Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s honestly a bit of a pain. I’m using a library to help parse different formats, but there are many custom cases to handle. Dates are a good example. I’m parsing more than a dozen formats, and there’s no real pattern in how sites display their published dates. Some blogs even use unusual formats that aren’t common anywhere else.

I try to avoid altering the original content as much as possible. I do need to sanitize and adjust parts of it to produce clean text on my site, but I’m careful not to change anything in a way that misrepresents the source. Only a few short phrases appear on GreatReads, and users cannot read the full article without visiting the original source.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: