It’s honestly a bit of a pain. I’m using a library to help parse different formats, but there are many custom cases to handle. Dates are a good example. I’m parsing more than a dozen formats, and there’s no real pattern in how sites display their published dates. Some blogs even use unusual formats that aren’t common anywhere else.
I try to avoid altering the original content as much as possible. I do need to sanitize and adjust parts of it to produce clean text on my site, but I’m careful not to change anything in a way that misrepresents the source. Only a few short phrases appear on GreatReads, and users cannot read the full article without visiting the original source.
I try to avoid altering the original content as much as possible. I do need to sanitize and adjust parts of it to produce clean text on my site, but I’m careful not to change anything in a way that misrepresents the source. Only a few short phrases appear on GreatReads, and users cannot read the full article without visiting the original source.