Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I use HXT to parse HTML. AFAICT, Hexpat doesn't do much besides parse the XML file into a tree. It doesn't have the niceties that Nokogiri or BeautifulSoup do. For example, I can use Nokogiri to get all the links on a page like so: page.css("a").

HXT allows me to come close to this:

tree >>> getXPathTreesInDoc "//a"

But I haven't seen a single Haskell XML parsing library that is as nice as Nokogiri.



In my work, I read in XML, parse its elements, attributes, and data, producing new XML. Along with Parsec, Hexpat is well-suited to the task.

I haven't had to parse HTML in Haskell. I use BeautifulSoup for that. I wouldn't be surprised if the Haskell libraries aren't as useful for that kind of thing.


I wrote up a guide to working with HTML in HXT: http://adit.io/posts/2012-04-14-working_with_HTML_in_haskell...

You might find it handy if you decide to give HXT another go :)




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: