Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Any chance you would consider explaining it not in (haskell?)


Sure:

My algorithm is something I made up, and from memory it works like this:

1) Remove HTML, stem, remove stopwords etc

2) Sort unique words by popularity in the text

3) Split the original text on sentence boundaries.

4) Include each sentence that first mentions the next most popular word, until the summary is the maximum length requested.

http://news.ycombinator.com/item?id=1803020

Googling turns up http://sujitpal.blogspot.com.au/2009/02/summarization-with-l... which compares a few approaches.

Edit: Also http://stackoverflow.com/questions/2829303/given-a-document-..., which I think is by Dn_Ab who wrote the OP.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: