I don't understand; 1) What's the 'top words' which appears when you search for ...

spgenot · on July 23, 2015

First of all this is all still in very early beta work ! I'm not sure how it got to HN but here it is so all your feedback is great. I'll try to answer some of your questions the best I can:

1) The top words are those that characterize the show the best. This is not perfect science, and is an output of the LDA algorithm, but it gives already a good indication. Some words indeed shouldn't be there. Some possible explanation: subtitle mistake or a bug...

2) The words that appear more then once are again a glitch, and should be fixed. Again, work in progress...

3) The top topics are found using a topic modelling algorithm. It splits a corpus of documents into a number of topics, and every documents contains a certain proportion of each topic (20% Police, 80% Terrorism for example). The topics are bag-of-words, and so we manually give them names to what we think fits best.

4) Again beta...

I hope the 'about' is clear enough, if you have any questions feel free to ask !

testudovictoria · on July 23, 2015

3.) I think the 'Top Topics' are categorizations based upon the words found in the subtitles. Each word in the English language is mapped to a category, and based upon the word content of the show, it is assigned a category. While interesting in theory, it definitely misses the mark on certain shows. I assume The Simpsons is due to their 27 Treehouse of Horror events while the rest of the show does not necessarily have a central focus.

frazras · on July 23, 2015

Right! I saw profanity in the big bang theory too but unless it was bleeped out I don't believe that has ever happened

codereflection · on July 23, 2015

Same with Adventure Time

bvm · on July 23, 2015

and Frasier...?