PDFtribute website with links scraped from Twitter

wahnfrieden · on Jan 13, 2013

Tweets are so short to begin with, why on earth are you truncating them? It's unreadable nonsense.

temphn · on Jan 13, 2013

Nice job here. Any way to direct link the pdfs or get people to submit their URL to Google for indexing?

https://www.google.com/webmasters/tools/submit-url

c16 · on Jan 13, 2013

Thanks. The twitter API gives t.co links, so next step will be to crawl the sites/documents and get a little more data from them as well as purge the blogs which got past the filter. From there we can use 'do follow' links to let search engines find the documents.

glomph · on Jan 13, 2013

Not very helpful when there is very little indication of what each link is or if it is a duplicate and the tweets are truncated. How is this more useful than searching twitter?

c16 · on Jan 13, 2013

We've been hacking away for past few hours. I've not yet had the time to look into the links etc... so data is still scarce.

As for 'How is this more useful than searching twitter?', I think it's nice to have everything in one place and listed. We're collecting data which we can manipulate it at a later date when we have more time. As I see it, gathering the data, then analysing it is far better than coding everything, having lost lots of data and only being able to analyse a small portion of it.

tchalla · on Jan 13, 2013

Nice idea. Question - Is there a way to "Storify" these archives? If not, you could make a UI which helps users to search through the data. Another idea would be to collect information on the PDFs linked.

devopstom · on Jan 13, 2013

We're working on the search implementation. It will be available soon.

waxjar · on Jan 14, 2013

Perhaps filtering on both #pdftribute and the word paper / papers gives better results.