Thanks. The twitter API gives t.co links, so next step will be to crawl the sites/documents and get a little more data from them as well as purge the blogs which got past the filter. From there we can use 'do follow' links to let search engines find the documents.
Not very helpful when there is very little indication of what each link is or if it is a duplicate and the tweets are truncated. How is this more useful than searching twitter?
We've been hacking away for past few hours. I've not yet had the time to look into the links etc... so data is still scarce.
As for 'How is this more useful than searching twitter?', I think it's nice to have everything in one place and listed. We're collecting data which we can manipulate it at a later date when we have more time. As I see it, gathering the data, then analysing it is far better than coding everything, having lost lots of data and only being able to analyse a small portion of it.
Nice idea. Question - Is there a way to "Storify" these archives? If not, you could make a UI which helps users to search through the data. Another idea would be to collect information on the PDFs linked.