Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To echo the meme around here, there appears to be an opportunity here for a startup :)

If you can get buy-in from the research community, even in one niche field, I could see great utility in a central escrow house of sorts, where people can post research software, datasets, etc. and others can download the stuff and validate the results only after agreeing in some reasonably binding way not to use that information to scoop or copy the original researcher.

Buy-in in this case means that everybody should have to do it to publish in a certain venue. You might even do something fancy like providing people access to a VPS which will run the server without allowing validators to scp everything back to their own machines. Of course this is impossible to do perfectly; once you allow people to view the data in a text editor or hex dump, it is all over in a strict sense (scripts + screen-scraping), but it is still a step up from just handing over the files, and keeps honest people honest.

You might also provide some anonymization/randomization service for datasets (for instance, by applying an unknown linear transform to absolute numbers in the case that the results don't change, similar to what was done here: http://googleresearch.blogspot.com/2010/01/google-cluster-da... .

I think there are 2 reasons why researchers can be reluctant to share:

1) They are afraid of getting scooped by somebody else leveraging all the effort they put into developing the tools and curating the data and leapfrogging them.

2) Their results are suspect.

Such an escrow house would ameliorate the concern behind 1) and help suss out 2) :)

Anyone else have thoughts on this?



In biomedicine, the National Library of Medicine has been taking steps in this direction. There are repositories for sequence, protein structure, and microarray data, for example, and many journals require that relevant data be deposited in these repositories as a condition of publication.

Efforts such as the Science Commons are also thinking about/working towards these sorts of solutions.


Actually, the NIH has already done this with many recent grants. Everyone funded through certain grants has to make their data available in dbGAP (genotype and phenotype), sometimes even pre-publication (which is raising some feathers).

Point 1 will always be true when people share methods and/or data pre-publication. If point 2 cannot be discerned with currently available methods (and I would argue that it can), then we are in a world of trouble.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: