Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thats a very valid point. Hadoop does a lot more than handle the basic logic of MapReduce (which btw, is not terribly complicated). Fault tolerance, preemptive duplication of tasks in case of unexpected slowness, etc. are all part of the ugliness that Hadoop takes care of on large clusters.

That said, I can see this being useful for prototyping MapReduce code quickly in Python without going through the hassle of setting up Hadoop and coding in Java.



Mincemeat.py actually does do fault tolerance and preemptive duplication. I'm trying to get it to handle all of the MapReduce logic (and reliability, security, etc.) without the extra burdens of setting up a distributed file system (the machines I've been running it on have a hefty NFS share, which is good enough for most purposes), preallocation of the machines that will be part of the cluster (my university uses Condor, which spawns processes randomly across a large cluster), etc.


In that case, kudos! One of my pet gripes about Hadoop is that it really used to be unusable without HDFS (don't know if that's changed). If you truly manage to develop it to a stable release with all the features of Hadoop, I definitely see it making some inroads in scientific computation.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: