Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's a limitation of hadoop, though, not the MR idea by itself.


The limitation on the map end is only a limitation for hadoop.

But the limit on the reducer is fundamental. Some reduce functions are not associative, and some don't even have type [T x U] -> T x U. In those cases, there is nothing to be done but redo the reduce.


Indeed, reduce is the difficult part. OTOH, I think this limitation is seen in many algorithms at a fairly fundamental level, and not just an artefact of MR. The only alternative framework I can think of for dealing with really large datasets in a distributed manner is sampling-based methods, with one-pass algorithms (or mostly one pass algorithm).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: