That's a limitation of hadoop, though, not the MR idea by itself.

yummyfajitas · on June 12, 2011

The limitation on the map end is only a limitation for hadoop.

But the limit on the reducer is fundamental. Some reduce functions are not associative, and some don't even have type [T x U] -> T x U. In those cases, there is nothing to be done but redo the reduce.

cdavid · on June 12, 2011

Indeed, reduce is the difficult part. OTOH, I think this limitation is seen in many algorithms at a fairly fundamental level, and not just an artefact of MR. The only alternative framework I can think of for dealing with really large datasets in a distributed manner is sampling-based methods, with one-pass algorithms (or mostly one pass algorithm).