The limitation on the map end is only a limitation for hadoop.
But the limit on the reducer is fundamental. Some reduce functions are not associative, and some don't even have type [T x U] -> T x U. In those cases, there is nothing to be done but redo the reduce.
Indeed, reduce is the difficult part. OTOH, I think this limitation is seen in many algorithms at a fairly fundamental level, and not just an artefact of MR. The only alternative framework I can think of for dealing with really large datasets in a distributed manner is sampling-based methods, with one-pass algorithms (or mostly one pass algorithm).