The only downside for me is that not only do you have to find a massively parall...

leif · on Sept 15, 2010

Not necessarily. It can also just be a vast amount of data, in which case bandwidth (which is generally pretty good), not latency, is your limiting factor with mapreduce.

Also, you only need one part of your infrastructure to require mapreduce's parallelism in order to argue for using it across the board. If you have simpler problems to solve, you may as well solve them with mapreduce since you'll be thinking in that computation model anyway, and you can easier use the results later in a computation that may require mapreduce.

stavros · on Sept 15, 2010

Your problem still has to have the property that loading and processing the dataset needs to be slower than sending it over the network and getting the result back, though...