Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The only downside for me is that not only do you have to find a massively parallel problem to work on, but the computation function also needs to be much slower than network latency. With network latency being in the ms range, the algorithm solving problem needs to be really slow to benefit.


Not necessarily. It can also just be a vast amount of data, in which case bandwidth (which is generally pretty good), not latency, is your limiting factor with mapreduce.

Also, you only need one part of your infrastructure to require mapreduce's parallelism in order to argue for using it across the board. If you have simpler problems to solve, you may as well solve them with mapreduce since you'll be thinking in that computation model anyway, and you can easier use the results later in a computation that may require mapreduce.


Your problem still has to have the property that loading and processing the dataset needs to be slower than sending it over the network and getting the result back, though...




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: