Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not necessarily. It can also just be a vast amount of data, in which case bandwidth (which is generally pretty good), not latency, is your limiting factor with mapreduce.

Also, you only need one part of your infrastructure to require mapreduce's parallelism in order to argue for using it across the board. If you have simpler problems to solve, you may as well solve them with mapreduce since you'll be thinking in that computation model anyway, and you can easier use the results later in a computation that may require mapreduce.



Your problem still has to have the property that loading and processing the dataset needs to be slower than sending it over the network and getting the result back, though...




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: