Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't think it needs to be more efficient than the CPU to merit moving to the GPU. If the horsing around to get the data in and out is less work than doing the job, then you may was well put the GPU to work and improve your total throughput.

Perhaps the memory page deduplication candidate detection could run out there. It would be memory bound, but maybe by not ruining the CPU cache it would be a win. (This is important for systems running a bunch of virtual machines.)



"the horsing around to get the data in and out" seems to be the key factor. An analysis of BLAS libraries' performance across several architectures [1] showed that GPU-based calculation only approached implementations like Goto BLAS with matrix dimensions well up into the thousands. That's just one example, but there seems to be a fair bit of overhead in getting the data to and from the GPU.

[1] http://dirk.eddelbuettel.com/blog/code/gcbd/




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: