How Much Linear Memory Access Is Enough?

(solidean.com)

50 points | by PhilipTrettner 3 days ago

2 comments

PhilipTrettner 3 days ago
I looked into this because part of our pipeline is forced to be chunked. Most advice I've seen boils down to "more contiguity = better", but without numbers, or at least not generalizable ones.
My concrete tasks will already reach peak performance before 128 kB and I couldn't find pure processing workloads that benefit significantly beyond 1 MB chunk size. Code is linked in the post, it would be nice to see results on more systems.
[-]
- twoodfin 3 hours ago
  Your results match similar analyses of database systems I’ve seen.
  64KB-128KB seems like the sweet spot.
_zoltan_ 1 hour ago
is this an attempt at nerd sniping? ;-)
on GPU databases sometimes we go up to the GB range per "item of work" (input permitting) as it's very efficient.
I need to add it to my TODO list to have a look at your github code...
[-]
- PhilipTrettner 50 minutes ago
  It definitely worked on myself :)
  Do have a look, I've tried to roughly keep it small and readable. It's ~250 LOC effectively.
  Also, this is CPU only. I'm not super sure what a good GPU version of my benchmark would be, though ... Maybe measuring a "map" more than a "reduction" like I do on the CPU? We should probably take a look at common chunking patterns there.