Golang proposal: Dense mark bits and sweep-free allocation

vessenes · on Oct 2, 2015

This would be great, and is greatly needed. but I have a question for the performance junkies here:

    If the cost of atomically setting bits in the mark bitmap turns out to be 
  high, we could instead dedicate a byte per object for the mark. This idea is 
  mentioned in GC literature [Garner, 2007]. Obviously, this involves an 8× 
  increase in memory overhead. It’s likely that on modern hardware, the 
  cost of the atomic bit operation is small, while the cost of increasing 
  the cache footprint of the mark structure is probably large.

I have been trained to think of an atomic operation at the CPU level flushing the cache line, and possibly L1 and further impacting all cores sharing the cache. This statement implies the opposite. Can anyone comment on this vis-a-vis a modern intel chip?

pslam · on Oct 2, 2015

This sounds wrong to me, too. For the single core case, there's no such thing as an atomic byte store - the underlying mechanism reads a cache line and modifies it. The same goes for a bit operation (possibly needing more instructions).

For the multi-core case, there's always going to be cache line thrashing regardless of size. Cache lines will evict/transfer between cores as each accesses them.

Probably best to go with single bit, unless it can be shown that the overhead of generating a bit mask and the r/m/w is far more expensive.

microtonal · on Oct 2, 2015

The referenced paper is a bit vague about this:

An alternative to side bitmaps is a bytemap, a bytegrained data structure on the side. Bytemaps trade an 8× space overhead to avoid synchronizing on each set operation (if we assume support for atomic byte-grained stores).

I think the idea is that, since a bytemap is less dense, two cores less frequently access the same cache lines, leading to fewer invalidations. Though, I wonder if it pays off, since it also gives less locality.

yxhuvud · on Oct 2, 2015

No, it is referring to the fact that the least unit that can be written to memory isn't a bit but a byte. To write a bit the whole byte must be read, modified and then written back.

microtonal · on Oct 5, 2015

Sure, but in both cases if a mark needs to be changed 'the whole byte needs to be read modified and written back', bitmaps or bytemaps.

yxhuvud · on Oct 2, 2015

Isn't the atomic part of that sentence referring to that they are changing bits separately, which is a bit inefficient since most architectures simply doesn't allow that. To set a single bit the writer must read the whole byte, modify it and then write the whole byte back. This is slower than simply writing the whole byte.

That is, I don't think they are referring to the same concept of atomicity as you are.

nulltype · on Oct 2, 2015

Aren't they agreeing with your single bit approach? If so, why does it sound wrong to you?

pslam · on Oct 2, 2015

The point is atomicity doesn't work like this on x86, ARM and most architectures. It's the right conclusion from the wrong analysis. The difference in speed will be the time consumed by managing a packed bitmap, not due to the atomicity being any faster.

shawnps · on Oct 2, 2015

If you liked this you might also find Rick Hudson's talk on GC improvements in Go 1.5 interesting:

https://www.youtube.com/watch?v=aiv1JOfMjm0

https://talks.golang.org/2015/go-gc.pdf

nulltype · on Oct 2, 2015

So it looks from the diagram the spans point to arbitrary points in the mark bitmaps. Does that mean there's a lookup table to map from span to mark bitmap/offset?