I believe his argument is that it would reduce thrashing between the caches. With the GIL ownership of a cache line containing the reference count for any given object will only have to be transfered at most once per timeslice. If multiple threads were concurrently accessing a python object it would be ping-ponging back and forth between caches much more frequently.
EDIT: Also, "stop whatever they are doing to synchronize the dirty cache lines with RAM," is not a very good way to describe what is going on, often times you don't have to hit RAM at all, the caches just synchronize between each other. It is still pretty bad for performance though.
>I believe his argument is that it would reduce thrashing between the caches. With the GIL ownership of a cache line containing the reference count for any given object will only have to be transfered at most once per timeslice.
Ah. Makes sense.
>just synchronize between each other
Yes, but that's bad because that cache line is 'stuck' for all processors while the synchronization is occurring, if I'm not mistaken...
In general at least the two processors with the conflict will either have to block for a bit or switch to another hardware thread when write conflicts are occurring. There are lots of architecture tricks people pull to try to mitigate the impact but the reality of the matter is frequently mutating shared state (e.g. reference counts) makes it extremely difficult to have good performance with threads running in parallel.
EDIT: Also, "stop whatever they are doing to synchronize the dirty cache lines with RAM," is not a very good way to describe what is going on, often times you don't have to hit RAM at all, the caches just synchronize between each other. It is still pretty bad for performance though.