Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The advantage of tombstones is that it is dead simple to implement. Backshift removals can cause a lot of memory shuffling if you are removing an element from a very congested part of the hash table.


Backshift has two potential disadvantages: 1) copying objects may be slower than testing equality (case dependent); 2) each deletion involves the entire probe chain (with tombstone, you inspect half of the probe chain in average). On the other hand, with tombstone, each probe chain is longer. This slows down searching and insertion as well as deletion. Tombstones also increase the frequency of rehashing.

> The advantage of tombstones is that it is dead simple to implement.

Backshift is harder to understand but it actually takes fewer LOC to implement at my hand. That is because insertion and searching become simpler (and slightly faster) with backshift.

> Backshift removals can cause a lot of memory shuffling

Not sure what you mean by memory shuffling. Backshift is applied to linear probing only. You don't jump around except when you wrap around the end of the bucket array, which is rare.


> Backshift has two potential disadvantages: 1) copying objects may be slower than testing equality (case dependent);

I would wager that is true in most cases.

> This slows down searching and insertion as well as deletion. Tombstones also increase the frequency of rehashing.

Yes, but you could argue that is an advantage; the hash table reaches its "natural size" faster.

> Backshift is harder to understand but it actually takes fewer LOC to implement at my hand. That is because insertion and searching become simpler (and slightly faster) with backshift.

I looked at your code and, afaict, it only supports linear probing. Backshift removals would be much harder or maybe even impossible to implement with quadratic probing or double hashing.


> I would wager that is true in most cases.

Strings. In C, comparing two strings involves a function call, a loop and a potential cache miss. Copying a string just moves a pointer, which is much cheaper. The additional copying costs little for strings.

> the hash table reaches its "natural size" faster.

Not sure what you mean by this. The "nature size" is the size without tombstones. Clearly backshift is the better one.

> it only supports linear probing

Yes, backshift only supports linear probing so far as I know. Most high-performance hash table libraries use linear probing these days. A few use quadratic probing. I rarely see double hashing.

Anyway, backshift always uses less memory and is always faster on insertions and searching. Under certain deletion load, backshift may be slower due to 1) and 2), but the difference is fairly small considering the save on memory.


> Strings. In C, comparing two strings involves a function call, a loop and a potential cache miss. Copying a string just moves a pointer, which is much cheaper. The additional copying costs little for strings.

A better approach would be to store the string's hashcode somewhere and only perform the element-wise comparison in case they match. For most strings they are unlikely to match.

> Not sure what you mean by this. The "nature size" is the size without tombstones. Clearly backshift is the better one.

You can count the fraction of the table consumed by tombstones and grow it a little earlier if that fraction is significant.

> Yes, backshift only supports linear probing so far as I know. Most high-performance hash table libraries use linear probing these days. A few use quadratic probing. I rarely see double hashing.

Yes, but you are trading amazing performance in the average case against awful performance in the degenerate case. Poor hash functions or adversarial input could cause congestion in parts of the hash table.


> Yes, but you are trading amazing performance in the average case against awful performance in the degenerate case. Poor hash functions or adversarial input could cause congestion in parts of the hash table.

It is not just me. As I said, most high-performance hash tables these days (absl, F14, and all robin hood implementations) use linear probing. Providing a proper default hash function avoids most worse cases in practice. In addition, the hash library can use a secondary hash function [1] which makes the library much more robust to bad hash functions such as the identity hash function.

[1] https://attractivechaos.wordpress.com/2018/10/01/advanced-te...


Since you need to do collision counting for security reasons anyway, it's trivial the check the backshift threshold while searching for your key. If you got too many tombstones in your chain, backshift.


If you are using load factor around %75, it means you’ll do 3 shifts at most. On average it is less than it as you are ordering your map on each removal and only case you’ll do backshifts when your items are “not ordered according to hash function”.

Tombstones are silly, you do add/remove %75 of your map capacity, now you have to rehash your map even your map is empty. I can’t think a single scenario that tombstones can perform better. It causes much more probing and much more cache misses than backshift deletion.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: