Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you'd have to implement the hash table every time you want to use one, you'd probably realize that there is a better alternative data structure. Quite often you data is not as randomly distributed as you may initially think it is.


One of the cool nuances of hash tables is that you can exploit the nonrandomness of your data to get a more uniform distribution of keys over hash buckets than you would get with an ideal random-oracle hash function. For example, it's very common to use hash functions which will produce four consecutive output values (possibly shuffled) for string keys "r0", "r1", "r2", and "r3", which guarantees that those keys will not collide with each other. Hashpjw is one such widely-used hash function; FNV is another. (Better not use linear probing with such a hash function, though! And it's probably better not to use them in cases where the keys might be chosen adversarially to produce a DoS.)


> Quite often you data is not as randomly distributed as you may initially think it is.

That's why it's important to pick the right hash function. A hash table usually goes a long way before it becomes a performance issue.


What?


I think the parent comment is highlighting that a hash table is just one implementation of a "Map" abstract data type. Depending on the use case, other implementations (such as using a heap, in the C++ STL) may have better performance.


True, but as far as I know it's rarely the worst choice you can make, and it's plenty fast for a lot of applications. I don't know that there's a single data structure that's always better, which is what the GP seems to be implying.


The STL is always the wrong solution for performance.

set: the worst. unordered_set: the worst. vector: no small vector optimizations. string: horrible. array: use small_vector instead. deque: slow. list and forward_list: nobody uses that for performance.


Or for the example of word frequencies in the article, a prefix trie.


I tried a trie with data that I thought would work perfectly with it. There were a good amount of shared prefixes. But it ended up performing worse than a hash map. I don't remember all the details now, but after seeing that I'll just keep reaching for hash maps first (unless I'm sure that there's only going to be a very small number of elements, then I'll just iterate in order over a vector).




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: