# Hash Table

 This is a stub or unfinished. Contribute by editing me.

A hash table is a dictionary-like data structure that provides $O(1)$ lookup time. You associate keys with values (e.g. a dictionary associates words with definitions). Often a hash table is implemented using a hash function and an ordinary array. Collisions (when multiple keys map to the same array index) must be dealt with somehow or the hash table is severely crippled.

Typically a standard hash table library (see UsingHashMap for details) will be adequate.

There are two completely separate algorithms involved in a hash table -- the "hash function" and the "collision resolution algorithm". Any hash function can be mix-and-matched with any collision resolution algorithm.

## Hash Functions

Most hash functions attempt to minimize the number of collisions, by attempting to spread all the expected keys uniformly over the array. In a few rare cases where all possible values are known ahead of time, it is possible to design a "perfect hash function" that maps each value to a different key, which eliminates all collisions.

### Key Length

An incredibly simple hash function could be the length of the key. This is a terrible hash function for (hopefully) obvious reasons.

### Key Value

If the key is given as a C-like string of length L, we can define the key value as such:

key[0] + key[1] + ... + key[L-2] + key[L-1]

This is a common but poor hash function. For example, "foo", "oof", and "fnp" all hash to the same index.

### non-uniform hash functions

Has anyone done research on non-uniform hash functions? Non-uniform hash functions that deliberately make several related values hash to the same key?

• In a spelling checker using a Bloom filter, it's OK if a hash function allows thousands of non-words (such as strings that begin with "qqtt") to "collide" to a single key, as long as no correctly-spelled word collides with that same key.
• When searching a table of information using a person's name as the "value", it may be useful to have common mis-spellings of that name all hash to the same key.
• Since certain patterns in Hashlife occur overwhelmingly more often than other patterns, it may be useful to have a hash function that makes each of the most-frequent patterns hash to a key with no collisions, even if it causes less-frequent patterns to hash to a key with an average of 2 or 3 collisions, and some extremely rare patterns to a key with dozens of collisions.
• Most "hash functions" attempt to simulate a "random oracle", to have "well-mixed ... avalanche". However, the hash function used in Python dictionaries has horrible avalanche behavior, but it gives better results on typical hash tables than a perfect random oracle (and therefore better than even cryptographically secure hashes).[1].