Mark Gritter (markgritter) wrote,
Mark Gritter

Better Hash

I came up with a two-level hashing implementation that works better than the two algorithms I described previously. Here's the idea:

First hash the key to a bucket (again I used bucket = ( key ^ x ) % a.) This bucket is a 32-byte structure containing a multiplier m, modulus b and 14 data entries forming a secondary hash table. The second bucket is given by (( key * m ) % b) & 0xf. We just need to search for appropriate (a, x) and per-bucket (b, m) such that there are no collisions (and no secondary buckets > 14.)

This is easily accomplished for a ~ 52C5, x = 0, m = 1, which uses two less operations to generate the hashes (eliminating the XOR and multiplication.) With m=1 fixed we can also use a 15th entry in the secondary hash table. That doesn't run any faster than previous efforts despite the secondary hash table fitting into a single cache line.

However, we can lower the table size and increase the cache hit rate. a=600071, x=0 gives a table that comes in around 1.85-1.88 seconds per 10M lookups, about 30% improvement over hash_map. I'm looking for an 'a' value that will allow m=1 (to eliminate the multiply) but haven't found one yet.
Tags: algorithm, hashing, programming
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.