Fibonacci Hashing : How do we find out what this Fibonacci Hashing is?
In real world Fibonacci hashing is not implemented or used even a part of it, because it’s actually more (Time and Space) expensive for some reason. But Few parts of this Hashing we can use and understand.
So lets start understanding First Fibonacci series and then how its idea came to implement for Hashing to Hashtable.
Fibonacci series is …
F(n) = F(n-1) + F(n-2)
Which results into sequence as
1 1 2 3 5 8 13 21 34 55 89 and so on .....
Now as Nature always uses Fibonacci series to develop new things for various reason example in plants, a new leaf always come into sequence of fibonacci in order to get proper sunlight.
And Each subsequent number of fibonacci series follows Golden Ratio Φ = 1.6180
example 8/5 = 1.6 and 34/21=1.61 and as number grows its becoming more accurate.
image source google
If you want to draw 8 leaf in a circle such that each gets proper sunlight, you can just make each step around the circle be an angle of 360° / 8.
But What if you don’t know how many leaf gonna come in future ? You create a new leaf in angle of 360°/ Φ
To understand more How nature works with Fibonacci see this Video, which I found on youtube.
So coming back to how it can be used into Hashing and HashTable…
Any time that you want something to look randomly distributed, but you want to be sure that there are no clusters, you should at least try to see if you can use the golden ratio for that.
Let’s say our hash table is 1024 slots large, and we want to map an arbitrarily large hash value into that range. The first thing we do is we map it using the above trick into the full 64 bit range of numbers. So we multiply the incoming hash value with
2^64 / Φ ≈ 11400714819323198485
Multiplying with that number will overflow so we can take few bits for distribution, just as we wrapped around the circle in the flower example in video, this will wrap around the whole 64 bit range in a nice pattern, giving us an even distribution across the whole range from 0 to 2^64
Let’s just look at the upper three bits. So we can code like this :
ulong fibhash (ulong n, ulong mul = 11400714819323198485UL, ulong shift = 61UL) { return ( n * mul ) >> shift; }
This will return the upper three bits after doing the multiplication with the magic constant. And we’re looking at just three bits because it’s easy to see how the golden ratio wraparound behaves when we just look at the top three bits. If we pass in some small numbers for the hash value, we get the following results from this:
fibhash(0) == 0 fibhash(8) == 7 fibhash(1) == 4 fibhash(9) == 4 fibhash(2) == 1 fibhash(10) == 1 fibhash(3) == 6 fibhash(11) == 6 fibhash(4) == 3 fibhash(12) == 3 fibhash(5) == 0 fibhash(13) == 0 fibhash(6) == 5 fibhash(14) == 5 fibhash(7) == 2 fibhash(15) == 2 fibhash(16) == 7
This gives a pretty even distribution: The number 0 comes up three times, all other numbers come up twice. And every number is far removed from the previous and the next number. If we increase the input by one, the output jumps around quite a bit. So this is starting to look like a good hash function. And also a good way to map a number from a larger range into the range from 0 to 7.
The problems with integer modulo
So why is integer modulo for HashTable is bad? Two reasons: 1. It’s slow. 2. input data can not be evenly distributed to avoid/reduce collision.
: a modulo reduction involves a division, and divisions are expensive. Much more expensive than multiplications. A single 32-bit division on a recent x64 processor has a throughput of one instruction every six cycles with a latency of 26 cycles. In contrast, a multiplication has a throughput of one instruction every cycle and a latency of 3 cycles.
Why Fibonacci Hashing is the Solution
Fibonacci hashing solves both of these problems. 1. It’s really fast. It’s a integer multiplication followed by a shift. 2. It mixes up input and evenly distributes it.
You can read about both in few lines here : https://www.csee.umbc.edu/courses/undergraduate/341/fall99/frey/Lectures/Hashing/hashFunc.html
Finally Here is the link to implementation of unordered_map to understand implementation of Hashtable with Fibonacci from Rich Geldreich & Malte Skarupke.
Their other two hash tables are also there in code: flat_hash_map has very fast lookups and bytell_hash_map is also very fast but was designed more to save memory compared to flat_hash_map.
To Understand performance of flat_hash_map and other is here
Want to write such good article, please write at admin@gohired.in
See other mathematical article and puzzles at : http://gohired.in/tag/mathematical/
Rajesh says
August 23, 2018 at 2:50 pmTypo in series…
F(n) = F(n-1) + F(n-1)
Its F(n) = F(n-1) + F(n-2)
Dhaval Dave says
August 23, 2018 at 3:02 pmThanks for pointing out Rajesh