Spo - Stage 1

While looking for some project ideas, I ran across an article comparing multiple popular hash table implementations and their performance : http://incise.org/hash-table-benchmarks.html .

The author picked some C++ libraries that include hash table (Google’s dense & sparse hash tables, GCC unordered_map , Glib GHashTable , Qt & boost implementations) and also ran tests against Python’s dict and Ruby’s hash map (just for ‘fun’, as he says). Now, the text is about 8 years old and it would be interesting to re-run the benchmarking code and see how much has changed since then. I did try running it with 2/7 libraries (not willing to build all of them) but the process was taking too long and it also seemed extremely resource hungry. I think I should leave it overnight later or tweak some parameters, after all I am still curious to see if anything has improved since 2009.

If you bother to go into the github repo containing benchmarking code, you can find some interesting forks including this (added some new hash map implementation), this (faster & benchmark additions (counting unique words in the Bible)). The point is, the article author made it very easy to add your own hash tables and benchmark them against popular alternatives.

I thought it would be interesting to take a look at the list of ‘Losers’ since hash tables use hash algorithms (duh) that we are supposed to optimize for the SPO course. It is important to note that bad hash-table performance may be caused by some other factors including data structure design flaws. The library I will be looking at, Boost, was labeled as ‘Fail Club’ member for the following reasons (you can look up some graphs/stats in the article):

  • bad performance for over ~15 million entries for strings
  • same for integers (both sequential & random inserts exec time is really bad for over 20 million entries)

This is slightly alarming since the issues might not be caused by poor hash function at all. Nevertheless, I decided to give it a try and see if anything can be done here. Boost is a hodgepodge of various tools that haven’t made it to C++ Standard Library yet include linear algebra & math stuff, multithreading, image processing, data structures, graphs and many other things (some of them sound very exotic). Since it’s both portable and library, the code base is guaranteed to be a macro hell. On the other hand, if I fail to optimise the hash function (spoiler alert, I think I did), I can take a look at various library components that use hash algorithm and see if anything can be done there.

Building The Library

Building Boost is very easy. In fact, you don’t need to build it at all (stuff we will use is header only). Just grab the latest version :

wget https://dl.bintray.com/boostorg/release/1.65.1/source/boost_1_65_1.tar.bz2


tar --bzip2 -xf  boost_1_65_1.tar.bz2

You can test that it works with the following code:

#include <boost/functional/hash.hpp>
#include <iostream>
int main()
    boost::hash<std::string> string_hash;
    std::size_t h = string_hash("Hash me");
    std::cout << h << std::endl;

Build with the gcc command, you might need to specify the boost folder with -I parameter:

c++ -O3  -g -I boost_1_65_1 hello_hash.cpp -o hello_hash

This should work (too bad if it didn’t).

Since I will be focusing on one particular boost component (functional/hash) implementation, it may be enough to clone one module https://github.com/boostorg/functional and include it in the code. The reason I downloaded the super-project is because I wanted to see where/how the hash function is used across the library.

The boost.org doc says that hash implementation is used by Boost.Unordered and Boost.Intrusive and some other modules. It can work with the following datatypes:

  • integers
  • floats
  • pointers
  • strings

You can also define custom hash implementation for your own data structure (in case you want to use own datatype with hash table).

Dissecting Hash

The first thing to do is to familiarise ourselves with the functional/hash code and to see how it works under the hood. I will use GDB to jump from/to breakpoints and step into the library code. In order for the debugger to work, you need to set -g GCC parameter and preferably disable any optimisation (just for now).

The simple program below hashes 3 datatypes (integers, floats and strings) and outputs the results. I set breakpoints on each call to hash function in the for loop:

#include <boost/functional/hash.hpp>

int main(int argc, char** argv)
    // Test arrays (I know, very good naming conventions)
    std::string strings[4] = {"Tuesday", "Wednesday", "Thursday", "Friday"};
    int ints[4]      = {1,2,3,4};
    float floats[4]  = {1.3f, 0.1f, 5501.23f, 0.1401f};
    boost::hash<int>   int_hash;
    boost::hash<float> float_hash;
    boost::hash<std::string> string_hash;
    for (int i=0; i < 4; i++) {
        size_t hash1 = int_hash(ints[i]);        // Breakpoint 1
        size_t hash2 = float_hash(floats[i]);    // Breakpoint 2
        size_t hash3 = string_hash(strings[i]);  // Breakpoint 3
        printf("Hashed Int    : %d into %lu \n", ints[i], hash1);
        printf("Hashed Float  : %f into %lu \n", floats[i], hash2);
        printf("Hashed String : %s into %lu \n", strings[i].c_str(), hash3);

The output :

Hashed Int    : 1 into 1 
Hashed Float  : 1.300000 into 1067869798 
Hashed String : Tuesday into 2547806126 
Hashed Int    : 2 into 2 
Hashed Float  : 0.100000 into 1036831949 
Hashed String : Wednesday into 1222668964 
Hashed Int    : 3 into 3 
Hashed Float  : 5501.229980 into 1168894423 
Hashed String : Thursday into 1820509330 
Hashed Int    : 4 into 4 
Hashed Float  : 0.140100 into 1041200736 
Hashed String : Friday into 3247388519 

The implementation of hash will vary depending on the datatype you pass, but the program flow seems to have the following general pattern:

Upon boost::hash<datatype> call,


takes execution to a type-specific


which calls specific hash implementation (hash_combine):


You’ve probably noticed that integer hash output looks very ‘sophisticated’. In fact, if you step into the int hash implementation, all you’ll see is a one-liner casting passed argument to size_t :

typename boost::hash_detail::basic_numbers<T>::type hash_value(T v)
    return static_cast<std::size_t>(v);

Any positive number will yield the same value and any signed one will be cast to unsigned (e.g. if you pass hash function -4 , it will return 4294967292 )

Floating point hasher seems a bit more interesting: e.g. If you pass float value as boost::hash<float>(1.3f) , program flow will be taken to hash.hpp file:

And the library will pick a function according to the data type passed ( float_hash_value in this case ):

template <typename T>
typename boost::hash_detail::float_numbers<T>::type hash_value(T v)
    return boost::hash_detail::float_hash_value(v);

After is_zero check, we should finally step into hash implementation itself:

template <class T>
inline std::size_t float_hash_value(T v)
    return boost::hash_detail::is_zero(v) ? 0 : float_hash_impl(v, 0);

Well, almost there. The float value is cast into char

inline std::size_t float_hash_impl(Float v,
    BOOST_DEDUCED_TYPENAME boost::enable_if_c<
        enable_binary_hash<Float, 24, 128>::value,
    return hash_binary((char*) &v, 4);

We got into float specific implementation of hash inside hash_float.hpp header, length passed to the hash_binary is equal to 4 (which won’t be equal to size_t 8 byte size (the program was compiled with -m64 bit flag)). That buffer manipulation does look pretty cryptic to me though, from what I understand, the binary hash splits ptr into chunks (how many - depends on ptr size) and these data blocks get hashed.

inline std::size_t hash_binary(char* ptr, std::size_t length)
    std::size_t seed = 0;
    if (length >= sizeof(std::size_t)) {
        std::memcpy(&seed, ptr, sizeof(std::size_t));
        length -= sizeof(std::size_t);
        ptr += sizeof(std::size_t);
        while(length >= sizeof(std::size_t)) {
            std::size_t buffer = 0;
            std::memcpy(&buffer, ptr, sizeof(std::size_t));
            hash_float_combine(seed, buffer);
            length -= sizeof(std::size_t);
            ptr += sizeof(std::size_t);
    if (length > 0) {
        std::size_t buffer = 0;
        std::memcpy(&buffer, ptr, length);
        hash_float_combine(seed, buffer);
    return seed;

Float combine (the hash algorithm) looks as following :

inline void hash_float_combine(std::size_t& seed, std::size_t value)
    seed ^= value + (seed<<6) + (seed>>2);

In fact, this hash algorithm is present in other (not only float-specific) combine functions as well. The hash function is called Shift-Add-XOR. It was originally designed as a string hashing algorithm, but it was later adopted for other datatypes (here it’s used for floats).

The string_hash may take you to a very similar algorithm:

inline void hash_combine_impl(SizeT& seed, SizeT value)
    seed ^= value + 0x9e3779b9 + (seed<<6) + (seed>>2);

Interesting enough, there is also MurMur hash implementation right beside Shift-Add-XOR, I couldn’t figure out where/when it’s used though. I was only able to invoke it directly, maybe the MurMur hash is architecture-specific, who knows (who cares)?

// moved macro for convenience 
#define BOOST_FUNCTIONAL_HASH_ROTL32(x, r) (x << r) | (x >> (32 - r))
inline void hash_combine_impl(boost::uint32_t& h1,
                boost::uint32_t k1)
            const uint32_t c1 = 0xcc9e2d51;
            const uint32_t c2 = 0x1b873593;
            k1 *= c1;
            k1 = BOOST_FUNCTIONAL_HASH_ROTL32(k1,15);
            k1 *= c2;
            h1 ^= k1;
            h1 = BOOST_FUNCTIONAL_HASH_ROTL32(h1,13);
            h1 = h1*5+0xe6546b64;

Anyways, I hope this walk-through wasn’t too confusing, here’s a graph demonstrating float_hash execution (I guess this may be somehow helpful).


I decided to test 2 algorithms (both MurMur & Shift-Add-XOR) and measure their performance:

#define SIZE_R 10000000

int main(int argc, char** argv)
    boost::uint32_t* uints = (boost::uint32_t*)malloc(SIZE_R* sizeof(boost::uint32_t));
    size_t* murmur_hashes =(size_t*) malloc(SIZE_R* sizeof(size_t));
    size_t* standard_hashes =(size_t*) malloc(SIZE_R* sizeof(size_t));
    for (boost::uint32_t i=0; i < SIZE_R; i++) {
        uints[i] = rand() % std::numeric_limits<uint32_t>::max();

    ////////////////// HASH STARTS HERE //////////////////

    clock_t begin = clock();
    /// MURMUR HASH ///
    for (int i=0; i<SIZE_R; i++) {
        boost::uint32_t h;
        murmur_hashes[i] = h;

   clock_t end  = clock();
   double ts = (double)(end - begin) / CLOCKS_PER_SEC;
   printf("Murmur hash took : %.3f\n", ts);
   long long sum_check_1 = 0;
   for (int i=0; i<SIZE_R; i++) {
        sum_check_1 += murmur_hashes[i];
   begin = clock();
    /// Standard Impl ///
    for (int i=0; i<SIZE_R; i++) {
        boost::hash<boost::uint32_t> reg_hash;
        standard_hashes[i] = reg_hash(uints[i]);
   end  = clock();
   ts = (double)(end - begin) / CLOCKS_PER_SEC;

   long long sum_check_2 = 0;
   printf("Standard hash took : %.3f\n", ts);
   for (int i=0; i<SIZE_R; i++) {
        sum_check_2 += standard_hashes[i];

   ////////// PRINT & FREE //////////////   
   printf("Sum checks = %ld, %ld \n", sum_check_1, sum_check_2); 

First, I will check if anyone else is logged in:

obelavin pts/0     23:04    0.00s  0.17s  0.01s w

Seems like I am good to go!

The program output:

With -00

Murmur hash took : 0.800
Standard hash took : 0.895
Sum checks = 21473488956304150, 10738138201479754 

With -03

Murmur hash took : 0.070
Standard hash took : 0.041
Sum checks = 21475772362107714, 10738138201479754

Optimization Strategy

Honestly, I have zero idea what to do now. The issue number 1 is that everything got vectorized with -03 and simply rewriting rotations in assembly with vector instructions will be pretty meaningless (thank you compiler, here’s a proof btw https://gist.github.com/belavina/0b6c66e69cb8ab2719ff8c531866a73f )

I think what I should do is to attempt optimizing modules that use hash functions rather than rewriting hash algorithms themselves (maybe enforcing vectorization one level up). Maybe benchmarking boot hash table is a better idea than benchmarking hash algorithm itself.

Written on December 17, 2017