Koustav Chowdhury / Jun 05 2019
Remix of Julia by Nextjournal

Blog #2 : Julia Season of Code

This blog is about documenting certain performance benchmarks for RobinDict. I have pushed a bench_robin_dict.jl in my Pull Request, for the sole purpose of benchmarking the performance of RobinDict. In this blog, I'll solely focus on the benchmarking process of the insert operation of RobinDict.

Insert operation

There will be several stages mentioned here and with each step, I'll explain why the present stage is an improvement over the last one, (except the first one!).

Stage 1 : As I built it

RobinDict had a Load Factor (LF) of 90% when I wrote the code for it initially. The repository at that point of time can be viewed here.

Shift+Enter to run
.
Sample #1 Key => Integer , Size => 10^6 entries
.
        add_entries for RobinDict()
  885.254 ms (7189157 allocations: 201.70 MiB)
        add_entries for Dict()
  1.171 s (5981582 allocations: 156.44 MiB)
        add_entries for RobinDict{Int, Int}()
  391.609 ms (55 allocations: 92.00 MiB)
        add_entries for Dict{Int, Int}()
  180.072 ms (54 allocations: 65.17 MiB)
.
Sample #2 Key => Float32 , Size => 10^6 entries
.
        add_entries for RobinDict()
  1.194 s (7189193 allocations: 201.70 MiB)
        add_entries for Dict()
  1.010 s (5977346 allocations: 156.38 MiB)
        add_entries for RobinDict{Float32, Float32}()
  219.201 ms (53 allocations: 61.34 MiB)
        add_entries for Dict{Float32, Float32}()
  103.065 ms (52 allocations: 34.50 MiB)
.
Sample #3 Key => String , Size => 10^6 entries
.
        add_entries for RobinDict()
  1.252 s (1728897 allocations: 118.38 MiB)
        add_entries for Dict()
  565.276 ms (2562159 allocations: 104.26 MiB)
        add_entries for RobinDict{String, String}()
  902.739 ms (55 allocations: 92.00 MiB)
        add_entries for Dict{String, String}()
  519.767 ms (54 allocations: 65.17 MiB)
.
Sample #4 Key => Integer , Size => 10^7 entries
.
        add_entries for RobinDict()
  21.810 s (64824799 allocations: 1.71 GiB)
        add_entries for Dict()
  19.453 s (57818021 allocations: 1.39 GiB)
        add_entries for RobinDict{Int, Int}()
  3.139 s (73 allocations: 764.00 MiB)
        add_entries for Dict{Int, Int}()
  1.755 s (72 allocations: 541.17 MiB)
.
Sample #5 Key => Integer , Size => 10^5 entries
.
        add_entries for RobinDict()
  33.398 ms (435326 allocations: 14.64 MiB)
        add_entries for Dict()
  18.232 ms (397871 allocations: 11.74 MiB)
        add_entries for RobinDict{Int, Int}()
  16.032 ms (37 allocations: 8.00 MiB)
        add_entries for Dict{Int, Int}()
  6.086 ms (36 allocations: 5.67 MiB)
.
Sample #6 Key => Float32 , Size => 10^5 entries
.
        add_entries for RobinDict()
  46.769 ms (435328 allocations: 14.64 MiB)
        add_entries for Dict()
  28.369 ms (398505 allocations: 11.75 MiB)
        add_entries for RobinDict{Float32, Float32}()
  10.911 ms (35 allocations: 5.34 MiB)
        add_entries for Dict{Float32, Float32}()
  5.346 ms (34 allocations: 3.00 MiB)
.
Sample #7 Key => String , Size => 10^5 entries
.
        add_entries for RobinDict()
  56.725 ms (78017 allocations: 9.19 MiB)
        add_entries for Dict()
  25.956 ms (115939 allocations: 7.44 MiB)
        add_entries for RobinDict{String, String}()
  28.982 ms (37 allocations: 8.00 MiB)
        add_entries for Dict{String, String}()
  15.631 ms (36 allocations: 5.67 MiB)

Stage 2 : Finding an optimal Load Factor

Well, as you can see above, the performance of RobinDict is not good enough, both in terms of memory and time. There were a couple of reasons that I speculated.

  1. The high LF was leading to increasing probes when insertion were happening in bulky-yet-to-be-rehashed dictionary. So, I played around with several LF( decreasing it from 90%), and found 70% quite optimal for my purpose.
  2. Frequency of rehash! operation would be affecting performance, as it's kind of costly operation. So, I decided to reduce the frequency of rehash by increasing the size of my dictionary two times. The result of this operation is that when the rehash is completed, the dictionary will be at a LF of approx 35%. (I will optimize it further in stage 4). And, I have branched out of dependency on number of elements in the dictionary (the current implementation in base/dict.jl has this dependency). As of now, in base/dict.jl, it is dependent on h.count, but in my implementation, I've made it independent, doubling the size of the table, every time, rehash! function gets called.

Here is the performance after accounting for those two changes.

Shift+Enter to run
.
Sample #1 Key => Integer , Size => 10^6 entries
.
        add_entries for RobinDict()
  638.793 ms (6401685 allocations: 193.69 MiB)
        add_entries for Dict()
  1.607 s (5980440 allocations: 156.42 MiB)
        add_entries for RobinDict{Int, Int}()
  263.412 ms (85 allocations: 96.00 MiB)
        add_entries for Dict{Int, Int}()
  160.540 ms (54 allocations: 65.17 MiB)
.
Sample #2 Key => Float32 , Size => 10^6 entries
.
        add_entries for RobinDict()
  793.756 ms (6401468 allocations: 193.68 MiB)
        add_entries for Dict()
  1.802 s (5981097 allocations: 156.43 MiB)
        add_entries for RobinDict{Float32, Float32}()
  257.217 ms (83 allocations: 64.00 MiB)
        add_entries for Dict{Float32, Float32}()
  154.797 ms (52 allocations: 34.50 MiB)
.
Sample #3 Key => String , Size => 10^6 entries
.
        add_entries for RobinDict()
  961.626 ms (1465635 allocations: 118.37 MiB)
        add_entries for Dict()
  770.944 ms (2562102 allocations: 104.26 MiB)
        add_entries for RobinDict{String, String}()
  780.450 ms (85 allocations: 96.00 MiB)
        add_entries for Dict{String, String}()
  656.538 ms (54 allocations: 65.17 MiB)
.
Sample #4 Key => Integer , Size => 10^7 entries
.
        add_entries for RobinDict()
  10.117 s (55229355 allocations: 1.57 GiB)
        add_entries for Dict()
  13.596 s (57828624 allocations: 1.39 GiB)
        add_entries for RobinDict{Int, Int}()
  2.485 s (103 allocations: 768.01 MiB)
        add_entries for Dict{Int, Int}()
  1.985 s (72 allocations: 541.17 MiB)
.
Sample #5 Key => Integer , Size => 10^5 entries
.
        add_entries for RobinDict()
  47.155 ms (748606 allocations: 23.43 MiB)
        add_entries for Dict()
  20.671 ms (397648 allocations: 11.74 MiB)
        add_entries for RobinDict{Int, Int}()
  16.719 ms (67 allocations: 12.00 MiB)
        add_entries for Dict{Int, Int}()
  6.046 ms (36 allocations: 5.67 MiB)
.
Sample #6 Key => Float32 , Size => 10^5 entries
.
        add_entries for RobinDict()
  68.728 ms (748580 allocations: 23.43 MiB)
        add_entries for Dict()
  18.499 ms (397389 allocations: 11.73 MiB)
        add_entries for RobinDict{Float32, Float32}()
  14.241 ms (65 allocations: 8.00 MiB)
        add_entries for Dict{Float32, Float32}()
  5.147 ms (34 allocations: 3.00 MiB)
.
Sample #7 Key => String , Size => 10^5 entries
.
        add_entries for RobinDict()
  70.362 ms (181572 allocations: 14.77 MiB)
        add_entries for Dict()
  24.401 ms (116018 allocations: 7.44 MiB)
        add_entries for RobinDict{String, String}()
  51.670 ms (67 allocations: 12.00 MiB)
        add_entries for Dict{String, String}()
  17.472 ms (36 allocations: 5.67 MiB)

As you can see, there's a improvement in the performance. But the memory consumption is still a cause of concern.

Stage 3 : Optimizing memory use

Till now, I was implementing everything in a pretty much straight forward manner, without caring about memory. These benchmarks showed that there's something amiss.

It took me sometime to realize that the point where I was getting battered by Dict was slots array. My implementation had a substitute called dibs(explained in the previous blog). My dibs array was declared as an Int type. But, on careful inspection, I did find my maxprobe wasn't going overboard than 17, even in the worst possible scenarios. So, surely I can cut down on memory by declaring as an Int8 type. And, viola!

Shift+Enter to run
.
Sample #1 Key => Integer , Size => 10^6 entries
.
        add_entries for RobinDict()
  471.782 ms (6401567 allocations: 165.68 MiB)
        add_entries for Dict()
  951.101 ms (5992313 allocations: 156.60 MiB)
        add_entries for RobinDict{Int, Int}()
  153.989 ms (83 allocations: 68.01 MiB)
        add_entries for Dict{Int, Int}()
  118.592 ms (54 allocations: 65.17 MiB)
.
Sample #2 Key => Float32 , Size => 10^6 entries
.
        add_entries for RobinDict()
  740.828 ms (6401737 allocations: 165.69 MiB)
        add_entries for Dict()
  1.044 s (5970254 allocations: 156.27 MiB)
        add_entries for RobinDict{Float32, Float32}()
  136.102 ms (81 allocations: 36.01 MiB)
        add_entries for Dict{Float32, Float32}()
  105.329 ms (52 allocations: 34.50 MiB)
.
Sample #3 Key => String , Size => 10^6 entries
.
        add_entries for RobinDict()
  601.298 ms (1465598 allocations: 90.37 MiB)
        add_entries for Dict()
  556.020 ms (2562018 allocations: 104.26 MiB)
        add_entries for RobinDict{String, String}()
  442.931 ms (83 allocations: 68.01 MiB)
        add_entries for Dict{String, String}()
  414.266 ms (54 allocations: 65.17 MiB)
.
Sample #4 Key => Integer , Size => 10^7 entries
.
        add_entries for RobinDict()
  9.591 s (55229329 allocations: 1.35 GiB)
        add_entries for Dict()
  12.006 s (57828210 allocations: 1.39 GiB)
        add_entries for RobinDict{Int, Int}()
  1.806 s (101 allocations: 544.01 MiB)
        add_entries for Dict{Int, Int}()
  1.494 s (72 allocations: 541.17 MiB)
.
Sample #5 Key => Integer , Size => 10^5 entries
.
        add_entries for RobinDict()
  33.722 ms (748702 allocations: 19.93 MiB)
        add_entries for Dict()
  17.690 ms (398958 allocations: 11.76 MiB)
        add_entries for RobinDict{Int, Int}()
  11.765 ms (65 allocations: 8.50 MiB)
        add_entries for Dict{Int, Int}()
  5.552 ms (36 allocations: 5.67 MiB)
.
Sample #6 Key => Float32 , Size => 10^5 entries
.
        add_entries for RobinDict()
  34.496 ms (748634 allocations: 19.93 MiB)
        add_entries for Dict()
  18.088 ms (399388 allocations: 11.76 MiB)
        add_entries for RobinDict{Float32, Float32}()
  10.865 ms (63 allocations: 4.50 MiB)
        add_entries for Dict{Float32, Float32}()
  4.771 ms (34 allocations: 3.00 MiB)
.
Sample #7 Key => String , Size => 10^5 entries
.
        add_entries for RobinDict()
  42.924 ms (181544 allocations: 11.27 MiB)
        add_entries for Dict()
  19.146 ms (115751 allocations: 7.43 MiB)
        add_entries for RobinDict{String, String}()
  32.988 ms (65 allocations: 8.50 MiB)
        add_entries for Dict{String, String}()
  15.462 ms (36 allocations: 5.67 MiB)

Stage 4 : Almost there

Well, upon careful observation, it can be seen that rehash operation is the most expensive operation in terms of both memory and time. Reducing frequency of rehash calls, thus can help in squeezing out some performance. So, I decided to expand the dictionary 4 times (as opposed to 2 times, in stage 2), leaving the newly rehashed dictionary at an approx LF of 17%. The result is there for you to see!

Shift+Enter to run
.
Sample #1 Key => Integer , Size => 10^6 entries
.
        add_entries for RobinDict()
  414.494 ms (4935410 allocations: 165.98 MiB)
        add_entries for Dict()
  697.873 ms (5980104 allocations: 156.42 MiB)
        add_entries for RobinDict{Int, Int}()
  154.609 ms (48 allocations: 90.67 MiB)
        add_entries for Dict{Int, Int}()
  112.004 ms (54 allocations: 65.17 MiB)
.
Sample #2 Key => Float32 , Size => 10^6 entries
.
        add_entries for RobinDict()
  961.174 ms (4935424 allocations: 165.98 MiB)
        add_entries for Dict()
  1.038 s (5993783 allocations: 156.63 MiB)
        add_entries for RobinDict{Float32, Float32}()
  121.045 ms (46 allocations: 48.00 MiB)
        add_entries for Dict{Float32, Float32}()
  100.450 ms (52 allocations: 34.50 MiB)
.
Sample #3 Key => String , Size => 10^6 entries
.
        add_entries for RobinDict()
  546.071 ms (978070 allocations: 105.59 MiB)
        add_entries for Dict()
  592.393 ms (2562226 allocations: 104.27 MiB)
        add_entries for RobinDict{String, String}()
  456.497 ms (48 allocations: 90.67 MiB)
        add_entries for Dict{String, String}()
  439.412 ms (54 allocations: 65.17 MiB)
.
Sample #4 Key => Integer , Size => 10^7 entries
.
        add_entries for RobinDict()
  6.294 s (31743327 allocations: 847.03 MiB)
        add_entries for Dict()
  12.765 s (57816986 allocations: 1.39 GiB)
        add_entries for RobinDict{Int, Int}()
  1.336 s (54 allocations: 362.67 MiB)
        add_entries for Dict{Int, Int}()
  1.564 s (72 allocations: 541.17 MiB)
.
Sample #5 Key => Integer , Size => 10^5 entries
.
        add_entries for RobinDict()
  16.962 ms (383058 allocations: 11.51 MiB)
        add_entries for Dict()
  18.223 ms (398862 allocations: 11.75 MiB)
        add_entries for RobinDict{Int, Int}()
  6.815 ms (36 allocations: 5.67 MiB)
        add_entries for Dict{Int, Int}()
  5.561 ms (36 allocations: 5.67 MiB)
.
Sample #6 Key => Float32 , Size => 10^5 entries
.
        add_entries for RobinDict()
  16.879 ms (383075 allocations: 11.51 MiB)
        add_entries for Dict()
  18.585 ms (397571 allocations: 11.74 MiB)
        add_entries for RobinDict{Float32, Float32}()
  5.915 ms (34 allocations: 3.00 MiB)
        add_entries for Dict{Float32, Float32}()
  4.764 ms (34 allocations: 3.00 MiB)
.
Sample #7 Key => String , Size => 10^5 entries
.
        add_entries for RobinDict()
  20.587 ms (60727 allocations: 6.60 MiB)
        add_entries for Dict()
  18.973 ms (115979 allocations: 7.44 MiB)
        add_entries for RobinDict{String, String}()
  16.995 ms (36 allocations: 5.67 MiB)
        add_entries for Dict{String, String}()
  15.657 ms (36 allocations: 5.67 MiB)

Notice how the allocations and the time has gone down! Now, I believe this is an improvement that can be brought about in the current form of implementation in Dict also.

Looking ahead

Well, there are a few things that can still be done to eke out certain performance improvements. One of them is based on the fact that even in the worst possible scenarios maxprobe is within 31. So, I can use 6 bits instead of 8 bits provided by Int8. I'll try this on, sometimes later in the project.