Fnv hasher with default capacity #44

greyblake · 2018-09-21T18:30:05Z

This PR goes on top of #43

Changes:

Initialize entites with default capacity > 0.

This may safe some extra memory allocations.

Benchmarks are again slightly contradictional.
Please consider this PR only like an idea. Feel free to reject it if it does not bring any useful improvement

mre · 2018-09-21T21:19:54Z

src/lib.rs

 use std::fmt;
 use std::marker::PhantomData;

 use pyo3::prelude::*;
 use serde::de::{self, DeserializeSeed, Deserializer, MapAccess, SeqAccess, Visitor};
 use serde::ser::{self, Serialize, SerializeMap, SerializeSeq, Serializer};

+const DEFAUL_HASHMAP_CAPACITY: usize = 10;


should be DEFAULT_HASHMAP_CAPACITY I guess. 😉

Oh right. Shame on me)

mre · 2018-09-21T21:23:45Z

Thanks for your PRs @greyblake. For sure sounds like a nice idea to try.
I'm gonna run the benchmarks, but what I would really love to do is running a profiler on the new and the old version to see what changed. Just haven't gotten around to add this to the project (see #38).

mre · 2018-09-21T21:45:00Z

So I ran the benchmarks on my machine and the values of this branch are very close to the master branch, well within the standard deviation. That means I can't make any conclusive judgement as to what is the faster version. Maybe others can try to reproduce on their machine?

main.txt
fnv-bench.txt

greyblake · 2018-09-22T14:37:38Z

Here are my benchmarks for Python 3.5, done on my laptop (debian).

master.txt
fnv-hasher.txt
fnv-hasher-with-default-capacity.txt

Here the result aggregated in one spreadsheet: https://docs.google.com/spreadsheets/d/1vrERpk-QLZYLQOHu8nh6fIAeTdEbNxp5bJEIPc-N-MM/edit?usp=sharing

However, if I run the benchmark multiple times I get different results, so yea, based on this it's hard to judge if this is a real improvement.
Would it make sense to increase number of iterations in the benchmarks?

mre · 2018-09-22T15:59:16Z

Pretty similar to what I saw in my benchmarks.
You can control the number of iterations and rounds with the following parameters:

pipenv run pytest benchmarks --benchmark-min-rounds=BENCHMARK_MIN_ROUNDS --benchmark-warmup-iterations=NUM

The docs are here.
I haven't tried that myself, though. 😉

greyblake · 2018-09-22T18:35:46Z

Ok, running the following command:

time pipenv run pytest benchmarks  --benchmark-warmup-iterations=100000 --benchmark-min-rounds=100000

Shows to me a more or less reproducible result (I've tried 2 times).
fnv-hasher branch slightly faster than master, and fnv-hasher-with-default-capacity is slightly faster than fnv-hasher. In particular (total time):

master: 	                  2m40.388s
fnv-hasher:                       2m38.193s
fnv-hasher-with-default-capacity: 2m37.594s

mre · 2018-11-10T16:35:34Z

I'm in the process of setting up a machine for profiling. Have a dedicated Linux box now for that purpose. If anybody has time to profile the code before me, feel free to do that and add some data here.

greyblake added 2 commits September 21, 2018 19:42

Use fnv hasher

9bdf577

Create HashMap with default capacity

67cc975

mre reviewed Sep 21, 2018

View reviewed changes

Fix typo: DEFAUL_HASHMAP_CAPACITY -> DEFAULT_HASHMAP_CAPACITY

a563db0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fnv hasher with default capacity #44

Fnv hasher with default capacity #44

greyblake commented Sep 21, 2018

mre Sep 21, 2018

greyblake Sep 22, 2018

mre commented Sep 21, 2018

mre commented Sep 21, 2018

greyblake commented Sep 22, 2018

mre commented Sep 22, 2018

greyblake commented Sep 22, 2018

mre commented Nov 10, 2018

Fnv hasher with default capacity #44

Are you sure you want to change the base?

Fnv hasher with default capacity #44

Conversation

greyblake commented Sep 21, 2018

mre Sep 21, 2018

Choose a reason for hiding this comment

greyblake Sep 22, 2018

Choose a reason for hiding this comment

mre commented Sep 21, 2018

mre commented Sep 21, 2018

greyblake commented Sep 22, 2018

mre commented Sep 22, 2018

greyblake commented Sep 22, 2018

mre commented Nov 10, 2018