Skip to content
This repository has been archived by the owner on Sep 14, 2023. It is now read-only.

Fnv hasher with default capacity #44

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

greyblake
Copy link

This PR goes on top of #43

Changes:

  • Initialize entites with default capacity > 0.

This may safe some extra memory allocations.

Benchmarks are again slightly contradictional.
Please consider this PR only like an idea. Feel free to reject it if it does not bring any useful improvement

src/lib.rs Outdated
use std::fmt;
use std::marker::PhantomData;

use pyo3::prelude::*;
use serde::de::{self, DeserializeSeed, Deserializer, MapAccess, SeqAccess, Visitor};
use serde::ser::{self, Serialize, SerializeMap, SerializeSeq, Serializer};

const DEFAUL_HASHMAP_CAPACITY: usize = 10;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be DEFAULT_HASHMAP_CAPACITY I guess. 😉

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right. Shame on me)

@mre
Copy link
Owner

mre commented Sep 21, 2018

Thanks for your PRs @greyblake. For sure sounds like a nice idea to try.
I'm gonna run the benchmarks, but what I would really love to do is running a profiler on the new and the old version to see what changed. Just haven't gotten around to add this to the project (see #38).

@mre
Copy link
Owner

mre commented Sep 21, 2018

So I ran the benchmarks on my machine and the values of this branch are very close to the master branch, well within the standard deviation. That means I can't make any conclusive judgement as to what is the faster version. Maybe others can try to reproduce on their machine?

main.txt
fnv-bench.txt

@greyblake
Copy link
Author

Here are my benchmarks for Python 3.5, done on my laptop (debian).

master.txt
fnv-hasher.txt
fnv-hasher-with-default-capacity.txt

Here the result aggregated in one spreadsheet: https://docs.google.com/spreadsheets/d/1vrERpk-QLZYLQOHu8nh6fIAeTdEbNxp5bJEIPc-N-MM/edit?usp=sharing

However, if I run the benchmark multiple times I get different results, so yea, based on this it's hard to judge if this is a real improvement.
Would it make sense to increase number of iterations in the benchmarks?

@mre
Copy link
Owner

mre commented Sep 22, 2018

Pretty similar to what I saw in my benchmarks.
You can control the number of iterations and rounds with the following parameters:

pipenv run pytest benchmarks --benchmark-min-rounds=BENCHMARK_MIN_ROUNDS --benchmark-warmup-iterations=NUM

The docs are here.
I haven't tried that myself, though. 😉

@greyblake
Copy link
Author

Ok, running the following command:

time pipenv run pytest benchmarks  --benchmark-warmup-iterations=100000 --benchmark-min-rounds=100000

Shows to me a more or less reproducible result (I've tried 2 times).
fnv-hasher branch slightly faster than master, and fnv-hasher-with-default-capacity is slightly faster than fnv-hasher. In particular (total time):

master: 	                  2m40.388s
fnv-hasher:                       2m38.193s
fnv-hasher-with-default-capacity: 2m37.594s

@mre
Copy link
Owner

mre commented Nov 10, 2018

I'm in the process of setting up a machine for profiling. Have a dedicated Linux box now for that purpose. If anybody has time to profile the code before me, feel free to do that and add some data here.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants