Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The rank hash table could be further reduced (~7%) #80

Open
m-anthony opened this issue Apr 25, 2024 · 3 comments
Open

The rank hash table could be further reduced (~7%) #80

m-anthony opened this issue Apr 25, 2024 · 3 comments

Comments

@m-anthony
Copy link

HI, first of all, thank you for sharing this awesome project, I was looking for a card evaluation algorithm for a custom project in Java and yours is clearly excellent (my Java implementation evaluate a random hand in ~8ns, while generating a random hand take ~16ns)

Since I wanted to understand how it worked and not copying magic numbers from your source code, I had a look on the hash table generator script (perfect_hash.py) and I enhance it so that it runs fast enough to do a brute force search of what parameters produce the smallest hash table. The script take 1 hour to compute an hash table, mine is doing the computation in roughly 1 second

Currently, the best value I have are M = 2369371, power = 9 (so side = 512), and it produce a rank array of 38995 elements, down from 42077 in your source code (so it's a 7% decrease on that table)

FYI, Here are the changes I made on your algorithm :

  • write in java, python is way too slow for such a bruteforce task
  • removed "diffused_keys" : it is only used to perform a sanity check that is redundant with the check on square
  • swap both indexes on "square" so that the inner loop iterate on the second indexes : at least in Java, the iteration will be done on elements that are next to each other in memory so it is more CPU-cache friendly
  • using arrays of 64-bits int as a bitsets to easier locate potential collusion faster : instead of iterating on each value from square/ranks, I maintain a bitset indicating where actual values are stored, so using bitwise operators I can iterate only on indexes where i now that both arrays have something else than NOT_A_VALUE. This trick made a 30x speedup (but unfortunately I don't know how to implement it in python to provide a PR

Moreover, there is a mismatch between perfect_hash.py and the current hash table : the scripts uses M=31 which compute an hash table of size 44969 (which was the hash table used in an earlier version of the C code)

@kennethshackleton
Copy link
Owner

Many thanks for your interest and hard work. Your contributions are very welcome, I'll be very interested in reviewing your PR. It's fine to completely replace Python with another (e.g. Java), or have two scripts that co-exist if you wish.

@m-anthony
Copy link
Author

I was in the process of contributing a whole java project for this (not only the script part but having a whole Java evaluator where all "magic numbers" can be recomputed by a single command in the build script, so I still have some work to do to have cleaner code.

During this I realized that your last optimisation of the hashtable was not by using different size parameters, but computing ranks in a different orders. I applied the same technique in my code (effectively obtaining an array of 42077 elements for M=31 as in current code) to my prime factor, and the size decreased even more. Now it is 34843, which is a 17% decrease compared to current code !!

I am still working on this and I will definitely do a PR as soon as the code is clean and readable enough, but since the algorithm to compute the most compact hash table changed, I will redo a bruteforce search to see if another prime can create an even shorter hash table

@kennethshackleton
Copy link
Owner

Wonderful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants