Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Otimize duplicut for SSDs #22

Open
nil0x42 opened this issue Sep 20, 2020 · 1 comment
Open

Otimize duplicut for SSDs #22

nil0x42 opened this issue Sep 20, 2020 · 1 comment

Comments

@nil0x42
Copy link
Owner

nil0x42 commented Sep 20, 2020

HDD vs SSD

On HDD, sequential access is relatively fast, while random access is terribly slow. That's why duplicut, written back in 2014 has been optimized thinking of it.
It made at that time no sense to have multiple threads reading concurrently a massive wordlist's content, so sequential access with a single thread was more performant when all lines could fit in hashmap at once.

Now we entered the SSD era, concurency could leverage great performance, as random access is way faster.

@solardiz suggested OpenMP, which would probably increase perf a lot.

TODO

  • compare duplicut/unique/rling on HDD to verify my assumption
  • compare duplicut/unique/rling on massive wordlist (>30GB)

@solardiz i'd love your suggestions & opinion about duplicut & ways to optimize 😄

@solardiz
Copy link
Contributor

My idea was to continue reading the input (or previously-written part of output when we're low on RAM) sequentially (tricky to do otherwise when the input is lines of varying lengths), but buffer it rather than process it against the hash table line by line. Once the buffer fills up, process it with multiple threads (mark for removal entries that are seen in the global hash table). Then repeat for the next buffer's worth of input. I guess a reasonable buffer size can be a few MB (maybe similar to L3 cache size). A complication is dealing with duplicates within a buffer - perhaps that needs to be taken care of separately, maybe using a separate smaller hash table, and maybe sequentially.

Random reading of input is also possible, perhaps skipping until the start of a new line and somehow processing the partial lines on block boundaries separately.

I suggested OpenMP because that's what we already use in JtR, and because it's easy to use in this way. Since you already use explicit pthreads, you probably shouldn't mix different threading technologies. You can implement the above with either technology.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants