Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: C++ CRC calculation speed is suboptimal #708

Open
james-rms opened this issue Nov 4, 2022 · 4 comments
Open

perf: C++ CRC calculation speed is suboptimal #708

james-rms opened this issue Nov 4, 2022 · 4 comments
Labels
c++ feature New feature or request

Comments

@james-rms
Copy link
Collaborator

james-rms commented Nov 4, 2022

CRC calculation in the writer can be a bottleneck in some situations. Here lies the ticket to track making it faster.
Relates to: #707 #706

@james-rms james-rms added the feature New feature or request label Nov 4, 2022
@foxymiles
Copy link
Contributor

Maybe worth a look at SIMD implementations? https://chromium.googlesource.com/chromium/src/+/HEAD/third_party/zlib/crc32_simd.c

@foxymiles
Copy link
Contributor

Some comments on this post claim 14x speedup for SIMD over zlib crc32: https://fastcompression.blogspot.com/2019/03/presenting-xxh3.html

@james-rms
Copy link
Collaborator Author

@wkalt notes:

for the chunk CRC (not the data CRC), it looks like we do an update every time a message gets written, but we also have the full uncompressed chunk data in hand when we finalize the chunk. If that understanding is correct, I wonder if there would be a benefit to computing the uncompressed CRC for the chunk in one shot when the chunk gets finalized rather than updating on each message we write
if the issue is the lookup table falling out of CPU cache, seems like that could potentially help. Also seems like it would eliminate sensitivity to message size though I'm not really sure we actually see that in the charts. Weirdly the charts seem to show that mixed message sizes are quicker than either small or large - though kinda hard to say

@wkalt
Copy link
Contributor

wkalt commented Nov 4, 2022

related to above, I observe some improvement between the two cases in this go program: https://gist.github.com/wkalt/22dbfad2a353443b4a812fe950b5bb2d simulating per-message CRC (kilobyte size) vs per-chunk CRC (5 MB size).

It's a 34% speedup on a single-core digitalocean vm. On my much more capable laptop it narrows to a bit over 7% improvement. This isn't testing the same C++ code but might help to validate the strategy.

@jtbandes jtbandes added the c++ label Nov 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ feature New feature or request
Development

No branches or pull requests

4 participants