perf: C++ CRC calculation speed is suboptimal #708

james-rms · 2022-11-04T03:18:15Z

CRC calculation in the writer can be a bottleneck in some situations. Here lies the ticket to track making it faster.
Relates to: #707 #706

foxymiles · 2022-11-04T03:24:35Z

Maybe worth a look at SIMD implementations? https://chromium.googlesource.com/chromium/src/+/HEAD/third_party/zlib/crc32_simd.c

foxymiles · 2022-11-04T03:28:19Z

Some comments on this post claim 14x speedup for SIMD over zlib crc32: https://fastcompression.blogspot.com/2019/03/presenting-xxh3.html

james-rms · 2022-11-04T03:42:27Z

@wkalt notes:

for the chunk CRC (not the data CRC), it looks like we do an update every time a message gets written, but we also have the full uncompressed chunk data in hand when we finalize the chunk. If that understanding is correct, I wonder if there would be a benefit to computing the uncompressed CRC for the chunk in one shot when the chunk gets finalized rather than updating on each message we write
if the issue is the lookup table falling out of CPU cache, seems like that could potentially help. Also seems like it would eliminate sensitivity to message size though I'm not really sure we actually see that in the charts. Weirdly the charts seem to show that mixed message sizes are quicker than either small or large - though kinda hard to say

wkalt · 2022-11-04T04:08:29Z

related to above, I observe some improvement between the two cases in this go program: https://gist.github.com/wkalt/22dbfad2a353443b4a812fe950b5bb2d simulating per-message CRC (kilobyte size) vs per-chunk CRC (5 MB size).

It's a 34% speedup on a single-core digitalocean vm. On my much more capable laptop it narrows to a bit over 7% improvement. This isn't testing the same C++ code but might help to validate the strategy.

james-rms added the feature New feature or request label Nov 4, 2022

jtbandes added the c++ label Nov 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: C++ CRC calculation speed is suboptimal #708

perf: C++ CRC calculation speed is suboptimal #708

james-rms commented Nov 4, 2022 •

edited

foxymiles commented Nov 4, 2022

foxymiles commented Nov 4, 2022

james-rms commented Nov 4, 2022

wkalt commented Nov 4, 2022

perf: C++ CRC calculation speed is suboptimal #708

perf: C++ CRC calculation speed is suboptimal #708

Comments

james-rms commented Nov 4, 2022 • edited

foxymiles commented Nov 4, 2022

foxymiles commented Nov 4, 2022

james-rms commented Nov 4, 2022

wkalt commented Nov 4, 2022

james-rms commented Nov 4, 2022 •

edited