Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AES-CTR (and probably more stream ciphers) are slow when used via the FFI's Stream_Cipher mode. #3925

Closed
aahajj opened this issue Feb 28, 2024 · 8 comments · Fixed by #3951
Closed
Assignees
Labels
performance Go faster
Milestone

Comments

@aahajj
Copy link

aahajj commented Feb 28, 2024

Hello everyone,

I have created a benchmarking tool to compare performance between libraries and have noticed a significant drop in performance with Botan's AES CTR. In my benchmark, I generated plaintexts with a length of 4096 Bytes for each round and measured the time it takes for 1 million iterations. For my performance test I have used the Botan C interface ffi.h.

Results:

AES-NI and SSSE3 enabled:

Library Algorithm Time (seconds)
Botan 3.2.0 AES-128 0.554971
Botan 3.2.0 AES-128/CTR 77.982720
Botan 3.2.0 AES-128/CBC 5.963908
Botan 3.2.0 AES-128/CFB 7.617746

AES-NI disabled:

Library Algorithm Time (seconds)
Botan 3.2.0 AES-128 9.187737
Botan 3.2.0 AES-128/CTR 81.234424
Botan 3.2.0 AES-128/CBC 19.137964
Botan 3.2.0 AES-128/CFB 20.943194

With AES-NI and SSSE3 disabled:

Library Algorithm Time (seconds)
Botan 3.2.0 AES-128 66.311669
Botan 3.2.0 AES-128/CTR 135.916029
Botan 3.2.0 AES-128/CBC 153.400266
Botan 3.2.0 AES-128/CFB 151.787276

Environment:

  • Operating System: Ubuntu 22.04.3 LTS (Codename: jammy)
  • CPU: AMD Ryzen 5 5600U with Radeon Graphics
    • Architecture: x86_64
    • CPU(s): 12
    • Thread(s) per core: 2
    • Core(s) per socket: 6
    • CPU max MHz: 4289,0000
    • CPU min MHz: 400,0000
lscpu | grep "Byte Order"
Byte Order:                         Little Endian

I suspect that the AES CTR encryption process is not being parallelized, which could be causing the significant drop in performance. However, I'm uncertain about the exact reason for this issue and would appreciate your assistance in investigating further.

Thank you and best regards,
Ahmed

Edit 1: add Byte Order
Edit 2: fix a comma typo

@aahajj aahajj changed the title Possible issue with AES CTR Preformance Possible issue with AES CTR Preformance when using ffi.h Feb 28, 2024
@reneme
Copy link
Collaborator

reneme commented Mar 4, 2024

Thanks for sharing your findings! With a quick and dirty run on my machine (*) using botan's integrated benchmarking tool, I see the opposite. Basically running the following CLI command(s):

./botan speed --buf-size=4096000 AES-128
./botan speed --buf-size=4096000 AES-128/CTR-BE
./botan speed --buf-size=4096000 AES-128/CBC
./botan speed --buf-size=4096000 AES-128/CFB
Algorithm Throughput
AES-128 11398.724 MiB/sec
AES-128/CTR-BE 6014.384 MiB/sec
AES-128/CBC 1372.127 MiB/sec
AES-128/CFB 715.270 MiB/sec

This tool doesn't use the FFI. One (technical) difference: CTR-mode is a StreamCipher while all the modes of operation are subclasses of Cipher_Mode. Perhaps this is reasonable starting point for further investigation.

(*) "my machine"

11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz 2.50 GHz
Ubuntu 22.04 (running in WSL)

@reneme
Copy link
Collaborator

reneme commented Mar 4, 2024

Mhh, current working hypothesis: CTR-mode being a stream cipher, it features an update_granularity() of 1 byte (!). Other cipher modes typically use the underlying block size of the bare symmetric algorithm. Also, CTR-mode doesn't provide any authentication tags or paddings, hence, its minimum_final_size() is 0 bytes.

The FFI adapter layer uses this information to determine the update granularity it will use internally:

if(minimum_final_size == 0 || update_granularity > minimum_final_size) {
BOTAN_ASSERT_NOMSG(update_granularity > 0);
return update_granularity;
}

... and then slices the input data accordingly:

const size_t ud = cipher_obj->update_size();
mbuf.resize(ud);
size_t taken = 0, written = 0;
while(input_size >= ud && output_size >= ud) {
// FIXME we can use process here and avoid the copy
copy_mem(mbuf.data(), input, ud);
cipher.update(mbuf);
input_size -= ud;
copy_mem(output, mbuf.data(), ud);
input += ud;
taken += ud;
output_size -= ud;
output += ud;
written += ud;
}

Therefore, we might actually pass the payload data byte-by-byte to the encryption/decryption routine. Obviously, that's awfully slow. 😨

Thanks again, for reporting this!

@reneme reneme self-assigned this Mar 4, 2024
@reneme reneme added the performance Go faster label Mar 4, 2024
@reneme reneme added this to the Botan 3.4.0 milestone Mar 4, 2024
@reneme reneme changed the title Possible issue with AES CTR Preformance when using ffi.h AES-CTR is slow when used via the FFI Mar 4, 2024
@aahajj
Copy link
Author

aahajj commented Mar 4, 2024

Ah, I stumbled upon that function today too, and I was a bit puzzled by the update granularity being just 1 byte. But now that you mention it, it actually makes perfect sense! Thank you for the clarification!

Oh, and by the way, AES-128/CCM also has an update granularity of 1 Byte. CCM, among the three AEAD algorithms I've tested, seems to be the slowest, and I think it might be due to the same issue.

@aahajj
Copy link
Author

aahajj commented Mar 4, 2024

I couldn't help but notice a naming inconsistency between the function botan_cipher_get_ideal_granularity() in the website's documentation and its declaration in

BOTAN_FFI_EXPORT(3, 0) int botan_cipher_get_ideal_update_granularity(botan_cipher_t cipher, size_t* ug);

@reneme reneme changed the title AES-CTR is slow when used via the FFI AES-CTR (and probably more stream ciphers) are slow when used via the FFI's Stream_Cipher mode. Mar 5, 2024
@reneme
Copy link
Collaborator

reneme commented Apr 9, 2024

For the record: This pull request addresses the performance issues described in this ticket. While working on the performance fix, another (functional) issue was found in the same code location (#3971). As a result, the performance fix didn't make it to the just-released 3.4.0 release. The mentioned bugfix landed and got released, however.

@aahajj if you have time, please repeat your measurements after applying the patch suggested in #3951. It should speed up the stream-cipher-esque modes quite significantly.

@aahajj
Copy link
Author

aahajj commented May 4, 2024

Sorry for the late response. I've changed jobs and am officially no longer working on that project. I was about to redo the measurements again but found that the patch is not merged yet, pending review from @randombit. By the way, I obtained approval to publish the tool on my GitHub page if you want to take a look at it: CBOS. Once the pull request is merged, I will happily do the measurements again.

@reneme
Copy link
Collaborator

reneme commented May 23, 2024

I did a quick-and-dirty run with CBOS, first against the latest Botan master and then against #3951.

The effect is especially visible for CTR and CCM modes which go from 0.0x bytes/cycle to almost 2 bytes/cycle. Also other modes seem to benefit from a rough 2x speedup. LGTM!

Thanks again for reporting.

master

Algorithm Bytes/Cycle
AES-128/CTR 0.045291
AES-128/CBC 0.202803
AES-128/CFB 0.318757
AES-128/GCM 0.344758
AES-128/OCB 0.301191
AES-128/CCM 0.095022

fix/performance_of_ffi_stream_ciphers

Algorithm Bytes/Cycle
AES-128/CTR 1.825574
AES-128/CBC 0.682218
AES-128/CFB 0.545389
AES-128/GCM 0.863399
AES-128/OCB 0.888618
AES-128/CCM 1.436926

@aahajj
Copy link
Author

aahajj commented May 23, 2024

looks much better now :)

@aahajj aahajj closed this as completed May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Go faster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants