Expose the LMDB Encrypt/Decrypt feature #134

Kerollmops · 2022-09-16T19:05:54Z

Call for Advice

As a reminder, heed is a wrapper on top of LMDB. LMDB is a memory-mapped key-value store battle-tested for a long time. This branch introduces a wrapper on top of the "recently" released support for page-level encryption. It has been released in the latest version of LMDB, and heed has been updated in #128. This PR will be merged in it after the reviews.

I would like help from people who know how to use encryption algorithms. Maximum feedback on the new encryption and checksum API would be appreciated. As I am not very proficient in cryptography, would it be possible to help me with these subjects:

How can I document every parameter of the Encrypt::encrypt_decrypt trait method?
Would it be possible to help me create a complex example that uses these parameters? ☝️
Would it be possible to help me create an example that uses the auth_size parameter of the EnvOpenOptions::encrypt_with method?
I added a piece of advice to the documentation of the encrypt_with method to set a checksum too, and to interpret a checksum error as a decryption error. Is it correct to do so?
Is there a more idiomatic way not to declare the Checksum and Encrypt traits when we don't need them in the EnvOpenOptions instead of having dummy types? (Would be so good when the never type is released).

Note that I already have successfully used this new API and exposed a simple example that uses the ChaCha20 crate. I would like to know if this example is valid, please.

You'll probably need the documentation of the API of LMDB. Unfortunately, the only one that is available only is outdated. I advise you to install and run Doxygen and generate the doc from lmdb-master3-sys/lmdb/libraries/liblmdb (don't forget to git submodule init --update when you clone this repo), or you can directly read the raw documentation from there too.

Where Should I Start?

To help me, you can simply create a new Rust crate, add this line to your dependencies and inspire yourself by looking at the simple example I linked above ☝️:

heed = { git = "https://github.com/meilisearch/heed", branch = "encdec-func" }

Once you've found improvements, please comment directly in this PR ❤️ 👨‍💻

What should I do next?

Make the EnvOpenOptions::encrypt_with method accept a Key of the right length and let the user provide it by using its own KDF and salt. Heed must not know about the KDF used.
Remove the Checksumming feature from this crate for now, we can introduce in another PR.
Move the DummyEncrypt traits impls elsewhere e.g., end of the file, another module, to avoid polluting this part of the code.
Check the features of the newly introduced dependencies to disable the useless ones.
Review and rewrite the documentation of the new methods and modules added in this PR.
Document the size of the nonce provided by LMDB.
XChaCha20 needs a bigger nonce than the one provided but there is no need in using it with LMDB anyway.
We should probably remove the DummyEncrypt type and rename it NoopEncrypt to miror the Support customized key compare function #212 changes.

sgued · 2022-10-31T21:11:54Z

I see multiple issues with the current PR, mostly coming from the LMDB API.

TLDR: LMDB should probably expose an AEAD API instead of the current "handmade" API.

Unsoundness

Unwinding across FFI is UB, so this is unsound. The callback should be wrapped in a catch_unwind.

AEAD

Instead of rolling your own traits I'd recommend leveraging the existing RustCrypto ecosystem, especially the aead crate. What you'd probably be looking for are the AeadMutInPlace (using their detached methods), and the KeyInit trait to initialize with the key. GenericArray is not obvious to use but they implement TryFrom<&[u8]> for &GenericArray<…> and have an as_slice method. Take a generic parameter <A: KeyInit + AeadMutInPlace>.

AEAD stands for Authenticated Encryption with Associated Data, which is the standard API for authenticated data. It is much better to not separate the encryption from the authentication (checksum), as it leaves less room for error and also can improve performance. For crates implementing AEADs look at (X)ChaCh20Poly1305, AES-GCM and AES-GCM-SIV.

The example

The example is somewhat broken. CRC codes are not cryptographically secure. Even with the encryption, it may be possible for an attacker to modify the data so that the checksum passes and modify it subtly without it being noticeable. AEADs prevent that kind of issue :)

Limits of LMDB

The API proposed by LMDB separates the encryption from the authentication(checksum), which is bad as it prevents efficient usage of standard, misuse-resistant AEAD constructs. LMDB also performs the checksum before the actual encryption, which is very dangerous. Also, just looking at the code I wasn't able to find how the nonce (iv) is generated. If the same iv is used twice with the same key, it leads to the plaintext being recoverable and sometimes even the key itself. I'm worried that if the nonce is generated randomly, it could lead to nonce reuse, since the nonces of standard AEADs are rather small (96bits), and you reach 50% chances of a collision after $2^{48}$ encryption operations, which can happen for a long-lived database. In that cases either longer nonces should be used (the X variants of chacha20poly1305), or a nonce-reuse resistant AEAD should be used (aes-gcm-siv). Either way the source of the nonces should probably be documented.

It appears that the functionality is not yet stabilized (still a release candidate) so I'd recommend contacting them to change the encryption API to be that of an AEAD:

Two fallible functions, one for encryption and one for decryption.
- Encryption
  - In: Key, Plaintext, Nonce and Associated data
  - Out: Ciphertext and Authentication tag
- Decryption:
  - In: Key, Nonce, Ciphertext, Associated data, Tag
  - Out: Plaintext

The associated data could be whatever metadata LMDB needs but cannot encrypt. The AEAD ensures that is hasn't been tampered.

If needed a checksum could also be used, but it should be separate from the encryption (even maybe disallowed to be used at the same time) and used only to detect accidental data corruption.

Kerollmops · 2022-11-04T16:37:59Z

Thank you very much @sgued for this detailed explanation,

I just pushed a commit to catch the possible panics of the checksum and encryption functions. Unfortunately, there is no way to tell LMDB that the checksum function failed. Should I abort the program instead of doing nothing?

About the example that is wrong, as I am using a non-cryptographic valid checksum method, what do you recommend? Is there anything that can be done without implementing an AEAD interface that checksums by itself?

I will try to ping @hyc here, and if he doesn't respond I will probably contact him via Twitter. So Howard, if you read me, I am trying to expose a safe LMDB wrapper for Rust users. As I am not a cryptographer, I asked the community for help with how I have designed this wrapper so far. @sgued answered just above. Could you take a look at this, if you have time, please? What do you think about exposing an API closer to an AEAD interface?

hyc · 2022-11-04T17:52:40Z

With respect, @sgued's review is incorrect. LMDB's API is fully compliant with AEAD APIs and the code already provides an example implementation thereof.

https://github.com/LMDB/lmdb/blob/mdb.master3/libraries/liblmdb/crypto.c
https://github.com/LMDB/lmdb/blob/mdb.master3/libraries/liblmdb/mtest_enc2.c

The LMDB checksum API is only for use when you only want a checksum and no encryption.

sgued · 2022-11-04T18:56:23Z

Oh, sorry, I was mislead by the documentation:

The result must be the same number of bytes as the input.
@param[in] src The input data to be transformed.
@param[out] dst Storage for the result.
@param[in] key An array of three values: key[0] is the encryption key,
key[1] is the initialization vector, and key[2] is the authentication
data, if any.

That doesn't leave any place for outputting the authentication tag, and given that MDB_sum_func also takes a key as an argument I jumped to conclusions.

Looking at your example it appears that key[2] serves as output and input, which is not clear from the docs. It makes more sense to me now. The API looked a bit weird I should have asked for clarifications rather than changing it 😅 sorry.

Kerollmops · 2022-11-05T11:41:44Z

Thank you very much Howard, for your intervention. It is much appreciated 🔥

Thank you @sgued, for helping me. Could you help me understand and design a good interface on top of LMDB? Can I ask the user to give me an AeadMutInPlace object? It looks complex because this trait asks for a &mut self, and LMDB doesn't seem to let the user pass any user data.

According to Howard's analysis, I conclude that I should remove the comment I added about preferring encryption with checksum and interpret the checksum error as a wrong password key. Instead, we should be able to use a valid AEAD algorithm that also checksums cryptographically correctly and returns an appropriate error. Am I right?

hyc · 2022-11-05T12:21:04Z

By the way, I would encourage you to copy the interface used in liblmdb/crypto.c and encapsulate it as a dynamic object, the way that sample does. That way the LMDB commandline tools can load your crypto mechanism and operate on your encrypted databases still.

sgued · 2022-11-06T11:58:55Z

Thank you @sgued, for helping me. Could you help me understand and design a good interface on top of LMDB? Can I ask the user to give me an AeadMutInPlace object? It looks complex because this trait asks for a &mut self, and LMDB doesn't seem to let the user pass any user data.

You don't need to pass user data. The KeyInnit trait gives you a new method that allows you to create it directly from the key. Something like that:

use aead::{AeadMutInPlace, Key, KeyInit, Nonce, Tag};

fn encrypt<A: AeadMutInPlace + KeyInit>(
    key: &[u8],
    nonce: &[u8],
    aad: &[u8],
    plaintext: &[u8],
    chipertext_out: &mut [u8],
    auth_out: &mut [u8],
) {
    chipertext_out.copy_from_slice(plaintext);
    let key: &Key<A> = key.try_into().unwrap();
    let nonce: &Nonce<A> = nonce.try_into().unwrap();
    let mut aead = A::new(key);
    let tag = aead
        .encrypt_in_place_detached(nonce, aad, chipertext_out)
        .unwrap();
    auth_out.copy_from_slice(&tag);
}

fn decrypt<A: AeadMutInPlace + KeyInit>(
    key: &[u8],
    nonce: &[u8],
    aad: &[u8],
    chipher_text: &[u8],
    output: &mut [u8],
    auth_in: &[u8],
) -> bool {
    output.copy_from_slice(chipher_text);
    let key: &Key<A> = key.try_into().unwrap();
    let nonce: &Nonce<A> = nonce.try_into().unwrap();
    let tag: &Tag<A> = auth_in.try_into().unwrap();
    let mut aead = A::new(key);
    aead.decrypt_in_place_detached(nonce, aad, output, tag)
        .is_ok()
}

fn with_chacha() {
    use aead::{generic_array::typenum::Unsigned, AeadCore, KeySizeUser};
    use chacha20poly1305::ChaCha20Poly1305;
    let mut pl = [1; 1024];
    let aad = [];
    let mut ciphertext = [0; 1024];
    // AeadCore and KeySizeUser are required for all T: AeadMutInPlace + KeyInnit, so you can use them to get the size of the key/tag/nonce.
    let nonce = [0; <ChaCha20Poly1305 as AeadCore>::NonceSize::USIZE];
    let mut auth = [0; <ChaCha20Poly1305 as AeadCore>::TagSize::USIZE];
    let key = [2; <ChaCha20Poly1305 as KeySizeUser>::KeySize::USIZE];
    encrypt::<ChaCha20Poly1305>(&key, &nonce, &aad, &pl, &mut ciphertext, &mut auth);
    pl = [0; 1024];
    assert!(decrypt::<ChaCha20Poly1305>(&key, &nonce, &aad, &ciphertext, &mut pl, &auth));
}

Kerollmops · 2022-11-12T13:04:08Z

I tried working with the AeadMutInPlace and KeyInit traits with LMDB, but I got a runtime error with the size of the initialization vector LMDB returns. It gives a 16 bytes array when ChaCha20 asks for a 12 bytes length array as a nonce. I am probably doing something wrong here and should derive a nonce from the IV.

Could someone help me with that subject?

You can reproduce the issue by cargo run --example encrypt, and if you want a better backtrace, you can RUST_BACKTRACE=1.

Kerollmops · 2022-11-12T15:29:01Z

Ok, so I was able to encrypt/decrypt an LMDB database with ChaCha20Poly1305. The thing is that I cropped the 16 bytes initialization vector coming from LMDB to be 12 bytes and be used by this algorithm. I don't know if it is cryptographically correct. What should I do if an algorithm needs a bigger nonce (iv) like XChaCha20Poly1305 that requires a 20 bytes nonce?

@hyc and @sgued, Can I ask you a technical question about how I should design an app on top of LMDB? I am designing a small app that stores files in an encrypted LMDB; this app will likely be encrypted with ChaCha20Poly1305, so the key will be 20 bytes long. As I can't ask a user to use a 20-byte long password, I thought about hashing the user password to output a 20 bytes long key and use that for the encryption. I also read about the difference between an encryption key and a password (on quora 😬), but they advise creating a very long key and then using the user password to encrypt it instead. Unfortunately, the key must be 20-byte long again and I am not sure we can do that in LMDB.

What is your advice about that?

hyc · 2022-11-12T17:07:20Z

I tried working with the AeadMutInPlace and KeyInit traits with LMDB, but I got a runtime error with the size of the initialization vector LMDB returns. It gives a 16 bytes array when ChaCha20 asks for a 12 bytes length array as a nonce. I am probably doing something wrong here and should derive a nonce from the IV.

Could someone help me with that subject?

Again, just study the example in liblmdb/crypto.c. Just truncate.

hyc · 2022-11-12T17:24:41Z

What should I do if an algorithm needs a bigger nonce (iv) like XChaCha20Poly1305 that requires a 20 bytes nonce?

XChaCha20 is used to protect against nonce misuse in systems that use randomly generated nonces (i.e., to protect against weak RNGs). There is no need to use that in LMDB since LMDB nonces aren't random and are guaranteed to be unique.

As I can't ask a user to use a 20-byte long password, I thought about hashing the user password to output a 20 bytes long key and use that for the encryption.

What you're asking about here is known as a Key Derivation Function, KDF. https://en.wikipedia.org/wiki/Key_derivation_function Popular choices are pbkdf or argon2. Again, for LMDB, you should expose whatever you use in your module as the string2key function.

Kerollmops · 2022-12-16T17:17:02Z

Call for Testing

Hey Rustaceans, I did a lot of work to provide the "recently" released support for page-level encryption of LMDB in heed. It is exposed as an encrypt_with function on the EnvOpenOptions builder. You can find a very basic example in the examples folder. I am asking you to help me test this feature to improve it, document it better, or fix it in the worst case:

Try it with a set of entries and custom LMDB flags/options, but please report LMDB bugs on the dedicated website.
Try it with different AEAD crates.
Be creative and use it in different projects or contexts.

I would be glad if you help me improve this crate. Note that this PR is part of a massive rewrite of the crate to use the latest version of LMDB. You'll notice many undocumented breaking changes compared to the other versions of heed.

Kerollmops · 2023-01-11T10:27:24Z

I must cherry-pick these 18 commits on top of #128 instead of rebasing...

Kerollmops · 2023-11-26T14:05:38Z

This branch has just been rebased on the latest main commit 3a20c96.

hyc · 2023-11-26T19:59:51Z

There is an open report claiming that AEAD is not working. I've not seen a reproducer for it. https://bugs.openldap.org/show_bug.cgi?id=9920 If your testing turns up any problems in LMDB please followup there.

Kerollmops force-pushed the encdec-func branch 5 times, most recently from 41bd7c7 to 13a78ff Compare September 23, 2022 13:32

Kerollmops force-pushed the encdec-func branch 7 times, most recently from bb35c93 to b75b672 Compare October 30, 2022 11:37

Kerollmops changed the base branch from main to a-new-era October 30, 2022 12:02

Kerollmops added the help wanted Extra attention is needed label Oct 30, 2022

Kerollmops force-pushed the encdec-func branch from 7bf57bf to 248ed54 Compare November 12, 2022 15:19

Kerollmops mentioned this pull request Nov 23, 2022

Publish milli in crates.io meilisearch/meilisearch#3367

Open

Kerollmops force-pushed the encdec-func branch 2 times, most recently from b9d1c21 to 75bdf1f Compare December 16, 2022 16:50

Kerollmops force-pushed the a-new-era branch from 580168f to 31e4ddb Compare January 11, 2023 10:17

Kerollmops mentioned this pull request Jan 11, 2023

Rework heed to be lightweight and simpler to maintain #128

Merged

20 tasks

Kerollmops force-pushed the a-new-era branch from 31e4ddb to 2d032f7 Compare January 11, 2023 10:23

Kerollmops force-pushed the a-new-era branch 2 times, most recently from 7d7dac4 to df71fed Compare January 11, 2023 10:44

Base automatically changed from a-new-era to main January 11, 2023 11:13

Kerollmops added this to the v0.21.0 milestone Jan 11, 2023

Kerollmops force-pushed the encdec-func branch 2 times, most recently from 6efe3b8 to 341bcb6 Compare January 11, 2023 20:49

Kerollmops and others added 16 commits November 26, 2023 14:50

Change the submodule to use mdb.master3

75d39a2

Expose the LMDB encrypt/decrypt and checksum features

b2cc01f

Make the SimplifiedOpenOptions public and documented

a1ac979

Add a method name to the Checksum trait and document it

d0c7136

Add a method name to the Encrypt trait and document it

b771592

Document the encrypt_with and checksum_with EnvOpenOptions methods

760cc1f

Update the SimplifiedOpenOptions to store the checksum and encrypt names

08650df

Use catchunwind for when functions can panic

5fa5c15

Use the aead crate and traits instead

d860864

Make the encrypt_with method to accept a Key and update the example

072d63d

Remove the useless chacha20 dependency

1f6c67a

Remove the checksum feature to reintroduce it later

ab1aa02

Move the DummyEncrypt type lower in the file

59922f3

Disable the default features of the aead dependency

84dffc5

Rewrite the documentation of the new encryption methods

fe67d92

Remove inexistant imports

c800c0a

Kerollmops force-pushed the encdec-func branch from 341bcb6 to c800c0a Compare November 26, 2023 14:04

Fix a small doc test

72ca1f5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose the LMDB Encrypt/Decrypt feature #134

Expose the LMDB Encrypt/Decrypt feature #134

Kerollmops commented Sep 16, 2022 •

edited

sgued commented Oct 31, 2022

Kerollmops commented Nov 4, 2022

hyc commented Nov 4, 2022 •

edited

sgued commented Nov 4, 2022 •

edited

Kerollmops commented Nov 5, 2022 •

edited

hyc commented Nov 5, 2022

sgued commented Nov 6, 2022

Kerollmops commented Nov 12, 2022 •

edited

Kerollmops commented Nov 12, 2022 •

edited

hyc commented Nov 12, 2022

hyc commented Nov 12, 2022

Kerollmops commented Dec 16, 2022 •

edited

Kerollmops commented Jan 11, 2023

Kerollmops commented Nov 26, 2023

hyc commented Nov 26, 2023

Expose the LMDB Encrypt/Decrypt feature #134

Are you sure you want to change the base?

Expose the LMDB Encrypt/Decrypt feature #134

Conversation

Kerollmops commented Sep 16, 2022 • edited

Call for Advice

Where Should I Start?

What should I do next?

sgued commented Oct 31, 2022

Unsoundness

AEAD

The example

Limits of LMDB

Kerollmops commented Nov 4, 2022

hyc commented Nov 4, 2022 • edited

sgued commented Nov 4, 2022 • edited

Kerollmops commented Nov 5, 2022 • edited

hyc commented Nov 5, 2022

sgued commented Nov 6, 2022

Kerollmops commented Nov 12, 2022 • edited

Kerollmops commented Nov 12, 2022 • edited

hyc commented Nov 12, 2022

hyc commented Nov 12, 2022

Kerollmops commented Dec 16, 2022 • edited

Call for Testing

Kerollmops commented Jan 11, 2023

Kerollmops commented Nov 26, 2023

hyc commented Nov 26, 2023

Kerollmops commented Sep 16, 2022 •

edited

hyc commented Nov 4, 2022 •

edited

sgued commented Nov 4, 2022 •

edited

Kerollmops commented Nov 5, 2022 •

edited

Kerollmops commented Nov 12, 2022 •

edited

Kerollmops commented Nov 12, 2022 •

edited

Kerollmops commented Dec 16, 2022 •

edited