Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrate with standard Read/Write IO traits #227

Open
vlmutolo opened this issue Oct 17, 2021 · 12 comments
Open

integrate with standard Read/Write IO traits #227

vlmutolo opened this issue Oct 17, 2021 · 12 comments
Assignees
Labels
new feature New feature or request

Comments

@vlmutolo
Copy link
Contributor

vlmutolo commented Oct 17, 2021

Summary

The Rust ecosystem heavily integrates with std's Read and Write traits. It would be helpful to interoperate with these traits, especially in the context of the aead and hash modules.

Example

Hashing

use orion::hash::{self, Digest};
use std::io::Cursor;

let data = Cursor::new(vec![0, 1, 2, 3, 4]); 
let digest: Digest = hash::from_reader(&mut data);

AEAD

use orion::aead;
use std::io::Cursor;

let data = Cursor::new(vec![0, 1, 2, 3, 4]);
let secret_key = aead::SecretKey::default();

// Encrypt from a `Read` type.
let mut encrypted_data = Vec::new();
aead::seal_copy(&secret_key, &mut data, &mut encrypted_data)?;

// Decrypt into a `Write` type.
let mut decrypted_data = Vec::new();
aead::open_copy(&secret_key, &mut encrypted_data, &mut decrypted_data)?;

I'm definitely open to suggestions on the API described here. Particularly, I'm not sure if we'd want convenience wrappers around the *_copy functions for the AEAD case. The copy functions are fairly general, and will allow for people to allocate buffers how they want. But there's probably a case to be made to provide a simpler type that just outputs a Vec of encrypted/decrypted data.


In any case, the real advantage of implementing these APIs is that someone could do, for example, the following:

use std::fs::File;
use orion::hash;

let file = File::open("filename.txt").unwrap(); // may want to use a BufReader (?)
let digest = hash::from_reader(file).unwrap();

And it would work as expected. The big deal here is that large files should be read and hashed in pieces, which currently requires reaching into the hazardous module and using the "streaming interface" (the update method). This looks something like the following.

use orion::hazardous::hash::blake2b::{Blake2b, Hasher, SecretKey};

// Figuring out the "right" size isn't something users should have to do.
let mut state = Blake2b::new(None, 32)?;
let mut reader = File::open("filename.txt")?;
let mut buf = Vec::new();

while let 0 < reader.read(&mut buf) {
    state.update(buf.as_slice())?;
    buf.clear();
}
let digest = state.finalize()?;

So it's longer, has a little more room for user error, and in general we probably just want to support the existing IO traits.

I already started the implementation, and I'll post a draft PR soon.

Also, I considered making AsyncRead and AsyncWrite part of this issue/PR, but that seems like it should be its own thing. There's talk about integrating those traits into the standard library some time soon, so maybe we should hold off until then.

@vlmutolo vlmutolo added the new feature New feature or request label Oct 17, 2021
@vlmutolo vlmutolo changed the title implement Write for relevant types implement Read/Write for relevant types Oct 17, 2021
@vlmutolo vlmutolo changed the title implement Read/Write for relevant types integrate with standard Read/Write IO traits Oct 17, 2021
@vlmutolo vlmutolo self-assigned this Oct 17, 2021
@brycx
Copy link
Member

brycx commented Oct 18, 2021

I'll take a closer look at this ASAP, but for now I very much agree with leaving AsyncRead/AsyncWrite as an aside.

@brycx
Copy link
Member

brycx commented Oct 23, 2021

AEAD example
This looks very much like the high-level API to me at this point (except of course for the traits). Would we just add support for Read/Write to simply supports these traits in the API of AEAD? Don't we want to consider things like the streaming API for AEAD, which could provide the same benefits when encrypting files as the hash example does?

Hash example
The approach for using the streaming interface seems good to me. I am wondering though, if you intend to have update() take a Read argument or if you want to only provide a from_reader method, which automatically sets the buffer size, BLAKE2b output size etc? In the latter case, incremental hashing of files for example, would happen transparently to the user. However, some users might want to decide buffer size themselves, which is only possible with the former.

@vlmutolo
Copy link
Contributor Author

vlmutolo commented Dec 26, 2021

Don't we want to consider things like the streaming API for AEAD, which could provide the same benefits when encrypting files as the hash example does?

Yes, I think this would be good to support. I'm unclear, though, how to directly convert between the streaming API and the Read and Write traits. Read and Write both work on streams of bytes broken up into chunks of indeterminate size, while it seems like the AEAD streaming API relies on having precisely sized and separated messages to encode/decode. Is there a standard libsodium serialization of these messages into a byte stream?


I am wondering … if you want to only provide a from_reader method, which automatically sets the buffer size, BLAKE2b output size etc?… users might want to decide buffer size themselves

For the output size, this isn't something that Orion currently exposes in the safe API. And if users want to set it, then they can grab the Blake2b type directly and use its Write implementation. The from_reader function would basically just be a safe wrapper around creating a Blake2b type with sane parameters, callingBlake2b's Write impl and calling the std::io::copy function.

For the "buffer", I'm not sure which buffer you mean. If users want to buffer the reader, they can use a BufReader and then pass that to the from_reader function.

@brycx
Copy link
Member

brycx commented Dec 27, 2021

For the "buffer", I'm not sure which buffer you mean. If users want to buffer the reader, they can use a BufReader and then pass that to the from_reader function.

I think I understand now, I think I misunderstood the purpose of from_reader() initially.

@brycx
Copy link
Member

brycx commented Dec 27, 2021

Is there a standard libsodium serialization of these messages into a byte stream?

I see your point, in that this can be tricky. Especially since streaming write directly to an output, and has no internal state. Perhaps, in that case each call to seal_chunk()/open_chunk() would just need a reader and writer as the current API suggestion is for non-streaming AEAD.

I'll have to think about this one a bit more. If we don't find a solution to this, I definitely think that we should start with adding the integration to the hashing primitives and non-streaming AEAD, and then see how that turns out. WDYT?

I was also thinking that perhaps the naming of the functions for non-streaming AEAD could be made a bit less generic. How about sealing_read()/opening_write() or seal_read()/seal_write()?


In any case, I really like the idea behind the hashing primitives so far. This should definitely go in. Regarding another io feature, I don't think this is necessary. If we just gate it behind safe-api that should be fine, just like we do with the std:error:Error trait.

@vlmutolo
Copy link
Contributor Author

vlmutolo commented Dec 28, 2021

each call to seal_chunk()/open_chunk() would just need a reader and writer

Oh, interesting. Yeah this probably would work. Instead of having a single call to e.g. sealing_read() handle all the individual streaming chunks, the user would call sealing_read once per chunk. That sounds good to me.

If we don't find a solution to this, I definitely think that we should start with adding the integration to the hashing primitives and non-streaming AEAD

Yes, agreed. Hashing is pretty straightforward here. Basically just consume any reader and output the digest. Though I do see where the user may want to control the size of the buffer we read into (which I see now is probably what you meant above). Maybe we could do something like internally allocate a Vec<u8> to read into and grow it dynamically until reads don't fill the whole buffer. Kind of like TCP slow-start.

How about sealing_read()/opening_write() or seal_read()/seal_write()?

I like these names. Not sure about sealing_ vs seal_. They're both good. I think I like sealing_read and sealing_write better, but not by much. That's what I'll go with unless you want to switch them.

For the hash names, are you still good with from_reader or should we switch to something again less generic? Maybe digest_from_reader or consume_to_digest? I think digesting_read sounds a bit odd haha.

@brycx
Copy link
Member

brycx commented Dec 28, 2021

Though I do see where the user may want to control the size of the buffer we read into (which I see now is probably what you meant above). Maybe we could do something like internally allocate a Vec to read into and grow it dynamically until reads don't fill the whole buffer. Kind of like TCP slow-start.

Right now I'm kind of lost in all these buffers and read/write directions/sources tbh. Do you perhaps have a small example?

That's what I'll go with unless you want to switch them.

Fine by me. The only downside I see is the small added consistency with the existing method naming of seal/open when doing seal_read/open_write instead. I'm open to both and will let you decide.

For the hash names, are you still good with from_reader or should we switch to something again less generic?

Good point. I very much like the digest_from_reader name. Plays well with the current digest function IMO.

I think digesting_read sounds a bit odd haha.

Almost tempting haha.

@vlmutolo
Copy link
Contributor Author

vlmutolo commented Dec 28, 2021

Do you perhaps have a small example?

Here's what the API might look like.

let reader = File::open("file.txt")?;
let digest = orion::hash::digest_from_reader(reader)?;

Nice and easy. Internal to the digest_from_reader function, we could do something like the following to achieve somewhat of an "ideal" buffer size.

fn digest_from_reader(mut reader: impl Read) -> Result<Digest> {
    // Start with a buffer of 1KB (not sure what this initial value should be)
    let mut read_buffer = vec![0; 1024];
    loop {
        let bytes_read: usize = reader.read(&mut read_buffer)?;
        if bytes_read == 0 { break; }
        if bytes_read == read_buffer.len() {
            // Double the size of the buffer if we filled it up last time.
            read_buffer.extend(std::iter::repeat(0).take(read_buffer.len()));
        }

        // Hash the slice given by: &read_buffer[..bytes_read]
    }
}

@brycx
Copy link
Member

brycx commented Dec 28, 2021

That is a very straight-forward API indeed. Thanks for the example! I think this is a good approach. We can figure out a good buffer size and document it clearly. Assuming users of digest_from_reader() typically pass readers will large amounts of data (an assumption which should probably also be documented if taken), we can take a buffer size of say of 128 or 256 KiB? The reason behind this suggestion stems mostly from the fact that these are the two largest sizes used in benchmarks.

Edit: If we take such large buffers, we probably don't want to resize them dynamically though.

@vlmutolo
Copy link
Contributor Author

Allocating a buffer of 256 KiB seems reasonable. Very few applications would be affected by that use of memory, and those that are can use the hazardous module and Blake2b's implementation of Read. And no, we probably wouldn't want to keep growing that buffer.

@brycx
Copy link
Member

brycx commented Dec 29, 2021

implementation of Read

Just making sure, but this is the Write impl right? I saw no Read impl in the draft PR AFAIR.

Regardless, sounds like a good plan.

@vlmutolo
Copy link
Contributor Author

Yes, exactly. I meant Write.

@brycx brycx added this to the 0.17.2 milestone Feb 2, 2022
@brycx brycx modified the milestones: 0.17.2, 0.17.3 Aug 16, 2022
@brycx brycx modified the milestones: 0.17.3, 0.17.4 Dec 7, 2022
@brycx brycx removed this from the 0.17.4 milestone Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants