Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Knowing the amount of bits that will be written #11

Closed
cBournhonesque opened this issue Sep 17, 2023 · 6 comments
Closed

Knowing the amount of bits that will be written #11

cBournhonesque opened this issue Sep 17, 2023 · 6 comments

Comments

@cBournhonesque
Copy link

Hi,

I'd like to use bitcode for games networking; and it would be useful to have a function to know how many bits/bytes a structure would take if it were encoded, but without doing the actual encoding (so that i know in which packet i can put the encoded data).

Something similar to https://docs.rs/bincode/latest/bincode/fn.serialized_size.html

@finnbear
Copy link
Member

finnbear commented Sep 17, 2023

Unlike bincode, bitcode doesn't support serializing into a mutable packet structure or stream because performance would suffer from lack of alignment/wide-integer instructions. bitcode only serializes into Vec<u8> (via allocation) or &[u8] (via &mut bitcode::Buffer).

As a result, the minimal-allocation method is to reuse a bitcode::Buffer (or pool of them) and copy from the resulting &[u8] into your packet, at which point you know the number of bytes from <&[u8]>::len().

Feel free to give other/more specific reasons to implement this functionality, e.g. a code example, taking into account the above limitations.

@cBournhonesque
Copy link
Author

I'm not sure I fully understood your comment; what I meant was a trait like this: https://github.com/naia-lib/naia/blob/main/shared/serde/src/serde.rs#L4

Where there could be an additional function that simply returns the amount of bytes that the struct/enum will be serialized into, but without doing the actual serialization. For example via these kinds of implementations: https://github.com/naia-lib/naia/blob/main/shared/serde/src/impls/string.rs#L28

@finnbear
Copy link
Member

finnbear commented Sep 19, 2023

For example via these kinds of implementations: https://github.com/naia-lib/naia/blob/main/shared/serde/src/impls/string.rs#L28

Thanks for providing a code example! It looks like you are using the bit length to decide whether to serialize the message at all, which could legitimately benefit from the functionality.

(Edit: FWIW, I tried implementing the desired functionality on the predict_len branch).

@caibear
Copy link
Member

caibear commented Sep 19, 2023

I avoided adding something similar to bincode::serialized_size since I've noticed lots of people misuse it to allocate buffers with capacity as an optimization. This usually results in half the performance and double the binary size for everything but the most trivial structures (see bincode-org/bincode#401).

it would be useful to have a function to know how many bits/bytes a structure would take if it were encoded, but without doing the actual encoding (so that i know in which packet i can put the encoded data).

I would advise serializing each structure to a Vec<u8> with bitcode::encode and then appending as many as possible to another Vec<u8>, each with a length prefix such as a u16 or u32. The length prefix is required so you can pass a &[u8] of the original structure length to bitcode::decode.

While copying the bytes isn't ideal, it should be much faster than something like serialized_size.

@finnbear
Copy link
Member

finnbear commented Sep 19, 2023

@caibear brings up some good points against implementing this and a possible alternative for your code.

Here is one more possible alternative for you, in the form of code that you can drop in to your project:

    use std::cell::RefCell;
    use serde::Serialize;
    use bitcode::{Encode, Buffer, Error};

    // for serde::Serialize
    fn serialize_len<T: Serialize + ?Sized>(t: &T) -> Result<usize, Error> {
        thread_local! {
            static BUFFER: RefCell<Option<Buffer>> = RefCell::new(None);
        }

        BUFFER.with(|buffer| {
            let mut buffer = buffer.borrow_mut();
            if buffer.is_none() {
                *buffer = Some(Default::default());
            }
            buffer.as_mut().unwrap().serialize(t).map(|bytes| bytes.len())
        })
    }

    // for bitcode::Encode
    fn encode_len<T: Encode + ?Sized>(t: &T) -> Result<usize, Error> {
        thread_local! {
            static BUFFER: RefCell<Option<Buffer>> = RefCell::new(None);
        }

        BUFFER.with(|buffer| {
            let mut buffer = buffer.borrow_mut();
            if buffer.is_none() {
                *buffer = Some(Default::default());
            }
            buffer.as_mut().unwrap().encode(t).map(|bytes| bytes.len())
        })
    }

Use these as a last resort if you can't refactor your code as suggested by @caibear. By reusing the Buffer, they avoid repeated memory allocations. They don't require additional codegen and won't be significantly slower than my predict_len changes mentioned above.

@cBournhonesque
Copy link
Author

Thank you!
In general i'll be encoding everything in a buffer of size UDP_PACKET_SIZE (around 1400 bytes), so i wouldn't be using this to optimize allocations.
Both options that you provided make sense to me.

@finnbear finnbear closed this as not planned Won't fix, can't repro, duplicate, stale Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants