Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rust: optional protobuf schema and direct protobuf message writing support #688

Open
neilisaac opened this issue Oct 28, 2022 · 4 comments
Labels
feature New feature or request rust Related to the rust implementation

Comments

@neilisaac
Copy link

neilisaac commented Oct 28, 2022

Encoding protobuf message in mcap logs is a common use case that could be streamlined. Currently a file descriptor set must be assembled and write to the schema manually, and messages must be encoded into a buffer before writing that buffer to mcap.

There are two popular rust protobuf crates: protobuf and prost. protobuf provides reflection and json encoding support, whereas prost generates more idiomatic rust structs with less heap allocations. Supporting both may be relevant to this project, however working with protobuf may be easier since message descriptors can be accessed via the MessageDyn trait, while prost (or protoc) can be configured to generate a file descriptor set file that must be loaded in order to write file descriptor sets to the mcap schema.

At the moment I'm using protobuf and am more familiar with it.

A protobuf file descriptor set schema may be naively built:

fn protobuf_schema(
    message_descriptor: &protobuf::reflect::MessageDescriptor,
) -> Result<Arc<Schema<'static>>, protobuf::Error> {
    fn collect_dependencies(
        descriptor: &protobuf::reflect::FileDescriptor,
        already_collected: &mut HashSet<String>,
    ) -> Vec<protobuf::descriptor::FileDescriptorProto> {
        let mut descriptors = vec![descriptor.proto().to_owned()];
        already_collected.insert(descriptor.name().to_string());
        for dep in descriptor.deps() {
            if already_collected.get(dep.name()).is_none() {
                descriptors.extend(collect_dependencies(dep, already_collected));
            }
        }
        descriptors
    }

    let data = Cow::Owned(protobuf::Message::write_to_bytes(
        &protobuf::descriptor::FileDescriptorSet {
            file: collect_dependencies(message_descriptor.file_descriptor(), &mut HashSet::new()),
            ..Default::default()
        },
    )?);

    Ok(Arc::new(Schema {
        name: message_descriptor.full_name().to_string(),
        encoding: "protobuf".to_string(),
        data,
    }))
}

It would be nice for this functionality to be built into the rust mcap library ex.

Writer::add_protobuf_channel(&mut self, topic: String, message_descriptor: &protobuf::reflect::MessageDescriptor, metadata: Option<BTreeMap<String, String>>) -> Result

In order to encode a message, we can do:

fn write_protobuf_message<W: Write + Seek>(
    writer: &mut mcap::Writer<W>,
    channel_id: u16,
    sequence_number: u32,
    log_time: SystemTime,
    publish_time: SystemTime,
    message: &dyn protobuf::MessageDyn,
) -> anyhow::Result<()> {
    // TODO: ideally use write_to_writer_dyn to avoid extra allocation and copy
    let data = message.write_to_bytes_dyn().context("encode protobuf")?;
    writer
        .write_to_known_channel(
            &mcap::records::MessageHeader {
                channel_id,
                sequence: sequence_number,
                log_time: log_time
                    .duration_since(SystemTime::UNIX_EPOCH)
                    .unwrap_or_default()
                    .as_nanos() as u64,
                publish_time: publish_time
                    .duration_since(SystemTime::UNIX_EPOCH)
                    .unwrap_or_default()
                    .as_nanos() as u64,
            },
            &data,
        )
        .context("write mcap message")
}

however this requires encoding the message into a temporary buffer before writing to the mcap writer.

Writer could potentially expose a method to borrow the a Write object ex. Writer::message_writer(&mut self, channel_id, sequence, log_time, publish_time) -> &MessageWriter where MessageWriter implements Write and computes the message length for you to allow using protobuf::MessageDyn::write_to_writer_dyn(&self, w: &mut dyn Write)

The library could also expose a convenience method ex. Writer::write_protobuf_message_to_existing_channel(channel_id, sequence, log_time, publish_time, message: &protobuf::MessageDyn).

If zero-copy encoding is to be supported for prost too, we could have something like Writer<W: BufMut>::write_prost_message_to_existing_channel(channel_id, sequence, log_time, publish_time, message: &prost::Message)

This functionality could be added as optional features in the mcap crate, or as additional crate(s) implementing an extension trait on Writer.

@neilisaac neilisaac added the feature New feature or request label Oct 28, 2022
@defunctzombie
Copy link
Contributor

defunctzombie commented Oct 29, 2022

I don't think we should put any protobuf specific dependencies or requirements in the base mcap library since there are users of that who do not use protobuf and should not need to bring in protobuf dependencies to build on top of the base library.

The way we've supported this in our other languages is to provide a separate support package which simplifies some of the common use cases for a specific serialization format. It would make sense to have a mcap_protobuf crate to handle some of these common tasks. Possibly even one for each variant of protobuf library if we have enough demand for that.

Here's the python version of a support package: https://github.com/foxglove/mcap/tree/main/python/mcap-protobuf-support

@neilisaac
Copy link
Author

@defunctzombie makes sense. Cargo optional features could also work, but I have no strong feelings.

How about adding a mechanism for zero-copy message writing?

@defunctzombie
Copy link
Contributor

@neilisaac

How about adding a mechanism for zero-copy message writing?

Seems sensible. Can you write up a separate feature request for that with specific note for how we would change the API to support this?

@neilisaac
Copy link
Author

@defunctzombie I added #693

@james-rms james-rms added the rust Related to the rust implementation label Nov 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request rust Related to the rust implementation
Development

No branches or pull requests

3 participants