Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rust: expose api to write messages directly #693

Open
neilisaac opened this issue Nov 1, 2022 · 2 comments
Open

rust: expose api to write messages directly #693

neilisaac opened this issue Nov 1, 2022 · 2 comments
Labels
feature New feature or request rust Related to the rust implementation

Comments

@neilisaac
Copy link

Currently Writer::write_to_known_channel takes a byte slice argument, so messages must be encoded into a buffer before writing to the log. In order to efficiently write protobuf messages to mcap (#688) without encoding into a vec first, we would like the Writer to provide a method that exposes std::io::Write.

Writer could potentially expose a method to borrow the a Write object ex. Writer<W>::message_writer(&'a mut self, channel_id, sequence, log_time, publish_time) -> MessageWriter<'a, W> where MessageWriter implements Write and computes the message length for you. This would allow us to encode protobuf messages using protobuf::MessageDyn::write_to_writer_dyn(&self, w: &mut dyn Write).

@james-rms
Copy link
Collaborator

Related to #658 - when writing in chunks you either need to:

  1. maintain a buffer for the chunk content in progress, then once the chunk is complete write the full chunk record at once
  2. Write a dummy chunk header to the file, then write the chunk content directly to the file, then once the chunk is done, seek back to the chunk header, and rewrite the header content with the now-known record length, compressed length, uncompressed length, CRC etc.

Option 1 precludes zero-copy writing, though agree we could potentially reduce the number of copies to 1.
Option 2 means we diverge from our our append-only writing strategy, meaning if the writer dies we can be left with a corrupt chunk record at the end of the file. This might still be reasonable, since the corruption is limited to the last chunk which would not be written at all in the (1) case.

For this reason I feel zero-copy writing is best suited for writing un-chunked MCAPs, which have the disadvantage that their messages aren't indexed.

@james-rms james-rms added the rust Related to the rust implementation label Nov 1, 2022
@neilisaac
Copy link
Author

neilisaac commented Nov 1, 2022

@james-rms understood. My benchmarking numbers aren't fully verified yet, but this might be helpful to motivate this ask.

I benchmarked this by writing an uncompressed mcap file (containing protobuf PointCloud messages) to memory (tmpfs) using this library, and well as my own mcap implementation (which doesn't currently support chunks or compression), and found this library to be about 50% slower. I'm guessing it's due chunked two copy vs unchunked zero copy.

Using chunking is desirable because I'd like to use zstd compression too. I'm not sure whether eliminating the chunk buffer is viable with compression or not though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request rust Related to the rust implementation
Development

No branches or pull requests

2 participants