Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose API for raw decoder primitives for LZMA and LZMA2. #74

Merged
merged 2 commits into from Aug 5, 2022

Conversation

chyyran
Copy link
Contributor

@chyyran chyyran commented Jun 9, 2022

Pull Request Overview

This pull request fixes #72 by refactoring decoder functions to expose a zero-cost raw decoder API. The existing public API has been refactored to use the raw decoder API internally. The following additions to the public API become available with the raw_decoder feature enabled under the decompress::raw module.

  • LzmaDecoder
    • LzmaDecoder::new(params: LzmaParams, memlimit: Option<usize>) -> Result<Self>
    • LzmaDecoder::reset(&mut self, unpacked_size: Option<Option<u64>>)
    • LzmaDecoder::decompress<'a, W: io::Write, R: io::BufRead>(&mut self, input: &mut R, output: &'a mut W)
  • Lzma2Decoder
    • Lzma2Decoder::new() -> Self
    • Lzma2Decoder::reset(&mut self)
    • Lzma2Decoder::decompress<'a, W: io::Write, R: io::BufRead>(&mut self, input: &mut R, output: &'a mut W)
  • LzmaProperties
  • LzmaParams
    • LzmaParams::new(properties: LzmaProperties, dict_size: u32, unpacked_size: Option<u64>) -> Self
    • LzmaParams:read_header<R>(input: &mut R, options: &Options) -> error::Result<LzmaParams>
      • This was a previously internal API that is now exposed only when raw_decoder is enabled. I don't see any reason to restrict this to internal usage.

Additionally, annotations have been added to indicate availability of APIs on docs.rs.
image

Testing Strategy

  • The existing test suite is sufficient and should all pass. The existing API tests have been refactored to use the raw_decoder APIs internally.

@chyyran chyyran force-pushed the feature-raw-decoder branch 5 times, most recently from 07055f4 to ba2d082 Compare June 9, 2022 17:49
src/lib.rs Outdated
output.finish()?;
let mut decoder = decode::lzma::LzmaDecoder::new(params, options.memlimit)?;

decoder.decompress(input, output)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decoding logic has been moved into LzmaDecoder::decompress.

@chyyran
Copy link
Contributor Author

chyyran commented Jun 9, 2022

The current API for decompress is very simple and matches up well with the existing one-shot functions. One downside of such a design is that the LzBuffers are still one-shot and result in an allocation every time decompress is called. However, being able to reuse a DecoderState is already a huge savings owing to the size of that struct.

It might be worth looking into allowing reusage of a backing buffer in an LzBuffer but this requires further design work on the LzBuffer implementations that isn't really related to #72. The current API design here does not preclude a future API that would allow reusage of these buffers.

A note, LzmaDecoder::reset/Lzma2Decoder::reset does not reallocate because the lclppb parameters are the same throughout the lifetime of the LzmaDecoder.

@chyyran
Copy link
Contributor Author

chyyran commented Jul 7, 2022

cc @gendx

@chyyran
Copy link
Contributor Author

chyyran commented Aug 3, 2022

I encountered a situation where simply resetting the decoder state was not enough but since the decoder is being reused for each raw LZMA chunk, the unpacked size was also needed to be set. Since this might not be the case for every usage, I have it in reset as Option<Option<u64>> where None leaves the unpacked size unchanged, and Some(_) changes the unpacked size in the decoder state. Having to keep track of the previous unpacked size for streams with headers seemed annoying and Option<u64> is ambiguous between "keep the unpacked size the same" and "change the unpacked size to None (for an end of payload marker)".

I thought about exposing set_unpacked_size to LzmaDecoder but it didn't really make sense in my perspective since reset prepares the decoder for the next chunk of data. For some streams, forgetting to call set_unpacked_size may leave the decoder in an invalid state for that specific stream, so keeping it atomic was best. I don't see any situation where someone would need to change the unpacked size between raw chunks.

Copy link
Owner

@gendx gendx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, I just came back from holidays.

Overall this looks good, I just left a few nits.
Thank you a lot for contributing this!

src/decode/lzma.rs Outdated Show resolved Hide resolved
src/decode/lzma.rs Outdated Show resolved Hide resolved
src/decode/lzma2.rs Outdated Show resolved Hide resolved
src/decode/lzma2.rs Outdated Show resolved Hide resolved
src/decode/xz.rs Outdated Show resolved Hide resolved
src/lib.rs Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
@chyyran chyyran force-pushed the feature-raw-decoder branch 4 times, most recently from 1af381d to 58155ef Compare August 5, 2022 05:07
@chyyran chyyran requested a review from gendx August 5, 2022 05:12
@chyyran chyyran force-pushed the feature-raw-decoder branch 2 times, most recently from c30bb38 to e13c0fb Compare August 5, 2022 06:20
src/decode/lzma.rs Outdated Show resolved Hide resolved
src/decode/lzma2.rs Outdated Show resolved Hide resolved
src/decode/lzma2.rs Outdated Show resolved Hide resolved
src/decode/lzma2.rs Outdated Show resolved Hide resolved
src/decode/lzma2.rs Outdated Show resolved Hide resolved
@jtmoon79
Copy link

jtmoon79 commented Sep 5, 2023

Hi @chyyran @gendx

I wanted to read an XZ file into decompressed chunks of some constant size. Currently I'm using xz_decompress. But xz_decompress reads the entire file in one call. I want to call some "read" function multiple times to sequentially reads the XZ into fixed size buffers (jtmoon79/super-speedy-syslog-searcher#182).

Can I do read an XZ file as a sequence of decompressed byte chunks (Vec<u8>) using these new API endpoints?

Thanks for creating this library! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose decoder/encoder primitives as public
3 participants