Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for streaming multipart decoding #222

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tzickel
Copy link
Contributor

@tzickel tzickel commented Jun 16, 2018

This is feature complete, and should parse the stream like the normal MultipartDecoder class (and passes it's tests).

Added benefits against the normal API is better memory and time savings on large inputs (if you can handle the data as a stream better than the normal code which reads all the stream chunks into memory, copies them together (then removes the chunks), then splits the data into parts (while still retaining the original copy inside the request itself, unless you explicitly delete the request directly after using MultipartDecoder)).

import requests
from requests_toolbelt import MultipartStreamDecoder

r = requests.get('some multipart result url', stream=True) # Output needs to be streamable
with MultipartStreamDecoder.from_response(r) as decoder:
  for part in decoder:
    print(part.headers)
    for stream in part:
      print(stream)
    #print(part.content) # Read comment below
  • The context manager is used so the input stream will be depleted in case of exception or not reading all of it (mainly useful if you want to re-use the socket in case of http keep-alive)
  • StreamingPart (part in the example) supports selectively to stream the part or use part.content or part.text to get it like before.
  • If using MultipartStreamDecoder.from_response you can pass chunk_size to set how much data to try to read per iteration.

@tzickel
Copy link
Contributor Author

tzickel commented Jun 21, 2018

https://gist.github.com/tzickel/4a81503acdb843dab4f03cfe950e84f3

This is a benchmark for this code that shows a potential use case for it. You can mess with the data size and chunk size in the end and see the time differences between the versions (peak memory measurement is much more tricky and depends on your OS). And in your use-case you might write the big part to disk instead of memory like here, and thus your peak memory usage by this code will be at max about the size of chunk_size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant