Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Content-Encoding: gzip along with Transfer-Encoding: chunked sometimes terminates early #1608

Merged
merged 3 commits into from Mar 29, 2022

Conversation

BenWhitehead
Copy link
Contributor

The issue

When GZIPInputStream completes processing an individual member it will call InputStream#available()
to determine if there is more stream to try and process. If the call to available() returns 0
GZIPInputStream will determine it has processed the entirety of the underlying stream. This is
spurious, as InputStream#available() is allowed to return 0 if it would require blocking in order
for more bytes to be available. When GZIPInputStream is reading from a Transfer-Encoding: chunked
response, if the chunk boundary happens to align closely enough to the member boundary
GZIPInputStream won't consume the whole response.

The fix

Add new OptimisticAvailabilityInputStream, which provides an optimistic "estimate" of the number of
available() bytes in the underlying stream. When instantiating a GZIPInputStream for a response,
automatically decorate the provided InputStream with an OptimisticAvailabilityInputStream.

Verification

This scenario isn't unique to processing of chunked responses, and can be replicated reliably using
a java.io.SequenceInputStream with two underlying java.io.ByteArrayInputStream. See
GzipSupportTest.java for a reproduction.

The need for this class has been verified for the following JVMs:

  • openjdk version "1.8.0_292"
    OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_292-b10)
    OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.292-b10, mixed mode)
    
  • openjdk version "11.0.14.1" 2022-02-08
    OpenJDK Runtime Environment Temurin-11.0.14.1+1 (build 11.0.14.1+1)
    OpenJDK 64-Bit Server VM Temurin-11.0.14.1+1 (build 11.0.14.1+1, mixed mode)
    
  • openjdk version "17" 2021-09-14
    OpenJDK Runtime Environment Temurin-17+35 (build 17+35)
    OpenJDK 64-Bit Server VM Temurin-17+35 (build 17+35, mixed mode, sharing)
    

… sometimes terminates early

#### The issue
When `GZIPInputStream` completes processing an individual member it will call `InputStream#available()`
to determine if there is more stream to try and process. If the call to `available()` returns 0
`GZIPInputStream` will determine it has processed the entirety of the underlying stream. This is
spurious, as `InputStream#available()` is allowed to return 0 if it would require blocking in order
for more bytes to be available. When `GZIPInputStream` is reading from a `Transfer-Encoding: chunked`
response, if the chunk boundary happens to align closely enough to the member boundary
`GZIPInputStream` won't consume the whole response.

#### The fix
Add new `OptimisticAvailabilityInputStream`, which provides an optimistic "estimate" of the number of
`available()` bytes in the underlying stream. When instantiating a `GZIPInputStream` for a response,
automatically decorate the provided `InputStream` with an `OptimisticAvailabilityInputStream`.

#### Verification
This scenario isn't unique to processing of chunked responses, and can be replicated reliably using
a `java.io.SequenceInputStream` with two underlying `java.io.ByteArrayInputStream`. See
GzipSupportTest.java for a reproduction.

The need for this class has been verified for the following JVMs:
* ```
  openjdk version "1.8.0_292"
  OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_292-b10)
  OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.292-b10, mixed mode)
  ```
* ```
  openjdk version "11.0.14.1" 2022-02-08
  OpenJDK Runtime Environment Temurin-11.0.14.1+1 (build 11.0.14.1+1)
  OpenJDK 64-Bit Server VM Temurin-11.0.14.1+1 (build 11.0.14.1+1, mixed mode)
  ```
* ```
  openjdk version "17" 2021-09-14
  OpenJDK Runtime Environment Temurin-17+35 (build 17+35)
  OpenJDK 64-Bit Server VM Temurin-17+35 (build 17+35, mixed mode, sharing)
  ```
@BenWhitehead BenWhitehead requested a review from a team as a code owner March 23, 2022 22:30
@BenWhitehead BenWhitehead merged commit 941da8b into main Mar 29, 2022
@BenWhitehead BenWhitehead deleted the gzip-full-consume branch March 29, 2022 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants