Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird issue when piping a chunked data from HTTP to a file in OS X #1567

Closed
jviotti opened this issue May 7, 2015 · 8 comments · Fixed by #1568
Closed

Weird issue when piping a chunked data from HTTP to a file in OS X #1567

jviotti opened this issue May 7, 2015 · 8 comments · Fixed by #1568

Comments

@jviotti
Copy link

jviotti commented May 7, 2015

From http://stackoverflow.com/questions/30085943/nodejs-unexpected-behaviour-when-piping-chunked-http-data-to-a-file-in-os-x?noredirect=1#comment48289848_30085943.

I'm getting a weird issue when piping a chunked data from HTTP to a file in OS X.

When triggering the GET HTTP request that initiates the download, my server responds with the following headers:

< content-disposition: attachment; filename="resin-Rpi2-0.1.0-0.0.12.img"
< content-encoding: gzip
< content-type: application/octet-stream
< date: Wed, 06 May 2015 19:03:33 GMT
< X-Frame-Options: DENY
< x-powered-by: Express
< transfer-encoding: chunked
< Connection: keep-alive

My code goes like this:

fs = require('fs')
request = require('request')

# If you want to try it by yourself, register to https://dashboard.resin.io
# create a Raspberry Pi application (note the id on the url)
# and get the token from the preferences page.
APPID = '...'
TOKEN = '...'
DEST = 'imagetest.img'

request
  url: "https://dashboard.resin.io/download?network=ethernet&wifiSsid=&wifiKey=&appId=#{APPID}"
  method: 'GET'
  gzip: true
  headers:
      Authorization: "Bearer #{TOKEN}"

.on 'data', (data) ->
  console.log(data)

.pipe(fs.createWriteStream(DEST))

If I run that script, I correctly get the initial chunks (corresponding to about 10MB of data) but the connection then suddenly ends, and no other event (close, end, etc) is triggered from any of the streams.

What's even more weird is that commenting out gzip: true leads to the desired behaviour (I of course get a *.img.gz file in that case), and it also works as expected if keeping the gzip: true and piping to process.stdout instead of to a file.

Also notice the issue is not reproducible on Ubuntu. It only seems to happen in OS X and Windows.

cURL also works as expected:

$ curl -H "Authorization: Bearer [token]" "https://dashboard.resin.io/download?network=ethernet&wifiSsid=&wifiKey=&appId=[appid]" -o image.img

I've proceeded to analyse the packet communication (you can get the pcap here):

I can see my IP is suddenly sending a ACK/FIN while the server is still sending me chunks:

Id = 2984
Source = 192.168.0.16
Destination = 52.4.102.97
Captured Length = 66
Packet Length = 66
Protocol = TCP
Date Received = 2015-05-06 18:11:36 +0000
Time Delta = 10.27246999740601
Information = 52264 -> 443 ([ACK, FIN], Seq=2137874087, Ack=3255859962, Win=10407)

Followed by a long loop of RST and ACK:

...
192.168.0.16 -> 52.4.102.97, 52264 -> 443 RST
52.4.102.97 -> 192.168.0.16, 443 -> 52264 ACK
...

And finally an agreed ACK, FIN from both sides.

After a lot of experimentation, the issue is gone if I write the chunks to the destination stream manually instead of piping:

destination = fs.createWriteStream(DEST)

request
  url: "https://dashboard.resin.io/download?network=ethernet&wifiSsid=&wifiKey=&appId=#{APPID}"
  method: 'GET'
  gzip: true
  headers:
      Authorization: "Bearer #{TOKEN}"

.on 'data', (data) ->
  destination.write(data)
@jviotti
Copy link
Author

jviotti commented May 7, 2015

It's worth noting that using the builtin https module works fine, which makes me think it's an issue with request.

@jviotti
Copy link
Author

jviotti commented May 7, 2015

I've also bisected the issue until 10246c8, which is the commit that implements gzip content decoding, which didn't help much.

@simov
Copy link
Member

simov commented May 7, 2015

Thanks for the thorough bug report @jviotti pinging @kevinoid

@kevinoid
Copy link
Contributor

kevinoid commented May 7, 2015

Thanks for the ping @simov, and thank you for the very detailed bug report @jviotti this looks like an interesting issue. I'll dig in and see what I can figure out.

@jviotti
Copy link
Author

jviotti commented May 7, 2015

@simov @kevinoid You're welcome. Let me know if I can help in any way!

@jviotti
Copy link
Author

jviotti commented May 7, 2015

In case it also helps, avoiding the zlib.createGunzip() stream (https://github.com/request/request/blob/master/request.js#L1129) also prevents the issue, which at first made me think it could be a bug in zlib, however using https + zlib.createGunzip() works as expected with the same resource.

Maybe there is a subtle bug in the way request overrides classic stream functions (pipe, write, etc) (https://github.com/request/request/blob/master/request.js#L1501)?

kevinoid added a commit to kevinoid/request that referenced this issue May 8, 2015
When the response content is piped through additional streams for
decoding, pause and resume calls should be propagated to the last stream
in the pipeline so that back pressure propagates correctly.

This avoids an issue where simultaneous back pressure from the content
decoding stream and from a stream to which Request is piped could cause
the response stream to get stuck waiting for a drain event on the
content decoding stream which never occurs.  See request#1567 for an example.

This commit also renames dataStream to responseContent to remedy my
previous poor choice of name, since the name will be exposed on the
Request instance it should be clearer and closer to the name used to
refer to this data in the relevant RFCs.

Fixes request#1567

Signed-off-by: Kevin Locke <kevin@kevinlocke.name>
kevinoid added a commit to kevinoid/request that referenced this issue May 8, 2015
When the response content is piped through additional streams for
decoding (e.g. for gzip decompression), pause and resume calls should be
propagated to the last stream in the pipeline so that back pressure
propagates correctly.

This avoids an issue where simultaneous back pressure from the content
decoding stream and from a stream to which Request is piped could cause
the response stream to get stuck waiting for a drain event on the
content decoding stream which never occurs.  See request#1567 for an example.

This commit also renames dataStream to responseContent to remedy my
previous poor choice of name, since the name will be exposed on the
Request instance it should be clearer and closer to the name used to
refer to this data in the relevant RFCs.

Fixes request#1567

Signed-off-by: Kevin Locke <kevin@kevinlocke.name>
@kevinoid
Copy link
Contributor

kevinoid commented May 8, 2015

Thanks @jviotti, I think I've figured it out. It looks like the response stream can get into a stalled state due to how backpressure is handled on the streams. Request was propagating pause and resume calls directly to the response stream rather than to the gzip stream, which could cause the stream to get stuck in the paused state and cause Node to exit prematurely.

The issue appears to affect all platforms (or, at least, I was able to reproduce it on Linux) but only affects Node 0.11 and 0.12, not 0.10, due to the changes discussed in nodejs/node-v0.x-archive#8351 for how stream.Readable now handles backpressure by pairing pause() with flowing = true instead of resume(), which is different than Request which always uses pause() and resume(). But the issue is intermittent due to the sequencing of calls from each stream, so it can be difficult to test/reproduce reliably.

If you could give the changes in #1568 a try and let me know whether it fixes the issue for you, I'd appreciate it!

@jviotti
Copy link
Author

jviotti commented May 8, 2015

@kevinoid I can confirm your fix works for me! Thanks a lot for the support! Can you let me know when this change is published to NPM please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants