-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement opaque ETag in Compact Index to avoid falling back to old index in servers with different etag implementations #7122
Conversation
cd0b0a2
to
a823684
Compare
@@ -193,6 +193,21 @@ def digest(name) | |||
Digest(name) | |||
end | |||
|
|||
def checksum_for_file(path, digest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this just take in an array of digest symbols and then return a hash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only use it in one place only for the MD5 of the version info files for the compact index. The server side of the compact index is hard coded to MD5 as well. The most correct thing is probably to move this completely into the compact index client and only generalize if we need to later.
@@ -76,6 +84,16 @@ def info_path(name) | |||
end | |||
end | |||
|
|||
def info_etag_path(name) | |||
name = name.to_s | |||
if /[^a-z0-9_-]/.match?(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so... question. since this is new, might it be easier to only do the #{name}-#{digest}
file name, and then only need to maintain a single directory of etags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this logic? @simi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to keep it the same as other data we store
if /[^a-z0-9_-]/.match?(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So then is @segiddins suggesting that we use MD5 of every name instead of having both? Could we just do the MD5 of the name without the prefix?
|
||
headers["If-None-Match"] = etag_for(local_temp_path) | ||
headers["If-None-Match"] = local_etag_temp_path.read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we strip!
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strip!
and chomp!
both return nil if there's nothing to strip or chomp, which makes this pretty ugly:
etag = local_etag_path.read
etag.chomp!
headers["If-None-Match"] = etag
this is one line but the to_proc might negate the benefit of using chomp!
over chomp
:
headers["If-None-Match"] = local_etag_path.read.tap(&:chomp!)
local_temp_path = local_temp_path.sub(/$/, ".tmp") | ||
|
||
headers["Want-Content-Digest"] = WANT_DIGEST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing Want-Content-Digest
listed at all on MDN, and it says Want-Digest
is deprecated https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Want-Digest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only a draft 🤷♂️ though, to be fair the Content-Digest
header seems to come from the same draft.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could totally skip this header. It's just future proofing.
# Unwrap surrounding colons (byte sequence) or quotes (string). | ||
# The wrapping characters must be matched or we return nil. | ||
# Converts hex to base64 based on field type assumptions. | ||
def parse_structured_value(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this definitely needs some tests
94e0179
to
14a0b4e
Compare
Hey @segiddins & @simi, I refactored this pretty significantly. I mostly replaced the giant method with two smaller methods. I also cut down on the pending temporary files, clearly separated what was called "retrying" logic, and removed the nested call to update which made the file management weird. The behavior changes a little, but I think for the best: In the case where a range request is ignored or not available, it no longer attempts the full request twice (it was requesting a range |
613b8ef
to
06b95be
Compare
@martinemde this is on my list, I'll do some local testing first. |
I'm still working on fixing and adding tests. I think the mocked ones are a good way to test the more subtle behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicks of my own code that I want to remember to fix.
c17960b
to
7ca3f7b
Compare
800ecd4
to
5298b97
Compare
5298b97
to
9fdf6fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! took me a while to read through the changes, but I think this is both good future-proofing, as well as fully backwards compatible. well done!
This changes the CompactIndexClient to store etags received from the compact index in separate files rather than relying on the MD5 checksum of the file as the etag. Smoothes the upgrade from md5 etags to opaque by generating them when no etag file exists. This should reduce the initial impact of changing the caching behavior by reducing cache misses when the MD5 etag is the same. Eventually, the MD5 behavior should be retired and the etag should be considered completely opaque with no assumption that MD5 would match.
9fdf6fe
to
a3ef6d9
Compare
@@ -5,7 +5,13 @@ | |||
|
|||
module Bundler | |||
class CompactIndexClient | |||
# NOTE: MD5 is here not because we expect a server to respond with it, but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be TODO instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. I mentioned this temporary md5 thing in a bunch of places so maybe we'll remember to get it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean I do rarely scan codebase for NOTE
, but I do often for TODO
. I'll try to keep this in my mind as well for now as a candidate to update at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I don't usually leave TODO in PRs, but that's a good point if you actually search for it. We could update.
@@ -76,8 +81,19 @@ def info_path(name) | |||
end | |||
end | |||
|
|||
def info_etag_path(name) | |||
name = name.to_s | |||
info_etag_root.join("#{name}-#{SharedHelpers.digest(:MD5).hexdigest(name).downcase}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ love this doesn't follow need for 2 directories (one with special characters) in the end and unifies it.
Thanks a lot for taking over @martinemde. Well done 💪. |
@simi and @segiddins thanks for reviewing and supporting this, and especially thanks @simi for jumping on a call to go over this code. Now let's just hope it rolls out smoothly. |
This is awesome, thanks for all the work here to not just quickly patch this and move on, but to fix it the right way! |
What was the end-user or developer problem that led to this PR?
Building on @simi's branch for opaque etags (related to #6632) and merging in my work with DigestIO in the CompactIndex client at #7039 (closed).
The problem that originally surfaced here was when other compact index sources did not calculate their etags the same way as rubygems.org and S3. This would result in the compact index never doing incremental updates, which is much slower. As a solution we decided to use the Digest, and then the Repr-Digest header and allow etags to be opaque.
What is your fix for the problem, implemented in this PR?
Change the compact index client to verify compact index file downloads using the
Repr-Digest
orDigest
header. Usessha-256
right now but opens up the possibility to support any algorithm going forward. This is a change from assuming that the etag is an MD5 sum of the full file body and using it to verify the content.The compact index client now stores etags in a separate file, allowing them to be completely opaque, as intended. If an etag file does not exist, we fallback to generating the md5 of the file on disk and saving that to the etag file. After the first pass, the etag will either be validated or replaced, resuming normal operation of the compact index client.
The Repr-Digest header to conform to Structured Fields in HTTP Headers RFC 8941, but we also read
Digest
which comes from Fastly/S3 for all compact index files and is (incorrectly if you're picky) wrapped in double quotes. Once the changes are deployed to fastly, this client will read from Repr-Digest preferentially.Make sure the following tasks are checked