Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: ngclient: Lazy refresh #2251

Closed
wants to merge 2 commits into from

Conversation

jku
Copy link
Member

@jku jku commented Dec 28, 2022

Fixes #2225. This is an experiment that I would like comments on.

Implement a "lazy refresh" feature for ngclient: When configuration lazy_refresh=True is set, the Updater will try to only load local top-level metadata. The objective is to not have the two mandatory requests (root N+1 and timestamp) in situations where a single client typically does many refreshes while the repository does not change and where the client is content with a delay of "timestamp expiry period" in getting new updates.

Notes:

  • Use case comes from sigstore-python where expectation is that the same binary might be called thousands of times during a time when the repository has not changed: this results in thousands of unneeded requests and makes each call just a bit slower.
  • It can be argued that this is not compatible with TUF specification -- on the other hand it can be argued that we are not trying to do a "client update" as described in spec, we are trying to use metadata that has been previously updated already: situation is comparable to a client that just stays running longer and uses the same metadata it has in memory
  • If a repository suggests using lazy_refresh, it must be more careful than usual as clients may be using metadata that is "timestamp expiry period" older than normal.
    • delegated metadata expiry bumps must happen earlier than normal
    • old delegated metadata versions must be available for longer
    • old target files must be available for longer
  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
@coveralls
Copy link

coveralls commented Dec 28, 2022

Pull Request Test Coverage Report for Build 3793352517

  • 16 of 16 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.01%) to 98.209%

Totals Coverage Status
Change from base Build 3787209428: 0.01%
Covered Lines: 1363
Relevant Lines: 1379

💛 - Coveralls

This configuration avoids fetching remote metadata if local toplevel
metadata is valid.
* This is _not_ TUF spec compliant
* May lead to issues on download_target if delegated metadata is expired
* Requires the repository to have a reasonably quick timestamp expiry
  (otherwise clients could be unaware of updates for a long time)
* Because of that should likely be accompanied with client application
  doing "non-lazy" refreshes occasionally, regardless of expiry
* Still, could be a useful feature for clients of repositories that have
  slow-moving metadata in situations where clients are content with a
  delay of timestamp expiry time for any updates.

The implementation is aiming to be as simple as possible: this should
only be done if the code remains understandable.

Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
@jku
Copy link
Member Author

jku commented Dec 28, 2022

CC @woodruffw @joshuagl @asraa @kommendorkapten -- this should fulfill the spirit of the "lazy metadata fetch" idea in the sigstore client doc: https://docs.google.com/document/d/1QWBvpwYxOy9njAmd8vpizNQpPti9rd5ugVhji0r3T4c. I did not implement it exactly as the doc says as I think checking timestamp expiry only smells fishy -- IMO we should do every check in the client workflow except fetching new metadata

Other notes:

  • I would argue this could still be spec compliant: we're avoiding a client update until we have to do it (and until then we continue using an older update state that we validate), but IMO nothing says we can't do that
  • this does put a bit more responsibility on the repository maintenance, but I don't think there's anything unmanageable there

Most of you have no python-tuf connection so I'm not asking for a code review but if you have comments on the concept, they are welcome.

@jku
Copy link
Member Author

jku commented Dec 28, 2022

From a high level security perspective the big change is that when using lazy_refresh, the client gives the repository some long term decision making power it did not have before: a single malicious timestamp update could set expiry to 1000 years in the future and that could "prevent" further client updates for a long time -- until some other top-level metadata expires.

As a counter measure the client should have some client-controlled "minimum update frequency" where it does non-lazy refreshes from time to time.

(EDIT: removed some details that i'll flesh out in another comment)

data, Targets.type, Root.type
)
return
except (OSError, exceptions.RepositoryError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have a DEBUG or similar log statement on this error case? I assume what it means is that a client that has configured lazy_refresh will be making network request(s) anyways, which would be useful to surface in downstream logging if errors arise 🙂

@woodruffw
Copy link
Contributor

I think checking timestamp expiry only smells fishy -- IMO we should do every check in the client workflow except fetching new metadata

Agreed -- I'd go beyond that and suggest that maybe we should update the Sigstore client doc to change that recommendation 🙂

As a counter measure the client should have some client-controlled "minimum update frequency" where it does non-lazy refreshes from time to time.

That SGTM. I need to give some thought to the UX on that for sigstore-python -- it arguably makes sense to do refreshes on every sigstore sign operation (since we're dialing out to the network anyways) and to present a "nag" message on sigstore verify invocations after a certain number of days without an update. Then, a user could run sigstore refresh-tuf (or similar) at their convenience.

@jku
Copy link
Member Author

jku commented Dec 29, 2022

That SGTM. I need to give some thought to the UX on that for sigstore-python -- it arguably makes sense to do refreshes on every sigstore sign operation (since we're dialing out to the network anyways)

oh this is is a good point

and to present a "nag" message on sigstore verify invocations after a certain number of days without an update. Then, a user could run sigstore refresh-tuf (or similar) at their convenience.

I don't think I agree with exposing any TUF decisions to the sigstore-python user -- it's asking way too much from someone who just wants to verify a signature. This should IMO be completely invisible to user just like it is now: if "lazy" mode can't work like that then it should probably not be done.

I think what I proposed should work fine "invisibly" with possibly one exception: if verifying client hits the "lazy" path and uses a week old certs/keys but the signature was actually made with some keys/certs introduced just yesterday -- this is basically not a TUF problem but a sigstore problem: if you want to cache trust roots then those trust roots need to be useful for the cache period...

@jku
Copy link
Member Author

jku commented Dec 29, 2022

https://github.com/jku/python-tuf/commits/lazy-refresh-with-max-timestamp-period

This is a branch on top of this one that changes lazy_refresh: bool to a lazy_refresh_max: int: It works like before except the value is a maximum timestamp expiry period client accepts for lazy refresh. If the repository timestamp is further away, client does a normal refresh instead.

This works with caveats:

  • is immediately a bit more difficult to understand.
  • Does not actually give the option that the "sigstore tuf clients" doc wants: you can't say "local cache is valid for 3 days" you can only say "use local cache if metadata is not expired" (I think this 3-day-cache is an application feature which sigstore-python can easily implement on top of the "lazy refresh" if they think it makes sense)

The upside here is that it requires no additional application data storage: we don't store a "last updated timestamp", we just define a limit for acceptable metadata expiry (compared to current time).

@asraa
Copy link

asraa commented Dec 29, 2022

I'll check over the implementation but I'm confused but probably agree with this

this should fulfill the spirit of the "lazy metadata fetch" idea in the sigstore client doc: https://docs.google.com/document/d/1QWBvpwYxOy9njAmd8vpizNQpPti9rd5ugVhji0r3T4c. I did not implement it exactly as the doc says as I think checking timestamp expiry only smells fishy -- IMO we should do every check in the client workflow except fetching new metadata

without new metadata fetching, the client workflow is load trusted metadata, (skip update root role except for expiration check), (skip timestamp role except for expiration check), load trusted snapshot and check targets version and expiration, and fetch target. If so, i think i agree, and the doc can be updated.

Your correction here is protecting against a tampered cache or a cache where the targets/snapshot/root are expired, right? Worth it to make that clear that if the other top-level roles are expired the timestamp check isn't sufficient. I think that was the intent, but wasn't spelled out clear.

@asraa
Copy link

asraa commented Dec 29, 2022

this is basically not a TUF problem but a sigstore problem: if you want to cache trust roots then those trust roots need to be useful for the cache period...

Agree. One thing we did to mitigate this when we did regular root rotations (obviously would not be possible for a compromise) is that we introduced a new target during version N of the timestamp, allowed N to expired, and then on issuance of N+1 timestamp we initiated the actual rotation. This allowed us to make sure all cache-able clients received the update before it actually went into use.

If there is a compromise, I'd assume users would likely be able to find a notice for it, and understand why the need to refresh their cache. That's not ideal and it does make sense to shorten that lazy window or to make it completely opt-in for users who know they are in air gapped environments

@jku
Copy link
Member Author

jku commented Jan 4, 2023

I notice I hadn't responded, sorry.

Your correction here is protecting against a tampered cache or a cache where the targets/snapshot/root are expired, right? Worth it to make that clear that if the other top-level roles are expired the timestamp check isn't sufficient. I think that was the intent, but wasn't spelled out clear.

Yeah, there were two reasons I didn't like peeking at the timestamp metadata from the application code:

  • I think the metadata storage is an internal implementation detail of python-tuf ngclient -- application should not make assumptions of the contents
  • ngclient design is that no metadata is ever used without checking its validity according to client workflow. I think doing so leads to issues at some point... whether it's because of tampering or just bugs

@jku
Copy link
Member Author

jku commented Jan 24, 2023

I think I'll close this to get the PR count down -- this PR was an experiment and is likely not going to be merged as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ngclient feature: Add option to only update metadata if needed
4 participants