Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cargo vet certify unnecessarily reformats config.toml #589

Open
anforowicz opened this issue Jan 25, 2024 · 10 comments
Open

cargo vet certify unnecessarily reformats config.toml #589

anforowicz opened this issue Jan 25, 2024 · 10 comments

Comments

@anforowicz
Copy link
Contributor

In my project the supply-chain/config.toml file is auto-generated from project-specific metadata by a tool (*). The file has a copyright header and some other headers like # @generated by tools/crates/gnrt vendor. Do not edit.. These headers are clobbered by some cargo vet commands.

I understand that cargo vet fmt will reformat all the files, including config.toml. OTOH, IMO cargo vet certify ... should only write to audits.toml and should leave config.toml unchanged. Is that a reasonable expectation?


(*) We use the gnrt tool to mapping project-specific categorization of crates into cargo vet requirements like:

[policy."autocfg:1.1.0"]                                               
criteria = ["does-not-implement-crypto", "safe-to-deploy", "ub-risk-2"]
@bholley
Copy link
Collaborator

bholley commented Jan 25, 2024

This seems reasonable at first glance. Certain certifications can necessitate changes in config.toml, but I think activating them entails prune? @mystor WDYT?

@mystor
Copy link
Collaborator

mystor commented Jan 26, 2024

cargo vet certify can cause changes to config.toml. For example, a new certify call will remove now-unnecessary exemptions for the crate which was just certified. This happens automatically for the crate which was certified in order to keep the tree relatively clean even without a manual prune call.

In general, all cargo vet commands which load the store also reformat it, even if there are no other changes to the store. When running with --locked, the cargo vet subcommand will actually fail if the formatting of the store doesn't match the formatting which would be generated.

@anforowicz
Copy link
Contributor Author

Would it be possible to somehow split config.toml into 1) cargo vet-maintained/reformattable part and 2) human-authored part (in our case - with crate-specific, project-specific, separately-generated criteria = ... entries - an example can be found in the original issue report/comment above)? (Just trying to brainstorm a way forward here.)

@anforowicz
Copy link
Contributor Author

Hmmm... I guess one other alternative may be to modify our tooling (i.e. gnrt) so that it won't generate config.toml from scratch (using handlebars + a .hbs template) but instead will read/parse the existing config.toml, make its modifications, and then save config.toml again.

@anforowicz
Copy link
Contributor Author

/cc @danakj

@anforowicz
Copy link
Contributor Author

anforowicz commented Mar 13, 2024

When running with --locked, the cargo vet subcommand will actually fail if the formatting of the store doesn't match the formatting which would be generated.

This seems rather undesirable. I see no harm in having a "@generated by tools/crates/gnrt vendor. Do not edit. " comment at the top. As-is, I have to take out this comment (and a few other comments) to enable running cargo vet in --locked mode (for presubmit/CI which probably should be somewhat hermetic and therefore needs --locked).

cargo vet certify can cause changes to config.toml.

I agree that rewriting config.toml in such cases is reasonable (dropping comments, sorting, and changing formatting in other ways). OTOH I would claim that config.toml shouldn't be modified unless its actual contents change (e.g. most check invocations won't have to change config.toml to remove exemptions or do other changes).

anforowicz added a commit to anforowicz/cargo-vet that referenced this issue Mar 13, 2024
After this commit `audits.toml`, `config.toml`, and `imports.lock` will
not be written if their contents wouldn't have changed ("contents" in a
business logic sense - ignoring comments and/or formatting differences).

Fixes mozilla#589
@anforowicz
Copy link
Contributor Author

In terms of proposed code changes, I am thinking about something along the following lines:

  • Remove the if check_file_formatting { ... } block. I understand that this block was introduced in When running --locked, check formatting of store files #348 for a good reason, but I think an alternative fix would be to avoid writing the audits.toml / config.toml / imports.lock files unless their actual content changes (see the next item)

  • Tweak fn commit to avoid writing the files unless their contents have actually changed (e.g. if deserializing the old file contents results in the same objects (Eq-wise) then keep the old file (with its comments, whitespace, and other potential minor formatting differences)

Please comment if you have any concerns with the proposal above. In the meantime, let me open a pull request along those lines so that we have more concrete code changes to discuss.

@mystor
Copy link
Collaborator

mystor commented Mar 14, 2024

I don't think that this is the approach that we want to take.

The automatic formatting, and especially the validation when running under --locked from #348, provides additional valuable benefits for the maintenance of a repository's supply-chain, especially around manual store edits. While the intention is generally that new audits are created using the cargo vet certify tool, it is not uncommon for folks to manually modify the relevant .toml files to add new audits or exemptions. When doing this, it is easy to make some mistakes which the formatting validation code would help fix:

  1. Ordering of members, fields, etc. - When formatting we always generate fields and members in a specific order, to keep things consistent. While it may seem inconsequential to allow adding new code which uses a different format or member order, this then leads to more churn down the line, as CI will not prevent landing the incorrectly formatted change, and the next developer using the cargo vet tool to modify the store will end up causing unrelated formatting changes to the document. This churn is undesirable within commit history, and keeping things formatted before landing is a way to keep it down.
  2. Unknown attributes - when parsing TOML using serde, unknown fields in things like structs are discarded. Without the formatting validation, this would mean that it may be possible to land an audit with e.g. a ntoes = "..." member, which would later be discarded when a command is used to update the contents of the file. Again this leads to unnecessary churn, and could also lead to deceptive audit entries due to missing or unrecognized fields.

Overall, I would prefer to keep the existing behaviour of maintaining strict formatting requirements on files in the store to keep down this kind of churn, and keep supply chain file deltas easier to review.


To my understanding, the main reason for this request is due to wanting to preserve the comment at the start of the config.toml file. I think adding special handling to allow a leading sequence of comments and whitespace to be preserved & replace the default # cargo vet config file. line is something we could do, even across modifications to the rest of the file (e.g. due to formatting).

Would that be sufficient for your use-case? This way we'd still be validating and ensuring that the TOML is correctly formatted and won't generate unnecessary deltas in the future due to manual edits when running with --locked, while also allowing for some comments and other documentation to be included at the top of the file.

I noticed that in your linked chromium revision, there were also some comments inline in the TOML. Preserving something like that would unfortunately be much more difficult (which is one of the reasons why we have an explicit notes = field for audits). However, we could easily add an optional notes: Option<String> member to the RemoteImport struct which would allow notes to be added and preserved around where you added comments.

@anforowicz
Copy link
Contributor Author

To my understanding, the main reason for this request is due to wanting to preserve the comment at the start of the config.toml file. I think adding special handling to allow a leading sequence of comments and whitespace to be preserved & replace the default # cargo vet config file. line is something we could do, even across modifications to the rest of the file (e.g. due to formatting).

Hmmm... I guess this is indeed correct. Of course this assumes that we are able to use a handlebars-based template expansion to generate a config.toml file that is compatible with cargo-vet's formatting expectations (modulo the initial, top comment and/or whitespace). This assumption is definitely true today and it seems okay to assume that things will stay this way in the future (or that we will be able to modify how we generate config.toml to stay compliant with cargo-vet's expectations if anything changes - e.g. if there are other sorting or formatting requirements).

I noticed that in your linked chromium revision, there were also some comments inline in the TOML. Preserving something like that would unfortunately be much more difficult.

Ack. I think it's okay to replace these comments with handlebars-only comments inside our vet_config.toml.hbs.

@mystor
Copy link
Collaborator

mystor commented Mar 18, 2024

Hmmm... I guess this is indeed correct. Of course this assumes that we are able to use a handlebars-based template expansion to generate a config.toml file that is compatible with cargo-vet's formatting expectations (modulo the initial, top comment and/or whitespace). This assumption is definitely true today and it seems okay to assume that things will stay this way in the future (or that we will be able to modify how we generate config.toml to stay compliant with cargo-vet's expectations if anything changes - e.g. if there are other sorting or formatting requirements).

Another option here might be to run cargo vet fmt to reformat the file immediately after generating it. This should just re-format the file to match how cargo-vet would, without doing any other changes. Not sure if that would work well with your build system though.

aarongable pushed a commit to chromium/chromium that referenced this issue Mar 20, 2024
In the future we want `cargo vet` to be used by
`third_party/rust/PRESUBMIT.py` and when used in this mode, we want
`cargo vet` to be somewhat hermetic.  This CL helps with that by
ensuring that `--locked --frozen` switches work when used as follows:

    ```
    $ tools/crates/run_cargo_vet.py check \
        --locked --frozen \
        --no-minimize-exemptions --no-registry-suggestions \
        --output-format=json
    {
      "conclusion": "success",
      "vetted_fully": [
        {
          "name": "aho-corasick",
          "version": "1.1.2"
        },
    ...
    ```

The CL achieves the above by:

* Reintroducing a pre-populated `imports.lock` file.
  - Having this in the repo means that code reviews will explicitly
    include information about newly imported audits.
  - Note that (per mozilla/cargo-vet#272)
    `imports.lock` should hopefully see minimal churn, because it
    should only cover imports that are actually used.
  - In a sense, this CL reverts/reconsiders `imports.lock`-related parts
    of https://crrev.com/c/5368308
* Tweaking `vet_config.toml.hbs` to produce `config.toml` with the exact
  formatting expected by `cargo vet`.
  - This has some downsides - e.g. we loose the
    '@generated by tools/crates/gnrt vendor. Do not edit.' comment.
  - This is necessary because even formatting-only discrepancies trigger
    an error in `--locked` mode: "A file in the store is not correctly
    formatted [...] run `cargo vet` without --locked to reformat files
    in the store"
  - This is somewhat related to
    mozilla/cargo-vet#589
  - In a sense, this CL makes obsolete the `config.toml`-related parts
    of https://crrev.com/c/5368308.  OTOH, detecting changes to
    `config.toml` (including a comments-only change) is not harmful, so
    let's leave the detection logic in place (in `run_cargo_vet.py).

Bug: 41494083
Change-Id: Ia5967fa8e8b13b885f703810ef84c5d79030dfae
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5368743
Reviewed-by: danakj <danakj@chromium.org>
Commit-Queue: Łukasz Anforowicz <lukasza@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1275813}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants