Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement git-only option #1227

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from
Draft

Conversation

jdidion
Copy link

@jdidion jdidion commented Jan 22, 2024

No description provided.

@jdidion jdidion mentioned this pull request Jan 22, 2024
3 tasks
use std::collections::BTreeMap;

lazy_static::lazy_static! {
// match PR/issue numbers, e.g. `#123`
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comment true?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops no, that was a copy-paste error

@giangndm
Copy link
Contributor

@jdidion How about this PR? I am interested in this feature. If you need help, I would be very glad to join.

@jdidion
Copy link
Author

jdidion commented Feb 19, 2024

@giangndm sure that would be great. I can add you as a collaborator on my fork. I'd you have a good plan for finishing the implementation go for it.

@ozgunozerk
Copy link

Hey guys, don't misunderstand this as pressure, but it's been a long time. If you want my help, I think I can contribute to finalizing this PR.

Best!

@jdidion
Copy link
Author

jdidion commented Feb 27, 2024

That would be great, thanks! I'll add you as a contributor.

@ozgunozerk
Copy link

ozgunozerk commented Feb 29, 2024

To let everyone know and manage expectations: due to unforeseen new tasks and a business trip to Singapore, I'll be available after March 15th.

Will gladly help the issue if you guys still need help at that time.

@ozgunozerk
Copy link

Got my hands dirty with implementing returning anyhow::Result instead of panicking, and also did some minor refactorings. I did not have enough time to test it. If there is more to implement, I can happily continue contributing

@ozgunozerk
Copy link

Any updates on this? What else is required?

@jdidion
Copy link
Author

jdidion commented Mar 27, 2024

@ozgunozerk the big outstanding issue is implementing next_ver::Updater::get_diff for a git-only package. I'm thinking we need some higher level interface which can abstract whether the package information is coming from the registry or the git repository. @MarcoIeni suggested a preference to define an enum as the basis for this abstraction. If you want to spend some time thinking through that implementation it could help move things along. I might have some time this week to work on it as well.

@jdidion
Copy link
Author

jdidion commented Mar 29, 2024

@MarcoIeni @ozgunozerk I introduced the BasePackage enum and did a bit of refactoring. Now I think the only thing left is to implement the Repository case for the three functions in BasePackage's impl.

@ozgunozerk
Copy link

ozgunozerk commented Mar 29, 2024

Great work! I've made the code more idiomatic, and also fixed the lifetime / borrow checker issues.

I believe we only have 3 unimplemented logics, and then I guess this will be ready for review, what do you think @MarcoIeni?

Sorry for being lazy, but I prefer asking the question here instead of going through the code of the whole repository:

  • we will compare the base package (itself, cargo.lock, and cargo.toml) with the repository's original package.
  • but, where is the base package retrieved from? Isn't it also from the repository itself?
    • if so, are these unimplemented functions unnecessary, and should just return Ok(true)? (because, we shouldn't compare the package retrieved from the repo, to the package retrieved from the repo?? :D )

I did not touch the empty implementations because I believe I don't have the full details on what to implement there. If the big-picture is supplied, I can get down to coding the solution.

@MarcoIeni
Copy link
Owner

Hey, thanks for working on this! I'll do my best to support you and get this shipped 🚀

where is the base package retrieved from?

From this code:

#[derive(Debug, Clone)]
enum BasePackage<'a> {
    Registry(&'a Package),
    Repository { name: &'a str, version: &'a Version },
}

I guess the base package can be either:

  • the package retrieved from crates.io if git_only option is false
  • the package retrieved from git tags if git_only option is true

In case of git_only option true, we need to compare:

  • the current code of the repository
  • the code of the repository at the latest git tag of the package. (the base package).

I suggest to keep the base package in a separate directory, so you can compare the two repositories. You can do git checkout <git_tag> in this separate directory to obstain the code at the tag. E.g. git checkout cargo_utils-v0.1.0.
You can clone the repository in a temporary directory, as we already do here

@ozgunozerk
Copy link

ozgunozerk commented Mar 29, 2024

Ah, silly me! I confused myself that we will be comparing the git repo with the git repo, hence it did not make sense at first (brainfart moment).

We will compare the checked out code at git repo, with the release in git repo, makes sense!

If I have time, I may start implementing the remaining unimplemented ones in the weekend!

@ozgunozerk
Copy link

ozgunozerk commented Mar 30, 2024

Hi again! Finally found the time to inspect all the codebase, and here is the plan that I come up with:

Plan: The only remaining thing is to find the latest previous release in the github, and parse it's cargo.toml and cargo.lock file for the comparison. Then, we will just call the existing functions (which are handling the actual comparison process) in this crate with the parsed toml and lock file, and it should be good to go.

Problems: However, I believe there are some inconsistencies in the code base. If my understanding it correct, I believe a minor PR prior to this one would be helpful. Below are my findings:

@MarcoIeni, this one is from the original code base:

  • get_registry_package
    • this is calling download() on the downloader, but with an option to provide another registry instead of crates.io
    • this download() function is calling clone()
      • The explanation for this function is Clone the specified crates from registry or git repository.
      • this is quite confusing, because now the wording is registry or git, so we treat them differently.
      • but, the caller function is get_registry_package

So, the wording should be changed. We have two options afaik:

  • registry may mean git as well. In other words, both crates.io, and git would fall under potential registry candidates.
  • or, the function names should be changed: get_registry_package -> get_registry_or_git_package (or any other naming).

@jdidion this one is from your suggested changes:

  • get_repo_versions used in here:
    let repository_packages = repo_versions::get_repo_versions(
        &repository.repo,
        local_project.contains_multiple_pub_packages,
    )?;
    this returns RepoVersions, but the variable name is repository_packages. However, there are no packages in the variable, this is misleading imo (I expected that github releases is parsed as packages, and tried to implement the remaining functions, only to realize there are no packages parsed in this variable)

If I understand your intentions correctly, you parsed the latest release's name and version, but did not parse the content of it yet.

What I suggest is

  • saving the tag of the latest release (I believe you already parsed that via git commands) into a variable (or a field in a struct)
  • checking out to that tag/commit in a temp directory
  • parsing all the metadata, lock and toml files in there
  • then, calling the already existing compare functions should be trivial

I made a lot of assumptions here, it'd be best to get confirmations before I proceed.
Please correct me if I'm wrong about any of these 👍

@MarcoIeni
Copy link
Owner

Thanks for working on this!

is to find the latest previous release in the github

why do we use github releases instead of git tags?

because it could have been that another registry is provided

I don't get why there's an inconsistency. The log info doesn't name crates.io

but, the caller function is get_registry_package

Yeah, you are right. With "registry" package, I mean "remote" package or similar. What do you think is a better term? Anyway, let's keep this term for now and refactor in another pr to avoid making this pr too big.

there are no packages in the variable

I think the name of that struct RepoVersions is misleading. RepoVersions contains the names of the packages and their most recent versions.
If the repo only contains one package, it doesn't contain the package name.

saving the tag of the latest release (I believe you already parsed that via git commands) into a variable (or a field in a struct)

Isn't this RepoVersions?

checking out to that tag/commit in a temp directory

yes!

parsing all the metadata, lock and toml files in there

I don't get why is this needed. But maybe I'm missing something, so if you think this is needed, go ahead 👍

@ozgunozerk
Copy link

Appreciate the fast response @MarcoIeni! Diving right into your questions/feeback 👀

why do we use github releases instead of git tags?

I choose the word release because there can be a tag without a release, but not vice versa. And if someone is using this tool, they probably want to compare their new code against the most recent "release" in their repo, but not necessarily the most recent tag.

I don't get why there's an inconsistency. The log info doesn't name crates.io

Yep, my bad on that. No worries about this one! 👍

Yeah, you are right. With "registry" package, I mean "remote" package or similar. What do you think is a better term?

I think registry is a good term, but I find other namings and inline docs across the codebase confusing w.r.t this terminology.

Anyway, let's keep this term for now and refactor in another pr to avoid making this pr too big.

Absolutely, and I wrote: "I believe a minor PR prior to this one would be helpful". It was not my intention to include multiple things in a single PR. Let me elaborate on that: it was confusing for me to went through the codebase whilst being not sure on whether registry term includes downloading from git or not. Considering this is an open-source project, I think I won't be the only one confused about that. And making these documentations/wordings clearer before this PR might help others who may want to contribute in the future. Otherwise, git-only feature based on potentially confusing registry terminology may become even more confusing for those who in the future who will take a look at this PR.

It's your call though :)

Isn't this RepoVersions?

I believe so! Wanted to get confirmation from @jdidion if he had the same design in his mind.

I don't get why is this needed. But maybe I'm missing something, so if you think this is needed, go ahead 👍

Hmm, this may be the most important discussion point. So, how can we compare the packages of the registry (downloaded from remote git), with the local one (to be released one)?

From what I understood (for the crates.io default workflow),

  1. it downloads the latest release from crates.io
  2. parses the cargo.toml and cargo.lock files of both the local version (to be released one), and the released one (from crates.io)
  3. compares the local toml and lock file against the registry ones, and generating a diff based on these comparisons

If we do not parse the toml and lock file of the latest release in git registry, how can we generate these diff's?


As usual, feel free to correct me on any of the above if I'm missing anything

@MarcoIeni
Copy link
Owner

And if someone is using this tool, they probably want to compare their new code against the most recent "release" in their repo, but not necessarily the most recent tag.

I'm not too sure about this.

there can be a tag without a release, but not vice versa

For this reason, I would use git tags instead of github releases. Also, git tags work across git hosts (github, gitlab, etc.).

It's your call though :)

My preference goes to adding the functionality first and refactoring it later so that we can ship the functionality earlier without arguing about names 😅. But if you prefer to open a refactoring pr before working on this, I won't stop you :)

If we do not parse the toml and lock file of the latest release in git registry, how can we generate these diff's?

You can re-use the existing algorithm:

P.S. sorry about the new merge conflicts! 😅

@ozgunozerk
Copy link

You can re-use the existing algorithm:

That's what I meant by parsing cargo.lock and cargo.toml, I'm so glad it is already implemented, that's fantastic 🥳

P.S. sorry about the new merge conflicts! 😅

No worries, I'm quite used to resolving those 👍

Agreed on tags instead of releases, I'm convinced! And ok, considering that the other PR might take some time due to potential discussions, I'll follow your suggestion and finish this one first, let's get down to work 🤓

@ozgunozerk
Copy link

I have to take a break, but here is the summary:

  • updated the branch, inspected the newly introduced code to main, made necessary changes, all conflicts are resolved.
  • changed repository_packages to repository_versions, because it does not have packages in it, but only names and versions.
  • changed the Repository variant of the BasePackage, added package there, because we have to be able to access the packages in order to do the comparisons. See:

    Now I think the only thing left is to implement the Repository case for the three functions in BasePackage's impl.

  • initialized the Repository variant's package field using registry_packages -> this probably needs to change.
  • implemented the remaining unimplemented functions.

@MarcoIeni @jdidion
Remaining tasks/questions:

  • I'm still not 100% clear on what get_registry_packages does. Based on the details, I think maybe this can be slightly modified to get the packages for the checked out git tag.
    • if it can be used for that, we wouldn't need to change anything in here:

      initialized the Repository variant's package field using registry_packages -> this probably needs to change.

    • however, also in this case, I don't think the BasePackage enum will be required at all. We will basically compare registry (including also git) packages, against the local packages. Comparison logic does not need to know where they arrive from, they just need the content. And BasePackage is defined for comparison logic. In this case, I think it is not necessary.
    • if get_registry_packages cannot be used to retrieve packages for a git tag, then we should change this part:

      initialized the Repository variant's package field using registry_packages -> this probably needs to change.

The reasons on why I got confused on get_registry_packages:

  1. This function uses registry_manifest, and I found the below:
    /// The manifest of the project you want to update.
    local_manifest: Utf8PathBuf,
    /// Cargo metadata.
    metadata: Metadata,
    /// Manifest of the project containing packages at the versions published in the Cargo registry.
    registry_manifest: Option<Utf8PathBuf>,
    For git-only mode, I assume we can supply the manifest of the project that is belonging to the specified git tag.
  2. then, this function will retrieve packages based on the manifest of the specified git tag, and all packages will be retrieved into a PackagesCollection
  3. If my previous assumption is correct, we only need to fetch the manifest related to the given git tag (or the latest git tag by default).
  4. then, this PR will be basically finished
  5. in this case, we can get rid of BasePackage and revert some of the changes related to that, that will make this PR much smaller. This way, it will be easier to review, and also to take a look at it in the future.
  • possibly unrelated question: registry_manifest is an optional argument for get_registry_packages. When not supplied, get_registry_packages downloads the packages from local dependencies. Probably unrelated to our git only feature, but will it compare local to local in this case? It did not make sense to me, and I got suspicious on whether my interpretation of the function is correct.

@MarcoIeni
Copy link
Owner

MarcoIeni commented Apr 4, 2024

I'm still not 100% clear on what get_registry_packages does

I documented the function in #1381

Comparison logic does not need to know where they arrive from, they just need the content.

If the comparison logic doesn't need to know, we must ensure every package "looks like" a registry package.
Registry packages have the following differences:

  • they have a Cargo.toml.orig
  • they have a README in the same directory as Cargo.toml (instead normally you can override the README in the Cargo.toml

Probably running cargo package -p <my_crate> would do that for us. 👍
I realized we don't do this at the moment if we supply a local_manifest which might be a bug 🤔

in this case, we can get rid of BasePackage and revert some of the changes related to that, that will make this PR much smaller. This way, it will be easier to review, and also to take a look at it in the future.

yes, probably you are right, if we do it this way, we can simplify the comparison logic 👍 Unless there's some corner case I'm not thinking about right now.

For git-only mode, I assume we can supply the manifest of the project that is belonging to the specified git tag.

yes

It did not make sense to me, and I got suspicious on whether my interpretation of the function is correct.

Hopefully #1381 clarifies it a bit.

@ozgunozerk
Copy link

Ok, here is my proposed change, if @jdidion also agrees with it, I can push my commit. However it is overriding quite a big chunk of the work @jdidion did, so I wanted to get his opinion first.

This is the new next_versions function. If we agree on this approach, we can basically revert 99% of the changes made to next_ver.rs:

pub fn next_versions(input: &UpdateRequest) -> anyhow::Result<(PackagesUpdate, TempRepo)> {
    let overrides = input.packages_config.overrides.keys().cloned().collect();
    let local_project = Project::new(
        &input.local_manifest,
        input.single_package.as_deref(),
        overrides,
        &input.metadata,
        input,
    )?;
    let updater = Updater {
        project: &local_project,
        req: input,
    };

    let repository = local_project
        .get_repo()
        .context("failed to determine local project repository")?;
    if !input.allow_dirty {
        repository.repo.is_clean()?;
    }

    // NEW CODE BEGIN
    if input.packages_config.default.git_only && input.registry_manifest.is_none() {
        debug!("git-only feature is enabled, and no registry manifest supplied. Will try to fetch the manifest from the latest tag");
        match get_repo_versions(
            &repository.repo,
            local_project.contains_multiple_pub_packages,
        )
        .context("could not get tag versions from the repo")?
        {
            Some(RepoVersions::Single(version)) => {
                repository.repo.checkout(version.to_string().as_str())?;
                // TODO:
                // 1. find the manifest (cargo.toml) file in this repo
                // 2. use it to override the `None` value for `registry_manifest`:
                // input.registry_manifest = Some(latest_tag_registry_manifest)
                repository.repo.checkout_head()?;
            }
            Some(RepoVersions::ByPackage(_b_tree_map)) => {
                unimplemented!("I don't know how this will work with multiple packages")
            }
            None => {
                unimplemented!("means that there is no previous tag in the repo. What should we do in this case?")
            }
        }
    }
    // NEW CODE END

    let registry_packages = registry_packages::get_registry_packages(
        input.registry_manifest.as_deref(),
        &local_project.publishable_packages(),
        input.registry.as_deref(),
    )?;

    let packages_to_update =
        updater.packages_to_update(&registry_packages, &repository.repo, input.local_manifest())?;
    Ok((packages_to_update, repository))
}

Rest of the next_ver.rs file should work as expected, and no further change is required. We are simply replacing the registry_manifest, and that's it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants