Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

renv::restore always seems to download from GitHub, but will find 'file up to date' for CRAN/P3m #1882

Open
chrisknoll opened this issue Apr 23, 2024 · 7 comments

Comments

@chrisknoll
Copy link

chrisknoll commented Apr 23, 2024

I'm doing a few restores of a lockfile I'm experimenting with and I'd expect that in prior restores, it would be able to find that a download from GitHub is up to date since it's fetching the same versions of pacakges, however, I get output like this:

- Downloading aws.s3 from CRAN ...              OK [file is up to date]
- Downloading R.oo from P3M ...                 OK [file is up to date]
- Downloading R.utils from P3M ...              OK [file is up to date]
- Downloading CirceR from GitHub ...            OK [4.2 Mb in 7.6s]
- Downloading CohortGenerator from GitHub ...   OK [696.9 Kb in 4.0s]

Any reason why CRAN and P3M would say files are up date, but every time I run it wants to re-download from GitHub. I'd expect that it would just find that files are up to date just like Cran/P3M does....

@kevinushey
Copy link
Collaborator

Can you share the lockfile? Are those packages available in public repositories?

@chrisknoll
Copy link
Author

chrisknoll commented Apr 23, 2024

I am using this one from a git repo (we fetch the file from the remo into local filesystem and then use renv::restore(lockfile = 'filename') to restore it.

Admittedly it's not a nice minimum test case since there's a lot of references in it, but, it does show you what we're working with.

All packages are available in public repos. We're an Open Science organization that makes our packages public.

Edit:
In case you're wondering the use case here is that we develop the packages to run clinical studies and these lock files ensure that everyone running the study is up on the correct version of the dependent packages for study execution.

@kevinushey
Copy link
Collaborator

Thanks! I mainly ask because it's very helpful to have a reproducible example I can run locally; this will help me ascertain whether this is a bug in renv, or something else.

@chrisknoll
Copy link
Author

I don't know enough about the internals of the CRAN/P3M vs Github, but GitHub doesn't have a tar file wheras maybe CRAN/P3M does so the file it is saying is 'up to date' is the downloaded TAR vs. it needs to rebuild the TAR when it pulls the github source. If that is the case, would be nice if it could store something about the hash commit that it's fetching and use that as a 'checksum' as to whether the 'file is up to date' vs downloading + rebuild TAR each time.

@chrisknoll
Copy link
Author

A minimum test case for you might be take that lock file, remove a bunch of entries leaving an assortment of CRAN, P3M and GitHub sources to just see how if you restore multiple times from the lock file you should see CRAN/P3M doesn't fetch again, while GitHub does the download.

@kevinushey
Copy link
Collaborator

I suspect the issue here arises because your lockfile entries don't record the associated GitHub commit for the dependent packages; e.g.

		"ShinyAppBuilder" : {
			"Package" : "ShinyAppBuilder",
			"Version" : "1.1.2",
			"Source" : "GitHub",
			"RemoteType" : "github",
			"RemoteHost" : "api.github.com",
			"RemoteRepo" : "ShinyAppBuilder",
			"RemoteUsername" : "ohdsi",
			"RemoteRef" : "v1.1.2"
		},

Normally, we'd expect a RemoteSha field as well, providing the specific commit hash. Here's what I see locally with e.g. renv::install("r-lib/rlang"):

    "rlang": {
      "Package": "rlang",
      "Version": "1.1.3.9000",
      "Source": "GitHub",
      "RemoteType": "github",
      "RemoteHost": "api.github.com",
      "RemoteUsername": "r-lib",
      "RemoteRepo": "rlang",
      "RemoteRef": "main",
      "RemoteSha": "3cc68e9c4f503340e8d11ca7af4a99b8ac816f1d",
      "Requirements": [
        "R",
        "utils"
      ],
      "Hash": "ff466d21f6c3811c09c1148b3cd612a8"
    }

Are you generating the lockfile with renv::snapshot(), or are you generating it by hand? If by hand, is the omission of the RemoteSha field intentional?

@chrisknoll
Copy link
Author

I'm not familiar with how it's constructed, but I'm pretty sure it's curated in some way to remove certain package which make renv:;restore challenging (for example, changing rlang between one version and another seems to sometimes run into issues).

But, why they would remove the RemoteSha and Hash fields, I am not sure.

I can check with the maintainers of those lock files and ask. Thanks for the insight!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants