Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support configuring the browser binary download path #6014

Merged
merged 1 commit into from Aug 10, 2020

Conversation

lcabral37
Copy link
Contributor

By adding support for a variable PUPPETEER_DOWNLOAD_PATH it is possible
to support downloading the browser binaries into a folder outside the
node_modules this will in a way allow to preserve previous downloaded
binaries and not required to download then again

This PR is very similar to #5105 but with the addition that the puppeteer runner also makes use of the PUPPETEER_DOWNLOAD_PATH without additional configuration

@jackfranklin
Copy link
Collaborator

@lcabral37 thank you for the PR, appreciate it.

My worry with this change is that it's becoming a bit complex to understand which download variables do what and how many you may or may not need to set. I'm wondering if there's a nicer way of doing this or we could take an opportunity to tidy up the ENV vars we support in a breaking change release.

I'd also be interested to know the situation that means you need to install the binaries outside of node_modules - most CI systems will cache this I think?

@mathiasbynens what do you think?

@mathiasbynens mathiasbynens changed the base branch from master to main June 15, 2020 14:37
@lcabral37
Copy link
Contributor Author

Thanks for the quick feedback.
I certainly do not have an overview as the maintainers of the project regarding its increasing complexity use of environment variables but I can at least explain the reason behind this PR.
This was a result from a hackday in my organization that was a reflection of the issues I've notice our systems and colleagues dependency on downloading the chromium binaries.
We build our product with npm ci, (whether we are in our development environment or in our build CI service) which will result in purging the node_modules/ folder so it results in downloading the binaries every time we run this command (which might occur also very often in a developers daily workflow, when changing branches, starting a new task, etc.)
This behavior has resulted in slowing down the build process. At some point in time, we even had 3 different version dependencies which would result in requiring to download 3 different chromium binaries( I know ! I know! insane but it has been fixed now).
I've also seen issues with storage.googleapis blocking our IPs (due rate limit), effectively stopping our build servers from building.
Another scenario I found out recently is that some colleagues are permanently blocked unless they are working within the VPN... storage.googleapis has permanently blocked their country (a quiet country in Asia)

Anyhow this is the short story of the pain I've seen with having to download the chromium binaries and I understand that this might not event be the best resolution for many of theses issue (using a mirror host would be better in some of these scenarios for sure) but I honestly think this approach would also be a major gain in this architecture. I've researched a bit through the documentation and could not find any combination that would mach this workflow. And it would make it possible to persist the same configuration for the download path while upgrading puppeteer (it would just download and run the newer revision without additional changes)

To answer your question regarding CI systems, I can be sure but our experience is that we use npm ci which will build the project with a clean slate while respecting the package-lock. npm itself will maintain a local cache of all npm modules downloaded so that running npm ci is very fast but it will still run every package postinstall script and in puppeteer case, will perform the download. I mean... this is what we do. I definitely can't say for sure this is the best workflow.

Having bothered you with a 5m read already, I can finish with saying that if you can identify any changes that could make this PR more future prof or at least less complex, I'm willing to make it so.

@jackfranklin
Copy link
Collaborator

@lcabral37 thanks for explaining your situation, that makes sense. I can see the benefit of being able to manage where the binary is put yourselves.

@mathiasbynens what do you think to this as a new feature? I think we could rework the ENV vars potentially but at a high level do you think this is a feature worth adding?

@lcabral37

This comment has been minimized.

@lcabral37

This comment has been minimized.

@mathiasbynens
Copy link
Member

mathiasbynens commented Jul 28, 2020

Longer-term we'll want to change the default download dir to something like ~/.puppeteer. The ENV var could then be used to restore the current functionality of downloading it to a local subdirectory.

I wonder if we should make this change all at once (to reduce confusion), or land this as-is first, and then change the default later. WDYT?

@OrKoN
Copy link
Collaborator

OrKoN commented Jul 28, 2020

I think this can be landed as-is first because it does not block changing the default in the future.

@mathiasbynens
Copy link
Member

I think this can be landed as-is first because it does not block changing the default in the future.

To clarify, my worry is that people will learn about the new env var, and by the time they start to use it, we'll have changed its meaning again. We could avoid this confusion by ensuring the two changes (this + a future patch that changes the default) make it into the same release, but we'd have to decide soonish.

@OrKoN
Copy link
Collaborator

OrKoN commented Jul 28, 2020

I think this can be landed as-is first because it does not block changing the default in the future.

To clarify, my worry is that people will learn about the new env var, and by the time they start to use it, we'll have changed its meaning again. We could avoid this confusion by ensuring the two changes (this + a future patch that changes the default) make it into the same release, but we'd have to decide soonish.

By changing the meaning of the env var, do you mean the change of the default value? I think in this case the meaning is still the same: your custom download path would be used. Btw, I am not sure how much effort is it to change the default to something like ~/.puppeteer?

@jackfranklin
Copy link
Collaborator

I think 99% of people won't have a strong opinion or care about where Puppeteer downloads things, so I think we can land this and then make a new change in the future and clearly document it as breaking. So I'm happy to ship this as is.

Copy link
Collaborator

@hanselfmu hanselfmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little hesitant around default-downloading something to a user's home directory. I think we can ship it as-is, and for me if I don't like where it currently downloads, I could just set the ENV.

I also find this feature quite useful, for example, when I need to switch back and forth my Chromium versions for unit-testing purposes.

@johanbay
Copy link
Collaborator

I definitely think it makes sense to allow configuring the download path. I see the point in downloading to the home folder to enable sharing binaries across projects or (as here) including the binaries in a CI image, but as a user I would be slightly annoyed if ~/.puppeteer was the default. Maybe that is just me though, but I would prefer if I had to explicitly enable downloading large binaries to a hidden folder in my home directory 🙂

@mathiasbynens
Copy link
Member

By changing the meaning of the env var, do you mean the change of the default value? I think in this case the meaning is still the same: your custom download path would be used.

When the var is set, that path is used, sure. The difference we'd get when changing the default is when it's not set — the "meaning" changes in the sense that you now need to use the var in different cases.

Sounds like there's appetite for not putting large binaries in the home folder by default. Thanks for chiming in, everyone! Let's go with this as-is then.

@mathiasbynens
Copy link
Member

@lcabral37 Would you be up for rebasing this patch?

@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@lcabral37
Copy link
Contributor Author

@lcabral37 Would you be up for rebasing this patch?

Most certainly.
Rebase done with minimal manual changes to move introduced logic from install.js to src/install.js

@mathiasbynens
Copy link
Member

Thanks @lcabral37! I think we can land this soon — I'll just wait for @jschfflr to chime in on the above discussion as well.

@jschfflr
Copy link
Contributor

Just my 2 cents: I think, having puppeteer download files to ./.puppeteer instead of node_modules by default would be a better solution because it would minimise the amount of downloads a user would have to do by default. To make sure, people are aware of if it, I'd maybe print a line like: Downloading chromium to ~/.puppeteer or Found an already downloaded chromium version in ~/.puppetter.
By having it as the default, tools using puppeteer via npx like for example https://github.com/puppeteer/recorder would also benefit from this.

@mathiasbynens
Copy link
Member

@lcabral37 Looks like npm run lint is unhappy, could you please take a look?

@mjzffr
Copy link
Contributor

mjzffr commented Jul 28, 2020

I suggest removing the changes to the puppeteer-firefox directory, since it's deprecated and not tested.

By adding support for a variable PUPPETEER_DOWNLOAD_PATH it is possible
to support downloading the browser binaries into a folder outside the
node_modules this will in a way allow to preserve previous downloaded
binaries and not required to download then again
@lcabral37
Copy link
Contributor Author

@lcabral37 Looks like npm run lint is unhappy, could you please take a look?
Sorry for that... when the lint tool is unhappy I'm unhappy :(

@mjzffr if you insist I can remove it, but for the time being I would prefer to leave it there for consistency. Test wise, I agree it should be tested if it is to be there. so I've performed a quick & dirty manual test to validate firefox downloads to the specified folder (if specified) and that it runs form the specified (if specified)

@jschfflr It would certainly resolve the problem I have but after thinking for a bit I would dislike this approach as a user as I see that npm modules should be self contained in its ecosystem (node_modules, cache folders, etc ) and not leak out artifacts through the system... Unless I specify otherwise (not trying to defend my PR in this manner), I would find it very annoying to find a ~/.puppetter or other npm artifacts (imagine several npm packages start to implement this approach) that I might not even have installed directly but is instead a dependency of another package.

Thank you all for your patience

@mathiasbynens mathiasbynens merged commit 13ea347 into puppeteer:main Aug 10, 2020
Krinkle added a commit to Krinkle/grunt-contrib-qunit that referenced this pull request Apr 12, 2021
Some internal breaking changes, but nothing that affects this project
afaik. https://github.com/puppeteer/puppeteer/releases/tag/v5.0.0

Main reason for updating is so that 5.3.0 becomes in-range,
which introduced support for PUPPETEER_DOWNLOAD_PATH, which can
be set to a cached directory (e.g. for CI to save/restore, or to
persist locally in a way compatible with npm-ci)

Ref puppeteer/puppeteer#6014
Krinkle added a commit to qunitjs/qunit that referenced this pull request Jul 4, 2021
* Puppeteer stores the Chromium binary only in node_modules/
  (not in $XDG_CACHE_HOME or some other directory that is generally
  persisted in dev and CI environments, nor in any other directory
  that one can opt-in to caching/persistence).

  This was fixed in Puppeteer 5+ (ref puppeteer/puppeteer#6014),
  through an opt-in PUPPETEER_DOWNLOAD_PATH env var.

* I enalbed this in commit 7cd99a2, but it didn't do anything
  yet as grunt-contrib-qunit wasn't using the newer Puppeteer yet.

  This was fixed in gruntjs/grunt-contrib-qunit#173.
Krinkle added a commit to qunitjs/qunit that referenced this pull request Jul 4, 2021
* Puppeteer stores the Chromium binary only in node_modules/
  (not in $XDG_CACHE_HOME or some other directory that is generally
  persisted in dev and CI environments, nor in any other directory
  that one can opt-in to caching/persistence).

  This was fixed in Puppeteer 5+ (ref puppeteer/puppeteer#6014),
  through an opt-in PUPPETEER_DOWNLOAD_PATH env var.

* I enalbed this in commit 7cd99a2, but it didn't do anything
  yet as grunt-contrib-qunit wasn't using the newer Puppeteer yet.

  This was fixed in gruntjs/grunt-contrib-qunit#173.

* Fix ENV syntax since tildes aren't expanded in this context.
Krinkle added a commit to qunitjs/qunit that referenced this pull request Jul 4, 2021
* Puppeteer stores the Chromium binary only in node_modules/
  (not in $XDG_CACHE_HOME or some other directory that is generally
  persisted in dev and CI environments, nor in any other directory
  that one can opt-in to caching/persistence).

  This was fixed in Puppeteer 5+ (ref puppeteer/puppeteer#6014),
  through an opt-in PUPPETEER_DOWNLOAD_PATH env var.

* I enalbed this in commit 7cd99a2, but it didn't do anything
  yet as grunt-contrib-qunit wasn't using the newer Puppeteer yet.

  This was fixed in gruntjs/grunt-contrib-qunit#173.

* Fix ENV syntax (tildes aren't expanded in this context, other
  variables like $HOME aren't expanded either, and absolute paths
  like /tmp also can't because Windows runners interpret /tmp as D:\tmp
  which doesn't exist by default, and Puppeteer requires the parent dir
  to exist instead of lazy-creating it). Use a local subdirectory
  instead.
Krinkle added a commit to qunitjs/qunit that referenced this pull request Jul 4, 2021
* Puppeteer stores the Chromium binary only in node_modules/
  (not in $XDG_CACHE_HOME or some other directory that is generally
  persisted in dev and CI environments, nor in any other directory
  that one can opt-in to caching/persistence).

  This was fixed in Puppeteer 5+ (ref puppeteer/puppeteer#6014),
  through an opt-in PUPPETEER_DOWNLOAD_PATH env var.

* I enalbed this in commit 7cd99a2, but it didn't do anything
  yet as grunt-contrib-qunit wasn't using the newer Puppeteer yet.

  This was fixed in gruntjs/grunt-contrib-qunit#173.

* Fix ENV syntax.
  - Tildes aren't expanded in this context.
  - Other variables like $HOME aren't expanded either.
  - Absolute paths like /tmp can't be used because Windows runners interpret
    `/tmp` as `D:\tmp`, which doesn't exist by default, and Puppeteer requires
    the parent dir to exist instead of simply lazy-creating it.
  - Can't use a relative directory in the workspace like ".puppeteer"
    because Puppeteer insists on the variable being an absolute path.
  - We can use the `github.workspace` builtin variable,
    which we can subsitute per <https://git.io/JUJEE>.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants