Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an option for puppeteer.launch() to run a docker image #8272

Closed
andyearnshaw opened this issue Apr 25, 2022 · 20 comments
Closed

Provide an option for puppeteer.launch() to run a docker image #8272

andyearnshaw opened this issue Apr 25, 2022 · 20 comments
Labels

Comments

@andyearnshaw
Copy link
Contributor

andyearnshaw commented Apr 25, 2022

I came here to ask about making it easier to run Puppeteer in Docker and I spotted #3072, which is very much related. Docker is arguably the ideal way to run puppeteer because it guarantees a consistent environment for your puppeteer scripts regardless of the operating system you're running them on. Some examples of differences across operating systems are:

  • Font rendering (makes visual snapshot tests difficult)
  • Permissions (for example, it's not possible to grant clipboard access on macOS)

For this reason, we skip some of our E2E tests when running locally and pray that they pass in the build pipeline (we can always use docker locally for debugging if they do fail). What would be ideal is if running in docker was integrated directly with Puppeteer so that very few changes are required to the puppeteer scripts in order to run in the docker environment.

I've been looking at Browserless and it provides a simple way to run a Chrome environment for Puppeteer to connect to:

$ docker run \
  --rm \
  -p 3000:3000 \
  -e "MAX_CONCURRENT_SESSIONS=10" \
  browserless/chrome:latest

However, this has its obvious downside: you have to change your process to ensure the container is running and switch your scripts from launch() to connect(). This is not particularly ideal for our pipeline and neither is maintaining two separate workflows for each environment. What I would like to see is support for puppeteer.launch() to run a docker container and connect to it.

const browser = await puppeteer.launch({
  docker: !process.env.CI && {
    // Let's pretend an official Puppeteer image exists,
    image: "puppeteer/chrome:v13.5.0",

    // Optional, could just defer to the Dockerfile's EXEC statement
    executablePath: "/usr/bin/chromium",
  },

  // ...
});

Other options could also be provided, for instance volume mapping for downloads, etc. In addition, the puppeteer node module could use an environment variable to predownload the docker image instead of a Chrome binary.

According to the Troubleshooting guide, "getting headless Chrome up and running in Docker can be tricky". The advice for Alpine also becomes out of date every time a new version of Chrome is published in the package repo (at the time of writing, copying and pasting the script would install Chrome 100 alongside Puppeteer v10.0.0, which is intended for Chrome 92). Having Puppeteer be able to just launch Chromium in Docker seems like it would be a win here.

@andyearnshaw
Copy link
Contributor Author

andyearnshaw commented Apr 25, 2022

I was able to hack this in quite quickly by making changes to BrowserRunner.ts and a couple of other files. Here's the launch() configuration I tested with:

this._browser = await puppeteer.launch({
  docker: {
    image: "browserless/chrome",
    executablePath: "google-chrome"
  },
  args: this._flags,
  userDataDir: "/usr/src/app/workspace"
});

And here's the changes I made:

diff --git a/src/node/BrowserRunner.ts b/src/node/BrowserRunner.ts
index f2dc6075..174a4005 100644
--- a/src/node/BrowserRunner.ts
+++ b/src/node/BrowserRunner.ts
@@ -25,7 +25,7 @@ import { promisify } from 'util';
 
 import { assert } from '../common/assert.js';
 import { helper, debugError } from '../common/helper.js';
-import { LaunchOptions } from './LaunchOptions.js';
+import { DockerOptions, LaunchOptions } from './LaunchOptions.js';
 import { Connection } from '../common/Connection.js';
 import { NodeWebSocketTransport as WebSocketTransport } from '../node/NodeWebSocketTransport.js';
 import { PipeTransport } from './PipeTransport.js';
@@ -49,6 +49,7 @@ export class BrowserRunner {
   private _processArguments: string[];
   private _userDataDir: string;
   private _isTempUserDataDir?: boolean;
+  private _dockerOptions?: DockerOptions;
 
   proc = null;
   connection = null;
@@ -62,13 +63,15 @@ export class BrowserRunner {
     executablePath: string,
     processArguments: string[],
     userDataDir: string,
-    isTempUserDataDir?: boolean
+    isTempUserDataDir?: boolean,
+    dockerOptions?: DockerOptions
   ) {
     this._product = product;
     this._executablePath = executablePath;
     this._processArguments = processArguments;
     this._userDataDir = userDataDir;
     this._isTempUserDataDir = isTempUserDataDir;
+    this._dockerOptions = dockerOptions;
   }
 
   start(options: LaunchOptions): void {
@@ -86,19 +89,36 @@ export class BrowserRunner {
     debugLauncher(
       `Calling ${this._executablePath} ${this._processArguments.join(' ')}`
     );
-    this.proc = childProcess.spawn(
-      this._executablePath,
-      this._processArguments,
-      {
-        // On non-windows platforms, `detached: true` makes child process a
-        // leader of a new process group, making it possible to kill child
-        // process tree with `.kill(-pid)` command. @see
-        // https://nodejs.org/api/child_process.html#child_process_options_detached
-        detached: process.platform !== 'win32',
-        env,
-        stdio,
+    if (this._dockerOptions) {
+      this.proc = childProcess.spawn(
+        '/usr/local/bin/docker',
+        ['run', '-p=9333:9333', '-p=3000:3000', this._dockerOptions.image, this._dockerOptions.executablePath, ...this._processArguments, "--remote-debugging-address=0.0.0.0"],
+        {
+          // On non-windows platforms, `detached: true` makes child process a
+          // leader of a new process group, making it possible to kill child
+          // process tree with `.kill(-pid)` command. @see
+          // https://nodejs.org/api/child_process.html#child_process_options_detached
+          detached: process.platform !== 'win32',
+          env,
+          stdio,
+        }
+      );
+    }
+    else {
+      this.proc = childProcess.spawn(
+        this._executablePath,
+        this._processArguments,
+        {
+          // On non-windows platforms, `detached: true` makes child process a
+          // leader of a new process group, making it possible to kill child
+          // process tree with `.kill(-pid)` command. @see
+          // https://nodejs.org/api/child_process.html#child_process_options_detached
+          detached: process.platform !== 'win32',
+          env,
+          stdio,
+        }
+        );
       }
-    );
     if (dumpio) {
       this.proc.stderr.pipe(process.stderr);
       this.proc.stdout.pipe(process.stdout);
diff --git a/src/node/LaunchOptions.ts b/src/node/LaunchOptions.ts
index da9fd650..0dd2d8ac 100644
--- a/src/node/LaunchOptions.ts
+++ b/src/node/LaunchOptions.ts
@@ -58,6 +58,11 @@ export type ChromeReleaseChannel =
   | 'chrome-canary'
   | 'chrome-dev';
 
+export interface DockerOptions {
+  image: string;
+  executablePath: string;
+}
+
 /**
  * Generic launch options that can be passed when launching any browser.
  * @public
@@ -67,6 +72,10 @@ export interface LaunchOptions {
    * Chrome Release Channel
    */
   channel?: ChromeReleaseChannel;
+  /**
+   * Docker options. If provided, `channel`, `executablePath`, `product` and `pipe` are ignored.
+   */
+  docker?: DockerOptions;
   /**
    * Path to a browser executable to use instead of the bundled Chromium. Note
    * that Puppeteer is only guaranteed to work with the bundled Chromium, so use
diff --git a/src/node/Launcher.ts b/src/node/Launcher.ts
index 17301d0e..c113bc15 100644
--- a/src/node/Launcher.ts
+++ b/src/node/Launcher.ts
@@ -73,6 +73,7 @@ class ChromeLauncher implements ProductLauncher {
       dumpio = false,
       channel = null,
       executablePath = null,
+      docker = null,
       pipe = false,
       env = process.env,
       handleSIGINT = true,
@@ -157,7 +158,8 @@ class ChromeLauncher implements ProductLauncher {
       chromeExecutable,
       chromeArguments,
       userDataDir,
-      isTempUserDataDir
+      isTempUserDataDir,
+      docker
     );
     runner.start({
       handleSIGHUP,

It works a charm if the docker image is already downloaded. I used it to run ~800 tests with Karma, including several visual snapshot tests. I tested it with our internal docker image that runs in the CI pipeline too (alpine, with a different executable path), that worked just fine also.

I'd be happy to work on contributing this feature properly if the Puppeteer team are willing to accept it.

@andyearnshaw
Copy link
Contributor Author

andyearnshaw commented May 3, 2022

I've published an npm package that can help with this in the meantime, https://github.com/andyearnshaw/docketeer:

$ npx docketeer browserless/chrome:latest npm run visual-snapshot

It works by setting the PUPPETEER_EXECUTABLE_PATH environment variable and launching the browser via a docker container. There are a few kinks to work out, such as the container not properly exiting when Puppeteer calls kill() on the process group. Puppeteer being aware of the docker process would make this simpler, so I'm still hoping there's interest in adding direct docker support and therefore keeping this issue open.

@frank-dspeed
Copy link

lol docker is the most complicated way to run pupeteer as even chromium does not run well in it because of many isolation issues and i know that as i am Docker/moby contributor since 2014 so over 8 yeary of expirence with that the GPU part for example is relativ problematic and many other issues that normal none OS Programmers are not aware of.

Do not get me wrong it is possible but internal it is a nightmare. Because docker and Chrome are using same system api's and you need to relax dockers own isolation or turn of the chrome isolation. Both are designed to run exclusiv on a Operating System. unlike Chrome OS which can run docker it self and is it self a OS. Chromium OS Shares a lot of code with Chromium it self. so You see where the Problem comes from Chromium is more then a simple browser (render engine) it is a full blown Platform with a gigantic standard set of Libs and Modules that build even into a full OS as soon as you add More window and user management code.

@frank-dspeed
Copy link

frank-dspeed commented May 28, 2022

It makes zero sense to implement this inside pupeteer it is out of scope this should get closed:

and all who want to do this they simply create a pupeteer docker image and then use child_process.execSync to do the magic or spawnSync what ever you favor or what ever system management is in between that.

It would not even be clear what that launch() would do!

  • Should it detect a local docker socket
  • should it accept docker hosts as target
  • should it use a local docker cli
  • should it use the http api
  • should it support full blown kubernetes?
  • .-....... the list is endless

Short Conclusion

Puppeteer can be seen as application it is out of scope for it to supply commands or functions that allow to fork the program it self inside a diffrent environment.

does any one elses software support that? I am not aware of a single case where a App has a Function to start it self inside docker. No Joke i do not even know a single one.

@andyearnshaw
Copy link
Contributor Author

Thanks for your comment @frank-dspeed.

docker is the most complicated way to run pupeteer

Yes, this is one of the main motivations for this suggestion: to improve the ergonomics of it.

chromium does not run well in it
[...]
Do not get me wrong it is possible but internal it is a nightmare

Chromium runs fine in Docker, and I'm speaking from plenty of experience too. Our pipelines run various docker images with Chromium many times a day for e2e tests, integration tests (including visual snapshot tests). I haven't run into any such issues in the several years since building and maintaining these images and the extensive tests that run on them.

It's imperitive to use Puppeteer in a Docker environment for visual snapshot tests to get a consistent rendering across different host operating systems.

and all who want to do this they simply create a pupeteer docker image and then use child_process.execSync to do the magic or spawnSync what ever you favor or what ever system management is in between that.

It's not as simple as this for many reasons, some of which I've discovered since implementing such a solution (see the library above). I've given other reasons in my main post too.

It would not even be clear what that launch() would do!

I don't understand your point here. This is merely an implementation detail that would be the choice of whoever should implement this feature if it were to be accepted by the puppeteer team. The fact that there are many different approaches that could be taken is not a good reason to dismiss the idea.

In any case, I would argue that using the docker cli would be the simplest approach and wouldn't require too many additional hoops to how Chromium is currently launched. This is how I've achieved it, but the aforementioned issues make it an imperfect solution. Having the launched instance be directly under the control of the puppeteer library removes the need for workarounds I've had to employ, improves resilience and robustness and, to circle back to the first point I made, improves the ergonomics of having that consistent browser experience on all operating systems.

@frank-dspeed
Copy link

but all your changes can be easy addressed by a extra package like pupeteer-docker it is not hard to integrate but ok if this issue should get any upvotes why not but i guess it will slow down overall puppeteer development it has already a lot of open issues and i guess keeping up with the chromium api is work enough for this project unit but maybe i am wrong.

@andyearnshaw
Copy link
Contributor Author

The changes cannot be easily addressed by an external package for the reasons I've already pointed out. That's what https://github.com/andyearnshaw/docketeer is about but there are several problems that are difficult to overcome, such as Puppeteer killing the process and leaving the container orphaned, or connecting the fd3/fd4 pipes for --remote-debugging-pipe.

Please feel free to take a look, I'm sure your experience as a Docker/Moby contributor for 8 years would be really useful.

@frank-dspeed
Copy link

@andyearnshaw ah that is related to the init process and such stuff it needs to use s6 init then the pipes will work

@frank-dspeed
Copy link

Update condenset the Problem down

the root effect you want to have is replace the chrome executeable with the following binary

'/usr/local/bin/docker',
+        ['run', '-p=9333:9333', '-p=3000:3000', this._dockerOptions.image, this._dockerOptions.executablePath, ...this._processArguments, "--remote-debugging-address=0.0.0.0"],

Solution

Create a local binary that runs the command something like a bash_script name it chromium mark it +x

pseudo code still working on it
chromium-binary marked as +x executable.

#!/bin/bash
#set e or something like that
/usr/local/bin/docker run -p=9333:9333 -p=3000:3000 this._dockerOptions.image this._dockerOptions.executablePath ...this._processArguments --remote-debugging-address=0.0.0.0"

example.js

const browser = await puppeteer. launch({ 
   args: [
    "--remote-debugging-address=0.0.0.0"
  ],
  executablePath: './'chromium-binary' 
});
// ready to take off

@andyearnshaw
Copy link
Contributor Author

@frank-dspeed thanks for posting that suggestion. I have a few concerns, but I haven't gotten around to evaluating it yet. I've set up andyearnshaw/docketeer#7 to handle that evaluation and any further discussion as I'd prefer to keep this thread noise-free and for discussion about the feature request itself.

@jrandolf it appears you tagged as bug instead of feature-request, was that intentional?

@frank-dspeed
Copy link

frank-dspeed commented Jun 8, 2022

Final Solution!

The following example exposes defaultArgs for the Chrome and Firefox Launcher via undocumented api's
and uses the documented ignoreDefaultArgs: true option to supply fully costumized args: [] containing the defaults for the given browser launcher

import

import Puppeteer from 'puppeteer';

in case of firefox inside docker set product before running the next part:
is not needed if you started Puppeteer with a Environment var

Puppeteer._productName = 'firefox'

finaly launch

Puppeteer.launch({
    // product: 'firefox', // if you want to use firefox do not forget the above 
    ignoreDefaultArgs: true, // use only our supplyed args: []
    executablePath: '/usr/bin/docker', // we want to run docker right?
    args: [
       // docker arguments
        'run', 
        '-p=9333:9333', 
        '-p=3000:3000',
        '-it',
        `${dockerImage}`,
        `/usr/bin/google-chrome`,
        // end docker arguments
        ...Puppeteer._launcher.defaultArgs({ 
                devtools: false, // devtools auto open needs headless false
                headless: true, // needs to be false if devtools auto open
                userDataDir // needs to get set here no where else is optional
         }),
         // Here comes what you would put into args in general
    ],
    pipe: true
}

How this works

Puppeteer._launcher is a getter that looks up _productName and returns the correct launcher which holds the defaultArgs Method.

_productName also gets set on Puppeteer.launch({ product: 'chrome'}) to the value of product

we avoid calling the internal Puppeteer._launcher.defaultArgs(options) call with the ignoreDefaultArgs: true option as we can filter the output our self if needed as we call it our self

@jrandolf
Copy link
Contributor

jrandolf commented Jun 8, 2022

@jrandolf it appears you tagged as bug instead of feature-request, was that intentional?

Result of a bulk labeling.

@frank-dspeed
Copy link

@jrandolf is this class based application design fixed? or would i be allowed to refactor to a more functional codebase?

i guess it would improve iteration speed a lot as also types and code reuse overall i needed to read the whole code base because of the class inharitance it would be more easy to use functional composition to configure Puppeteer as also make
it more test able and so on i would invest into that it would take less then 3 days includes testing as i would keep the original class based structure as a facade till it is complet.

@jrandolf
Copy link
Contributor

jrandolf commented Jun 9, 2022

This is closed in favour of #3072. In particular, puppeteer is already disjoint enough from chromium through the 'connect' interface, so containerising puppeteer fills the gap of "inconsistent" environments.

@jrandolf jrandolf closed this as completed Jun 9, 2022
@andyearnshaw
Copy link
Contributor Author

@jrandolf I feel like #3072 only strengthens the case for this feature request, the two requests compliment each other significantly.

Let me clarify. If official Puppeteer images are published, the typical approach in using them would be to map in your scripts using -v in some docker run command. This is problematic in non-Linux environments when you have a lot of disk I/O because of how volumes are virtualized in Mac and Windows. This pretty much means any TypeScript or other transpiled codebases are much slower inside a Docker container in those environments. For some of the tests run at the company I work for, the difference is >15 minutes in running visual snapshot tests. The biggest benefit of this feature request is making the process of avoiding that trap as ergonomic as possible for Puppeteer users.

If Puppeteer is both included in the container but can also just launch the containerised browser from outside the container, you're covering all bases for users; the container can be used exclusively in the CI pipeline, just the browser can be used outside.

@jrandolf
Copy link
Contributor

jrandolf commented Jun 9, 2022

@andyearnshaw Those statistics look very off. Did you try other mount methods? How about repeated runs? Does caching help?

Also, if puppeteer runs in a container, then with this feature, puppeteer will run the containerized browser in the container, so I don't see how #3072 complements this feature.

@andyearnshaw
Copy link
Contributor Author

@jrandolf yes, I've tried other mount methods to no avail, but the slowness is acknowledged by the Docker engineers themselves. Docker for Mac 4.6 has an experimental file sharing implementation called virtiofs which improves performance drastically. If you read the release post, you'll see that their own statistics are similar to mine. virtiofs gains, to pick one quote that supports my statement:

An 80% improvement in the time taken to boot a monolithic Typescript app (1m 30s to 18s)

However, my test suites running with virtiofs still take twice as long as when they are running on the host instead of the container (and there are other issues with the "new Virtualization framework" that is required). As for repeated runs, caching, etc, those all depend on configuration of the build tool and such options may or may not be available. Some build tools like WebPack offer watching, which will obviously increase speed for re-runs.

Also, if puppeteer runs in a container, then with this feature, puppeteer will run the containerized browser in the container, so I don't see how #3072 compliments this feature.

Because although the idea on that feature (currently) is that Puppeteer will run in a container, it will also enable easier running the browser in the container, with Puppeteer running on the host. Currently we maintain our own Puppeteer builds with Node, so every time we need to upgrade Puppeteer in our projects, we also need to produce a new image. If #3072 gets done, then upgrading would mostly be a simple case of updating the image tag in the same commit as updating package.json. Going even further, Puppeteer could provide default values for the image and tag, reducing the configuration (as suggested in my first post) to docker: true in the simplest of cases. Running Puppeteer in a container just can't get much easier than that.

@frank-dspeed
Copy link

frank-dspeed commented Jun 9, 2022

i guess that it is a design failure to even use defaultArgs for anything the documentation and why they are needed needs to be improved we need something like the devtools-protocol repo for chromium and firefox args and so on

you can not see docker as indipendent target as docker is just the app it self by design docker is not a vm it is a compiler target like you would compile to single binary. so your resulting docker image with chromium included should contain only the entrypoint because of edge cases like CMD usage breaks the signaling and so on

so a docker image should be seen as single file binary while docker handels IPC

the docker arguments do depend highly on the docker hosts configuration they can be seen as individual Os so with every args change your binary supports a diffrent OS

what your asking for is like supporting: https://enigmaprotector.com/en/aboutvb.html

Enigma Virtual Box embeds a special loader into the main application module which runs before the main code of the application. The loader intercepts system calls to the disk for file read/write, and if the target file is virtualized, the Enigma Virtual Box will emulate the file in memory and return the required result.

@andyearnshaw
Copy link
Contributor Author

@frank-dspeed I'm not sure if something is getting lost in translation, but it feels like what you're implying that it's not possible to do this. I have proven this not to be the case. Yes, there are issues—some of which could be mitigated by Puppeteer's source code having the knowledge that it's launched a docker container—but there is no point letting perfect be the enemy of good.

Here are the main problems I've come across:

  • Puppeteer sends the SIGKILL signal to the docker run process if a custom userDataDir is not specified. This could easily be solved if we change puppeteer to not send the SIGKILL signal when running a docker container.
  • --remote-debugging-pipe does not work and it seems like it might be impossible at the moment, I'm not sure about that. Is this problematic? Sure, this would have been my preferred way to connect Puppeteer with the browser. However, remote debugging via network is still available.
  • --remote-debugging-port=0 can't work because we need to know which ports to bind on the host before the browser runs. Manually choosing a random available port will suffice here, albeit it's not perfect.

Despite these issues, I'm using this solution in the real world to run my tests much, much faster. My UX colleagues have praised the significant amount of time that it saves them when running or updating their hundreds of visual snapshot tests locally. So please, don't imply that this cannot work simply because it is not perfect.

@frank-dspeed
Copy link

frank-dspeed commented Jun 10, 2022

you got me wrong i say it works and it works perfekt and has nothing todo with puppeteer i recoded it now complet including playwright and did many iterations on that i know everything it does and does not

to get it working with remote pipe you need to adjust the docker args and then the puppeteer launcher or you even do it more clever and spawn the docker container your self and then use puppeteer connect and there you can also none documented insert a custom transport that would be the stdio in and out

all this is do able all this works all that is out of scope for the puppeteer project thats my whole point

the core vital part of puppeteer is its Connection.js thats it the rest is sugar and wrappers lets say models around the existing api they are opinionated

this defines the communication channel to every Page and frame

https://github.com/puppeteer/puppeteer/blob/main/src/common/Connection.ts

when you want to use the ws method use puppeteer.connect() and spawn it in your JS or if you want to code a bit more examin that connect method then you see you can supply a custom transport examin transport and pass your stdio and you get pipe
pipe args does not work with remote port and so on there can be only 1 debugger attached.
that is the only file that matters from the whole project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants