Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Java library poor playwright/java docker image integration #1268

Open
msangel opened this issue Apr 27, 2023 · 7 comments
Open

[Feature] Java library poor playwright/java docker image integration #1268

msangel opened this issue Apr 27, 2023 · 7 comments

Comments

@msangel
Copy link

msangel commented Apr 27, 2023

Right now java library when running in the playwright/java docker image is not aware of installed browsers automatically. Also Even having a lot of things installed, the library requires node.js instalation that is not bundled with image(yes, it bundled with the library itself, but this have some drawbacks, see below).

Right now running java application with this library inside of playwright/java container with default settings behaves the same way as it would behave on empty+java base image. When the application started and playwright API is called, these steps happen:

  1. java playwright detects no node.js installed
  2. java playwright unpack node.js in tmp folder
  3. java playwright can't see installed browsers
  4. java playwright download browsers into temp directory

Expected behavior:

  1. java playwright detects bundled with docker image node.js installation via exposed in docker image environment variable
  2. java playwright detects bundled with docker image browsers via exposed in docker image environment variable
  3. No executable files are created in /tmp, docker image follows best security practices about immutable infrastructure
  4. Java library is ready to run immediately without any "preparation"
@yury-s
Copy link
Member

yury-s commented Apr 28, 2023

As long as the image tag matches playwright version you use in the project, the image should already contain all 3 browsers downloaded (in /ms-playwright directory) and all playwright maven packages cached locally (under /root/.m2/repository/com/microsoft/playwright). Please make sure you use playwright version that matches the image. If Playwright still downloads browsers or its packages from the network we'll need a repro.

A bit more on each of the items from the observed behavior section:

  • java playwright detects no node.js installed

Playwright doesn't detect if node.js is installed on the system, instead it uses the one that is shipped in driver-bundle package. This is a deliberate decision. We can change this model in the future but it has served well so far. There is a related feature request: #1196. Also, you can manually specify alternative node.js location via PLAYWRIGHT_NODEJS_PATH or entire driver location via playwright.cli.dir. If you specify the latter playwright will not extract anything from the driver bundle.

  • java playwright unpack node.js in tmp folder

Yes, this happens on every launch of playwright java. Unfortunately, Maven doesn't provide any means for extracting a bundled executable into a managed location.

  • java playwright can't see installed browsers

As I mentioned above this is likely because of the version mismatch.

  • java playwright download browsers into temp directory

The browsers are downloaded into a cache directory managed by playwright rather than /tmp. See this page for more details.

@yury-s
Copy link
Member

yury-s commented May 9, 2023

Closing per the response above, feel free to open a new issue if it doesn't work.

@yury-s yury-s closed this as completed May 9, 2023
@msangel
Copy link
Author

msangel commented May 9, 2023

Doesn't.

I have dived into this more and these is problems the library has:

  • driver instantiation happened on instance initialization and by default via unpacking binaries from jar to /tmp
  • there is a key for use pre-installed driver - this works
  • when using playwright base image, the driver must be already instantiated and the environment variable should point to its location, the environment variable should be picked by java code

@msangel
Copy link
Author

msangel commented May 9, 2023

Anyway. Even manually crafting base image with such capabilities close to impossible, as there is no tooling to unpack driver to any location. I literally had to unzip it from jar and put binaries to docker fs. My approach will fail next time maven dependency is upgraded (it anyway will fail as the library already rely on a specific base image version via PLAYWRIGHT_BROWSERS_PATH)

@msangel
Copy link
Author

msangel commented May 9, 2023

Tradeof will provide tooling for unpacking driver to specific location

@msangel
Copy link
Author

msangel commented May 9, 2023

As the environment I work on explicitly disallows dynamically-created shell script execution. Sad to hear that Microsoft allows such.

@yury-s
Copy link
Member

yury-s commented May 12, 2023

Preinstalling Node.js and driver in Docker images once makes total sense. We may even try to switch to the mode where we download Node.js and the driver to a local cache when starting Playwright the same way as we download browsers today. That has been a low priority task so far but we might find some resources to work on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants