Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pnpm monorepo docker support #1637

Closed
vjpr opened this issue Jan 31, 2019 · 25 comments
Closed

pnpm monorepo docker support #1637

vjpr opened this issue Jan 31, 2019 · 25 comments
Labels
area: monorepo Everything related to the pnpm workspace feature

Comments

@vjpr
Copy link
Contributor

vjpr commented Jan 31, 2019

I'm starting work on Dockerizing my pnpm monorepo. Hopefully when I'm done we can have a recipe for how to deploy a monorepo containing multiple services sharing multiple packages as separate Docker images that build quickly after changes are made, and are lightweight.

Plan

Assumption: You don't care about publishing packages to npm during a deploy.

Goal: Only build the services whose dependencies have changed.

  • Work out what has changed using git diff.
  • For every modified file, find its containing package (find up package.json).
  • Find all dependents of this package. (i.e. pnpm ...pkg-with-changes).
    • NOTE: Need to make sure all dependencies are explicitly defined. E.g. If I change the root babel.config.js, every file will need rebuilding. Need to include the babel cache key in the decision to rebuild a package or not. The root package should use the files prop to define all files that are needed for a full build.
  • If a dependent package has a Dockerfile in its root directory (i.e. it is a deployable service), mark it as needing to be rebuilt.
  • For all services that need to be rebuilt...
    • Find all dependencies which are monorepo packages.
    • Copy the entire repo to a temp dir (.docker-trees), filtering out any packages not in this service's dependency tree.
      • This dir should be inspectable because if a dependency is not defined correctly, the user will need to look into this dir to find out what is missing.
    • Run docker build -f <path-to-dockerfile> . from the filtered repo temp dir. (The Dockerfile could be shared).

Perf (local dev and CI)

  • If I have 100 services, the file copying could be slow and take up a lot of space because we need 100 temp dirs. Maybe hard links would be better. Can Docker context handle hard links?
  • docker build accepts a tar as a context. (If you provide a path like docker build .)

Perf (during local dev)

  • Can we use our existing babel (and other) caches in node_modules.cache...or should we force a rebuild just to be safe.
  • Could --use-store-server speed up multiple Docker installs? Can a server be shared outside of a Docker container?
  • Is there any point running a server now that we have shared-workspace-shrinkwrap?

Perf - prevent all root package dependencies (node_modules) being included in every service build

peer deps

The root package contains peer deps. If there is a large peer dep required by only two packages, it will be in the root package, and included in every Docker image. This is not good.

We only want peer deps that are used by packages in our partial repo tree. So we could filter the root package.json - using a custom pnpmfile perhaps...

All peer deps should be explitly defined (if they are not, we use the pnpmfile.js to declare peer deps). Sometimes we cheat though, and just install them in the root package which makes them resolvable. E.g. knex has an optional peer dep of: mssql, pg, etc. These should be added as peer deps to the knex package. Or installed in the root.

We can use --strict-peer to ensure that all peers are satisfied. If this is not the case, then include all deps from the root package.json.

tools

Sometimes there are tools in the root package like @babel/core.

Perf - stripping unnecessary files

We want the tests included during ci testing, but then when deploying we want to remove them. Use build containers. Should simulate the docker structure in .docker-trees dir.

Perf - caching node_modules between builds

Ideal scenario:

  • Pnpm runs a headless install

  • The virtual store (nm/.registry.npmjs.org) contains all packages needed in repo

  • we just need to symlink sub-packages

  • If a package is missing from the virtual store (nm/.registry...):

  • all packages are already in the global pnpm store

  • only the file copying from store to node_modules needs to run.

Options for global store persistence:

  • Bind mount. Mount host's ~/.pnpm-store/.
    • Could it be a problem if the Docker container runs pnpm and the host runs pnpm at the same time (and they could be different pnpm versions)?
      • What about native dep artifacts - are they stored separately depending on OS+node version? See --side-effects-cache
    • PRO: Can share the host's pnpm-store during dev, so its much faster to build an image and test it.
  • Data volume.
    • CON: Would grow quite large as every build would add to it.
    • PRO: Doesn't interfere with local development store.

Avoiding copying from global store to node_modules each install:

  • Symlinking?

Virtual Store (node_modules/.registry...) caching:

  • If we cannot hardlink, we need to reduce file copy operations from store to node_modules.
  • Every service build can use the same virtual store dir.
    • We mount it as a bind volume or data volume.
    • Do cloud CI services allow sharing a cache across builds of different services?
    • CON: It will make the Docker images bigger than they need to be because they will not need every package in the virtual store.
  • Then we just need to update the symlinks of the sub-packages.

Plan - npm publishing

b. If you want some packages published to npm and for them to be installed from npm when building your image.

I guess Lerna's approach works.


Pnpm apis to use

  • How can I programmatically get the dependent tree of all local workspace packages? Does pnpm have an api for this?

CI notes

  • Seen in v3 features: Don't run verify-store-integrity for CI.
@zkochan
Copy link
Member

zkochan commented Jan 31, 2019

Could --use-store-server speed up multiple Docker installs? Can a server be shared outside of a Docker container?

If you mean, when preparing the servers to be dockerized, then no. Just running pnpm recursive install will be faster. Recursive install can do installation in several independent node_modules concurrently (if they use the same store).

How can I programmatically get the dependent tree of all local workspace packages? Does pnpm have an api for this?

No way to do it programmatically at the moment but we can move this logic to a separate package

Regarding the virtual-store/store, I had an idea that might be related. I'd like to move the tarball files out from the store. That would give us several benefits, some of which:

  • we may store the tarballs as a verdaccio storage. In that case, pnpm's registry mirror could be used as a storage for a local verdaccio. You could browse your registry mirror via the nice verdaccio web interface.
  • we should have one store per disk but we may have 1 registry mirror per system

Another thing I was thinking about, pnpm could have a feature that would pack all the needed tarballs by a repo into a local repo-specific registry mirror. That registry mirror could be commited with the repo. As a result, the git repo become completly independent.

cc @octogonz, @etamponi

@vjpr

This comment has been minimized.

@zkochan

This comment has been minimized.

@zkochan

This comment has been minimized.

@vjpr

This comment has been minimized.

@vjpr

This comment has been minimized.

@vjpr

This comment has been minimized.

@kaidjohnson
Copy link

We have successfully built a docker build and deployment infrastructure on top of our pnpm workspace projects.

Package Structure

  • We have our own bespoke packages in /packages
  • Our application stacks are in /apps
  • We have some legacy application stuff in /legacy

Each of these folders is included in our pnpm-workspace.yaml:

packages:
  - 'packages/**'
  - 'apps/**'
  - 'legacy/**'

Docker Layers

Pnpm - handles the vm requirements (node and pnpm)

FROM node:10.18.0
ENV PNPM_VERSION 4.9.2 # Control pnpm version dependency explicitly
RUN curl -sL https://unpkg.com/@pnpm/self-installer | node

Install - handles the top-level pnpm install. We've kept the file content light:

FROM pnpm
WORKDIR /repo

# These folders include `package.json` files _only_ to avoid having to re-run pnpm install on every code change. They are pre-filtered via our build orchestrator (gradle)
COPY legacy/ legacy/
COPY apps/ apps/
COPY packages/ packages/

# Top-level monorepo concerns that are shared across projects
COPY .browserslistrc .browserslistrc
COPY .npmrc .npmrc
COPY pnpmfile.js pnpmfile.js
COPY pnpm-workspace.yaml pnpm-workspace.yaml
COPY pnpm-lock.yaml pnpm-lock.yaml
RUN pnpm install -r --frozen-lockfile && \
	rm -rf ~/.pnpm-store;

Lint - allows linting to be controlled separately from other build concerns to maximize caching

FROM install as lint
WORKDIR /repo

# These folders include all lintable (.js and .ts) files in each package as well as eslint-specific configs (eslintignore, eslint.config.js, etc). They are pre-filtered via our build orchestrator (gradle)
COPY legacy/ legacy/ 
COPY apps/ apps/
COPY packages/ packages/
COPY .eslint/ .eslint/
RUN pnpm run lint -r --no-sort && \
	find . -delete && \
	echo "OK" > results.txt

FROM alpine
COPY --from=lint /repo/results.txt /repo/results.txt

Test - allows testing to be controlled separately from other build concerns to maximize caching (our legacy stack is tested on a separate image that isn't worth exploring here)

FROM install as test
WORKDIR /repo

# These folders include all files needed for testing from each package as well as test-specific configs (webpack.test.js, jest.config.js, etc). They are pre-filtered via our build orchestrator (gradle)
COPY packages/ packages/
RUN pnpm run test -r --filter ./packages --no-sort -- --runInBand
COPY apps/ apps/
RUN pnpm run test -r --filter ./apps --no-sort -- --runInBand
RUN find . -type f -not -path "*/build/coverage/*" -delete && \
	find . -type l -delete && \
	find . -type d -empty -delete

FROM alpine
COPY --from=test /repo/ /repo/

With those layers decoupled, we can build our application (deployable) images without additional cruft:

FROM install AS build
WORKDIR /repo

# These files contain all the necessary runtime files (no test config, no test files, no lint config, etc) Just the requirements for building your application for runtime.
COPY /packages/ /repo/packages/ # dependencies, as needed.
COPY /apps/my-app/ /repo/apps/my-app/
RUN pnpm run build --filter my-app \
	&& find . -type f -not -path "*/build/dist/*" -not -path "*/build/conf/*" -delete \
	&& find . -type l -delete \
	&& find . -type d -empty -delete

FROM nginx:1.16.1
COPY nginx.conf /etc/nginx/nginx.conf
COPY default.conf.template /etc/nginx/conf.d/default.conf.template
COPY --from=build /repo/apps/my-app/build/conf /etc/nginx/conf.d
COPY --from=build /repo/apps/my-app/build/dist /var/www
# HEALTHCHECK goes here, if needed
# ENTRYPOINT goes here, if needed
CMD ["nginx", "-g", "daemon off;"]

Note that each decoupled layer inherits from the base install layer, and not from each other. By separating concerns, we have been able to achieve:

  1. maximal caching of our pnpm installs
  2. maximal caching of test running and linting
  3. lightweight production images

It is important to re-emphasize that the folder COPY tasks are prefiltered via our build orchestrator (gradle), so we're cherry-picking only the files that we actually want. You could, alternatively, have a much more verbose COPY stack that does the cherry-picking for you, but Docker's COPY command is not capable of doing this gracefully, so pre-filtering before the docker build layers worked well for us and scales more elegantly with the ability to use globbing for this requirement.

As with all things docker, composition is your friend. I hope these examples are useful. We have a few additional layers between install and the actual application images, as we have a few cross-dependencies from our legacy stacks, but the example laid out above is the core architecture for our builds. And because we have some layers in between, we've proven that this setup actually scales based on individual circumstances.

@zkochan
Copy link
Member

zkochan commented Feb 5, 2020

@kaidjohnson Thanks for sharing! This is really useful information!

@RDeluxe
Copy link

RDeluxe commented May 4, 2020

Hi, @kaidjohnson I might be missing something but your production image still pack the whole node_modules folder, right ?
The one created in your "install" layer contains all the deps for all your packages, or are you filtering right from this first step ?

Edit ; ok scratch that, it's written right there

@kaidjohnson
Copy link

@RDeluxe While the install image does have all the package deps, we only end up copying the build outputs to the final application images, so those images do not include node_modules.

@RDeluxe
Copy link

RDeluxe commented May 4, 2020

Ok, that's what I thought at the beginning. Problem is, some apps (say Nestjs apps, or NextJs) need the node_modules folder.

@kaidjohnson
Copy link

kaidjohnson commented May 4, 2020

That's true. If we needed to run a nodejs container rather than nginx, we would probably not discard the pnpm store in the install layer and we probably would rerun a pnpm install -P on the specific application package (in the final layer) as to hydrate only as much of the node_modules as is needed for production. It's the same set of layers with some slightly different implementation details in order to support the production image requirements.

@gouroujo
Copy link

Hello,
I've got some trouble to use pnpm in a monorepo of NestJS micro-services. I've got /libraries/ folder containing common libraries and /packages/ that contain all micro-services
I'm trying to build and run a specific package. External dependencies are not bundled and should be installed to launch the server.
My dockerfile look like :

FROM node:12-alpine as pnpm
ENV PNPM_VERSION 5.4 # Control pnpm version dependency explicitly
RUN apk --no-cache add curl
RUN curl -sL https://unpkg.com/@pnpm/self-installer | node

FROM pnpm as install
ARG NPM_TOKEN
ARG PACKAGE_NAME=""
ENV NPM_CONFIG_LOGLEVEL error
WORKDIR /app
COPY pnpm-lock.yaml .
COPY pnpm-workspace.yaml .
COPY pnpmfile.js .
COPY *.json ./
RUN echo "//registry.npmjs.org/:_authToken=${NPM_TOKEN}" > ~/.npmrc
COPY libraries/libA/package.json libraries/libA/package.json
# COPY all libs ...
COPY packages/${PACKAGE_NAME}/package.json packages/${PACKAGE_NAME}/package.json


FROM install as builder
ARG PACKAGE_NAME=""
WORKDIR /app
# install dependencies for the selected package and its dependencies (direct and non-direct)
RUN pnpm install -r --reporter=append-only --ignore-scripts --filter @mymodule/${PACKAGE_NAME}...
COPY libraries/ libraries/
COPY packages/ packages/
RUN pnpm run build --filter @mymodule/${PACKAGE_NAME}...


FROM install as runner
ARG PACKAGE_NAME=""
ARG PORT=3000
ENV NPM_CONFIG_LOGLEVEL warn
ENV NODE_ENV production
ENV PORT ${PORT}
EXPOSE ${PORT}
WORKDIR /app
RUN pnpm install -rP --no-optional --frozen-lockfile --reporter=append-only --shamefully-hoist --filter @mymodule/${PACKAGE_NAME}... && \
  pnpm store prune && \
	rm -rf ~/.pnpm-store
COPY --from=builder /app/libraries/ ./libraries/

WORKDIR /app/packages/${PACKAGE_NAME}
COPY --from=builder /app/packages/${PACKAGE_NAME}/dist/ ./dist/

CMD ["node", "dist/main.js"]

Even if I put the --shamefully-hoist flag when installing production dependencies, I've got plenty of errors like

Error: Cannot find module 'rxjs'
Require stack:
- /app/node_modules/.pnpm/@nestjs/common@7.3.2_reflect-metadata@0.1.13/node_modules/@nestjs/common/cache/interceptors/cache.interceptor.js
- /app/node_modules/.pnpm/@nestjs/common@7.3.2_reflect-metadata@0.1.13/node_modules/@nestjs/common/cache/interceptors/index.js
- /app/node_modules/.pnpm/@nestjs/common@7.3.2_reflect-metadata@0.1.13/node_modules/@nestjs/common/cache/index.js
- /app/node_modules/.pnpm/@nestjs/common@7.3.2_reflect-metadata@0.1.13/node_modules/@nestjs/common/index.js

The only solution so far is to manually add all missing dependencies in the readPackage hook but it's far from ideal !

What do you thing ? How to use shamefully-hoist option ? Thanks !!

@gouroujo
Copy link

gouroujo commented Jul 16, 2020

I have the feeling that the flag --shamefully-hoist was not used in the install command. I ve put a .npmrc at the root of the project with the following content :

recursive-install = false
shamefully-hoist = true

and it's now working !

(I have also changed a little the Dockerfile if anyone is interested)

EDIT: In fact I still got the same issue when I include the pnpm-lock.yaml file inside the build

@Madvinking
Copy link

i had the same problem with yarn before so i created yarn-isolate-workspace package that took care of that exact problem.
i just forked it and create pnpm-isolate-worksapce

that works fine except for the lock-file that is much harder to create from yarn
Since they keep all the workspaces resolutions in it.

other than that it has some good practices for dockerfiles such as separating workspaces srcless files so u could install all workspaces dependencies and then copy the src files.
and create prod-package.json so when installing with --prod flag it won't try to resolve the devDependencies

@AlonMiz
Copy link

AlonMiz commented Jun 12, 2021

@Madvinking the isolation solution is great!
this should be the go-to method when dealing with docker build in monorepo
a sample docker file would be a great improvement ;)

@zkochan
Copy link
Member

zkochan commented Jun 28, 2022

I have released a new command in pnpm v7.4.0-4 called deploy.

You can deploy a project from a workspace by selecting the project with filter and telling the output directory, for example:

pnpm --filter=foo deploy dist

@weyert
Copy link
Contributor

weyert commented Jul 4, 2022

Looks interesting. How do you imagine it to be used? Does it pick the workspace projects used by the package passed via --filter? And can I then build it in the Dockerfile e.g. comes with all the necessary dependencies and devDependencies ? Or is meant to be used to build outside Dockerfile and then used to bundle up the final build artefacts into a docker image?

@zkochan
Copy link
Member

zkochan commented Jul 4, 2022

I assumed you would build it before deployment, so devDependencies are not installed in the target directory. If this is not a correct assumption, I am fine changing it.

@weyert
Copy link
Contributor

weyert commented Jul 4, 2022

Yeah, good point, maybe as an opt-in, would be handy?

@zkochan
Copy link
Member

zkochan commented Jul 4, 2022

ok

@AWare
Copy link
Contributor

AWare commented Jul 5, 2022

it would be really handy if deploy would infer a package to use if the current working directory is one, and to also take its output path relative to the working directory

sorry, if that's a bit offtopic, have added a new issue for it #4980

@DRoet
Copy link

DRoet commented Jul 6, 2022

Having an opt-in for devDependencies would indeed be pretty handy when using pnpm deploy inside the Dockerfile

@zkochan
Copy link
Member

zkochan commented Aug 14, 2022

pnpm deploy install dev deps now by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: monorepo Everything related to the pnpm workspace feature
Projects
None yet
Development

No branches or pull requests

10 participants