Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use BuildKit for Docker builds #54

Merged
merged 9 commits into from
Jul 9, 2022
Merged

Use BuildKit for Docker builds #54

merged 9 commits into from
Jul 9, 2022

Conversation

br3ndonland
Copy link
Owner

@br3ndonland br3ndonland commented Jun 11, 2022

Description

This PR will update Docker builds to use BuildKit, a next-generation build back-end that can be used with front-ends including buildctl, docker, and docker buildx.

BuildKit supports some helpful features, including:

  • Heredocs: here-documents, also known as "heredocs," allow multiple lines of text to be passed into a shell command. Heredoc support was added to Dockerfiles in the 1.4.0 release. This feature enables Dockerfile RUN commands to be written like shell scripts, instead of having to jam commands into long run-on lines.
  • COPY --link, which allows layers to be re-used, even if layers built previously have changed
  • RUN --mount, which allows advanced mounting and caching of files during builds, as well as access to secrets without baking the secrets into the final image layers
  • RUN --network, which allows advanced networking configuration

In addition to the buildctl front-end, Docker confusingly offers two front-end options for working with the BuildKit back-end:

  1. DOCKER_BUILDKIT=1 docker build, using the same Docker CLI commands as usual, but with BuildKit as a back-end instead of the legacy (but apparently not deprecated) build back-end
  2. docker buildx build, invoking a Docker CLI plugin which also uses BuildKit as a back-end

The differences between these two options are poorly-documented, and it is unclear why the Buildx features aren't included in the mainline Docker CLI. The DOCKER_BUILDKIT=1 docker build option will be used.

Changes

Dockerfile

  • Refactor Dockerfile with heredoc syntax
    • The contents of the RUN command were previously added to the base stage to support Alpine Linux (Add support for Alpine Linux #37) and Debian "slim" Linux (Add support for Debian slim Docker images #38).
    • Split the previous base stage into builder and base stages (the builder name comes from the "builder" pattern discussed in the Docker docs)
    • In the builder stage, install build-time dependencies and Python packages (but don't uninstall the build-time dependencies yet, as the previous RUN command did)
    • In the base, starlette, and fastapi stages, start with FROM builder, install any additional packages, and then uninstall build-time dependencies
  • Break heredoc across multiple stages
    • Previously, build-time dependencies for building Python packages with binary extensions were installed and uninstalled in one RUN command, as they were only needed to install some dependencies on Alpine Linux. However, this also meant that build-time dependencies were not available to later installation steps, such as installing FastAPI dependencies. The FastAPI dependency pydantic has binary extensions, and although it now provides wheels with the musllinux tag (for Alpine), inboard should account for the possibility that packages like pydantic might need to build from source.
    • This PR will refactor build-time dependency installation into two steps. The dependencies will be installed in the new builder stage, then uninstalled at the end of each subsequent stage.
  • Use COPY --link to improve layer caching
    • COPY --link allows re-use of layers, even when layers built previously have changed.

docker build command

  • Enable BuildKit in GitHub Actions
    • The DOCKER_BUILDKIT=1 docker build option will be used.
  • Add BUILDKIT_INLINE_CACHE Docker build argument
  • Update docker build --cache-from for inboard
    • --cache-from allows build commands to specify external cache sources.
    • The Docker build was previously caching from the official Python image.
    • With the BuildKit inline cache metadata provided by images built with BUILDKIT_INLINE_CACHE=1, layers from inboard can now also be cached.
    • The inboard caching may not kick in until after this PR is merged, because images must first be pushed to the registry before the cache metadata can be read.
    • The standard caching mechanism may not be effective for the multi-stage builds used here, so --cache-from may not end up reducing build times. See notes section for additional info on caching in GitHub Actions.

Docs

  • Add Docker BuildKit info to CONTRIBUTING.md
  • Add heredoc examples and info to docs

TODOs

  • Build Starlette and FastAPI images concurrently?
  • Figure out which combo of back-end and front-end to use
  • Use the GitHub Actions external cache source?

Notes

Build concurrency

BuildKit automatically runs build stages concurrently when possible. Unfortunately, tags can't be specified for intermediate stages/targets, and inboard needs to specify tags for each stage. Docker recommends multiple builds for multiple targets, so inboard will continue using separate build commands for each stage.

GitHub Actions external cache source

At this time, the caching APIs are confusing and experimental. There are three different caching APIs, two cache export modes, and four "exporters."

Caching APIs:

  1. The Docker CLI offers a --cache-from flag. --cache-from allows build commands to specify external cache sources. The docs on use of this flag are confusing and poorly written, but it appears that this option uses min caching mode, which does not cache intermediate stages.
  2. The BuildKit buildctl front-end offers different flags, --export-cache and --import-cache, which apparently support at least two modes and at least four "exporters."
    1. Modes:
      1. mode=min. The BuildKit README says min only caches layers from the final image, and not from intermediate build stages.
      2. mode=max. This mode can read layer data from intermediate stages in multi-stage builds.
    2. Exporters:
      1. inline: "embed the cache into the image, and push them to the registry together" (isn't this specified by BUILDKIT_INLINE_CACHE=1?)
      2. gha: "export to GitHub Actions cache"
      3. local: "export to a local directory"
      4. registry: "push the image and the cache separately"
  3. Buildx (which uses BuildKit) offers --cache-from as the Docker CLI does, but accepts different values, and also offers a --cache-to flag that apparently supports mode=min and mode=max.

In addition to deciphering the confusing caching APIs, there are other considerations needed:

Trying to figure out all the confusion around caching takes inordinate amounts of human time, and in the case of this project, would only save negligible amounts of machine time (without caching, Docker builds only take 1-3 minutes of machine time). Not worth it at all.

Related

@vercel
Copy link

vercel bot commented Jun 11, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
inboard ✅ Ready (Inspect) Visit Preview Jul 9, 2022 at 6:36PM (UTC)

Previously, build-time dependencies for building Python packages with
binary extensions were installed and uninstalled in one `RUN` command,
as they were only needed to install some dependencies on Alpine Linux.
However, this also meant that build-time dependencies were not available
to later installation steps, such as installing FastAPI dependencies.
pydantic has binary extensions, and although it now provides wheels with
the `musllinux` tag (for Alpine), inboard should account for the
possibility that packages like pydantic might need to build from source.

This commit will refactor build-time dependency installation into two
steps. The dependencies will be installed in the new `builder` stage,
then uninstalled at the end of each subsequent stage.
This commit will refactor the Dockerfile examples in the docs to use
heredoc syntax, and will also add a tip explaining the syntax.
This commit will add `--link` to `COPY` commands in the Dockerfile. This
allows re-use of these layers, even when layers built previously change.

https://github.com/moby/buildkit/blob/HEAD/frontend/dockerfile/docs/syntax.md
The `BUILDKIT_INLINE_CACHE` build argument tells Docker to write cache
metadata into the image during `docker build`. The image can then be
used as a cache source for subsequent builds.

The build argument will be provided with the `docker build` command.
The build argument could be provided directly within the Dockerfile,
but BuildKit does not offer any guidance on where it goes. It could
either be supplied before the first `FROM`, or after the first `FROM`,
and could work differently in each case.

https://docs.docker.com/engine/reference/commandline/build/#specifying-external-cache-sources
`--cache-from` allows build commands to specify external cache sources.
The Docker build was previously caching from the official Python image.
With the BuildKit inline cache metadata provided by images built with
`BUILDKIT_INLINE_CACHE=1`, layers from inboard can now also be cached.

The inboard caching may not kick in until after this PR is merged,
because images must first be pushed to a registry before cache metadata
can be read.

The standard caching mechanism may not be effective for the multi-stage
builds used here, so `--cache-from` may not end up reducing build times.

https://docs.docker.com/engine/reference/commandline/build/#specifying-external-cache-sources
Now that Dockerfile 1.4 is in stable release, this commit will relax the
Dockerfile version from 1.4 to 1, to allow further 1.x releases.

https://github.com/moby/buildkit/blob/f4eb826799e53547c793bfa83a035b8e24a2b88d/frontend/dockerfile/docs/reference.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant