-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
π³ (dockerfile) follow hadolint's best practice #65
Conversation
.github/workflows/docker_build.yml
Outdated
- uses: hadolint/hadolint-action@v1.6.0 | ||
with: | ||
dockerfile: kuwala/pipelines/population-density/dockerfile | ||
|
||
# This is the a separate action that sets up buildx runner | ||
- name: Set up Docker Buildx | ||
uses: docker/setup-buildx-action@v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main logic as follow:
- run
hadolint
on all dockerfiles - run docker build to make sure no one screw up (there's too many dockerfiles so I only setup 1 docker build in this CI pipeline)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the lint tool is good. This is the first time I know hadolint
π
It helps create a good standard docker file in the CI step. If you can show the error output when the lint is failed in GitHub action, it would be nice. And I wonder if the lint can run in multiple files in parallel without fail-fast and report in the last step to save time. Current lint steps if fail, it will be one by one then it will be annoying when fix one file, commit & push, then wait then fail again.
However, I think it is good practice if, in the local dev environment, we can integrate some plugins into IDE (vscode or intellij or pre-commit hook) to make linter more verbose to the developer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea I've integrated hadolint
into pre-commit
hook on my local. but I'm using npm's pre-commit
π
not sure wether it's ok to commit this npm's package.json
to this repo lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use this pre-commit hook with hadolint, but as you can see there's many security issues π
need some time to upgrade these packages, or I can create a PR directly since hook are just used on local so no worry about security issue (?)
.github/workflows/docker_build.yml
Outdated
${{ runner.os }}-buildx- | ||
|
||
# And make it available for the builds | ||
- name: Build neo4j and push |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only validate no one screw up neo4j in CI pipeline
.github/workflows/docker_build.yml
Outdated
push: false | ||
tags: kuwala/neo4j:${{ github.sha }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your call to decide wether/which CR to push
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the impact/purpose of this? Does it specify which "version of Kuwala" it belongs to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it specify the version of this Kuwala image with sha256 digest of commit π
The point of using this tag is for the sake of rollback, you can rollback your docker image to whatever commit you want by specify the specific commit hash
FROM jupyter/pyspark-notebook | ||
FROM jupyter/pyspark-notebook:2021-11-20 | ||
|
||
RUN pip install pandas-profiling[notebook] | ||
RUN pip install --no-cache-dir "pandas-profiling[notebook]==3.1.0" | ||
|
||
COPY ./common/jupyter/requirements.txt /opt/requirements.txt | ||
RUN pip install -r /opt/requirements.txt | ||
RUN pip install --no-cache-dir -r /opt/requirements.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are all minor stuff that linter (hadolint) complained about:
- should always specify a version
cc1fcce
to
1ce0530
Compare
FROM maven | ||
FROM maven:3.8.4-jdk-11-slim AS maven | ||
COPY ./core/neo4j/plugins/udfs /usr/src/mymaven | ||
WORKDIR /usr/src/mymaven | ||
RUN mvn clean install | ||
|
||
FROM neo4j:4.3.0-community | ||
COPY --from=0 /usr/src/mymaven/target/cypher-udfs-0.1-SNAPSHOT.jar ./plugins/cypher-udfs-0.1-SNAPSHOT.jar | ||
COPY --from=maven /usr/src/mymaven/target/cypher-udfs-0.1-SNAPSHOT.jar /var/lib/neo4j/plugins/cypher-udfs-0.1-SNAPSHOT.jar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- use absolute path or specify a
WORKDIR
.github/workflows/docker_build.yml
Outdated
push: | ||
branches: '*' | ||
jobs: | ||
build_and_deploy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should separate build_and_deploy
into 2 jobs: lint
and build
. That makes the workflow is cleaner and easy to spot out any problem in each job.
jobs:
lint:
// any docker lint step with `hadolint`
build:
// do build and cache
// it should run if lint is successful
// https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#jobsjob_idif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yea good point~ on it!
.github/workflows/docker_build.yml
Outdated
@@ -0,0 +1,68 @@ | |||
name: Build Dockerfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should Build Docker Image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea haha my poor english
.github/workflows/docker_build.yml
Outdated
- uses: hadolint/hadolint-action@v1.6.0 | ||
with: | ||
dockerfile: kuwala/pipelines/population-density/dockerfile | ||
|
||
# This is the a separate action that sets up buildx runner | ||
- name: Set up Docker Buildx | ||
uses: docker/setup-buildx-action@v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the lint tool is good. This is the first time I know hadolint
π
It helps create a good standard docker file in the CI step. If you can show the error output when the lint is failed in GitHub action, it would be nice. And I wonder if the lint can run in multiple files in parallel without fail-fast and report in the last step to save time. Current lint steps if fail, it will be one by one then it will be annoying when fix one file, commit & push, then wait then fail again.
However, I think it is good practice if, in the local dev environment, we can integrate some plugins into IDE (vscode or intellij or pre-commit hook) to make linter more verbose to the developer
.github/workflows/docker_build.yml
Outdated
# This ugly bit is necessary if you don't want your cache to grow forever | ||
# till it hits GitHub's limit of 5GB. | ||
# Temp fix | ||
# https://github.com/docker/build-push-action/issues/252 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used this trick for caching in the local registry.
https://github.com/zero88/gh-registry/blob/main/README.md#usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to know this tricks, thx~ π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9586e58
to
b28cf33
Compare
COPY ./pipelines/common/python_utils /opt/app/pipelines/common/python_utils | ||
COPY ./common/python_utils /opt/app/pipelines/common/python_utils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zero88 need to have your eye on it~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested everything, and it's all still working. ππ½
I want to request a small change to keep the .yml
file naming of the workflows consistent. Most of it is snake case, and only Neo4j.yml
and Neo4j_importer.yml
are not, and I feel we should write them in snake case as well.
There is one particularity to a pipeline. We are using a git submodule for OSM to process the pbf
files to parquet
files. We created a fork from another repo for that: https://github.com/kuwala-io/osm-parquetizer
That repo contains a Docker file which is also referenced in our docker-compose.yml
. Should we add the build of that image to the GitHub actions as well, or should this be done in the submodule repo? If we do it here, I guess we need to clone the repo in the workflow of this repo. What do you think?
Updateabout osm-parquetizer, here's the PR π |
b28cf33
to
7d4b9e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! This PR really adds some value! Well done @david30907d ! ππ½
Types of changes
Description
Try to make your PR easy to review
ref: How to Split Pull Requests β Good Practices, Methods and Git Strategies
Checklist:
Steps to Test This Pull Request
Expected behavior
services should remain the same except there's no python's cache-dir