🐳 (dockerfile) follow hadolint's best practice #65

david30907d · 2021-12-07T14:13:03Z

Types of changes

Refactoring

Description

use hadolint to lint dockerfiles
setup github Action as CI/CD pipeline

Try to make your PR easy to review

ref: How to Split Pull Requests – Good Practices, Methods and Git Strategies

annotate Pull Requests: Is there a specific order in which to review
Separate refactoring from features
The feature didn't require too many code changes

Checklist:

Update the documentation if necessary

Steps to Test This Pull Request

build new images
run these services (I'm still new in this community so have no idea how to test these images to be honest 😅 )

Expected behavior

services should remain the same except there's no python's cache-dir

david30907d · 2021-12-07T14:14:24Z

.github/workflows/docker_build.yml

+    - uses: hadolint/hadolint-action@v1.6.0
+      with:
+        dockerfile: kuwala/pipelines/population-density/dockerfile
+
+    # This is the a separate action that sets up buildx runner
+    - name: Set up Docker Buildx
+      uses: docker/setup-buildx-action@v1 


main logic as follow:

run hadolint on all dockerfiles

run docker build to make sure no one screw up (there's too many dockerfiles so I only setup 1 docker build in this CI pipeline)

I think the lint tool is good. This is the first time I know hadolint 👍
It helps create a good standard docker file in the CI step. If you can show the error output when the lint is failed in GitHub action, it would be nice. And I wonder if the lint can run in multiple files in parallel without fail-fast and report in the last step to save time. Current lint steps if fail, it will be one by one then it will be annoying when fix one file, commit & push, then wait then fail again.

However, I think it is good practice if, in the local dev environment, we can integrate some plugins into IDE (vscode or intellij or pre-commit hook) to make linter more verbose to the developer

yea I've integrated hadolint into pre-commit hook on my local. but I'm using npm's pre-commit 😅
not sure wether it's ok to commit this npm's package.json to this repo lol

Hi @zero88 sorry for the late reply, yea make sense about making it output the error and running in parallel!

output: current settings would output the error msg example

parallel: let me separate them into multiple CI yaml so that they can run in parallel~

I use this pre-commit hook with hadolint, but as you can see there's many security issues 😅
need some time to upgrade these packages, or I can create a PR directly since hook are just used on local so no worry about security issue (?)

david30907d · 2021-12-07T14:14:53Z

.github/workflows/docker_build.yml

+          ${{ runner.os }}-buildx-
+
+    # And make it available for the builds
+    - name: Build neo4j and push


only validate no one screw up neo4j in CI pipeline

david30907d · 2021-12-07T14:15:23Z

.github/workflows/docker_build.yml

+        push: false
+        tags: kuwala/neo4j:${{ github.sha }}


your call to decide wether/which CR to push

What's the impact/purpose of this? Does it specify which "version of Kuwala" it belongs to?

yes it specify the version of this Kuwala image with sha256 digest of commit 😄
The point of using this tag is for the sake of rollback, you can rollback your docker image to whatever commit you want by specify the specific commit hash

david30907d · 2021-12-07T14:16:07Z

kuwala/core/jupyter/docker/dockerfile

-FROM jupyter/pyspark-notebook
+FROM jupyter/pyspark-notebook:2021-11-20

-RUN pip install pandas-profiling[notebook]
+RUN pip install --no-cache-dir "pandas-profiling[notebook]==3.1.0"

 COPY ./common/jupyter/requirements.txt /opt/requirements.txt
-RUN pip install -r /opt/requirements.txt
+RUN pip install --no-cache-dir -r /opt/requirements.txt


these are all minor stuff that linter (hadolint) complained about:

should always specify a version

david30907d · 2021-12-07T14:17:47Z

kuwala/core/neo4j/docker/dockerfile

-FROM maven
+FROM maven:3.8.4-jdk-11-slim AS maven
 COPY ./core/neo4j/plugins/udfs /usr/src/mymaven
 WORKDIR /usr/src/mymaven
 RUN mvn clean install

 FROM neo4j:4.3.0-community
-COPY --from=0 /usr/src/mymaven/target/cypher-udfs-0.1-SNAPSHOT.jar ./plugins/cypher-udfs-0.1-SNAPSHOT.jar
+COPY --from=maven /usr/src/mymaven/target/cypher-udfs-0.1-SNAPSHOT.jar /var/lib/neo4j/plugins/cypher-udfs-0.1-SNAPSHOT.jar


use absolute path or specify a WORKDIR

zero88 · 2021-12-14T02:36:25Z

.github/workflows/docker_build.yml

+  push:
+    branches: '*'
+jobs:
+  build_and_deploy:


I think you should separate build_and_deploy into 2 jobs: lint and build. That makes the workflow is cleaner and easy to spot out any problem in each job.

jobs: lint: // any docker lint step with `hadolint` build: // do build and cache // it should run if lint is successful // https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#jobsjob_idif

oh yea good point~ on it!

zero88 · 2021-12-14T02:36:58Z

.github/workflows/docker_build.yml

@@ -0,0 +1,68 @@
+name: Build Dockerfile


Should Build Docker Image

yea haha my poor english

zero88 · 2021-12-14T02:59:13Z

.github/workflows/docker_build.yml

+    - uses: hadolint/hadolint-action@v1.6.0
+      with:
+        dockerfile: kuwala/pipelines/population-density/dockerfile
+
+    # This is the a separate action that sets up buildx runner
+    - name: Set up Docker Buildx
+      uses: docker/setup-buildx-action@v1 


I think the lint tool is good. This is the first time I know hadolint 👍
It helps create a good standard docker file in the CI step. If you can show the error output when the lint is failed in GitHub action, it would be nice. And I wonder if the lint can run in multiple files in parallel without fail-fast and report in the last step to save time. Current lint steps if fail, it will be one by one then it will be annoying when fix one file, commit & push, then wait then fail again.

However, I think it is good practice if, in the local dev environment, we can integrate some plugins into IDE (vscode or intellij or pre-commit hook) to make linter more verbose to the developer

zero88 · 2021-12-14T03:02:37Z

.github/workflows/docker_build.yml

+      # This ugly bit is necessary if you don't want your cache to grow forever
+      # till it hits GitHub's limit of 5GB.
+      # Temp fix
+      # https://github.com/docker/build-push-action/issues/252


I used this trick for caching in the local registry.
https://github.com/zero88/gh-registry/blob/main/README.md#usage

good to know this tricks, thx~ 🎉

oh, seems to me that they've fixed this issue (PR)
I'll give this trick a shot next time (I need to change my cache ref, other would get this error 😅 )

david30907d · 2021-12-17T04:04:22Z

kuwala/core/cli/dockerfile

-COPY ./pipelines/common/python_utils /opt/app/pipelines/common/python_utils
+COPY ./common/python_utils /opt/app/pipelines/common/python_utils


@zero88 need to have your eye on it~

mattigrthr

I have tested everything, and it's all still working. 👍🏽

I want to request a small change to keep the .yml file naming of the workflows consistent. Most of it is snake case, and only Neo4j.yml and Neo4j_importer.yml are not, and I feel we should write them in snake case as well.

There is one particularity to a pipeline. We are using a git submodule for OSM to process the pbf files to parquet files. We created a fork from another repo for that: https://github.com/kuwala-io/osm-parquetizer
That repo contains a Docker file which is also referenced in our docker-compose.yml. Should we add the build of that image to the GitHub actions as well, or should this be done in the submodule repo? If we do it here, I guess we need to clone the repo in the workflow of this repo. What do you think?

david30907d · 2021-12-18T03:50:33Z

@mattigrthr yea snake case makes sense to me, on it~
seems to me that setup CI workflow in osm-parquetizer makes more sense, that me create a PR for that repo~

Update

about osm-parquetizer, here's the PR 🙏

mattigrthr

Awesome! This PR really adds some value! Well done @david30907d ! 🙌🏽

david30907d commented Dec 7, 2021

View reviewed changes

david30907d force-pushed the github-action branch from cc1fcce to 1ce0530 Compare December 7, 2021 14:17

david30907d commented Dec 7, 2021

View reviewed changes

zero88 suggested changes Dec 14, 2021

View reviewed changes

david30907d force-pushed the github-action branch 4 times, most recently from 9586e58 to b28cf33 Compare December 17, 2021 04:00

david30907d commented Dec 17, 2021

View reviewed changes

david30907d mentioned this pull request Dec 17, 2021

Could not build CLI Docker Image #68

Closed

mattigrthr linked an issue Dec 17, 2021 that may be closed by this pull request

Could not build CLI Docker Image #68

Closed

mattigrthr assigned david30907d Dec 17, 2021

mattigrthr added the enhancement New feature or request label Dec 17, 2021

mattigrthr requested changes Dec 17, 2021

View reviewed changes

🐳 (dockerfile) follow hadolint's best practice

7d4b9e0

david30907d force-pushed the github-action branch from b28cf33 to 7d4b9e0 Compare December 18, 2021 03:53

david30907d mentioned this pull request Dec 18, 2021

👷 (github action) build CI pipeline kuwala-io/osm-parquetizer#1

Merged

david30907d requested a review from mattigrthr December 18, 2021 04:04

mattigrthr approved these changes Dec 20, 2021

View reviewed changes

mattigrthr merged commit 50fdfbe into kuwala-io:master Dec 20, 2021

mattigrthr added this to In progress in Kuwala via automation Dec 20, 2021

mattigrthr moved this from In progress to Done in Kuwala Dec 20, 2021

david30907d deleted the github-action branch December 20, 2021 10:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐳 (dockerfile) follow hadolint's best practice #65

🐳 (dockerfile) follow hadolint's best practice #65

david30907d commented Dec 7, 2021 •

edited

david30907d Dec 7, 2021

zero88 Dec 14, 2021

david30907d Dec 17, 2021

david30907d Dec 17, 2021

david30907d Dec 17, 2021

david30907d Dec 7, 2021

david30907d Dec 7, 2021

mattigrthr Dec 17, 2021

david30907d Dec 18, 2021

david30907d Dec 7, 2021

david30907d Dec 7, 2021

zero88 Dec 14, 2021

david30907d Dec 17, 2021

zero88 Dec 14, 2021

david30907d Dec 17, 2021

zero88 Dec 14, 2021

zero88 Dec 14, 2021

david30907d Dec 17, 2021

david30907d Dec 17, 2021

david30907d Dec 17, 2021

mattigrthr left a comment

david30907d commented Dec 18, 2021 •

edited

mattigrthr left a comment

		COPY ./pipelines/common/python_utils /opt/app/pipelines/common/python_utils
		COPY ./common/python_utils /opt/app/pipelines/common/python_utils

🐳 (dockerfile) follow hadolint's best practice #65

🐳 (dockerfile) follow hadolint's best practice #65

Conversation

david30907d commented Dec 7, 2021 • edited

Types of changes

Description

Try to make your PR easy to review

Checklist:

Steps to Test This Pull Request

Expected behavior

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattigrthr left a comment

Choose a reason for hiding this comment

david30907d commented Dec 18, 2021 • edited

Update

mattigrthr left a comment

Choose a reason for hiding this comment

david30907d commented Dec 7, 2021 •

edited

david30907d commented Dec 18, 2021 •

edited