Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐳 (dockerfile) follow hadolint's best practice #65

Merged
merged 1 commit into from
Dec 20, 2021

Conversation

david30907d
Copy link
Contributor

@david30907d david30907d commented Dec 7, 2021

Types of changes

  • Refactoring

Description

  1. use hadolint to lint dockerfiles
  2. setup github Action as CI/CD pipeline

Try to make your PR easy to review

ref: How to Split Pull Requests – Good Practices, Methods and Git Strategies

  • annotate Pull Requests: Is there a specific order in which to review
  • Separate refactoring from features
  • The feature didn't require too many code changes

Checklist:

  • Update the documentation if necessary

Steps to Test This Pull Request

  1. build new images
  2. run these services (I'm still new in this community so have no idea how to test these images to be honest πŸ˜… )

Expected behavior

services should remain the same except there's no python's cache-dir

Comment on lines 34 to 40
- uses: hadolint/hadolint-action@v1.6.0
with:
dockerfile: kuwala/pipelines/population-density/dockerfile

# This is the a separate action that sets up buildx runner
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main logic as follow:

  1. run hadolint on all dockerfiles
  2. run docker build to make sure no one screw up (there's too many dockerfiles so I only setup 1 docker build in this CI pipeline)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the lint tool is good. This is the first time I know hadolint πŸ‘
It helps create a good standard docker file in the CI step. If you can show the error output when the lint is failed in GitHub action, it would be nice. And I wonder if the lint can run in multiple files in parallel without fail-fast and report in the last step to save time. Current lint steps if fail, it will be one by one then it will be annoying when fix one file, commit & push, then wait then fail again.

However, I think it is good practice if, in the local dev environment, we can integrate some plugins into IDE (vscode or intellij or pre-commit hook) to make linter more verbose to the developer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea I've integrated hadolint into pre-commit hook on my local. but I'm using npm's pre-commit πŸ˜…
not sure wether it's ok to commit this npm's package.json to this repo lol

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @zero88 sorry for the late reply, yea make sense about making it output the error and running in parallel!

  1. output: current settings would output the error msg example
  2. parallel: let me separate them into multiple CI yaml so that they can run in parallel~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use this pre-commit hook with hadolint, but as you can see there's many security issues πŸ˜…
need some time to upgrade these packages, or I can create a PR directly since hook are just used on local so no worry about security issue (?)

${{ runner.os }}-buildx-

# And make it available for the builds
- name: Build neo4j and push
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only validate no one screw up neo4j in CI pipeline

Comment on lines 56 to 57
push: false
tags: kuwala/neo4j:${{ github.sha }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your call to decide wether/which CR to push

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the impact/purpose of this? Does it specify which "version of Kuwala" it belongs to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it specify the version of this Kuwala image with sha256 digest of commit πŸ˜„
The point of using this tag is for the sake of rollback, you can rollback your docker image to whatever commit you want by specify the specific commit hash

Comment on lines -1 to +6
FROM jupyter/pyspark-notebook
FROM jupyter/pyspark-notebook:2021-11-20

RUN pip install pandas-profiling[notebook]
RUN pip install --no-cache-dir "pandas-profiling[notebook]==3.1.0"

COPY ./common/jupyter/requirements.txt /opt/requirements.txt
RUN pip install -r /opt/requirements.txt
RUN pip install --no-cache-dir -r /opt/requirements.txt
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are all minor stuff that linter (hadolint) complained about:

  1. should always specify a version

Comment on lines -1 to +7
FROM maven
FROM maven:3.8.4-jdk-11-slim AS maven
COPY ./core/neo4j/plugins/udfs /usr/src/mymaven
WORKDIR /usr/src/mymaven
RUN mvn clean install

FROM neo4j:4.3.0-community
COPY --from=0 /usr/src/mymaven/target/cypher-udfs-0.1-SNAPSHOT.jar ./plugins/cypher-udfs-0.1-SNAPSHOT.jar
COPY --from=maven /usr/src/mymaven/target/cypher-udfs-0.1-SNAPSHOT.jar /var/lib/neo4j/plugins/cypher-udfs-0.1-SNAPSHOT.jar
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. use absolute path or specify a WORKDIR

push:
branches: '*'
jobs:
build_and_deploy:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should separate build_and_deploy into 2 jobs: lint and build. That makes the workflow is cleaner and easy to spot out any problem in each job.

jobs:
  lint:
      // any docker lint step with `hadolint`
  build:
     // do build and cache
     // it should run if lint is successful
     // https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#jobsjob_idif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yea good point~ on it!

@@ -0,0 +1,68 @@
name: Build Dockerfile
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Build Docker Image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea haha my poor english

Comment on lines 34 to 40
- uses: hadolint/hadolint-action@v1.6.0
with:
dockerfile: kuwala/pipelines/population-density/dockerfile

# This is the a separate action that sets up buildx runner
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the lint tool is good. This is the first time I know hadolint πŸ‘
It helps create a good standard docker file in the CI step. If you can show the error output when the lint is failed in GitHub action, it would be nice. And I wonder if the lint can run in multiple files in parallel without fail-fast and report in the last step to save time. Current lint steps if fail, it will be one by one then it will be annoying when fix one file, commit & push, then wait then fail again.

However, I think it is good practice if, in the local dev environment, we can integrate some plugins into IDE (vscode or intellij or pre-commit hook) to make linter more verbose to the developer

# This ugly bit is necessary if you don't want your cache to grow forever
# till it hits GitHub's limit of 5GB.
# Temp fix
# https://github.com/docker/build-push-action/issues/252
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this trick for caching in the local registry.
https://github.com/zero88/gh-registry/blob/main/README.md#usage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to know this tricks, thx~ πŸŽ‰

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, seems to me that they've fixed this issue (PR)
I'll give this trick a shot next time (I need to change my cache ref, other would get this error πŸ˜… )

@david30907d david30907d force-pushed the github-action branch 4 times, most recently from 9586e58 to b28cf33 Compare December 17, 2021 04:00
COPY ./pipelines/common/python_utils /opt/app/pipelines/common/python_utils
COPY ./common/python_utils /opt/app/pipelines/common/python_utils
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zero88 need to have your eye on it~

@mattigrthr mattigrthr linked an issue Dec 17, 2021 that may be closed by this pull request
@mattigrthr mattigrthr added the enhancement New feature or request label Dec 17, 2021
Copy link
Contributor

@mattigrthr mattigrthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested everything, and it's all still working. πŸ‘πŸ½

I want to request a small change to keep the .yml file naming of the workflows consistent. Most of it is snake case, and only Neo4j.yml and Neo4j_importer.yml are not, and I feel we should write them in snake case as well.

There is one particularity to a pipeline. We are using a git submodule for OSM to process the pbf files to parquet files. We created a fork from another repo for that: https://github.com/kuwala-io/osm-parquetizer
That repo contains a Docker file which is also referenced in our docker-compose.yml. Should we add the build of that image to the GitHub actions as well, or should this be done in the submodule repo? If we do it here, I guess we need to clone the repo in the workflow of this repo. What do you think?

@david30907d
Copy link
Contributor Author

david30907d commented Dec 18, 2021

  1. @mattigrthr yea snake case makes sense to me, on it~
  2. seems to me that setup CI workflow in osm-parquetizer makes more sense, that me create a PR for that repo~

Update

about osm-parquetizer, here's the PR πŸ™

Copy link
Contributor

@mattigrthr mattigrthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! This PR really adds some value! Well done @david30907d ! πŸ™ŒπŸ½

@mattigrthr mattigrthr merged commit 50fdfbe into kuwala-io:master Dec 20, 2021
@mattigrthr mattigrthr added this to In progress in Kuwala via automation Dec 20, 2021
@mattigrthr mattigrthr moved this from In progress to Done in Kuwala Dec 20, 2021
@david30907d david30907d deleted the github-action branch December 20, 2021 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Could not build CLI Docker Image
3 participants