Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All files are included when a non-DBT yaml file is edited #163

Open
willbowditch opened this issue Nov 8, 2023 · 4 comments
Open

All files are included when a non-DBT yaml file is edited #163

willbowditch opened this issue Nov 8, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@willbowditch
Copy link

Describe the bug

I believe pre-commit should only check files staged to be committed. However, if one of the files edited is in the root directory and is a yml or sql file , all files are checked.

This is due to the get_missing_file_paths adding the paths for everything. check-model-has-tests and other hooks are effected.

To Reproduce
Steps to reproduce the behavior:

  1. Have a repo that already has some issues that would be flagged by the hook
  2. Install dbt-checkpoint hook check-model-has-tests
  3. Edit a file in the root of the repo, such as .pre-commit-config.yaml
  4. The hook is then checked against the entire DBT project, not just changed files.

Expected behavior

Only files staged for the commit (and those directly related, such as properties files) should be checked

Version:
v0.1.1

Additional context

I think the problem is that root files break the logic in _add_related_sqls. To prevent this adding an additional condition here:
elif (suffix == ".yml" or suffix == ".yaml") and ".sql" in extensions and Path(path).parent != Path('.'):
fixes it.

Happy to raise a PR if useful.

@willbowditch willbowditch added the bug Something isn't working label Nov 8, 2023
@nathangriffiths-cdx
Copy link

nathangriffiths-cdx commented Nov 17, 2023

I think I just ran into this issue myself, ironically while adding dbt-checkpoint to the project. This required adding or editing these files to the root of the project:

.dbt-checkpoint.yaml
.pre-commit-config.yaml

Which then triggered dbt-checkpoint to run for the entire project on commit, finding a large number of non-compliant files and in theory preventing me from commiting changes to implement dbt-checkpoint in the first place until all the issues are resolved. Not sure if this is intentional or not?

@noel
Copy link
Collaborator

noel commented Nov 17, 2023 via email

@nathangriffiths-cdx
Copy link

@noel

Environment:
dbt-bigquery 1.5.6
Python 3.10.10
Gitbash on Windows 11

Git repo structure:
project_root_folder/dbt_project_folder

dbt_project_folder contains an existing dbt project with a large number of non-compliant files.

Steps
In project_root_folder:
Create the file .dbt-checkpoint.yaml
Update the file .pre-commit-config.yaml

.dbt-checkpoint.yaml

version: 1
dbt-project-dir: dbt_project_folder

.pre-commit-config.yaml

-   repo: https://github.com/dbt-checkpoint/dbt-checkpoint
    rev: v1.1.1
    hooks:
    - id: dbt-compile
    - id: dbt-docs-generate
    - id: check-column-desc-are-same
    - id: check-model-columns-have-desc
    - id: check-model-has-all-columns
    - id: check-model-has-description
    - id: check-model-has-properties-file
    - id: check-script-semicolon
    - id: check-script-has-no-table-name
    - id: check-script-ref-and-source
    - id: check-source-columns-have-desc
    - id: check-source-has-all-columns
    - id: check-source-table-has-description
    - id: check-macro-has-description
    - id: check-macro-arguments-have-desc
$ git status
On branch dbt-checkpoint-implementation
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        new file:   .dbt-checkpoint.yaml
        modified:   .pre-commit-config.yaml

Now commit the changes:
git commit -m "some commit message"

Output:

check yaml....................................................................Passed
check json................................................(no files to check)Skipped
fix end of files..............................................................Passed
trim trailing whitespace......................................................Passed
isort.....................................................(no files to check)Skipped
black.....................................................(no files to check)Skipped
flake8....................................................(no files to check)Skipped
mypy......................................................(no files to check)Skipped
sqlfluff-lint.............................................(no files to check)Skipped
dbt compile...............................................(no files to check)Skipped
dbt docs generate.............................................................Passed
Check column descriptions are same............................................Passed
Check the model columns have description......................................Passed
Check the model has all columns in properties file............................Failed
- hook id: check-model-has-all-columns
- exit code: 1

<< This hook takes about 50 minutes to run and generates errors for dozens of models in the project similar to below >>

Columns in .., but not in Database (....sql):
- name: middle_name
Unable to find model `model....` in catalog file. Make sure you run `dbt docs generate` before executing this hook.
Unable to find model `model....` in catalog file. Make sure you run `dbt docs generate` before executing this hook.
... 
Columns in Database (....sql), but not in ....yml:
- name: some_col
- name: some_other_col
...
Columns in ....yml, but not in Database (...sql):
- name: my_col_1
- name: my_col_2

Check the model has description...............................................Passed
Check the model has properties file...........................................Passed
Check the script does not contain a semicolon.............(no files to check)Skipped
Check the script has not table name.......................(no files to check)Skipped
Check the script has existing refs and sources............(no files to check)Skipped
Check for source column descriptions..........................................Passed
Check the source has all columns in the properties file.......................Passed
Check the source table has description........................................Passed
Check the macro has description...............................................Passed

I may be doing something wrong here but I can't see what it might be. It definitely looks like these hooks are running for all models in the dbt project, not just the two files I am actually committing.

@nathangriffiths-cdx
Copy link

The workaround for this e.g. when adding dbt-checkpoint to a new repo is:

  • stage (git add) only the changed yaml files
  • commit the files bypassing pre-commit checks e.g. git commit -m "my message" --no-verify

After this it appears dbt-checkpoint will only include staged files in checks as intended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants