Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: add optional --schema_location argument to check_model_parents_schema #132

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

kiliantscherny
Copy link

@kiliantscherny kiliantscherny commented May 3, 2023

I'm proposing a small change to the check_model_parents_schema check to fix an issue with the parsing of the parent models' schema.

Current state
When this check is run, it parses the schema from nodes → schema, which is prefixed by an individual's configuration in profiles.yml – for example:

For a BigQuery-based profile:

[project_name]:
  outputs:
    dev:
      dataset: [dev_dataset_name]
...

For a Snowflake-based profile:

[project_name]:
  outputs:
    dev:
      ...
      schema: [dev_dataset_name]
...

In manifest.json, for any given model, nodes → schema inherits the individual's [dev_dataset_name] and prefixes it in front of the "true" schema name, which itself comes from dbt_project.yml, e.g. for a "dim" folder in an example repo:

models:
  [project_name]:
    +materialized: view
    ...
    dim:
      +materialized: table
      +schema: dim_schema_name

This then causes the check to fail if we set the args as:

- id: check-model-parents-schema
    args: ["--manifest", "analytics/target/manifest.json", "--whitelist", "dim_schema_name", "--"]

Because the schema is parsed from nodes → schema as dev_dim_schema_name rather than dim_schema_name (which is found in nodes → config → schema), the check doesn't work as expected.

Proposed solution
In our organisation, we have had issues with this check because we actually need it to parse the schema from nodes → config → schema instead, as the different dbt developers each have their own dev dataset name which prefixes the model schema in their dev environment. This must have changed when the manifest was updated to a newer version.

By selecting from the "nested" schema that's inside config, we get the "true" schema name reliably every time, for each developer and in production.

The additional argument I'm adding to args is --schema_location to optionally specify if the intended schema is located inside config.

How it works
It's completely optional, and only requires the user to add "--schema_location", "config" to the args after the white-/blacklist argument, like:

- id: check-model-parents-schema
    args: ["--manifest", "analytics/target/manifest.json", "--whitelist", "dim_schema_name",  "--schema_location", "config", "--"]

Note on testing coverage
I'm not really too familiar with Pytest or how to ensure complete coverage, so if there are any other contributors with an idea of how to help with this I'd be hugely grateful!

@kiliantscherny kiliantscherny marked this pull request as ready for review May 3, 2023 15:24
@kiliantscherny
Copy link
Author

kiliantscherny commented May 3, 2023

Not sure how to ensure test coverage if it's needed – would be very grateful for any advice as I kept getting coverage failures for this section:

if schema_location == "config": # pragma: no cover
    # Chooses the schema inside in the model config (nodes -> model -> config -> schema)
    db = parent.node.get("config").get("schema")
else: # pragma: no cover
    # Chooses the schema outside the model config (nodes -> model -> schema)
    db = parent.node.get("schema")

For now, I have escaped the coverage for the new optional --schema_location argument with the "pragma: no cover" comment as I have seen in other checks' files.

@codecov-commenter
Copy link

codecov-commenter commented May 4, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (a3a276c) 100.00% compared to head (981f42a) 98.39%.
Report is 14 commits behind head on main.

Files Patch % Lines
dbt_checkpoint/utils.py 95.23% 0 Missing and 1 partial ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##              main     #132      +/-   ##
===========================================
- Coverage   100.00%   98.39%   -1.61%     
===========================================
  Files           47       50       +3     
  Lines         2194     2486     +292     
  Branches       272      321      +49     
===========================================
+ Hits          2194     2446     +252     
- Misses           0       24      +24     
- Partials         0       16      +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kiliantscherny
Copy link
Author

Potential fix for #135

Copy link
Collaborator

@BAntonellini BAntonellini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kiliantscherny thanks for contributing to dbt-checkpoint

Would you be able to extend existing test check model parents schema with this new argument?

dbt_checkpoint/check_model_parents_schema.py Show resolved Hide resolved
@kiliantscherny
Copy link
Author

kiliantscherny commented Dec 27, 2023

@kiliantscherny thanks for contributing to dbt-checkpoint

Would you be able to extend existing test check model parents schema with this new argument?

@BAntonellini I've just had a go at adding the test coverage for this new argument, but please just let me know if it wasn't done correctly. Thanks!

@kiliantscherny
Copy link
Author

Hi @BAntonellini would you be able to take another look at this when you get the chance?

@@ -40,7 +42,13 @@ def check_parents_schema(
)
)
for parent in parents:
db = parent.node.get("schema")
# Selecting the location of the model schema
if schema_location == "config": # pragma: no cover
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why adding the # pragma: no cover? Please cover this case as part of the tests

if schema_location == "config": # pragma: no cover
# Chooses the schema inside in the model config (nodes -> model -> config -> schema)
db = parent.node.get("config").get("schema")
else: # pragma: no cover
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants