Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to limit sql asset to top 1000 rows #9917

Open
MarkAngler opened this issue May 12, 2024 · 0 comments
Open

unable to limit sql asset to top 1000 rows #9917

MarkAngler opened this issue May 12, 2024 · 0 comments

Comments

@MarkAngler
Copy link

MarkAngler commented May 12, 2024

Describe the bug
There does not seem to be any clear way to limit a table asset to a top 1000 rows.

To Reproduce

import great_expectations as gx
from great_expectations.core.batch import BatchRequest
import sqlalchemy
from azure.keyvault.secrets import SecretClient
from azure.identity import DefaultAzureCredential
import os

# env = os.environ['ENVIRONMENT']
env = 'Dev'

if env == 'Prod':
    keyVaultName =secretselse:
    keyVaultName =secrets”'

KVUri = f"https://{keyVaultName}.vault.azure.net"

credential = DefaultAzureCredential()
client = SecretClient(vault_url=KVUri, credential=credential)

username = 'bankdwadmin'
password = client.get_secret('bankdw-sql-administrator-login')
password = password.value

context = gx.get_context()
expectation_suite_name_prefix = "ExpectationSuite_BankDW"
expectation_suite = context.add_or_update_expectation_suite(expectation_suite_name=expectation_suite_name_prefix)


dwConnString = f'mssql+pyodbc://{username}:{password}@servername.database.windows.net:1433/BankDW?driver=ODBC+Driver+17+for+SQL+Server'
BankDwEngine = sqlalchemy.create_engine(dwConnString)
datasource = context.sources.add_sql(
    name="BankDW", connection_string=dwConnString
)



asset_tablename = 'Account'
asset_name = f'asset_{asset_tablename}'


datasource.add_table_asset(name=asset_name, table_name=asset_tablename)



# query = f"select top (1000) * from dbo.{asset_tablename}"
# datasource.add_query_asset(name=asset_name, query=query)


# asset = datasource.get_asset(asset_name)



batch_request = datasource.get_asset(asset_name).build_batch_request(batch_slice="[:1001]")

validator = context.get_validator(
    batch_request=batch_request
)

# onboarding
# missingness
data_assistant_onboarding_result = context.assistants.onboarding.run(
    validator=validator,
    exclude_column_names=[],
)


validator.expectation_suite = data_assistant_onboarding_result.get_expectation_suite(
    expectation_suite_name=f"{expectation_suite_name_prefix}_{asset_name}"
)

validator.save_expectation_suite()



checkpoint = context.add_or_update_checkpoint(
    name="CheckPoint_BankDw",
    validator=validator
)

checkpoint_result = checkpoint.run()

context.build_data_docs()
context.open_data_docs()`

Expected behavior
a key word or api or anything to limit the number of rows to scan. sql query doesnt work, it just errors out.

Environment (please complete the following information):

  • Operating System: [e.g. Linux, MacOS, Windows]
  • Great Expectations Version: [e.g. 0.13.2]
  • Data Source: [e.g. Pandas, Snowflake]
  • Cloud environment: [e.g. Airflow, AWS, Azure, Databricks, GCP]

Additional context
Add any other context about the problem here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant