You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Environment
Operating system: Databricks + Azure ADLS Gen2
Version numbers:
Programming language: Python
Problem Statement:
I was trying to do the following:
I configured my GX in a way that I’m able to store my GX artifacts (GX Expectations, GX Checkpoint etc.) in an ADLS Gen2 project container. Second I wanted to store my GX data docs from executed checkpoints in as a static website in the $web container. Because of several other projects are using the same Azure storage account I wanted to put my data docs in $web/ct10/ subdirectory (ct10 is my project), expecting that the main page index.html will show a list of executed checkpoints results. This works btw when I use the root dir $web instead of $web/ct10/
Instead, this is happening:
After executing some checkpoints index.html shows no list
Reproducing the issue: Details necessary to enable us to reproduce the issue and provide a quick resolution. E.g.: What commands or code did you run and/or what actions did you take that led to the issue?Steps to reproduce the behavior could look something like the following:
I ran this command or piece of code ‘…’
My code:
context_root_dir = f"/dbfs{project.MNT_PATH}/DataQuality/GX/"
project_config = gx.data_context.types.base.DataContextConfig(
## Local storage backend
store_backend_defaults=gx.data_context.types.base.FilesystemStoreBackendDefaults(
root_directory=context_root_dir
),
## Data docs site storage
data_docs_sites={
"az_site": {
"class_name": "SiteBuilder",
"store_backend": {
"class_name": "TupleAzureBlobStoreBackend",
"container": "\$web",
"prefix": "ct10/",
"connection_string": "DefaultEndpointsProtocol=https;AccountName=<storage_account>>;AccountKey=<key>;EndpointSuffix=core.windows.net",
},
"site_index_builder": {
"class_name": "DefaultSiteIndexBuilder",
"show_cta_footer": True,
},
}
},
)
context = gx.get_context(project_config=project_config)
data_source_name = f"{get_hive_table_db_name()}".lower()
data_asset_name = f"{get_product_name()}".lower()
batch_request = (context
.sources
.add_or_update_spark(name=data_source_name)
.add_dataframe_asset(name=data_asset_name, dataframe=df_Staging)
.build_batch_request()
)
# Run the default onboarding profiler on the batch request
onboarding_data_assistant_result = (context
.assistants
.onboarding
.run(
batch_request=batch_request,
exclude_column_names=[],
estimation="flag_outliers", # default: "exact"
)
)
# Get the suite with specific name
onboarding_suite_name = "onboarding_"+data_source_name+"_"+data_asset_name
onboarding_suite = (onboarding_data_assistant_result
.get_expectation_suite(
expectation_suite_name=onboarding_suite_name
)
)
# Perist expectation suite with the specified suite name from above
context.add_or_update_expectation_suite(expectation_suite=onboarding_suite)
onboarding_checkpoint_name="onboarding_"+data_source_name+"_"+data_asset_name
# Create and persist checkpoint to reuse for multiple batches
context.add_or_update_checkpoint(
name = onboarding_checkpoint_name,
config_version = 1,
class_name = "SimpleCheckpoint",
validations = [
{"expectation_suite_name": onboarding_suite_name}
]
)
# Run Onboarding checkpoint
checkpoint_result = context.run_checkpoint(
checkpoint_name=onboarding_checkpoint_name,
batch_request=batch_request
)
Additional information:
At least documentation was created. But only reachable via deep-link
Newest release was used 0.18.10
Expected behavior
I would expect to use sub-directories and .html files are written as content-type "text/html" in the $web container on ADLS Gen2
Additional context
When I use TupleFilesystemStoreBackend instead of TupleAzureBlobStoreBackend
Data Docs are written to the wanted subfolder but as content-type application/octet-stream, which can not be opened directly in a browser (needs to be downloaded and opened locally). So that's also no option.
files are written in /subfolder subdirectory! And .html-files are in content-type text/html!
But unfortunately I get an error at the end
and no index.html is generated in $web/subfolder
HttpResponseError: The requested URI does not represent any resource on the server.
RequestId:e02116b8-601e-0035-66ef-6a7eae000000
Time:2024-02-29T09:11:25.6007194Z
ErrorCode:InvalidUri
Content: <?xml version="1.0" encoding="utf-8"?>
<Error><Code>InvalidUri</Code><Message>The requested URI does not represent any resource on the server.
RequestId:e02116b8-601e-0035-66ef-6a7eae000000
Time:2024-02-29T09:11:25.6007194Z</Message></Error>
Describe the bug
Please see also https://discourse.greatexpectations.io/t/data-docs-in-azure-adls-web-subdirectory-not-working/1627
Hello together,
Environment
Operating system: Databricks + Azure ADLS Gen2
Version numbers:
Programming language: Python
Problem Statement:
I was trying to do the following:
I configured my GX in a way that I’m able to store my GX artifacts (GX Expectations, GX Checkpoint etc.) in an ADLS Gen2 project container. Second I wanted to store my GX data docs from executed checkpoints in as a static website in the $web container. Because of several other projects are using the same Azure storage account I wanted to put my data docs in $web/ct10/ subdirectory (ct10 is my project), expecting that the main page index.html will show a list of executed checkpoints results. This works btw when I use the root dir $web instead of $web/ct10/
Instead, this is happening:
After executing some checkpoints index.html shows no list
Reproducing the issue: Details necessary to enable us to reproduce the issue and provide a quick resolution. E.g.: What commands or code did you run and/or what actions did you take that led to the issue?Steps to reproduce the behavior could look something like the following:
I ran this command or piece of code ‘…’
My code:
Additional information:
At least documentation was created. But only reachable via deep-link
Newest release was used 0.18.10
Expected behavior
I would expect to use sub-directories and .html files are written as content-type "text/html" in the $web container on ADLS Gen2
Additional context
When I use TupleFilesystemStoreBackend instead of TupleAzureBlobStoreBackend
Data Docs are written to the wanted subfolder but as content-type application/octet-stream, which can not be opened directly in a browser (needs to be downloaded and opened locally). So that's also no option.
Using
files are written in /subfolder subdirectory! And .html-files are in content-type text/html!
But unfortunately I get an error at the end
and no index.html is generated in $web/subfolder
Using setting
I get errors like described here:
https://discourse.greatexpectations.io/t/data-docs-in-azure-adls-web-subdirectory-not-working/1627/6?u=hdamczy
The text was updated successfully, but these errors were encountered: