Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[engsys] Global Sanitizers inconsistently sanitize storage account names, recordings unreplayable #35447

Closed
kdestin opened this issue May 1, 2024 · 2 comments
Assignees
Labels
test-proxy Anything relating to test-proxy requests or issues.

Comments

@kdestin
Copy link
Member

kdestin commented May 1, 2024

Describe the bug

#35196 introduced a collection of "global" sanitizers that scrub secrets from recordings as they are written to disk.

I'm currently writing a test, where the code path involves:

  1. Fetching details about a storage account

  2. Usage those details to build the uri for the next request

This sanitizer will redact the storage account name in the recording from the response in Step 1.

{"json_path": "$..accountName", "value": SANITIZED},

There is no "global" sanitizer that sanitizes storage account names from request urls.

This leaves my recording un-replayable.

In recording mode, the code receives the sanitized request and tries to send a subsequent request to a URL it builds with the sanitized values: https://sanitized.blob.core.windows.net. But the recording stored an unsanitized URL for that subsequent request, https://account-name.blob.core.windows.net, so the proxy is unable to find a match.

To Reproduce
Steps to reproduce the behavior:

  1. Succesfully record a test in live mode that:

    1. Fetches some response with details about a storage account
    // Example response
    {
            "id": "/subscriptions/00000000-0000-0000-0000-000000000/resourceGroups/00000/providers/Microsoft.MachineLearningServices/workspaces/00000/datastores/workspaceblobstore",
            "name": "workspaceblobstore",
            "type": "Microsoft.MachineLearningServices/workspaces/datastores",
            "properties": {
              ...,
              "subscriptionId": "00000000-0000-0000-0000-000000000",
              "resourceGroup": "resource-group",
              "datastoreType": "AzureBlob",
              "accountName": "account-name",
              "containerName": "d49eda6a-ab96-4d00-b108-33768a3d0aee-azureml-blobstore",
              "endpoint": "core.windows.net",
              "protocol": "https",
              "serviceDataAccessAuthIdentity": "WorkspaceSystemAssignedIdentity"
            },
            "systemData": {
                ...
            }
    
    }
    1. Uses that response to build the URL for a subsequent request

    https://account-name.blob.core.windows.net/d49eda6a-ab96-4d00-b108-33768a3d0aee-azureml-blobstore/path/to/files

  2. Attempt to re-run the test in recording mode

Expected behavior

The test should run off the recording, and pass

Actual behavior

The test fails

ERROR    root:proxy_fixtures.py:312 

-----Test proxy playback error:-----

Unable to find a record for the request PUT https://sanitized.blob.core.windows.net/d49eda6a-ab96-4d00-b108-33768a3d0aee-azureml-blobstore/LocalUpload/0e7abff4dcb2ddd489d3e72fa2039bf6/README.md?sv=2021-10-04&si=azureml-system-datastore-policy&sr=c&sig=Sanitized
Method doesn't match, request <PUT> record <HEAD>
Uri doesn't match:
    request <https://sanitized.blob.core.windows.net/d49eda6a-ab96-4d00-b108-33768a3d0aee-azureml-blobstore/LocalUpload/0e7abff4dcb2ddd489d3e72fa2039bf6/README.md?sv=2021-10-04&si=azureml-system-datastore-policy&sr=c&sig=Sanitized>
    record  <https://account-name.blob.core.windows.net/d49eda6a-ab96-4d00-b108-33768a3d0aee-azureml-blobstore/LocalUpload/0e7abff4dcb2ddd489d3e72fa2039bf6/README.md?sv=2021-10-04&si=azureml-system-datastore-policy&sr=c&sig=Sanitized>

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

@github-actions github-actions bot added the needs-triage This is a new issue that needs to be triaged to the appropriate team. label May 1, 2024
@xiangyan99 xiangyan99 added EngSys This issue is impacting the engineering system. test-proxy Anything relating to test-proxy requests or issues. and removed needs-triage This is a new issue that needs to be triaged to the appropriate team. EngSys This issue is impacting the engineering system. labels May 2, 2024
@mccoyp
Copy link
Member

mccoyp commented May 13, 2024

Hi @kdestin, I have good news: Paul merged an update to our tooling that adds the ability to remove sanitizers (#35385), so there should now be a better solution than your current workaround.

Some sanitizers were already disabled for ML as part of the update:

# Remove the following sanitizers since certain fields are needed in tests and are non-sensitive:
# - AZSDK3430: $..id
# - AZSDK3493: $..name
# - AZSDK2003: Location
remove_batch_sanitizers(["AZSDK3430", "AZSDK3493", "AZSDK2003"])

If you do the same with the accountName body key sanitizer, you should be able to remove the additional sanitizer you added as a workaround. The ID for the central sanitizer is "AZSDK3478"; here's where it's registered by the test proxy.

@kdestin
Copy link
Member Author

kdestin commented May 14, 2024

Awesome, thank you so much. I'll close this issue

@kdestin kdestin closed this as completed May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test-proxy Anything relating to test-proxy requests or issues.
Projects
None yet
Development

No branches or pull requests

3 participants