Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AML workspace outbound rules remove newly created rules #453

Open
kimyen opened this issue Apr 19, 2024 · 1 comment
Open

AML workspace outbound rules remove newly created rules #453

kimyen opened this issue Apr 19, 2024 · 1 comment
Labels
bug Something isn't working example Example request upstream-api

Comments

@kimyen
Copy link

kimyen commented Apr 19, 2024

Brief description of the problem

  • When using Microsoft.MachineLearningServices/workspaces/outboundRules@2023-10-01 as documented here to create multiple outbound rules, the behaviors is non deterministic and destructive.
    • Destructive: when create multiple rules in the same plan, while the rules are being created (~20 minutes), one can observed on the Azure portal and a previously created rule would be deleted while a new one is created.
    • Non deterministic: It's indeterministic that which rules will be deleted. For example: a plan that created 6 FQDN rules and 2 Private Endpoint rules results in all of them created, 5 FQDN + 1 Private Endpoint rules deleted after creation. This results in 1 FQDN + 1 Private Endpoint exist at the end. When adding 2 FQDN + 1 Private Endpoint rules, 1/2 new FQDN rule is deleted after created. The existing FQDN rule remain, while the existing Private Endpoint rule was deleted. It's not clear what determines if a rule would be deleted.
  • The terraform apply will report successfully created X rules. terraform state list & terraform state show can correctly show to the rules created. However, the Azure portal shows only the one that has not been deleted. The following terraform plan without code change will show that the rules that are deleted need to be re-created.

How to reproduce

Step 1: Add the following to your workspace .tf file:

resource "azapi_resource" "conda_anaconda_outbound_rules" {
  type = "Microsoft.MachineLearningServices/workspaces/outboundRules@2023-10-01"
  name = "conda-anaconda-org"
  parent_id = azurerm_machine_learning_workspace.aml_workspace.id
  body = jsonencode({
    properties = {
      category = "UserDefined"
      status = "Active"
      type = "FQDN"
      destination = "conda.anaconda.org"
    }
  })
}

resource "azapi_resource" "repo_anaconda_outbound_rules" {
  type = "Microsoft.MachineLearningServices/workspaces/outboundRules@2023-10-01"
  name = "repo-anaconda-org"
  parent_id = azurerm_machine_learning_workspace.aml_workspace.id
  body = jsonencode({
    properties = {
      category = "UserDefined"
      status = "Active"
      type = "FQDN"
      destination = "repo.anaconda.org"
    }
  })
}
  • [Optional] Run terraform plan, there should be 2 FQDN rules to be created

Step 2: Run terraform apply, after ~20 minutes, it should succeed

  • [Optional] Run terraform state list, the state should be present
  • [Optional] Run terraform state shown, each state should have full details

Step 3: Run terraform plan, 1/2 rules need to be created

  • [Optional] From the Azure portal, only 1/2 FQDN rule show up

Other setup

  • AML workspace networking has public access disabled
  • AML workspace outbound config is set to Allow only approved outbound

Desired resolution

  • After running terraform apply and it runs to completion, all outbound rules are created and visible in Azure portal. The following terraform plan without code change results in no changes.
@ms-henglu
Copy link
Collaborator

Hi @kimyen ,

Thank you for taking time open this issue and apologize for late response.

Thanks for the details and I could reproduce this issue. It seems that this API only works if the outbound rules are created one by one.

The azapi_resource supports locks field which allows user to specify a list of ARM resource IDs which are used to avoid create/modify/delete azapi resources at the same time.

But I also noticed that there is an API bug(Azure/azure-rest-api-specs#28982) which will make the azapi v1.13.x crash. I have two workarounds for this case, hope it could help.

Workaround 1. (Recommended)

  1. Use azapi v1.12.1 to deploy the following config, and you could upgrade to the latest once the bug fix is released.
resource "azurerm_machine_learning_workspace" "example" {
  name                    = "acctesthenglu562"
  location                = azurerm_resource_group.example.location
  resource_group_name     = azurerm_resource_group.example.name
  application_insights_id = azurerm_application_insights.example.id
  key_vault_id            = azurerm_key_vault.example.id
  storage_account_id      = azurerm_storage_account.example.id

  identity {
    type = "SystemAssigned"
  }
  public_network_access_enabled = true
  managed_network {
    isolation_mode  = "AllowOnlyApprovedOutbound"
  }
}

resource "azapi_resource" "example" {
  count = 3
  type = "Microsoft.MachineLearningServices/workspaces/outboundRules@2023-10-01"
  name = "test2${count.index}"
  parent_id = azurerm_machine_learning_workspace.example.id
  body = jsonencode({
    properties = {
      category = "UserDefined"
      status = "Active"
      type = "FQDN"
      destination = "conda.anaconda${count.index}.org"
    }
  })
  locks = [azurerm_machine_learning_workspace.example.id]
}

Workaround 2.
If you prefer the dynamic properties that v1.13.x provides, you could use the azapi_resource_action to bypass the bug, however the action resource doesn't monitor the resource's state.

data "azapi_resource_id" "outboundRules" {
  count     = 3
  type      = "Microsoft.MachineLearningServices/workspaces/outboundRules@2023-10-01"
  name      = "test2${count.index}"
  parent_id = azurerm_machine_learning_workspace.example.id
}

resource "azapi_resource_action" "outboundRules" {
  count       = 3
  type        = "Microsoft.MachineLearningServices/workspaces/outboundRules@2023-10-01"
  resource_id = data.azapi_resource_id.outboundRules[count.index].id
  method      = "PUT"
  locks       = [azurerm_machine_learning_workspace.example.id]
  body = {
    properties = {
      category    = "UserDefined"
      status      = "Active"
      type        = "FQDN"
      destination = "repo.anaconda.org${count.index}"
    }
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working example Example request upstream-api
Projects
None yet
Development

No branches or pull requests

2 participants