Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP 500 on some resources when reinstalling Proxmox #1152

Open
aleprovencio opened this issue Mar 23, 2024 · 4 comments
Open

HTTP 500 on some resources when reinstalling Proxmox #1152

aleprovencio opened this issue Mar 23, 2024 · 4 comments
Labels
🐛 bug Something isn't working

Comments

@aleprovencio
Copy link

Describe the bug
I have several resources created by this provider on a node, but when reinstalling Proxmox on it and trying to getting it back to the previous state by applying terraform, although it works fine for most resources I have found a few problems.

I'm unsure whether I should have done prior modifications on terraform's state, because I did not, but terraform recreates all resources (VM, containers, etc) and returns HTTP 500 errors on the following:

  • proxmox_virtual_environment_group
  • proxmox_virtual_environment_user
  • proxmox_virtual_environment_role
  • proxmox_virtual_environment_cluster_firewall_security_group

To Reproduce
Steps to reproduce the behavior:

  1. Create any of the described resources above
  2. run terraform apply
  3. Reinstall Proxmox
  4. Run terraform apply
  5. See errors

Please also provide a minimal Terraform configuration that reproduces the issue.

resource "proxmox_virtual_environment_group" "admin" {
  group_id = "admin"
  comment  = "Managed by Terraform"
  acl {
    path      = "/"
    propagate = true
    role_id   = "Administrator"
  }
}

resource "proxmox_virtual_environment_user" "tf-packer" {
  acl {
    path    = "/"
    role_id = proxmox_virtual_environment_role.tf-packer.role_id
  }
  comment = "Managed by Terraform"
  user_id = "tf-packer@pve"
}

resource "proxmox_virtual_environment_user" "prometheus" {
  acl {
    path    = "/"
    role_id = "PVEAuditor"
  }
  comment = "Managed by Terraform"
  user_id = "prometheus@pve"
}

resource "proxmox_virtual_environment_role" "tf-packer" {
  role_id = "tf-packer"
  privileges = [
    "VM.Allocate",
    "VM.Clone",
    "VM.Config.CDROM",
    "VM.Config.CPU",
    "VM.Config.Cloudinit",
    "VM.Config.Disk",
    "VM.Config.HWType",
    "VM.Config.Memory",
    "VM.Config.Network",
    "VM.Config.Options",
    "VM.Console",
    "VM.Monitor",
    "VM.Audit",
    "VM.PowerMgmt",
    "Datastore.AllocateSpace",
    "Datastore.Audit",
    "Pool.Allocate",
    "Sys.Audit",
    "Sys.Console",
    "Sys.Modify",
    "SDN.Use",
    "VM.Migrate",
  ]
}

resource "proxmox_virtual_environment_cluster_firewall_security_group" "ping_ssh" {
  name    = "ssh-ping"
  comment = "SSH and ping"

  rule {
    comment = "Ping"
    type    = "in"
    action  = "ACCEPT"
    proto   = "icmp"
  }

  rule {
    comment = "SSH"
    type    = "in"
    action  = "ACCEPT"
    macro   = "SSH"
  }

}

resource "proxmox_virtual_environment_cluster_firewall_security_group" "promtail_node_exp" {
  name    = "promtail-node-exp"
  comment = "Promtail and node exporter"

  rule {
    type    = "in"
    action  = "ACCEPT"
    comment = "Promtail"
    proto   = "tcp"
    dport   = "9080"
  }

  rule {
    type    = "in"
    action  = "ACCEPT"
    comment = "Prometheus node exporter"
    proto   = "tcp"
    dport   = "9100"
  }

}

Expected behavior
Resources are recreated like the other ones

Additional context
Add any other context about the problem here.

  • Single or clustered Proxmox: single
  • Provider version (ideally it should be the latest version): 0.50.0
  • Terraform version: 1.7.3
  • OS (where you run Terraform from): Archlinux
  • Debug logs (TF_LOG=DEBUG terraform apply):
@aleprovencio aleprovencio added the 🐛 bug Something isn't working label Mar 23, 2024
@bpg
Copy link
Owner

bpg commented Mar 25, 2024

Hi @aleprovencio! 👋🏼

Was there any error messages reported by PVE? These issues are quite hard to debug without reproducing, which means quite a bit of efforts with reinstalling PVE. So any additional details is really appreciated.

In general, if you reset the remote state of the resource (i.e. deleted the resource outside of terraform), the local TF state should be also deleted, so there is no inconsistency or "state drift" for the provider to reconcile.

@aleprovencio
Copy link
Author

Hello @bpg, thanks for the reply and of course, also for this awesome project.

It does makes sense to me that I should probably remove all proxmox related resources from terraform state prior reinstalling it, in order to prevent the so called "state drift".

However I still would like to understand why resources like proxmox_virtual_environment_vm or proxmox_virtual_environment_container get recreated while resources mentioned on this issue do not, on a test like I did without manual interventions on the terraform state.

Regarding errors, I don't see anything special on PVE's side and on terraform's, besides the HTTP 500 it just says that those resources do not exist. I wish I could give you additional details on the problem, but that's all I have for now, maybe I could try better debugging with your help.

@bpg
Copy link
Owner

bpg commented Mar 26, 2024

It looks like the affected resource are "compound resources", i.e. they have references to other separate proxmox entities that are on different API paths. When provider applies a change, first it has to read the resource state from the remote to detect the "drift". I think there are logical or implementation bugs in those resources,they probably are trying to read the dependent objects first (like ACLs for a user, or rules for a security group) using the "parent" object ID as a request criteria. Those parents do not exist, and requests fail.

That's my hypothesis, without any actual debugging. There is definitely something in the provider's implementation that can be improved in this regard, though a proper investigation is needed to make a fix.

@aleprovencio
Copy link
Author

Yeah I guess you are on the right path.

I've done a new test where I did remove those resources from state, reinstalled proxmox and although terraform apply seemed worked flawlessly the first time, issuing the same command again still suggested changes on these resources we talk about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants