Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading cluster to v1.9.4 fails #15210

Closed
fransguelinckx opened this issue Apr 28, 2022 · 3 comments
Closed

Upgrading cluster to v1.9.4 fails #15210

fransguelinckx opened this issue Apr 28, 2022 · 3 comments
Labels
bug Used to indicate a potential bug

Comments

@fransguelinckx
Copy link

fransguelinckx commented Apr 28, 2022

Describe the bug
When trying to upgrade a 3-node vault cluster from v1.8.9 to v1.9.4 by replacing one of the old nodes, the new node encounters an error preventing it from starting up.

To Reproduce
Steps to reproduce the behavior:

  1. Start from v1.8.9 cluster with 3 nodes.
  2. Terminate one of the follower nodes and replace it with a new node running v 1.9.4.

Expected behavior
The v1.9.4 node joins the cluster.

Environment:

  • Vault Server Version: v1.9.4
  • Server Operating System/Architecture: EC2 instances running ubuntu linux 5.4.0-1071-aws.

Vault server configuration file(s):

ui = true

log_level = "trace"

storage "raft" {
  path = "/ebs1/vault"
  node_id = "vault-bb28de72914b59dfb.development.us-east-1.mycorp.com"
  retry_join {
    auto_join = "provider=aws region=us-east-1 tag_key=VaultCluster tag_value=vault-server-development-us-east-1"
    leader_tls_servername = "*.development.us-east-1.mycorp.com"
    leader_client_cert_file = "/opt/vault/tls/vault.crt"
    leader_client_key_file = "/opt/vault/tls/vault.key"
  }
}

listener "tcp" {
  address = "0.0.0.0:8200"
  tls_min_version = "tls12"
  tls_cert_file = "/opt/vault/tls/vault.crt"
  tls_key_file = "/opt/vault/tls/vault.key"
  proxy_protocol_behavior = "use_always"
  proxy_protocol_authorized_addrs = "0.0.0.0:8200"

  telemetry {
    unauthenticated_metrics_access = true
  }
}

api_addr = "https://vault-bb28de72914b59dfb.development.us-east-1.mycorp.com:8200"
cluster_addr = "https://vault-bb28de72914b59dfb.development.us-east-1.mycorp.com:8201"

disable_mlock = true

telemetry {
  disable_hostname = true
  prometheus_retention_time = "744h"
}

seal "awskms" {
  kms_key_id = "f1d7b410-e984-4028-a30c-7c4993d0f3e2"
}

Error log:

Apr 28 06:17:38 vault-bb28de72914b59dfb systemd[1]: Started "HashiCorp Vault - A tool for managing secrets".
Apr 28 06:17:54 vault-bb28de72914b59dfb vault[2205]: error loading configuration from /etc/vault.d/vault.hcl: unable to parse address template "https://vault-bb28de72914b59dfb.development.us-east-1.mycorp.com:8200": unable to query interface addresses: route ip+net: netlinkrib: address family not supported by protocol
Apr 28 06:17:54 vault-bb28de72914b59dfb systemd[1]: vault.service: Main process exited, code=exited, status=1/FAILURE
Apr 28 06:17:54 vault-bb28de72914b59dfb systemd[1]: vault.service: Failed with result 'exit-code'.

Additional context

  • We just upgraded our vault clusters from v1.7.8 to v1.8.9 without issues.
  • The existing v1.8.9 nodes have the same vault.hcl configuration file as the I posted above.
@ncabatoff
Copy link
Collaborator

Hi @fransguelinckx,

Thanks for reporting this. It looks like you've discovered a bad interaction with #9109. I wrote the code to try to expand any templates that might be defined in the configured address, which requires asking the OS about what network interfaces exist. Something in your environment is preventing that query from working.

I took this simpler approach because it felt safer than trying to identify first whether an address was templated or not, and only trying to expand templates if it was. It sounds like we'll have to explore that slightly harder approach in order to avoid this issue.

@fransguelinckx
Copy link
Author

Thanks for the quick response! We'll be happy to help if we can.

@ncabatoff
Copy link
Collaborator

Fixed by #15224. I don't think this will be in the very next release, since that's already underway and should be out soon to fix a critical issue. So more likely this will be released in about a month. Thanks @peteski22 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Used to indicate a potential bug
Projects
None yet
Development

No branches or pull requests

3 participants