(eks.AlbController): 'UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress' when AlbController chart install is attempted #19705
Replies: 20 comments 6 replies
-
@Aidan-Pasquale99 Thank you for raising the issue. I am also facing the same issue. |
Beta Was this translation helpful? Give feedback.
-
Seeing the same thing in the Python CDK using e.g. cluster = eks.Cluster(
self,
"some-id",
cluster_name="some-cluster-name",
vpc=vpc,
vpc_subnets=[ec2.SubnetSelection(...)],
default_capacity=4,
version=eks.KubernetesVersion.V1_21,
alb_controller=eks.AlbControllerOptions(version=eks.AlbControllerVersion.V2_3_1),
) |
Beta Was this translation helpful? Give feedback.
-
Is there any workaround? I'm struggling with this |
Beta Was this translation helpful? Give feedback.
-
@peterwoodworth noticed this was converted from an issue to a discussion. Is there additional information/exploration needed for it to be treated as an "issue"? |
Beta Was this translation helpful? Give feedback.
-
I am also struggling with this, has anyone found a solution yet? |
Beta Was this translation helpful? Give feedback.
-
I am also facing the same issue and struggling with this. I am using kubernetes version 1.22 |
Beta Was this translation helpful? Give feedback.
-
same problem. have tried different versions (V2_3_1 and V2_4_1). no chance. doesn't work. Received response status [FAILED] from custom resource. |
Beta Was this translation helpful? Give feedback.
-
Hi team, I am experiencing the same issue. Do we have any updates on this? My Dependencies
Error |
Beta Was this translation helpful? Give feedback.
-
Run same issue! Why this is not a bug? |
Beta Was this translation helpful? Give feedback.
-
I'm having this same issue. Not sure why @peterwoodworth converted this from an issue to a discussion. |
Beta Was this translation helpful? Give feedback.
-
related to #22005 I will try reproduce this in my environment and see what I can find out. |
Beta Was this translation helpful? Give feedback.
-
I was able to deploy successfully with no error. #22005 (comment) Is this issue still valid? |
Beta Was this translation helpful? Give feedback.
-
I had the same problem, solution to fix it with me:
|
Beta Was this translation helpful? Give feedback.
-
Anyone still having this issue now? Can someone share the full CDK code to reproduce it? |
Beta Was this translation helpful? Give feedback.
-
The work around for me was to either specify some default capacity or call AddNodegroupCapacity on the cluster followed by new AlbController with dependency on the nodes i.e. in c#:
|
Beta Was this translation helpful? Give feedback.
-
The below Python 3.9 code triggers the behavior @pahud... self.eks_cluster[app] = eks.Cluster(self, f"{app}-eks",
alb_controller=eks.AlbControllerOptions(
version=eks.AlbControllerVersion.V2_4_1,
),
cluster_logging=[eks.ClusterLoggingTypes.API,
eks.ClusterLoggingTypes.AUDIT,
eks.ClusterLoggingTypes.AUTHENTICATOR,
eks.ClusterLoggingTypes.SCHEDULER],
cluster_name=f"{app}",
default_capacity = 0,
output_cluster_name=True,
output_masters_role_arn=True,
place_cluster_handler_in_vpc=True,
tags=addtl_tags,
version=eks.KubernetesVersion.V1_21,
vpc=self.vpc,
) Also tried:
For all cases, same results afterwards:
This is one error out of the many tests, taken from the CloudWatch log stream:
I suppose this is happening within the Lambda function—not sure on the calling order—and might be as simple as a timeout caused by some missing rule somewhere. However the documentation had me thinking that the construct would build proper security groups automatically. The VPC was created at If I attempt to create the EKS without the AlbController, construction succeeds. I can provide more code if this would help. Best regards -lem |
Beta Was this translation helpful? Give feedback.
-
@nerdlem From what I can see in my recent test. The following works if you specify some default capacity. eks.Cluster(self, 'Cluster',
vpc=vpc,
alb_controller=eks.AlbControllerOptions(
version=eks.AlbControllerVersion.V2_4_1,
),
cluster_logging=[eks.ClusterLoggingTypes.API,
eks.ClusterLoggingTypes.AUDIT,
eks.ClusterLoggingTypes.AUTHENTICATOR,
eks.ClusterLoggingTypes.SCHEDULER],
version=eks.KubernetesVersion.V1_24,
default_capacity=2,
output_cluster_name=True,
output_masters_role_arn=True,
place_cluster_handler_in_vpc=True,
kubectl_layer=kubectl.KubectlLayer(self, 'KubectlLayer')
) However, if you set default_capacity=0 it will not complete the deployment. Please try again with some default capacity or |
Beta Was this translation helpful? Give feedback.
-
Today I tried to create a FargateCluster (v1.24) with ALB Controller (v2.4.1) - which forced me to add a kubectlLayer - and that failed with the above mentioned 'operation in progress'. Next I tried to create a FargateCluster v1.21 with the same version ALB Controller, a combination which does not require an explicit kubectlLayer to be defined. All other settings and properties were the same. This combination gave me a successful setup. I hope this information will help someone more knowledgable than me to debug the issue? |
Beta Was this translation helpful? Give feedback.
-
check out this working sample for the workaround: |
Beta Was this translation helpful? Give feedback.
-
@pahud thank you for the investigation. However, what is your proposed work around for scenarios that require fine grained control over the node group? As in, require a custom nodegroup as opposed to default capacity, e.g. if we need to specify a node group role, or change sizing, etc. From testing, it's not possible to zero out default capacity, and yet also not possible to simply add capacity after the fact (see comment). Thanks. |
Beta Was this translation helpful? Give feedback.
-
General Issue
Observing 'UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress' error when installing AlbController chart via construct
The Question
I am trying to install the ALB Controller Helm chart via my CDK code using the AlbController construct as follows:
I run the above CDK code and get the following Helm error in the CloudFormation events output, as well as in the Lambda function logs:
I've looked through this and this but the solutions do not apply here, there appears to be no known state of the AlbController release and therefore I am unable to perform any rollback commands.
CDK CLI Version
2.17.0 (build f9cd009)
Framework Version
No response
Node.js Version
No response
OS
No response
Language
Java
Language Version
No response
Other information
No response
Beta Was this translation helpful? Give feedback.
All reactions