Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cdk deploy fails: indeterminate cause #25806

Open
garethjudson opened this issue Jun 1, 2023 · 19 comments · Fixed by #25846
Open

cdk deploy fails: indeterminate cause #25806

garethjudson opened this issue Jun 1, 2023 · 19 comments · Fixed by #25846
Labels
@aws-cdk/aws-appsync Related to AWS AppSync bug This issue is a bug. p1

Comments

@garethjudson
Copy link

garethjudson commented Jun 1, 2023

Describe the bug

Mid way through a cdk deployment the following error is raised:

❌ Deployment failed: Error: Unable to make progress anymore among:
<...>
at WorkGraph.updateReadyPool (/builds/.../node_modules/aws-cdk/lib/index.js:400:138218)
at start (/builds/.../node_modules/aws-cdk/lib/index.js:400:136061)
at /builds/.../node_modules/aws-cdk/lib/index.js:400:136488
at runMicrotasks ()
at processTicksAndRejections (node:internal/process/task_queues:96:5)
Unable to make progress anymore among:
<...>

Where '<...>' is a printed list of every stack, simliar to:
wwwStack := completed stack, ... yyyStack:= pending asset-publish, wwwStack := pending stack

This issue occurs on v2.81.0 (typescript) but downgrading to 2.76.0 resolves the issue.
There is no detail to determine a cause

Expected Behavior

Either the stack deploys successfully or an error message with a clear description of the cause is displayed

Current Behavior

Deployment of stacks is partially completed, and then the deployment fails with:

❌ Deployment failed: Error: Unable to make progress anymore among: <...>

Reproduction Steps

I don't know how to, other than by using my environment deployment, because I can't determine the cause of the issue.
Given our environment is ~50 cdk stacks, with a relatively straight forward dependency order, I can't determine a cause. Other than to deploy my stack.

Happy to provide more detail if I can get help with how best to do so.
Suspect it's related to dependant stacks, but I have no idea.

Possible Solution

Downgrading to version 2.76.0

Additional Information/Context

I get an issue in v2.81.0 of cdk (with typescript), which I suspect is related to dependant stacks.

In my environment we have many stacks which are dependant. This was making a linear dependency where each stack was dependant on an immediately prior stack.

Found this easiest way to manage dependency order, where you individual devs may not have the context to determine this easily. This way we don't have to think about dependency order too hard, earlier means before anything later...easy.
If this were causing the issue, could change it, however, before expending development effort, would like to understand the cause. This issue looks suspiciously similar to
#25714, given the issue is visible in 2.81.0, but not 2.76.0

Downgrading to version 2.76.0 resolves the issue.
Happy to help with any additional information.

Thanks a bunch.

CDK CLI Version

2.81.0 (build bd920f2)

Framework Version

2.81.0

Node.js Version

18.16.0

OS

macOS , alpine linux (node:hydrogen-alpine container)

Language

Typescript

Language Version

5.0.4

Other information

Could be related to
#25714

Commented on that issue, but realised the comment was at the end of a closed issue, so thought I better raise a new one.

@garethjudson garethjudson added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jun 1, 2023
@github-actions github-actions bot added the @aws-cdk/aws-appsync Related to AWS AppSync label Jun 1, 2023
@rix0rrr
Copy link
Contributor

rix0rrr commented Jun 1, 2023

Hi @garethjudson. Thanks for the report!

A copy of your cdk.out/manifest.json would help in debugging this. You can send it to huijbers ~A~T~ amazon.nl if you are uncomfortable posting it anywhere here.

Thanks!

@mericod
Copy link

mericod commented Jun 1, 2023

Hello @garethjudson
What do you mean by "v1.81.0 (typescript)" and "given the issue is visible in 1.81.0, but not 1.76.0"? I can see it is neither CDK's (which is 2.81) nor typescript (5.0.4).
Was it a typo?

@peterwoodworth peterwoodworth added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. p1 and removed needs-triage This issue or PR still needs to be triaged. labels Jun 1, 2023
@nuzayets
Copy link

nuzayets commented Jun 1, 2023

We're also experiencing this issue with cdk 2.81, we rolled back to 2.71 for now. Typescript 4.9.x. Node 18.15.0.

We also have a high # of stacks with a high # of dependencies.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Jun 1, 2023
@garethjudson
Copy link
Author

Correct was a typo, should be v2.81.0

I entered the versions correctly into the subsequent fields
I have updated the issue text.

@garethjudson
Copy link
Author

Hi @garethjudson. Thanks for the report!

A copy of your cdk.out/manifest.json would help in debugging this. You can send it to huijbers ~A~T~ amazon.nl if you are uncomfortable posting it anywhere here.

Thanks!

I haven't got the cdk.out/manifest.json from my failed build, unfortunately, sorry about this.
I tried to reproduce subsequently, by upgrading to v2.81.0 again but when I ran the deployment, it passed 🤦‍♀️ of course it did.

My only thought here is that we were jumping a number of versions. We were upgrading from 2.24.1 to 2.81.0

In this case maybe applying the update 2.24.1 to 2.76.0 fixed the root cause ?
I'm not sure if I can offer any more help, but if there is something let me know !

@rix0rrr
Copy link
Contributor

rix0rrr commented Jun 2, 2023

but when I ran the deployment, it passed 🤦‍♀️ of course it did.

Of course 😆. That's unfortunate. Any intermediary versions you've been using shouldn't make a difference to this issue if it's in the area I think it is in (but of course, this being software, "should" is the operative word here).

For anyone reading running into the same issue, a copy of manifest.json emailed to the above address would really help a lot in tracking this down. In the mean time I'll see if I can find and print the dependency cycle if the work graph runs into this problem, that should help us make sense of what's going on if it happens again.

rix0rrr added a commit that referenced this issue Jun 2, 2023
To help with diagnosing #25806, if the work graph can't make any
progress anymore because of a dependency cycle, print the cycle that was
found instead of all the remaining nodes.
@rix0rrr
Copy link
Contributor

rix0rrr commented Jun 2, 2023

I've had someone send me a manifest.json and it doesn't reproduce on my machine. I wonder if it might be due to a race condition. If the CLI throws this error, and you run while true; do npx cdk deploy --all; done (bash) does it eventually succeed? (Or gets stuck in a different place)

@graemevwilson
Copy link

graemevwilson commented Jun 2, 2023

I have had issues with cdk deploy since 2.80.0. In 2.81.0 I get the error in the OP. In 2.80.0 there's no error - just one of my 5 stacks deploys, nothing happens for the other stacks.

I have a workaround which is this. Say I have 3 stacks: stack3 depends on stack2 and stack2 depends on stack1. If I run:

cdk deploy stack1
cdk deploy stack1 stack2 stack3

then all of the stacks will deploy successfully. Similarly I've seen that this works:

cdk deploy stack1
cdk deploy --all

This works in 2.80.0 and 2.81.0. I also don't see any errors or unusual behaviour in 2.79.0 so I suspect the defect originates in 2.80.0.

mergify bot pushed a commit that referenced this issue Jun 2, 2023
To help with diagnosing #25806, if the work graph can't make any progress anymore because of a dependency cycle, print the cycle that was found instead of all the remaining nodes.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@rix0rrr
Copy link
Contributor

rix0rrr commented Jun 5, 2023

I've identified the problem. The next release will have the fix.

@mergify mergify bot closed this as completed in 8b97bdf Jun 5, 2023
@mergify mergify bot closed this as completed in #25846 Jun 5, 2023
@github-actions
Copy link

github-actions bot commented Jun 5, 2023

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

1 similar comment
@github-actions
Copy link

github-actions bot commented Jun 5, 2023

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@jogold
Copy link
Contributor

jogold commented Jun 5, 2023

@rix0rrr just tried with the changes from #25846 (using patch-package) and I'm still getting the error Unable to make progress anymore among: ...

@rix0rrr
Copy link
Contributor

rix0rrr commented Jun 5, 2023

Thanks for testing @jogold! That's annoying. Do you have anything you can share so that I can repro?

@rix0rrr rix0rrr reopened this Jun 5, 2023
@jogold
Copy link
Contributor

jogold commented Jun 5, 2023

Thanks for testing @jogold! That's annoying. Do you have anything you can share so that I can repro?

Will send you my manifest.json.

@rix0rrr
Copy link
Contributor

rix0rrr commented Jun 5, 2023

And your *.asset.json files as well please.

@jogold
Copy link
Contributor

jogold commented Jun 5, 2023

And your *.asset.json files as well please.

sent

@rix0rrr
Copy link
Contributor

rix0rrr commented Jun 6, 2023

Thanks! Unfortunately it didn't help, with the code above there are no cycles anymore in the graph that's built from your manifest.json (or at least, as far as I could tell). Can you send me the error message as well? It should have the cycle that it found.

@jogold
Copy link
Contributor

jogold commented Jun 7, 2023

@rix0rrr trying now with v2.83.0, will let you know if this can be closed.

@jogold
Copy link
Contributor

jogold commented Jun 7, 2023

@rix0rrr the issue is fixed for me in v2.83.0. Thanks and sorry again for the confusion with the patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-appsync Related to AWS AppSync bug This issue is a bug. p1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants