New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ambient mesh: upstream connect error or disconnect/reset before headers when using Gateway API #50983
Comments
Didn't get a chance to try to fully reproduce, but I fixed a bug very similar to this recently. Worth giving https://github.com/istio/istio/releases/tag/1.22.0-rc.0 a shot. Another question is why does the gateway have ztunnel capture enable on it. This sounds like what 385708e was meant to fix (the gateway doesn't need a ztunnel capture since it can natively do all the functionality). That commit should be in beta.1. 291cba8 is after beta.1 and touches the same code, though I doubt its fixing this |
Huh, the internal view (configz) of the GW has CNI shows
Seems like a bug. will try to replicate. Thanks! |
Thanks for looking into it! I tested with |
I tested it by removing the |
Very interesting, doesn't repropduce for me on rc0 or beta1 unless I run it in a script instead of interactively. Maybe a race condition |
Hmm, its pretty rare for me to trigger it actually. I got it once but struggling to do so again (which hints again to a race) |
I originally got the issue with Flux, and it reproduced pretty reliably for me there but I wanted to share a more minimal example |
I've reproduced a few times on the beta but not yet on the RC. Will try more tomorrow. I think its
|
Really frusterating debugging this because I can only reproduce 1/10 times or so... I know there is a bug around mutating the object that we shouldn't in a shared cache. Just cannot verify 100% since I cannot reliably reproduce 😬 . I suppose I can just send a fix for that and we can see how it goes |
Fixes istio#50983, I think Basically, we have logic to use `infra.Labels || meta.Labels`. We modify meta.Labels. This is wrong, we should modify AFTER we pick which set of labels to use. Also, we need to deepcopy anything we mutate since its accessing a shared object cache
I think you can workaround by this (infrasrtucture.annotations -> metadata). If I am right about #51013. |
Fixes istio#50983, I think Basically, we have logic to use `infra.Labels || meta.Labels`. We modify meta.Labels. This is wrong, we should modify AFTER we pick which set of labels to use. Also, we need to deepcopy anything we mutate since its accessing a shared object cache
Fixes istio#50983, I think Basically, we have logic to use `infra.Labels || meta.Labels`. We modify meta.Labels. This is wrong, we should modify AFTER we pick which set of labels to use. Also, we need to deepcopy anything we mutate since its accessing a shared object cache
Fixes #50983, I think Basically, we have logic to use `infra.Labels || meta.Labels`. We modify meta.Labels. This is wrong, we should modify AFTER we pick which set of labels to use. Also, we need to deepcopy anything we mutate since its accessing a shared object cache
Fixes istio#50983, I think Basically, we have logic to use `infra.Labels || meta.Labels`. We modify meta.Labels. This is wrong, we should modify AFTER we pick which set of labels to use. Also, we need to deepcopy anything we mutate since its accessing a shared object cache
Fixes #50983, I think Basically, we have logic to use `infra.Labels || meta.Labels`. We modify meta.Labels. This is wrong, we should modify AFTER we pick which set of labels to use. Also, we need to deepcopy anything we mutate since its accessing a shared object cache Co-authored-by: John Howard <john.howard@solo.io>
Thank you for the investigation and the fix. Was this included in 1.22 or should I use the workaround for now? |
Is this the right place to submit this?
Bug Description
I have been trying to get Ambient Mesh working on one of my side projects at home. I'm using version
1.22.0-beta.1
because 1.21 had some issues with init containers that were fixed in this version.Everything seems to work fine up until the point I try to set up a Gateway to my services. Without ambient mesh the Gateway works properly, but when I turn on ambient for the gateway and service namespaces, whenever I try to make a connection I get the following in the
ztunnel
logs (and the response that I included in the issue title):Removing the ambient annotation on either the GW namespace or the service namespace fixes the issue.
You can use the following script &
kind
configuration to reproduce the issue (I'm using rootless podman for the cluster - you might need to change the LB IPs to be on the same subnet):kind.config.yaml
repro.sh
curl
Version
Additional Information
bug-report.tar.gz
The text was updated successfully, but these errors were encountered: