Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envoy can't start the egress listener(configured by Sidecar) on VM workload #50954

Open
2 tasks done
CadenGuo opened this issue May 9, 2024 · 2 comments
Open
2 tasks done
Labels
area/networking feature/Virtual-machine issues related with VM support

Comments

@CadenGuo
Copy link

CadenGuo commented May 9, 2024

Is this the right place to submit this?

  • This is not a security vulnerability or a crashing bug
  • This is not a question about how to use Istio

Bug Description

I have a VM workload connected to a single primary istio cluster. For the VM workload, I am using Sidecar to configure ingress and egress listeners

  • sidecar:
apiVersion: networking.istio.io/v1alpha3
kind: Sidecar
metadata:
  name: "mesh-east-vm"
  namespace: "mesh-east-vm-istio"
spec:
  workloadSelector:
    labels:
      app: "mesh-east-vm"
  ingress:
    - port:
        number: 21100
        protocol: GRPC
        name: grpc-ingress
      captureMode: NONE
      defaultEndpoint: 127.0.0.1:8087
    - port:
        number: 21101
        protocol: HTTP
        name: http-ingress
      captureMode: NONE
      defaultEndpoint: 127.0.0.1:8088
  egress:
    - port:
        number: 15050
        protocol: GRPC
        name: grpc-egress
      bind: 127.0.0.1
      hosts:
        - istio-workload/client-000.istio-workload.svc.cluster.local
  outboundTrafficPolicy:
    mode: REGISTRY_ONLY

After applying the the Sidecar, I can see the ingress listeners are configured and envoy is correctly listening on both 127.0.0.1:8087 and 127.0.0.1:8088. But for the egress listener, envoy seems to be ignoring it.

However, after checking the the envoy's configuration, I found envoy did received the configurations of the egress listener and the related upstream cluster from xDS.

# curl 127.0.0.1:15000/clusters 2>/dev/null | grep 15050
outbound|15050||server-000.istio-workload.svc.cluster.local::observability_name::outbound|15050||server-000.istio-workload.svc.cluster.local
outbound|15050||server-000.istio-workload.svc.cluster.local::default_priority::max_connections::4294967295
outbound|15050||server-000.istio-workload.svc.cluster.local::default_priority::max_pending_requests::4294967295
outbound|15050||server-000.istio-workload.svc.cluster.local::default_priority::max_requests::4294967295
outbound|15050||server-000.istio-workload.svc.cluster.local::default_priority::max_retries::4294967295
outbound|15050||server-000.istio-workload.svc.cluster.local::high_priority::max_connections::1024
outbound|15050||server-000.istio-workload.svc.cluster.local::high_priority::max_pending_requests::1024
outbound|15050||server-000.istio-workload.svc.cluster.local::high_priority::max_requests::1024
outbound|15050||server-000.istio-workload.svc.cluster.local::high_priority::max_retries::3
outbound|15050||server-000.istio-workload.svc.cluster.local::added_via_api::true
outbound|15050||server-000.istio-workload.svc.cluster.local::eds_service_name::outbound|15050||server-000.istio-workload.svc.cluster.local
outbound|15050||server-000.istio-workload.svc.cluster.local::10.169.159.23:21100::cx_active::0
outbound|15050||server-000.istio-workload.svc.cluster.local::10.169.159.23:21100::cx_connect_fail::0
outbound|15050||server-000.istio-workload.svc.cluster.local::10.169.159.23:21100::cx_total::0
outbound|15050||server-000.istio-workload.svc.cluster.local::10.169.159.23:21100::rq_active::0
outbound|15050||server-000.istio-workload.svc.cluster.local::10.169.159.23:21100::rq_error::0
outbound|15050||server-000.istio-workload.svc.cluster.local::10.169.159.23:21100::rq_success::0
outbound|15050||server-000.istio-workload.svc.cluster.local::10.169.159.23:21100::rq_timeout::0
outbound|15050||server-000.istio-workload.svc.cluster.local::10.169.159.23:21100::rq_total::0
outbound|15050||server-000.istio-workload.svc.cluster.local::10.169.159.23:21100::hostname::
outbound|15050||server-000.istio-workload.svc.cluster.local::10.169.159.23:21100::health_flags::healthy
outbound|15050||server-000.istio-workload.svc.cluster.local::10.169.159.23:21100::weight::1
outbound|15050||server-000.istio-workload.svc.cluster.local::10.169.159.23:21100::region::ap-southeast-1
outbound|15050||server-000.istio-workload.svc.cluster.local::10.169.159.23:21100::zone::ap-southeast-1c
outbound|15050||server-000.istio-workload.svc.cluster.local::10.169.159.23:21100::sub_zone::
outbound|15050||server-000.istio-workload.svc.cluster.local::10.169.159.23:21100::canary::false
outbound|15050||server-000.istio-workload.svc.cluster.local::10.169.159.23:21100::priority::0
outbound|15050||server-000.istio-workload.svc.cluster.local::10.169.159.23:21100::success_rate::-1
outbound|15050||server-000.istio-workload.svc.cluster.local::10.169.159.23:21100::local_origin_success_rate::-1

Also from the envoy config_dump, we can also find the dynamic_listener config, routeConfig and cluster config related to the egress configuration in Sidecar.

It's just, envoy doesn't actually start listener on 127.0.0.1:15050. And I can't find any informative debug logs generated by Istio and Envoy.

Version

$ istioctl version
client version: 1.21.0
control plane version: 1.20.3
data plane version: 1.20.3 (10 proxies), 1.21.2 (1 proxies)

$ kubectl version
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.23.17-eks-7a52527

Additional Information

No response

@howardjohn
Copy link
Member

Showing the full config dump would be handy here

@CadenGuo
Copy link
Author

CadenGuo commented May 10, 2024

Hi @howardjohn . Thanks for the reply. I got the the full config_dump, plus some findings. Please refer to the details below.

So now I changed the Sidecar obj a bit, by having 2 different egress listeners(port 15051 and port 15050):

apiVersion: networking.istio.io/v1alpha3
kind: Sidecar
metadata:
  name: "mesh-east-vm"
  namespace: "mesh-east-vm-istio"
spec:
  workloadSelector:
    labels:
      app: "mesh-east-vm"
  ingress:
    - port:
        number: 21100
        protocol: GRPC
        name: grpc-ingress
      captureMode: NONE
      defaultEndpoint: 127.0.0.1:8087
    - port:
        number: 21101
        protocol: HTTP
        name: http-ingress
      captureMode: NONE
      defaultEndpoint: 127.0.0.1:8088
  egress:
    - port:
        number: 15050
        protocol: GRPC
        name: grpc-egress
      bind: 127.0.0.1
      captureMode: NONE
      hosts:
        - "istio-workload-payment/server-000.istio-workload-payment.svc.cluster.local"
        - "istio-workload-payment/server-001.istio-workload-payment.svc.cluster.local"
    - port:
        number: 15051
        protocol: HTTP
        name: http-egress
      bind: 127.0.0.1
      hosts:
        - "istio-workload-payment/client-000.istio-workload-payment.svc.cluster.local"
  outboundTrafficPolicy:
    mode: REGISTRY_ONLY

The major difference between them is the egress listener listening on port 15050 has captureMode: NONE configured while the other listener (on 15051) doesn't have this configuration.

After applying this Sidecar, I restarted did systemctl restart istio on the VM, then I found Envoy started to listen on 15050 but doesn't listen on 15051. The full config dump is like this(the file below):
config_dump_istio_2024-05-10T03_23_56+0000.json

Having compared the dynamic listeners of port 15050 and port 15051, I found the dynamic listener config of 15051 has this line added

"bind_to_port": false

While the listener of 15050 doesn't have this line.

I suppose this is the reason why Envoy only listens on 15050 but not on 15051. I don't know if this is the intended behaviour, but it will be nice if this behaviour can be documented in the Sidecar config reference.

Another experiment
Based on the above setup, later I added the captureMode: NONE to the 15051 egress listener config in the Sidecar obj and applied the obj. Then I did another inspection on the config_dump and I found the line "bind_to_port": false is gone on the 15051 dynamic listener, however Envoy didn't start the 15051 listener until I did a systemctl restart istio. So basically it means: any change of captureMode in the Sidecar obj requires a restart of the istio-sidecar on the VM workload to actually take effect? Is this also intended? Cause I suppose as long as Envoy receives the latest configuration from xDS, it should react to it by starting new listeners.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking feature/Virtual-machine issues related with VM support
Projects
None yet
Development

No branches or pull requests

3 participants