Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NestJs GRPC Kubernetes: Microservice stuck on Initialization when run inside kubernetes cluster #9726

Closed
4 of 15 tasks
TMInnovations opened this issue Jun 4, 2022 · 9 comments
Labels
needs triage This issue has not been looked into

Comments

@TMInnovations
Copy link

TMInnovations commented Jun 4, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Current behavior

Starting the grpc microservice container with url: 0.0.0.0:5000

app.connectMicroservice<MicroserviceOptions>({
    transport: Transport.GRPC,
    options: {
      url: process.env.GRPC_ITEM_PACKAGE_URL,
      package: 'item',
      protoPath: join(__dirname, 'proto/item.proto'),
    },
  });

in a docker environment produces these logs:

[Nest] 24352  - 04.06.2022, 11:36:37     LOG [InstanceLoader] UserMessageModule dependencies initialized +3ms
[Nest] 24352  - 04.06.2022, 11:36:37     LOG [InstanceLoader] GraphQLModule dependencies initialized +3ms
[Nest] 24352  - 04.06.2022, 11:36:37     LOG [NestMicroservice] Nest microservice successfully started +132ms
[Nest] 24352  - 04.06.2022, 11:36:37     LOG [RoutesResolver] AppController {/}: +10ms
[Nest] 24352  - 04.06.2022, 11:36:37     LOG [RouterExplorer] Mapped {/health, GET} route +5ms

When running it inside a kubernetes cluster the logs seem to be stuck at line 2:

[Nest] 18 - 06/04/2022, 9:04:38 AM LOG [InstanceLoader] UserMessageModule dependencies initialized +1ms
[Nest] 18 - 06/04/2022, 9:04:38 AM LOG [InstanceLoader] GraphQLModule dependencies initialized +0ms

Any ideas why this would happen without an (timeout or similar) error message?

Minimum reproduction code

Steps to reproduce

Run any NestJs GRPC Microservice that provides a grpc package with url: 0.0.0.0:5000 inside a kubernetes cluster.

Expected behavior

Microservice should be reachable on Port 5000 from other services. If there is a problem, an error log should be produced.

Package

  • I don't know. Or some 3rd-party package
  • @nestjs/common
  • @nestjs/core
  • @nestjs/microservices
  • @nestjs/platform-express
  • @nestjs/platform-fastify
  • @nestjs/platform-socket.io
  • @nestjs/platform-ws
  • @nestjs/testing
  • @nestjs/websockets
  • Other (see below)

Other package

No response

NestJS version

No response

Packages versions

"@nestjs/microservices": "^8.4.4",

Node.js version

16.14.2

In which operating systems have you tested?

  • macOS
  • Windows
  • Linux

Other

No response

@TMInnovations TMInnovations added the needs triage This issue has not been looked into label Jun 4, 2022
@Tony133
Copy link
Contributor

Tony133 commented Jun 4, 2022

Hi @TMInnovations , for this type of question try to use the support discord channel

The NestJS Core team uses GitHub to track bug reports, feature requests, and regressions.

@TMInnovations
Copy link
Author

As I found out the issue now, I think this really is a bug.

I added a RabbitMQ Transporter to the application. However the url that I provided was wrong. Instead of throwing an error, the initialization process was just stuck. This is horrible to debug in production and happened to me today.

@micalevisk
Copy link
Member

so to reproduce this we just need to supply any invalid url to options.url, regardless of the environment?

@Tony133
Copy link
Contributor

Tony133 commented Jun 5, 2022

@TMInnovations, if you can create a minimal reproduction of the issue in a clonable git repository, it would be of great help to the Nest Core Team.

@TMInnovations
Copy link
Author

TMInnovations commented Jun 5, 2022

@micalevisk yes, tried it on local, docker and kubernetes environment. Initialization gets stuck and no error is thrown. really hard to debug if this happens for the first time + on your production server (faulty env var for broker-connection-url)

@Tony133 I will do my best to create a minrepo soon.

@delucca
Copy link
Contributor

delucca commented Jun 9, 2022

@TMInnovations as mentioned by @micalevisk your issue is related to #9749. It seems that when the connection fails within the transport we don't resolve the server promise and therefore the bootstrap of the application hangs.

I'll work on a solution for this 😄

@delucca
Copy link
Contributor

delucca commented Jun 10, 2022

@TMInnovations unfortunately I was not able to reproduce your issue 😢. Also, your issue is not related to #9749 as we expected, since after further investigation I was able to find out that that issue was only related to how RabbitMQ handles failed connections.

Here is what I did to try to reproduce your problem:

  • I've used a boilerplate app and added a gRPC microservice to it pointing to an invalid server (so, the gRPC server is unreachable), but after launching NestJS it works as expected (it doesn't hangs)
  • I've tried to do the same, but pointing to an existing port that isn't a gRPC service (to evaluate a misconfiguration on your cluster), but NestJS throws an error as expected
  • I've created a Minikube instance, deployed a sample gRPC server and them deployed that same app, to test it within a "Kubernetes-like" context, and it works (as you can see here):

Could you help me reproducing it? 😄
I think the best way, in this case, is for you to provide 2 Docker images + the manifests of them, so we can deploy into a Minikube to reproduce and debug the issue.

@TMInnovations
Copy link
Author

@delucca as i wrote in one of my comments before my problem has nothing to do with grpc - just with rmq. Actually I had the exact same issue as #9749 and just was not sure where the problem came from (due to no logs or errors).
Thank you for investigating such a lot!

@delucca
Copy link
Contributor

delucca commented Jun 12, 2022

@delucca as i wrote in one of my comments before my problem has nothing to do with grpc - just with rmq. Actually I had the exact same issue as #9749 and just was not sure where the problem came from (due to no logs or errors). Thank you for investigating such a lot!

Oh! Sorry. For some reason I understood that we also had an issue with gRPC 🤦🏻

Regarding RabbitMQ I've already solved that issue on #9751, we're just discussing the optimal way to handle the reconnection, but we're probably going to merge it soon 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs triage This issue has not been looked into
Projects
None yet
Development

No branches or pull requests

5 participants