Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dial fails with NNGException: Connection shutdown in kubernetes #116

Open
twisteroidambassador opened this issue Jul 7, 2023 · 0 comments

Comments

@twisteroidambassador
Copy link

Started seeing this problem after migrating some apps to kubernetes.

My original setup on bare metal is like this: There's one instance of the "responder" app, listening on a Rep0 socket, and several instances of the "requester" app, dialing the responder with Req0 sockets. All these instances run on the same host machine. Every day on a timer, the requester instances start up first, and after a few minutes the responder starts. The requester's code is like this:

async with contextlib.AsyncExitStack() as stack:
    req = stack.enter_context(pynng.Req0(dial='tcp://responder:7470'))
    while True:
        await req.asend(b'Hello')
        resp = await req.arecv()
        # do stuff

There was never a problem with requesters start dialing before responder starts listening. The Req0 socket simply fails the initial sync dial, changes to async dialing, and eventually connects.

Then, I had to migrate this setup to kubernetes. So I made a responder deployment with one pod, a responder service pointing to the Rep0 port of the responder pod, and a requester deployment with several pods. The requesters dial the service address of the responder.

In this setup, there's a chance that the requesters' dialing attempts fail outright:

File "/app/requester.py", line 54, in do_work
  req = stack.enter_context(pynng.Req0(dial=f'tcp://{responder_host}:{responder_port}'))
File "/app/venv/lib/python3.9/site-packages/pynng/nng.py", line 938, in __init__
  super().__init__(**kwargs)
File "/app/venv/lib/python3.9/site-packages/pynng/nng.py", line 349, in __init__
  self.dial(dial, block=block_on_dial)
File "/app/venv/lib/python3.9/site-packages/pynng/nng.py", line 374, in dial
  return self.dial(address, block=True)
File "/app/venv/lib/python3.9/site-packages/pynng/nng.py", line 371, in dial
  return self._dial(address, flags=0)
File "/app/venv/lib/python3.9/site-packages/pynng/nng.py", line 390, in _dial
  check_err(ret)
File "/app/venv/lib/python3.9/site-packages/pynng/exceptions.py", line 201, in check_err
  raise exc(string, err)
pynng.exceptions.NNGException: Connection shutdown

This is an uncaught exception, and the requester basically dies without retrying. Why does this happen?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant