Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node crashes if timeout (rc: -7) is thrown during init() #262

Open
sirmonin opened this issue Jan 11, 2021 · 16 comments
Open

Node crashes if timeout (rc: -7) is thrown during init() #262

sirmonin opened this issue Jan 11, 2021 · 16 comments

Comments

@sirmonin
Copy link

sirmonin commented Jan 11, 2021

Describe the bug
If timeout is reached (rc: -7) during the first init() call, I receive the error below and app crashes with code: 139.

Assertion failed: loop->watchers[w->fd] == w

I am sure that app doesn't use Zookeeper instance until there is a 'connect' event (which is never received).

To Reproduce

  1. Initialize Zookeeper via .init() or .connect() with a timeout set to a tiny value. E.g. 5ms. (This will produce the same effect as on our server, where timeout is set to 10000ms.)
  2. See error.

Error Log

2021-01-11 16:58:46,945:14539:ZOO_INFO@log_env@1102: Client environment:zookeeper.version=zookeeper C client 3.5.8
2021-01-11 16:58:46,945:14539:ZOO_INFO@log_env@1106: Client environment:host.name=project
2021-01-11 16:58:46,945:14539:ZOO_INFO@log_env@1113: Client environment:os.name=Linux
2021-01-11 16:58:46,945:14539:ZOO_INFO@log_env@1114: Client environment:os.arch=4.19.121-linuxkit
2021-01-11 16:58:46,946:14539:ZOO_INFO@log_env@1115: Client environment:os.version=#1 SMP Tue Dec 1 17:50:32 UTC 2020
2021-01-11 16:58:46,946:14539:ZOO_INFO@log_env@1123: Client environment:user.name=(null)
2021-01-11 16:58:46,946:14539:ZOO_INFO@log_env@1131: Client environment:user.home=/home/node
2021-01-11 16:58:46,946:14539:ZOO_INFO@log_env@1143: Client environment:user.dir=/srv
2021-01-11 16:58:46,946:14539:ZOO_INFO@zookeeper_init_internal@1177: Initiating client connection, host=zookeeper:2181 sessionTimeout=5 watcher=0x7f988ef61490 sessionId=0 sessionPasswd=<null> context=0x5623b4960d20 flags=0
2021-01-11 16:58:46,947:14539:ZOO_DEBUG@get_next_server_in_reconfig@1330: [OLD] count=0 capacity=0 next=0 hasnext=0
2021-01-11 16:58:46,947:14539:ZOO_DEBUG@get_next_server_in_reconfig@1333: [NEW] count=1 capacity=16 next=0 hasnext=1
2021-01-11 16:58:46,947:14539:ZOO_DEBUG@get_next_server_in_reconfig@1344: Using next from NEW=172.18.0.2:2181
2021-01-11 16:58:46,947:14539:ZOO_DEBUG@zookeeper_connect@2232: [zk] connect()
Interest in (fd=19, read=true, write=true, timeout=1)
2021-01-11 16:58:46,957:14539:ZOO_WARN@zookeeper_interest@2284: Exceeded deadline by 9ms
zk_timer_cb fired
ping timer went off
2021-01-11 16:58:46,957:14539:ZOO_ERROR@handle_socket_error_msg@2386: Socket [172.18.0.2:2181] zk retcode=-7, errno=110(Operation timed out): connection to 172.18.0.2:2181 timed out (exceeded timeout by 7ms)
2021-01-11 16:58:46,958:14539:ZOO_DEBUG@cleanup@1785: Previous connection=[172.18.0.2:2181] delay=0
yield:zookeeper_interest returned error: -7 - operation timeout
zk_timer_cb fired
ping timer went off
2021-01-11 16:58:46,959:14539:ZOO_WARN@zookeeper_interest@2284: Exceeded deadline by 11ms
2021-01-11 16:58:46,960:14539:ZOO_DEBUG@get_next_server_in_reconfig@1330: [OLD] count=0 capacity=0 next=0 hasnext=0
2021-01-11 16:58:46,960:14539:ZOO_DEBUG@get_next_server_in_reconfig@1333: [NEW] count=1 capacity=16 next=1 hasnext=0
2021-01-11 16:58:46,960:14539:ZOO_DEBUG@get_next_server_in_reconfig@1355: Failed to find either new or old
2021-01-11 16:58:46,960:14539:ZOO_DEBUG@zookeeper_connect@2232: [zk] connect()
Assertion failed: loop->watchers[w->fd] == w (../deps/uv/src/unix/core.c: uv__io_stop: 932)

Expected behavior
Timeout doesn't crash node app and closes client connection, so I could recreate client instance and retry to connect.

Desktop (please complete the following information):

  • OS: Linux
  • Node.js version: v10.23.0
@DavidVujic
Copy link
Collaborator

Thank you for reporting this @sirmonin!

@DavidVujic
Copy link
Collaborator

DavidVujic commented Jan 11, 2021

Could be related to #238

@DavidVujic
Copy link
Collaborator

Hi @sirmonin!

To find out what could be wrong:

  • What version of node-zookeeper are you using?
  • What version of the ZooKeeper server are you connecting to?

@sirmonin
Copy link
Author

sirmonin commented Jan 11, 2021

@DavidVujic, thanks for a quick reply!

node-zookeeper: Tested both on 4.6.0 and 4.7.1
Zookeeper: I am using Docker image bitnami/zookeeper:latest which is 3.6.2

@DavidVujic
Copy link
Collaborator

Great, thanks!

@sirmonin
Copy link
Author

@DavidVujic It is also observed, that timeout with crashing happens while application is running under some load.

@DavidVujic
Copy link
Collaborator

Thanks!

I'm planning to look at this later this week.

Notes:
I think there is need for some extra uv_closing (or something similar) checks here: https://github.com/yfinkelstein/node-zookeeper/blob/master/src/node-zk.cpp#L265

As hinted in this StackOverflow post: https://stackoverflow.com/questions/47522854/main-src-unix-core-c117-uv-close-assertion-uv-is-closinghandle-failed

If you, or if you know someone with C and/or C++ skills, please feel free to jump in and help 😄 It is much appreciated.

@DavidVujic
Copy link
Collaborator

Just a short update: I'm working on this issue and am trying different approaches on how to solve it. I also believe I can reproduce the assertion error using the code in the examples folder and restart a docker container with a zookeeper server running.

@DavidVujic
Copy link
Collaborator

DavidVujic commented Jan 30, 2021

Hello again @sirmonin!

I've been working on solving this issue, but cannot really find where and how to prevent the assertion error to happen.

What I've found out is that it most likely is when watchers are attached. Earlier today I pushed new versions of the code in the examples folder, making sure only one client is used, and function calls is made only when the client is connected.

And that has made a difference: on Mac OS X I get no assertion error. But on Linux I get the error - only occasionally. Haven't found a pattern.

The way I've been doing is:
starting a zookeeper server hosted in a docker container on my local machine.
starting the examples with node examples/index.js
running docker restart <container-id>

Before the changes in the example code, I got assertion error every time. Now, only randomly it seems and only on Linux.

I have never got the error you are describing (with short timeouts).

Could you help me by sharing another code example, or if you can try the example code described above?

Here's a diff of the changes I have made: https://github.com/yfinkelstein/node-zookeeper/pull/264/files

@sirmonin
Copy link
Author

sirmonin commented Feb 1, 2021

Hi @DavidVujic! Thanks a lot for time and effort you've spent on this issue.

My zookeeper connect method looks like this. Timeout is set to 5ms to simulate timeout error.

public connect = () => {
        const host = this.config.get('ZOOKEEPER_HOST');
        const port = this.config.get('ZOOKEEPER_PORT');
        this.zkConfig = {
            connect: `${host}:${port}`,
            timeout: 5,
            debug_level: ZooKeeper.ZOO_LOG_LEVEL_DEBUG,
            host_order_deterministic: false,
        };

        this.client = new ZooKeeper(this.zkConfig);

        this.client.init(this.zkConfig);
    };

This produces error logs:

2021-02-01 17:34:32,821:139:ZOO_INFO@log_env@1098: Client environment:zookeeper.version=zookeeper C client 3.5.5
2021-02-01 17:34:32,821:139:ZOO_INFO@log_env@1102: Client environment:host.name=test
2021-02-01 17:34:32,821:139:ZOO_INFO@log_env@1109: Client environment:os.name=Linux
2021-02-01 17:34:32,821:139:ZOO_INFO@log_env@1110: Client environment:os.arch=4.19.121-linuxkit
2021-02-01 17:34:32,821:139:ZOO_INFO@log_env@1111: Client environment:os.version=#1 SMP Tue Dec 1 17:50:32 UTC 2020
2021-02-01 17:34:32,821:139:ZOO_INFO@log_env@1119: Client environment:user.name=(null)
2021-02-01 17:34:32,821:139:ZOO_INFO@log_env@1127: Client environment:user.home=/home/node
2021-02-01 17:34:32,821:139:ZOO_INFO@log_env@1139: Client environment:user.dir=/srv
2021-02-01 17:34:32,821:139:ZOO_INFO@zookeeper_init_internal@1173: Initiating client connection, host=zookeeper:2181 sessionTimeout=5 watcher=0x7f365d600480 sessionId=0 sessionPasswd=<null> context=0x55d714f278a0 flags=0
Interest in (fd=19, read=true, write=true, timeout=1)2021-02-01 17:34:32,823:139:ZOO_DEBUG@get_next_server_in_reconfig@1326: [OLD] count=0 capacity=0 next=0 hasnext=0
2021-02-01 17:34:32,823:139:ZOO_DEBUG@get_next_server_in_reconfig@1329: [NEW] count=1 capacity=16 next=0 hasnext=1
2021-02-01 17:34:32,823:139:ZOO_DEBUG@get_next_server_in_reconfig@1340: Using next from NEW=172.18.0.2:2181
2021-02-01 17:34:32,823:139:ZOO_DEBUG@zookeeper_connect@2228: [zk] connect()
zk_timer_cb firedping timer went off2021-02-01 17:34:32,832:139:ZOO_WARN@zookeeper_interest@2280: Exceeded deadline by 9ms
yield:zookeeper_interest returned error: -7 - operation timeout
2021-02-01 17:34:32,833:139:ZOO_ERROR@handle_socket_error_msg@2382: Socket [172.18.0.2:2181] zk retcode=-7, errno=110(Operation timed out): connection to 172.18.0.2:2181 timed out (exceeded timeout by 7ms)
2021-02-01 17:34:32,833:139:ZOO_DEBUG@cleanup@1781: Previous connection=[172.18.0.2:2181] delay=0
zk_timer_cb firedping timer went off2021-02-01 17:34:32,834:139:ZOO_WARN@zookeeper_interest@2280: Exceeded deadline by 11ms
2021-02-01 17:34:32,835:139:ZOO_DEBUG@get_next_server_in_reconfig@1326: [OLD] count=0 capacity=0 next=0 hasnext=0
2021-02-01 17:34:32,835:139:ZOO_DEBUG@get_next_server_in_reconfig@1329: [NEW] count=1 capacity=16 next=1 hasnext=0
2021-02-01 17:34:32,836:139:ZOO_DEBUG@get_next_server_in_reconfig@1351: Failed to find either new or old
2021-02-01 17:34:32,836:139:ZOO_DEBUG@zookeeper_connect@2228: [zk] connect()
Interest in (fd=20, read=true, write=true, timeout=1)Assertion failed: loop->watchers[w->fd] == w (../deps/uv/src/unix/core.c: uv__io_stop: 932)

@sirmonin
Copy link
Author

sirmonin commented Feb 1, 2021

However, adding a setTimeout seemed to help a bit. App is no longer crashing and tries to reconnect.

public connect = () => {
        const host = this.config.get('ZOOKEEPER_HOST');
        const port = this.config.get('ZOOKEEPER_PORT');
        this.zkConfig = {
            connect: `${host}:${port}`,
            timeout: 5,
            debug_level: ZooKeeper.ZOO_LOG_LEVEL_DEBUG,
            host_order_deterministic: false,
        };

        this.client = new ZooKeeper(this.zkConfig);

        setTimeout(() => {
            this.client.init(this.zkConfig);
        }, 1000);
    };

@DavidVujic
Copy link
Collaborator

Thanks @sirmonin!

@DavidVujic
Copy link
Collaborator

I'm closing this one. Please reopen if it still is an issue!

@sirmonin
Copy link
Author

sirmonin commented Apr 29, 2022

@DavidVujic Issue still persists when server is under high load.

2022-04-29 17:48:16,829:448:ZOO_ERROR@handle_socket_error_msg@2489: Socket 172.19.0.4:2181 zk retcode=-7, errno=110(Operation timed out): connection to 172.19.0.4:2181 timed out (exceeded timeout by 8849ms)
yield:zookeeper_interest returned error: -7 - operation timeout
2022-04-29 17:48:18,481:448:ZOO_WARN@zookeeper_interest@2375: Exceeded deadline by 12182ms
Assertion failed: loop->watchers[w->fd] == w (../deps/uv/src/unix/core.c: uv__io_stop: 930)

Node.js v14.16.1
node-zookeeper v5.4.0

@DavidVujic
Copy link
Collaborator

Thank you for notifying. I'll reopen this one!

@DavidVujic DavidVujic reopened this Apr 29, 2022
@sirmonin
Copy link
Author

sirmonin commented Apr 29, 2022

@DavidVujic I tested under Node v16.14.2. It looks like yield:zookeeper_interest returned error: -7 - operation timeout no longer causes libuv to fail.

Will keep you posted in case issue repeats itself again.
For now the issue is solved by updating to Node v16.14.2 or later.

However, timeout issue still persists (eventhough it doesn't crash node anymore)

2022-04-29 20:18:04,369:218:ZOO_ERROR@handle_socket_error_msg@2489: Socket 172.19.0.4:2181 zk retcode=-7, errno=110(Operation timed out): connection to 172.19.0.4:2181 timed out (exceeded timeout by 10530ms)
yield:zookeeper_interest returned error: -7 - operation timeout

Update: Node.js v14.19.1 doesn't crash as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants