持久节点HTTP健康检查失败 #12068

Joey777210 · 2024-05-08T08:21:55Z

我们注册持久节点规模大概在8万左右，采用HTTP健康检查，运行一段时间后有大量不健康，观察naming.log发现打印大量

http:500 milliseconds timeout on connection http-outgoing-52823038824

和

http:Connection lease request time out

观察代码发现Nacos默认的HTTP健康检查Timeout为500ms，连接池大小为核心数，健康检查间隔为5s一次。

感觉是服务能力跟不上需要检查的节点量级，请问是否有方式优化这种问题？目前想到的是把健康检查时间调长，但在代码中没有发现可以配置的地方？

The text was updated successfully, but these errors were encountered:

KomachiSion · 2024-05-10T04:01:35Z

目前好像没办法调整，不过3.0有计划重新设计一下健康检查相关的部分。

Joey777210 · 2024-05-10T10:27:48Z

大佬，另外发现使用HTTP对持久节点健康检查时，naming-server.log中打印大量的Client change for service ....... 和 http check started before last one finished日志，节点大量不健康，请问可能的原因是什么呢？
@KomachiSion

KomachiSion · 2024-05-15T02:15:04Z

Client change for service ....... 这个日志就是某个服务的某个实例发生了变化，比如健康检查状态变了，或者有新注册，或者更新。
http check started before last one finished 这个日志应该就是你上面提的问题，因为连接超时，或者线程池不足，导致任务积压了，下一次的检查任务已经启动，但是之前的还没有结束。

如果目前已经出现这个问题，建议扩容nacos节点，把健康检查的压力分散到多个节点上。

Joey777210 · 2024-05-16T05:58:23Z

Client change for service发生的频率非常高，毫秒级地在刷新日志；
探活频率的问题我通过改代码的方式暂时解决了

KomachiSion added the kind/discussion Category issues related to discussion label May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

持久节点HTTP健康检查失败 #12068

持久节点HTTP健康检查失败 #12068

Joey777210 commented May 8, 2024

KomachiSion commented May 10, 2024

Joey777210 commented May 10, 2024

KomachiSion commented May 15, 2024

Joey777210 commented May 16, 2024

持久节点HTTP健康检查失败 #12068

持久节点HTTP健康检查失败 #12068

Comments

Joey777210 commented May 8, 2024

KomachiSion commented May 10, 2024

Joey777210 commented May 10, 2024

KomachiSion commented May 15, 2024

Joey777210 commented May 16, 2024