Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECONNREFUSED due to the connecting to wrong port after while #1001

Open
PScharrenberg opened this issue Feb 21, 2023 · 2 comments
Open

ECONNREFUSED due to the connecting to wrong port after while #1001

PScharrenberg opened this issue Feb 21, 2023 · 2 comments

Comments

@PScharrenberg
Copy link

PScharrenberg commented Feb 21, 2023

Problem

fluent-plugin-elasticsearch successfully pushes logs to our elasticsearch server located behind a ssl-offloading nginx proxy listening on port 443.
After a while (a few hours) no logs are transferred anymore and we find this warning-message in the fluentd logs (where X.X.X.X is the correct ip address of our es server):

2023-02-21 11:07:15 +0000 [warn]: #0 [clusterflow:flow] failed to flush the buffer. retry_times=12 next_retry_time=2023-02-21 12:10:46 +0000 chunk="5f52bba4e6c17284274d9814840cea63" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch-fqdn\", :port=>443, :scheme=>\"https\", :user=>\"logging\", :password
=>\"obfuscated\"}): Connection refused - connect(2) for X.X.X.X:9200 (Errno::ECONNREFUSED)"

So after a while it tries to connect to the elasticsearch server directly without proxy, which obviously does not work.

After restarting fluentd inside of the k8s pod (fluent-ctl restart) the logs are shipped again

Steps to replicate

The relevant config part in fluentd.conf:

  <match **>
    @type elasticsearch
    @id clusterflow:flow
    exception_backup true
    fail_on_putting_template_retry_exceed true
    host elasticsearch-fqdn
    logstash_dateformat %Y-%m-%d
    logstash_format true
    logstash_prefix logging
    password xxxxxxxxxx
    port 443
    reload_connections true
    scheme https
    ssl_verify true
    user logging
    utc_index true
    verify_es_version_at_startup true
    <buffer tag,time>
      @type file
      chunk_limit_size 8MB
      path /buffers/clusterflow:flow.*.buffer
      retry_forever true
      timekey 10m
      timekey_wait 1m
    </buffer>
  </match>

Expected Behavior or What you need to ask

We expect it to continue connecting to the configured port.

Using Fluentd and ES plugin versions

We're using the rancher-logging "app" provided by rancher (rancher-logging:100.1.3+up3.17.7)
We're seeing this issue after upgrading from an older version.

  • Debian Buster
  • Kubernetes
  • Fluentd 1.14.6
  • ES plugin 5.2.2
  • ES version 6.8.12
@cosmo0920
Copy link
Collaborator

This could be occurred by Elasticsearch Sniffering feature.

How to enable this feature, see: https://github.com/uken/fluent-plugin-elasticsearch#sniffer-class-name

@GiZZoR
Copy link

GiZZoR commented Apr 11, 2023

You probably hit this time bomb someone left for you: https://github.com/uken/fluent-plugin-elasticsearch#reload-after
This causes the activation of the sniffer. Yes, a sniffer that hunts out the nodes in your ES cluster and then bypasses the configuration you explicitly set, thereby voiding any load balancing you may have configured. Bonus feature: it uses the scheme from the config you supplied to hit the host and port it finds in the nodes catalog.

I'd recommend reload_connections false, as the sniffer just shouldn't be needed in any properly configured environment.
You'd either correctly configure the hosts it uses, or use a load balancer.
This "feature" should only be enabled if explicitly needed, which should be never.

IMHO the sniffer should exist as an optional plugin, and should be promptly removed/disabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants