Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems routing to dynamically named data streams #1025

Open
1 task done
javajawa opened this issue Jul 31, 2023 · 2 comments
Open
1 task done

Problems routing to dynamically named data streams #1025

javajawa opened this issue Jul 31, 2023 · 2 comments

Comments

@javajawa
Copy link

Problem

During a migration from regular ES to datastreams, I discovered I was unable to generate a data stream name with placeholders.

We are in a situation where dev teams can put messages onto an exchange with properties that describe their product and format. This was being routed to distinct logstash indexes (containing data with different schemas) with a config along the lines of

<match to.es>
  @type elasticsearch
  hosts my-cluster:9200

  logstash_format true
  logstash_prefix demo-${product}-${logstream}
</match>

When attempting to change to using data streams, I found that the plugin did not seem to have any kind of replacement support.

What I have tried:

  • Replacements in the data_stream_name are ignored (also requires _index in the buffer config).
  • Using _index as a field in the record might have been doing the right thing, but the commits fail as it's not a permitted ES field. This happens even if we request fluent does not send the field.
  • Trying to use a different name with target_index_key is seemingly ignored (everything written to ceres-).

I did find a config (probably the middle one) that created all the expected datastreams but did not populate them with any documents. I have been unable to fully replicate that today.

Steps to replicate

The complete minimal test cases and logs for the situations described above can be found at https://gist.github.com/javajawa/4283666d2489b0d9d9abb8909d077476

Expected Behavior or What you need to ask

Are dynamically named data streams supported?
If so, how?
If not, can they be?

Using Fluentd and ES plugin versions

This situation has been tested using a docker image running with:

   ruby 3.1.4p223 (2023-03-30 revision 957bb7cb81) [x86_64-linux-musl]
   elastic-transport (8.2.2)
   elasticsearch (8.9.0)
   elasticsearch-api (8.9.0)
   fluent-plugin-elasticsearch (5.3.0)
   fluentd (1.16.2)
   typhoeus (1.4.0)

The cluster is ES 8.3.3. The templates in question are all very trivial (for now)

@cosmo0920
Copy link
Collaborator

cosmo0920 commented Nov 27, 2023

During a migration from regular ES to datastreams, I discovered I was unable to generate a data stream name with placeholders.

We are in a situation where dev teams can put messages onto an exchange with properties that describe their product and format. This was being routed to distinct logstash indexes (containing data with different schemas) with a config along the lines of

<match to.es>
  @type elasticsearch
  hosts my-cluster:9200

  logstash_format true
  logstash_prefix demo-${product}-${logstream}
</match>

This is because you don't use buffer directive to include information for replacing the placeholders with actual values.

see: https://docs.fluentd.org/configuration/buffer-section#placeholders

@javajawa
Copy link
Author

javajawa commented Nov 27, 2023

Can you reference where those are missing in the attached gist that showed the variations on the config that were attempted?

(to quote the gist

# This was the first obvious thing, and the thing I hoped would work
# Just specify the same replacements in datastream name
<match to.es>
  @id es_with_name_placeholder
  @type elasticsearch_data_stream

  hosts host.docker.internal:9200
  http_backend typhoeus
  reload_connections false

  data_stream_name ceres-${product}-${logstream}
  data_stream_template_name ceres-datastream

  id_key _hash

  <buffer product,logstream>
    @type memory
    chunk_limit_records 2
    queued_chunks_limit_size 1
    retry_max_times 0
  </buffer>
</match>

# The above errors on the buffer config not containing _index
# I did a little poking around and determined that that field was likely the index/ds name in use

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants