Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector metric vector_open_files not showing correct data and missing description in documentation #20431

Open
ShahroZafar opened this issue May 4, 2024 · 3 comments
Labels
domain: observability Anything related to monitoring/observing Vector source: file Anything `file` source related type: bug A code related bug.

Comments

@ShahroZafar
Copy link

ShahroZafar commented May 4, 2024

A note for the community

  • Please vote on this issue by adding a 馃憤 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

We are using file source to fetch pod logs and push them to kafka using the kafka sink. We want a mechanism which we can be certain that vector is not losing or laging far behind.
To do so, we saw that there is a metric in vector vector_open_files which is not mentioned in the documentation but actually exists. We assume that this metric is at any given time how many files are open by vector for reading.
Our configuration is such that at any given time vector agent can be reading at max 2 files (file and the 2nd file created due to rotation containing copy of first file). However in the graph we see that the metric value reaches 3 from time to time. Also when a file is rotated and vector detects it, ideally it should complete reading that file and the vector_open_files should drop to 1.

Our main blocker shifting to vector is a way using which we can absolutely be sure that vector is upto the speed and it not lagging far behind. Also a mechanism using which we can get insights that no data is being lost while file reading.

Configuration

customConfig:
  data_dir: /vector-data-dir
  acknowledgements:
    enabled: true
  api:
    enabled: true
    address: 127.0.0.1:8686
    playground: true
  sources:
    logs:
      type: file
      oldest_first: true
      exclude:
        - /var/log/pods/particular-pod-directory-*/container_name/*.tmp
        - /var/log/pods/particular-pod-directory-*/container_name/*.gz
      include:
        - /var/log/pods/particular-pod-directory-*/container_name/*
    internal_metrics:
      type: internal_metrics
  sinks:
    prom_exporter:
      type: prometheus_exporter
      inputs: [internal_metrics]
      address: 0.0.0.0:9090
      buffer:
        type: disk
        when_full: block
        max_size: 10000000000
    kafka:
      type: kafka
      inputs:
        - logs
      bootstrap_servers: brokers:9092
      topic: test
      encoding:
        codec: json
      compression: zstd
      healthcheck:
        enabled: false
      librdkafka_options:
        request.required.acks: "1"
        message.timeout.ms: "0"
        batch.num.messages: "8192"
        linger.ms: "100"
        batch.size: "1000000"
      message_timeout_ms: 0
      buffer:
        type: disk
        when_full: block
        max_size: 10000000000

Currently we are testing vector at 20k requests per seconds. Our actual application can have logs produced at about 200k requests per seconds.

We haven't chose the kubernetes_logs source at the moment since we don't want any enrichment

Version

0.37.1

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

@ShahroZafar ShahroZafar added the type: bug A code related bug. label May 4, 2024
@jszwedko jszwedko added source: file Anything `file` source related domain: observability Anything related to monitoring/observing Vector labels May 6, 2024
@jszwedko
Copy link
Member

jszwedko commented May 6, 2024

Thanks @ShahroZafar . I do see that the metric is undocumented. It should, as you note, measure the number of files the file source has open.

. Also when a file is rotated and vector detects it, ideally it should complete reading that file and the vector_open_files should drop to 1.

When the file is rotated, does it match one of the exclude patterns?

As an aside, you could try increasing max_read_bytes. Users often see better performance with a limit.

@ShahroZafar
Copy link
Author

When the file is rotated, does it match one of the exclude patterns?

No. The exclude pattern is limited to .tmp and .gz. The rotated file is not .gz. Its in the format 0.logs.{Timestamp}

As an aside, you could try increasing max_read_bytes. Users often see better performance with a limit.

We have oldest_first: true since these are kubernetes logs and we want to read the old files as soon as possible before they are further rotated to be .gz. And I think as per docs (please correct me if I am wrong) if older_first is set, max_read_bytes doesn't come into play

@jszwedko
Copy link
Member

jszwedko commented May 6, 2024

When the file is rotated, does it match one of the exclude patterns?

No. The exclude pattern is limited to .tmp and .gz. The rotated file is not .gz. Its in the format 0.logs.{Timestamp}

As an aside, you could try increasing max_read_bytes. Users often see better performance with a limit.

We have oldest_first: true since these are kubernetes logs and we want to read the old files as soon as possible before they are further rotated to be .gz. And I think as per docs (please correct me if I am wrong) if older_first is set, max_read_bytes doesn't come into play

Ah, I missed that you had oldest_first, yes that should cause it to read the oldest files first then rather than round-robin balancing. My expectation would match yours then:

Also when a file is rotated and vector detects it, ideally it should complete reading that file and the vector_open_files should drop to 1.

However, I believe the file source will maintain open file handles to all matching files, even if it isn't actively reading them. Related: #10005

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: observability Anything related to monitoring/observing Vector source: file Anything `file` source related type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

2 participants