add service cmd flag and custom labels, add process metric (process_gorup_count) add process custom labels #1194

peekjef72 · 2023-04-22T14:13:42Z

replace previous PR PR#1180, PR#1185

This PR adds some feature to service collector: (see service)

add config parameter collector.service.services-list with a comma separated list of service names:

collector:
 service:
   services-list: windows_exporter, winRM, pushprox_client, Dhcp

the param will only be checked if services-where is not set.
the list will be used to build a service-where query based on Name and is equivalent to:

collector:
  service:
    services-where: "Name = 'elmt_1' or Name = "elmt_X' or ..."

add a new parameter that can only be set in config file:
It allows to set any number of custom labels value for each service:
e.g.:

collector:
  service:
    services:
      windows_exporter:
        application: prometheus
        custom1: val1
      pushprox_client:
        application: prometheus
        custom1: val1
      winRM:
        application: windows
        custom1: val2
      Dhcp:
        application: windows
        custom1: val3

Use case: allow to build a generic Prometheus alert on not running service and to use the specified associated labels to drive a specific behavior. For me they are used to route for a specific documentation based on context.

Label's names must be identical for each service. Not identical labels names are removed !
This parameter as a lower priority than service-where and service-list.
It accepts a dict (see above example) or a list; in this case it behaves like services-list.
e.g.:

collector:
  service:
    services:
      - dhcp
      - pushprox_client
      - windows_exporter
      - winRM

Then generic alert services:

groups:
- name: Window Server Alerts
  rules:

  # Sends an alert when the 'sqlserveragent' service is not in the running state for 3 minutes.
  - alert: WindowServiceNotRunning
    expr: windows_service_state{state="running"} == 0
    for: 3m
    labels:
      severity: high
    annotations:
      summary: "Service {{ $labels.name }} for {{ $labels.application }} down for 3 min."
      description: "Service {{ $labels.name }} for Application {{ $labels.application }} on instance {{ $labels.instance }} has been down for more than 3 minutes."

This PR adds some feature to process collector:
It allows to defined "process group" and to set any number of custom labels value for each group:
e.g.: (see process)

  collector:
    process:
      processes:
      browsers:
        include: "(?i)(firefox|chrome).*"
        exclude: "(?i)safari"
        application: browsers
        custom1: val4
      Visual Studio Code:
        include: "(?i)code.*"
        application: "vscode"
        custom1: val5

the custom labels will be added to each windows_process_metrics

# HELP windows_process_handles Total number of handles the process has open. This number is the sum of the handles currently open by each thread in the process.
# TYPE windows_process_handles gauge
windows_process_handles{application="browsers",creating_process_id="15988",custom1="val4",process="firefox",process_id="11184"} 257
windows_process_handles{application="browsers",creating_process_id="15988",custom1="val4",process="firefox",process_id="11588"} 256
windows_process_handles{application="browsers",creating_process_id="15988",custom1="val4",process="firefox",process_id="12428"} 320
windows_process_handles{application="browsers",creating_process_id="15988",custom1="val4",process="firefox",process_id="12536"} 333
windows_process_handles{application="browsers",creating_process_id="15988",custom1="val4",process="firefox",process_id="12544"} 323
windows_process_handles{application="browsers",creating_process_id="15988",custom1="val4",process="firefox",process_id="13620"} 346
windows_process_handles{application="browsers",creating_process_id="15988",custom1="val4",process="firefox",process_id="16700"} 366
....
windows_process_handles{application="vscode",creating_process_id="10392",custom1="val5",process="Code",process_id="9172"} 2474
windows_process_handles{application="vscode",creating_process_id="22356",custom1="val5",process="Code",process_id="12076"} 182
windows_process_handles{application="vscode",creating_process_id="22356",custom1="val5",process="Code",process_id="13124"} 186
windows_process_handles{application="vscode",creating_process_id="22356",custom1="val5",process="Code",process_id="24212"} 247
windows_process_handles{application="vscode",creating_process_id="9172",custom1="val5",process="Code",process_id="10664"} 296
windows_process_handles{application="vscode",creating_process_id="9172",custom1="val5",process="Code",process_id="11028"} 229
windows_process_handles{application="vscode",creating_process_id="9172",custom1="val5",process="Code",process_id="13516"} 236

It also adds a metric named windows_process_group_count that count number of matching processes for each group:

# HELP windows_process_group_count Number of processes found for the matching patterns.
# TYPE windows_process_group_count gauge
windows_process_group_count{application="browsers",custom1="val4",group="browsers"} 29
windows_process_group_count{application="vscode",custom1="val5",group="Visual Studio Code"} 12

This metric should be used to define a generic alert if occurrence of the metric is equal to 0, instead of a specific alert absent(windows_process_handles{process="firefox"} == 1

jkroepke · 2023-04-23T10:13:38Z

Use case: allow to build a generic Prometheus alert on not running service and to use the specified associated labels to drive a specific behavior. For me they are used to route for a specific documentation based on context.

Why you are not using the relabel config for this?

metric_relabel_configs:
- source_labels: [__name__, name]
  regex: windows_service_state;windows_exporter
  target_label: custom1
  replacement: val1

peekjef72 · 2023-04-23T16:45:25Z

Use case: allow to build a generic Prometheus alert on not running service and to use the specified associated labels to drive a specific behavior. For me they are used to route for a specific documentation based on context.

Why you are not using the relabel config for this?
metric_relabel_configs:
- source_labels: [__name__, name]
  regex: windows_service_state;windows_exporter
  target_label: custom1
  replacement: val1

It is easier for us to provide specific configuration datas for each hosts we have: they are generated using ansible and deployed with winRM module.
We have more than 500 hosts, with specific "application" switchs; metrics relabel rules must to be specific for service, and host.
But you are right it is possible to do so!

peekjef72 · 2023-06-18T09:38:11Z

Can't understand why it was closed...

ordimans · 2023-10-19T09:21:01Z

Can we do same thing for process ?
I have several windows exporter on several computer.
But i don't know how filter metrics for one server on grafana, because there is not hostname or custom label for all metrics.
I am wrong ?

peekjef72 · 2023-10-21T12:42:02Z

Can we do same thing for process ? I have several windows exporter on several computer. But i don't know how filter metrics for one server on grafana, because there is not hostname or custom label for all metrics. I am wrong ?

Of course it's possible with a little work, but it's useless unless the current PR is accepted!
And right now it doesn't, so it probably never will!
In addition, the base (master branch) has changed since the publication and requires a code review so that the PR can be accepted...

Signed-off-by: Peekjef72 <67902897+peekjef72@users.noreply.github.com>

peekjef72 · 2023-10-23T15:58:16Z

Can we do same thing for process ? I have several windows exporter on several computer. But i don't know how filter metrics for one server on grafana, because there is not hostname or custom label for all metrics. I am wrong ?

Of course it's possible with a little work, but it's useless unless the current PR is accepted! And right now it doesn't, so it probably never will! In addition, the base (master branch) has changed since the publication and requires a code review so that the PR can be accepted...

I've rebased the code/branch for service collector.
I've added a part for process collector (see doc on dev branch).

Signed-off-by: Peekjef72 <67902897+peekjef72@users.noreply.github.com>

jkroepke · 2024-01-16T17:40:45Z

Hi, you can use the relabel feature of prometheus to set custom labels. It should be preferred over an custom logic in this exporter.

peekjef72 · 2024-01-28T10:58:44Z

Hi, you can use the relabel feature of prometheus to set custom labels. It should be preferred over an custom logic in this exporter.
You as you already mentioned, you are right it is possible to do so. But only if you have 10 hosts to collect or do everything by hand.

Now imagine having 3 thousand hosts! I no longer share your point of view.

Almost every server has its own configuration containing the processes and services to monitor. This configuration is contained in a file, generally generated by ansible from a CMDB (a windows_exporter file and a job part).
So we have two files for each host : one for windows_exporter and one for prometheus config

In the proposed solution, you would therefore need to have a Prometheus configuration file with a gigantic relabel rule for the job containing the labels to be positioned by host and by service evaluated at each scrape for each host!
With the PR solution, which remains optional, it is enough to position the labels once for the host concerned during its initial generation.

To sum up:

on the one hand it is necessary:
- generate the configuration only once for the windows_exporter, for which the labels do not matter and do not interfere with scraping.
to add or remove a host, remove the file (file_sd_configs.files) from the Prometheus side
no evaluation, therefore additional CPU, therefore electricity (yes green IT you know ;)!)
the other
- for each host, the global rule containing the metric_relabel_configs must be regenerated for the job
or have a job per host with a specific rule.
in all cases at each scrape the rules must be evaluated costing CPU and therefore electricity.(Modifié)Restaurer la traduction d'origine

jkroepke · 2024-01-28T11:34:38Z

Hi,

I understand your case.

In summarize grafana-agent is a better solution for you, since you could do relabeling on a local mache + you could eliminate your Prometheus configuration by enable the Remote Write Endpoint. Grafana Agent also has remote config capabilities which reduce the complexity on ansible to install only. That should work for 3000 hosts and that is what we do with approx. ~950 at the moment.

The main purpose of windows-exporter is gathering metrics. Adding additional business logic like the process group increases the complexity. Gathering metrics under windows is already complex enough, since there are a lot API (WMI, perfcounter, Registry) to obverse.

In any case, to continue here, merge conflicts needs to be resolved and custom labels and process group features should be separate pull requests.

peekjef72 · 2024-01-28T15:49:50Z

Thanks for the reply.

In any case, to continue here, merge conflicts needs to be resolved and custom labels and process group features should be separate pull requests.

I am not sure to continue the development of the branch if there is no chance that the additions will one day be taken into account.
so should I continue?

peekjef72 requested a review from a team as a code owner April 22, 2023 14:13

peekjef72 closed this Jun 18, 2023

peekjef72 force-pushed the add_service_custom_labels_v2 branch 2 times, most recently from 11543a9 to 6ba0297 Compare June 18, 2023 09:10

peekjef72 reopened this Jun 18, 2023

peekjef72 closed this Oct 21, 2023

peekjef72 force-pushed the add_service_custom_labels_v2 branch from a2274d0 to 6ba0297 Compare October 21, 2023 15:56

add service and process custom labels

7abc192

Signed-off-by: Peekjef72 <67902897+peekjef72@users.noreply.github.com>

peekjef72 reopened this Oct 23, 2023

peekjef72 changed the title ~~add service cmd flag and custom labels~~ add service cmd flag and custom labels, add process metric (process_gorup_count) add process custom labels Oct 23, 2023

peekjef72 added 2 commits October 23, 2023 17:26

fix golangci-lit and spell errors

f07c46e

Signed-off-by: Peekjef72 <67902897+peekjef72@users.noreply.github.com>

fix spelling error in comment

73466bf

Signed-off-by: Peekjef72 <67902897+peekjef72@users.noreply.github.com>

peekjef72 added 2 commits November 12, 2023 12:08

add debug

ff0e11b

Signed-off-by: Peekjef72 <67902897+peekjef72@users.noreply.github.com>

fix

a78f275

Signed-off-by: Peekjef72 <67902897+peekjef72@users.noreply.github.com>

github-actions bot added the Stale label Apr 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add service cmd flag and custom labels, add process metric (process_gorup_count) add process custom labels #1194

add service cmd flag and custom labels, add process metric (process_gorup_count) add process custom labels #1194

peekjef72 commented Apr 22, 2023 •

edited

jkroepke commented Apr 23, 2023

peekjef72 commented Apr 23, 2023

peekjef72 commented Jun 18, 2023

ordimans commented Oct 19, 2023

peekjef72 commented Oct 21, 2023

peekjef72 commented Oct 23, 2023

jkroepke commented Jan 16, 2024

peekjef72 commented Jan 28, 2024

jkroepke commented Jan 28, 2024 •

edited

peekjef72 commented Jan 28, 2024

add service cmd flag and custom labels, add process metric (process_gorup_count) add process custom labels #1194

Are you sure you want to change the base?

add service cmd flag and custom labels, add process metric (process_gorup_count) add process custom labels #1194

Conversation

peekjef72 commented Apr 22, 2023 • edited

jkroepke commented Apr 23, 2023

peekjef72 commented Apr 23, 2023

peekjef72 commented Jun 18, 2023

ordimans commented Oct 19, 2023

peekjef72 commented Oct 21, 2023

peekjef72 commented Oct 23, 2023

jkroepke commented Jan 16, 2024

peekjef72 commented Jan 28, 2024

jkroepke commented Jan 28, 2024 • edited

peekjef72 commented Jan 28, 2024

peekjef72 commented Apr 22, 2023 •

edited

jkroepke commented Jan 28, 2024 •

edited