Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exporter does not collect metrics from multiple databases #999

Open
freneticpony1995 opened this issue Jan 24, 2024 · 5 comments
Open

Exporter does not collect metrics from multiple databases #999

freneticpony1995 opened this issue Jan 24, 2024 · 5 comments

Comments

@freneticpony1995
Copy link

What did you do?

I've started postgres-exporter in my k8s cluster using official prometheus helmchart v5.3.0.
I serve connection information to exporter using environmental variable "DATA_SOURCE_NAME" containing connection strings for 23 host with Postgresql 13.

Example:
DATA_SOURCE_NAME=postgresql://username1:PASSWD1@server1:5432/service1db?sslmode=disable,postgresql://username1:PASSWD2@server2:5432/service2db?sslmode=disable ... etc

What did you expect to see?

I expected to see labels "datname" and "server" in all metrics on /metrics endpoint + all enabled metrics for all databases.

Example:

# HELP pg_stat_database_conflicts_confl_tablespace Number of queries in this database that have been canceled due to dropped tablespaces
# TYPE pg_stat_database_conflicts_confl_tablespace counter
pg_stat_database_conflicts_confl_tablespace{datid="20146",datname="service1db",server="FQDN1:5432"} 0
pg_stat_database_conflicts_confl_tablespace{datid="20366",datname="service2db",server="FQDN2:5432"} 0

What did you see instead? Under which circumstances?

I see some metrics has been exported only for 1 database (first in DATA_SOURCE_NAME list) without label "server" or even without any label.

Example:

# HELP pg_stat_database_deadlocks Number of deadlocks detected in this database
# TYPE pg_stat_database_deadlocks counter
pg_stat_database_deadlocks{datid="20146",datname="service1db"} 0

List of metrics with this issue:
pg_database_size_bytes
pg_locks_count
pg_stat_database_*
pg_stat_user_tables_*
pg_statio_user_tables_.*

List of metrics without issue:
pg_stat_database_conflicts_confl_*
pg_stat_archiver_*
pg_stat_activity_*
pg_settings_*
pg_stat_bgwriter_*

pg_stat_bgwriter_* metrics has been served without labels at all.

Example:

# HELP pg_stat_bgwriter_buffers_alloc_total Number of buffers allocated
# TYPE pg_stat_bgwriter_buffers_alloc_total counter
pg_stat_bgwriter_buffers_alloc_total 1.42966019e+08
# HELP pg_stat_bgwriter_buffers_backend_fsync_total Number of times a backend had to execute its own fsync call (normally the background writer handles those even when the backend does its own write)
# TYPE pg_stat_bgwriter_buffers_backend_fsync_total counter
pg_stat_bgwriter_buffers_backend_fsync_total 0
# HELP pg_stat_bgwriter_buffers_backend_total Number of buffers written directly by a backend
# TYPE pg_stat_bgwriter_buffers_backend_total counter
pg_stat_bgwriter_buffers_backend_total 5.7748244e+07
# HELP pg_stat_bgwriter_buffers_checkpoint_total Number of buffers written during checkpoints
# TYPE pg_stat_bgwriter_buffers_checkpoint_total counter
pg_stat_bgwriter_buffers_checkpoint_total 2.70184913e+08
# HELP pg_stat_bgwriter_buffers_clean_total Number of buffers written by the background writer
# TYPE pg_stat_bgwriter_buffers_clean_total counter
pg_stat_bgwriter_buffers_clean_total 1.977864e+06
# HELP pg_stat_bgwriter_checkpoint_sync_time_total Total amount of time that has been spent in the portion of checkpoint processing where files are synchronized to disk, in milliseconds
# TYPE pg_stat_bgwriter_checkpoint_sync_time_total counter
pg_stat_bgwriter_checkpoint_sync_time_total 226029
# HELP pg_stat_bgwriter_checkpoint_write_time_total Total amount of time that has been spent in the portion of checkpoint processing where files are written to disk, in milliseconds
# TYPE pg_stat_bgwriter_checkpoint_write_time_total counter
pg_stat_bgwriter_checkpoint_write_time_total 7.680013381e+09
# HELP pg_stat_bgwriter_checkpoints_req_total Number of requested checkpoints that have been performed
# TYPE pg_stat_bgwriter_checkpoints_req_total counter
pg_stat_bgwriter_checkpoints_req_total 780
# HELP pg_stat_bgwriter_checkpoints_timed_total Number of scheduled checkpoints that have been performed
# TYPE pg_stat_bgwriter_checkpoints_timed_total counter
pg_stat_bgwriter_checkpoints_timed_total 198817
# HELP pg_stat_bgwriter_maxwritten_clean_total Number of times the background writer stopped a cleaning scan because it had written too many buffers
# TYPE pg_stat_bgwriter_maxwritten_clean_total counter
pg_stat_bgwriter_maxwritten_clean_total 8284
# HELP pg_stat_bgwriter_stats_reset_total Time at which these statistics were last reset
# TYPE pg_stat_bgwriter_stats_reset_total counter
pg_stat_bgwriter_stats_reset_total 1.646306272e+09

Environment

  • System information:

k8s cluster

  • postgres_exporter version:
postgres_exporter, version 0.15.0 (branch: HEAD, revision: 68c176b8833b7580bf847cecf60f8e0ad5923f9a)
  build user:       root@88f74f2c2888
  build date:       20231027-14:38:56
  go version:       go1.21.3
  platform:         linux/amd64
  tags:             unknown
  • postgres_exporter flags:
'--config.file=/etc/postgres_exporter.yml'
'--web.listen-address=:9187'
'--log.level=debug'
'--collector.long_running_transactions'
'--collector.stat_activity_autovacuum'
'--collector.process_idle'
'--no-collector.wal'
  • PostgreSQL version: PostgreSQL 13
  • Logs:
    Logs does not contain errors
ts=2024-01-24T13:57:26.583Z caller=main.go:86 level=warn msg="Error loading config" err="Error opening config file \"/etc/postgres_exporter.yml\": open /etc/postgres_exporter.yml: no such file or directory"
ts=2024-01-24T13:57:26.583Z caller=proc.go:267 msg="Excluded databases" databases=[]
ts=2024-01-24T13:57:26.584Z caller=tls_config.go:274 level=info msg="Listening on" address=[::]:9187
ts=2024-01-24T13:57:26.584Z caller=tls_config.go:277 level=info msg="TLS is disabled." http2=false address=[::]:9187
ts=2024-01-24T13:58:02.930Z caller=server.go:74 level=info msg="Established new database connection" fingerprint=server970:5432
ts=2024-01-24T13:58:02.942Z caller=postgres_exporter.go:585 level=debug msg="Querying PostgreSQL version" server=server970:5432
ts=2024-01-24T13:58:02.942Z caller=postgres_exporter.go:613 level=info msg="Semantic version changed" server=server970:5432 from=0.0.0 to=13.6.0
ts=2024-01-24T13:58:02.942Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=procpid
ts=2024-01-24T13:58:02.943Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=sent_location
ts=2024-01-24T13:58:02.943Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=replay_location
ts=2024-01-24T13:58:02.943Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=flush_location
ts=2024-01-24T13:58:02.943Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=write_location
ts=2024-01-24T13:58:02.943Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=pg_xlog_location_diff
ts=2024-01-24T13:58:02.943Z caller=pg_setting.go:35 level=debug msg="Querying pg_setting view" server=server970:5432
ts=2024-01-24T13:58:02.944Z caller=collector.go:206 level=debug msg="collector succeeded" name=stat_activity_autovacuum duration_seconds=0.00232475
ts=2024-01-24T13:58:02.945Z caller=collector.go:206 level=debug msg="collector succeeded" name=replication duration_seconds=0.003091174
ts=2024-01-24T13:58:02.947Z caller=namespace.go:193 level=debug msg="Querying namespace" namespace=pg_stat_database_conflicts
ts=2024-01-24T13:58:02.948Z caller=namespace.go:193 level=debug msg="Querying namespace" namespace=pg_stat_replication
ts=2024-01-24T13:58:02.950Z caller=namespace.go:193 level=debug msg="Querying namespace" namespace=pg_replication_slots
ts=2024-01-24T13:58:02.951Z caller=namespace.go:193 level=debug msg="Querying namespace" namespace=pg_stat_archiver
ts=2024-01-24T13:58:02.952Z caller=namespace.go:193 level=debug msg="Querying namespace" namespace=pg_stat_activity
ts=2024-01-24T13:58:02.954Z caller=server.go:74 level=info msg="Established new database connection" fingerprint=server910:5432
ts=2024-01-24T13:58:02.956Z caller=pg_stat_database.go:280 level=debug collector=stat_database msg="Skipping collecting metric because it has no datname"
ts=2024-01-24T13:58:02.957Z caller=pg_stat_database.go:350 level=debug collector=stat_database msg="No metric for stats_reset, will collect 0 instead"
ts=2024-01-24T13:58:02.957Z caller=pg_stat_database.go:350 level=debug collector=stat_database msg="No metric for stats_reset, will collect 0 instead"
ts=2024-01-24T13:58:02.957Z caller=collector.go:206 level=debug msg="collector succeeded" name=stat_database duration_seconds=0.014814364
ts=2024-01-24T13:58:02.957Z caller=collector.go:206 level=debug msg="collector succeeded" name=stat_bgwriter duration_seconds=0.015447574
ts=2024-01-24T13:58:02.959Z caller=collector.go:206 level=debug msg="collector succeeded" name=long_running_transactions duration_seconds=0.016531943
ts=2024-01-24T13:58:02.974Z caller=postgres_exporter.go:585 level=debug msg="Querying PostgreSQL version" server=server910:5432
ts=2024-01-24T13:58:02.975Z caller=postgres_exporter.go:613 level=info msg="Semantic version changed" server=server910:5432 from=0.0.0 to=13.5.0
ts=2024-01-24T13:58:02.975Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=procpid
ts=2024-01-24T13:58:02.975Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=sent_location
ts=2024-01-24T13:58:02.975Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=replay_location
ts=2024-01-24T13:58:02.975Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=flush_location
ts=2024-01-24T13:58:02.975Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=write_location
ts=2024-01-24T13:58:02.975Z caller=postgres_exporter.go:287 level=debug msg="Column is being forced to discard due to version incompatibility" column=pg_xlog_location_diff
ts=2024-01-24T13:58:02.975Z caller=pg_setting.go:35 level=debug msg="Querying pg_setting view" server=server910:5432
ts=2024-01-24T13:58:02.979Z caller=collector.go:206 level=debug msg="collector succeeded" name=stat_user_tables duration_seconds=0.036695873
ts=2024-01-24T13:58:02.981Z caller=namespace.go:193 level=debug msg="Querying namespace" namespace=pg_stat_database_conflicts
ts=2024-01-24T13:58:02.982Z caller=collector.go:206 level=debug msg="collector succeeded" name=statio_user_tables duration_seconds=0.04011164
ts=2024-01-24T13:58:02.983Z caller=collector.go:206 level=debug msg="collector succeeded" name=replication_slot duration_seconds=0.040864841
ts=2024-01-24T13:58:02.986Z caller=collector.go:206 level=debug msg="collector succeeded" name=locks duration_seconds=0.043772689
ts=2024-01-24T13:58:02.989Z caller=collector.go:206 level=debug msg="collector succeeded" name=process_idle duration_seconds=0.046559469
ts=2024-01-24T13:58:02.993Z caller=namespace.go:193 level=debug msg="Querying namespace" namespace=pg_stat_replication
ts=2024-01-24T13:58:02.995Z caller=collector.go:206 level=debug msg="collector succeeded" name=database duration_seconds=0.053059736
ts=2024-01-24T13:58:02.996Z caller=namespace.go:193 level=debug msg="Querying namespace" namespace=pg_replication_slots
ts=2024-01-24T13:58:02.998Z caller=namespace.go:193 level=debug msg="Querying namespace" namespace=pg_stat_archiver
ts=2024-01-24T13:58:02.999Z caller=namespace.go:193 level=debug msg="Querying namespace" namespace=pg_stat_activity
@hajali-amine
Copy link

There's also all metrics following this pattern pg_stat_statements_* are only showing the values for the first datasource.

@oadekoya
Copy link

What is the update on this? I think deprecating the auto-discover-databases feature without any alternative solution is very disadvantageous.

@thampiotr
Copy link

It seems to be because of this logic, which takes only first DSN to instantiate the collector.

When I tried to instantiate multiple collectors in a fork I ran into issues with attempts to register multiple metrics. Seems like running an exporter per DB is the workaround for now.

@sysadmind
Copy link
Contributor

The best way to use one collector for metrics on several postgres instances is to use the multi-target support. We're in the (slow) process of standardizing the code and how the labels are applied.

@thampiotr
Copy link

Thanks for the context @sysadmind! We're looking forward to the new versions with standardized implementation.

Are all the metrics, including the ones defined in the postgres_exporter.go and custom metrics from a YAML file supported by the multi-target support? Or only those from collector package?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants