Skip to content

Releases: cortexproject/cortex

Cortex 1.10.0-rc.0

21 Jul 14:28
Compare
Choose a tag to compare
Cortex 1.10.0-rc.0 Pre-release
Pre-release

This was a release candidate for 1.10.0.

1.9.0 / 2021-05-14

18 May 10:51
v1.9.0
ed4f339
Compare
Choose a tag to compare

This release contains 131 contributions from 28 authors. Thank you!

Highlights

  • We have several exciting features become stable: Shuffle-sharding, querying chunks and blocks store simultaneously, lazy mmap-ing of block indexes, etc.
  • Several query and ingest performance improvements!
  • Tons of bugfixes and optimisations!

Changelog

  • [CHANGE] Alertmanager now removes local files after Alertmanager is no longer running for removed or resharded user. #3910
  • [CHANGE] Alertmanager now stores local files in per-tenant folders. Files stored by Alertmanager previously are migrated to new hierarchy. Support for this migration will be removed in Cortex 1.11. #3910
  • [CHANGE] Ruler: deprecated -ruler.storage.* CLI flags (and their respective YAML config options) in favour of -ruler-storage.*. The deprecated config will be removed in Cortex 1.11. #3945
  • [CHANGE] Alertmanager: deprecated -alertmanager.storage.* CLI flags (and their respective YAML config options) in favour of -alertmanager-storage.*. This change doesn't apply to -alertmanager.storage.path and -alertmanager.storage.retention. The deprecated config will be removed in Cortex 1.11. #4002
  • [CHANGE] Alertmanager: removed -cluster. CLI flags deprecated in Cortex 1.7. The new config options to use are: #3946
    • -alertmanager.cluster.listen-address instead of -cluster.listen-address
    • -alertmanager.cluster.advertise-address instead of -cluster.advertise-address
    • -alertmanager.cluster.peers instead of -cluster.peer
    • -alertmanager.cluster.peer-timeout instead of -cluster.peer-timeout
  • [CHANGE] Blocks storage: removed the config option -blocks-storage.bucket-store.index-cache.postings-compression-enabled, which was deprecated in Cortex 1.6. Postings compression is always enabled. #4101
  • [CHANGE] Querier: removed the config option -store.max-look-back-period, which was deprecated in Cortex 1.6 and was used only by the chunks storage. You should use -querier.max-query-lookback instead. #4101
  • [CHANGE] Query Frontend: removed the config option -querier.compress-http-responses, which was deprecated in Cortex 1.6. You should use-api.response-compression-enabled instead. #4101
  • [CHANGE] Runtime-config / overrides: removed the config options -limits.per-user-override-config (use -runtime-config.file) and -limits.per-user-override-period (use -runtime-config.reload-period), both deprecated since Cortex 0.6.0. #4112
  • [CHANGE] Cortex now fails fast on startup if unable to connect to the ring backend. #4068
  • [FEATURE] The following features have been marked as stable: #4101
    • Shuffle-sharding
    • Querier support for querying chunks and blocks store at the same time
    • Tracking of active series and exporting them as metrics (-ingester.active-series-metrics-enabled and related flags)
    • Blocks storage: lazy mmap of block indexes in the store-gateway (-blocks-storage.bucket-store.index-header-lazy-loading-enabled)
    • Ingester: close idle TSDB and remove them from local disk (-blocks-storage.tsdb.close-idle-tsdb-timeout)
  • [FEATURE] Memberlist: add TLS configuration options for the memberlist transport layer used by the gossip KV store. #4046
    • New flags added for memberlist communication:
      • -memberlist.tls-enabled
      • -memberlist.tls-cert-path
      • -memberlist.tls-key-path
      • -memberlist.tls-ca-path
      • -memberlist.tls-server-name
      • -memberlist.tls-insecure-skip-verify
  • [FEATURE] Ruler: added local backend support to the ruler storage configuration under the -ruler-storage. flag prefix. #3932
  • [ENHANCEMENT] Upgraded Docker base images to alpine:3.13. #4042
  • [ENHANCEMENT] Blocks storage: reduce ingester memory by eliminating series reference cache. #3951
  • [ENHANCEMENT] Ruler: optimized <prefix>/api/v1/rules and <prefix>/api/v1/alerts when ruler sharding is enabled. #3916
  • [ENHANCEMENT] Ruler: added the following metrics when ruler sharding is enabled: #3916
    • cortex_ruler_clients
    • cortex_ruler_client_request_duration_seconds
  • [ENHANCEMENT] Alertmanager: Add API endpoint to list all tenant alertmanager configs: GET /multitenant_alertmanager/configs. #3529
  • [ENHANCEMENT] Ruler: Add API endpoint to list all tenant ruler rule groups: GET /ruler/rule_groups. #3529
  • [ENHANCEMENT] Query-frontend/scheduler: added querier forget delay (-query-frontend.querier-forget-delay and -query-scheduler.querier-forget-delay) to mitigate the blast radius in the event queriers crash because of a repeatedly sent "query of death" when shuffle-sharding is enabled. #3901
  • [ENHANCEMENT] Query-frontend: reduced memory allocations when serializing query response. #3964
  • [ENHANCEMENT] Querier / ruler: some optimizations to PromQL query engine. #3934 #3989
  • [ENHANCEMENT] Ingester: reduce CPU and memory when an high number of errors are returned by the ingester on the write path with the blocks storage. #3969 #3971 #3973
  • [ENHANCEMENT] Distributor: reduce CPU and memory when an high number of errors are returned by the distributor on the write path. #3990
  • [ENHANCEMENT] Put metric before label value in the "label value too long" error message. #4018
  • [ENHANCEMENT] Allow use of y|w|d suffixes for duration related limits and per-tenant limits. #4044
  • [ENHANCEMENT] Query-frontend: Small optimization on top of PR #3968 to avoid unnecessary Extents merging. #4026
  • [ENHANCEMENT] Add a metric cortex_compactor_compaction_interval_seconds for the compaction interval config value. #4040
  • [ENHANCEMENT] Ingester: added following per-ingester (instance) experimental limits: max number of series in memory (-ingester.instance-limits.max-series), max number of users in memory (-ingester.instance-limits.max-tenants), max ingestion rate (-ingester.instance-limits.max-ingestion-rate), and max inflight requests (-ingester.instance-limits.max-inflight-push-requests). These limits are only used when using blocks storage. Limits can also be configured using runtime-config feature, and current values are exported as cortex_ingester_instance_limits metric. #3992.
  • [ENHANCEMENT] Cortex is now built with Go 1.16. #4062
  • [ENHANCEMENT] Distributor: added per-distributor experimental limits: max number of inflight requests (-distributor.instance-limits.max-inflight-push-requests) and max ingestion rate in samples/sec (-distributor.instance-limits.max-ingestion-rate). If not set, these two are unlimited. Also added metrics to expose current values (cortex_distributor_inflight_push_requests, cortex_distributor_ingestion_rate_samples_per_second) as well as limits (cortex_distributor_instance_limits with various limit label values). #4071
  • [ENHANCEMENT] Ruler: Added -ruler.enabled-tenants and -ruler.disabled-tenants to explicitly enable or disable rules processing for specific tenants. #4074
  • [ENHANCEMENT] Block Storage Ingester: /flush now accepts two new parameters: tenant to specify tenant to flush and wait=true to make call synchronous. Multiple tenants can be specified by repeating tenant parameter. If no tenant is specified, all tenants are flushed, as before. #4073
  • [ENHANCEMENT] Alertmanager: validate configured -alertmanager.web.external-url and fail if ends with /. #4081
  • [ENHANCEMENT] Alertmanager: added -alertmanager.receivers-firewall.block.cidr-networks and -alertmanager.receivers-firewall.block.private-addresses to block specific network addresses in HTTP-based Alertmanager receiver integrations. #4085
  • [ENHANCEMENT] Allow configuration of Cassandra's host selection policy. #4069
  • [ENHANCEMENT] Store-gateway: retry synching blocks if a per-tenant sync fails. #3975 #4088
  • [ENHANCEMENT] Add metric cortex_tcp_connections exposing the current number of accepted TCP connections. #4099
  • [ENHANCEMENT] Querier: Allow federated queries to run concurrently. #4065
  • [ENHANCEMENT] Label Values API call now supports match[] parameter when querying blocks on storage (assuming -querier.query-store-for-labels-enabled is enabled). #4133
  • [BUGFIX] Ruler-API: fix bug where /api/v1/rules/<namespace>/<group_name> endpoint return 400 instead of 404. #4013
  • [BUGFIX] Distributor: reverted changes done to rate limiting in #3825. #3948
  • [BUGFIX] Ingester: Fix race condition when opening and closing tsdb concurrently. #3959
  • [BUGFIX] Querier: streamline tracing spans. #3924
  • [BUGFIX] Ruler Storage: ignore objects with empty namespace or group in the name. #3999
  • [BUGFIX] Distributor: fix issue causing distributors to not extend the replication set because of failing instances when zone-aware replication is enabled. #3977
  • [BUGFIX] Query-frontend: Fix issue where cached entry size keeps increasing when making tiny query repeatedly. #3968
  • [BUGFIX] Compactor: -compactor.blocks-retention-period now supports weeks (w) and years (y). #4027
  • [BUGFIX] Querier: returning 422 (instead of 500) when query hits max_chunks_per_query limit with block storage, when the limit is hit in the store-gateway. #3937
  • [BUGFIX] Ruler: Rule group limit enforcement should now allow the same number of rules in a group as the limit. #3616
  • [BUGFIX] Frontend, Query-scheduler: allow querier to notify about shutdown without providing any authentication. #4066
  • [BUGFIX] Querier: fixed race condition causing queries to fail right after querier startup with the "empty ring" error. #4068
  • [BUGFIX] Compactor: Increment cortex_compactor_runs_failed_total if compactor failed compact a single tenant. #4094
  • [BUGFIX] Tracing: hot fix to avoid the Jaeger tracing client to indefinitely block the Cortex process shutdown in case the HTTP connection to the tracing backend is blocked. #4134
  • [BUGFIX] Forward proper EndsAt from ruler to Alertmanager inline with Prometheus behaviour. #4017

Blocksconvert

  • [ENHANCEMENT] Builder: add `-builder.timestamp-...
Read more

Cortex 1.9.0-rc.0

29 Apr 14:17
v1.9.0-rc.0
bf75b7f
Compare
Choose a tag to compare
Cortex 1.9.0-rc.0 Pre-release
Pre-release

This was a release candidate for 1.9.0.

Cortex 1.8.1

27 Apr 12:08
v1.8.1
4afaa35
Compare
Choose a tag to compare

1.8.1 / 2021-04-27

  • [CHANGE] Fix for CVE-2021-31232: Local file disclosure vulnerability when -experimental.alertmanager.enable-api is used. The HTTP basic auth password_file can be used as an attack vector to send any file content via a webhook. The alertmanager templates can be used as an attack vector to send any file content because the alertmanager can load any text file specified in the templates list.

Cortex 1.7.1

27 Apr 12:08
v1.7.1
06bbda1
Compare
Choose a tag to compare

1.7.1 / 2021-04-27

  • [CHANGE] Fix for CVE-2021-31232: Local file disclosure vulnerability when -experimental.alertmanager.enable-api is used. The HTTP basic auth password_file can be used as an attack vector to send any file content via a webhook. The alertmanager templates can be used as an attack vector to send any file content because the alertmanager can load any text file specified in the templates list.

Cortex 1.8.0

25 Mar 13:27
v1.8.0
51662ea
Compare
Choose a tag to compare

Cortex 1.8.0 features 122 contributions by 35 authors. Thank you!

Highlights

  • Automatic deletion of old blocks with configurable per-tenant retention
  • Introduction of new storage options in Ruler and Alertmanager, using bucket client from Thanos. Previous storage options will be deprecated in next release.
  • New thanosconvert tool to migrate Thanos or Prometheus block metadata to Cortex
  • Support for @ <timestamp> in PromQL (needs to be enabled by flag)
  • Configurable per-tenant server-side encryption for S3
  • Work on sharding Alertmanager continues (not finished yet)

Changelog

  • [CHANGE] Alertmanager: Don't expose cluster information to tenants via the /alertmanager/api/v1/status API endpoint when operating with clustering enabled. #3903
  • [CHANGE] Ingester: don't update internal "last updated" timestamp of TSDB if tenant only sends invalid samples. This affects how "idle" time is computed. #3727
  • [CHANGE] Require explicit flag -<prefix>.tls-enabled to enable TLS in GRPC clients. Previously it was enough to specify a TLS flag to enable TLS validation. #3156
  • [CHANGE] Query-frontend: removed -querier.split-queries-by-day (deprecated in Cortex 0.4.0). Please use -querier.split-queries-by-interval instead. #3813
  • [CHANGE] Store-gateway: the chunks pool controlled by -blocks-storage.bucket-store.max-chunk-pool-bytes is now shared across all tenants. #3830
  • [CHANGE] Ingester: return error code 400 instead of 429 when per-user/per-tenant series/metadata limits are reached. #3833
  • [CHANGE] Compactor: add reason label to cortex_compactor_blocks_marked_for_deletion_total metric. Source blocks marked for deletion by compactor are labelled as compaction, while blocks passing the retention period are labelled as retention. #3879
  • [CHANGE] Alertmanager: the DELETE /api/v1/alerts is now idempotent. No error is returned if the alertmanager config doesn't exist. #3888
  • [FEATURE] Experimental Ruler Storage: Add a separate set of configuration options to configure the ruler storage backend under the -ruler-storage. flag prefix. All blocks storage bucket clients and the config service are currently supported. Clients using this implementation will only be enabled if the existing -ruler.storage flags are left unset. #3805 #3864
  • [FEATURE] Experimental Alertmanager Storage: Add a separate set of configuration options to configure the alertmanager storage backend under the -alertmanager-storage. flag prefix. All blocks storage bucket clients and the config service are currently supported. Clients using this implementation will only be enabled if the existing -alertmanager.storage flags are left unset. #3888
  • [FEATURE] Adds support to S3 server-side encryption using KMS. The S3 server-side encryption config can be overridden on a per-tenant basis for the blocks storage, ruler and alertmanager. Deprecated -<prefix>.s3.sse-encryption, please use the following CLI flags that have been added. #3651 #3810 #3811 #3870 #3886 #3906
    • -<prefix>.s3.sse.type
    • -<prefix>.s3.sse.kms-key-id
    • -<prefix>.s3.sse.kms-encryption-context
  • [FEATURE] Querier: Enable @ <timestamp> modifier in PromQL using the new -querier.at-modifier-enabled flag. #3744
  • [FEATURE] Overrides Exporter: Add overrides-exporter module for exposing per-tenant resource limit overrides as metrics. It is not included in all target (single-binary mode), and must be explicitly enabled. #3785
  • [FEATURE] Experimental thanosconvert: introduce an experimental tool thanosconvert to migrate Thanos block metadata to Cortex metadata. #3770
  • [FEATURE] Alertmanager: It now shards the /api/v1/alerts API using the ring when sharding is enabled. #3671
    • Added -alertmanager.max-recv-msg-size (defaults to 16M) to limit the size of HTTP request body handled by the alertmanager.
    • New flags added for communication between alertmanagers:
      • -alertmanager.max-recv-msg-size
      • -alertmanager.alertmanager-client.remote-timeout
      • -alertmanager.alertmanager-client.tls-enabled
      • -alertmanager.alertmanager-client.tls-cert-path
      • -alertmanager.alertmanager-client.tls-key-path
      • -alertmanager.alertmanager-client.tls-ca-path
      • -alertmanager.alertmanager-client.tls-server-name
      • -alertmanager.alertmanager-client.tls-insecure-skip-verify
  • [FEATURE] Compactor: added blocks storage per-tenant retention support. This is configured via -compactor.retention-period, and can be overridden on a per-tenant basis. #3879
  • [ENHANCEMENT] Queries: Instrument queries that were discarded due to the configured max_outstanding_requests_per_tenant. #3894
    • cortex_query_frontend_discarded_requests_total
    • cortex_query_scheduler_discarded_requests_total
  • [ENHANCEMENT] Ruler: Add TLS and explicit basis authentication configuration options for the HTTP client the ruler uses to communicate with the alertmanager. #3752
    • -ruler.alertmanager-client.basic-auth-username: Configure the basic authentication username used by the client. Takes precedent over a URL configured username.
    • -ruler.alertmanager-client.basic-auth-password: Configure the basic authentication password used by the client. Takes precedent over a URL configured password.
    • -ruler.alertmanager-client.tls-ca-path: File path to the CA file.
    • -ruler.alertmanager-client.tls-cert-path: File path to the TLS certificate.
    • -ruler.alertmanager-client.tls-insecure-skip-verify: Boolean to disable verifying the certificate.
    • -ruler.alertmanager-client.tls-key-path: File path to the TLS key certificate.
    • -ruler.alertmanager-client.tls-server-name: Expected name on the TLS certificate.
  • [ENHANCEMENT] Ingester: exposed metric cortex_ingester_oldest_unshipped_block_timestamp_seconds, tracking the unix timestamp of the oldest TSDB block not shipped to the storage yet. #3705
  • [ENHANCEMENT] Prometheus upgraded. #3739 #3806
    • Avoid unnecessary runtime.GC() during compactions.
    • Prevent compaction loop in TSDB on data gap.
  • [ENHANCEMENT] Query-Frontend now returns server side performance metrics using Server-Timing header when query stats is enabled. #3685
  • [ENHANCEMENT] Runtime Config: Add a mode query parameter for the runtime config endpoint. /runtime_config?mode=diff now shows the YAML runtime configuration with all values that differ from the defaults. #3700
  • [ENHANCEMENT] Distributor: Enable downstream projects to wrap distributor push function and access the deserialized write requests berfore/after they are pushed. #3755
  • [ENHANCEMENT] Add flag -<prefix>.tls-server-name to require a specific server name instead of the hostname on the certificate. #3156
  • [ENHANCEMENT] Alertmanager: Remove a tenant's alertmanager instead of pausing it as we determine it is no longer needed. #3722
  • [ENHANCEMENT] Blocks storage: added more configuration options to S3 client. #3775
    • -blocks-storage.s3.tls-handshake-timeout: Maximum time to wait for a TLS handshake. 0 means no limit.
    • -blocks-storage.s3.expect-continue-timeout: The time to wait for a server's first response headers after fully writing the request headers if the request has an Expect header. 0 to send the request body immediately.
    • -blocks-storage.s3.max-idle-connections: Maximum number of idle (keep-alive) connections across all hosts. 0 means no limit.
    • -blocks-storage.s3.max-idle-connections-per-host: Maximum number of idle (keep-alive) connections to keep per-host. If 0, a built-in default value is used.
    • -blocks-storage.s3.max-connections-per-host: Maximum number of connections per host. 0 means no limit.
  • [ENHANCEMENT] Ingester: when tenant's TSDB is closed, Ingester now removes pushed metrics-metadata from memory, and removes metadata (cortex_ingester_memory_metadata, cortex_ingester_memory_metadata_created_total, cortex_ingester_memory_metadata_removed_total) and validation metrics (cortex_discarded_samples_total, cortex_discarded_metadata_total). #3782
  • [ENHANCEMENT] Distributor: cleanup metrics for inactive tenants. #3784
  • [ENHANCEMENT] Ingester: Have ingester to re-emit following TSDB metrics. #3800
    • cortex_ingester_tsdb_blocks_loaded
    • cortex_ingester_tsdb_reloads_total
    • cortex_ingester_tsdb_reloads_failures_total
    • cortex_ingester_tsdb_symbol_table_size_bytes
    • cortex_ingester_tsdb_storage_blocks_bytes
    • cortex_ingester_tsdb_time_retentions_total
  • [ENHANCEMENT] Querier: distribute workload across -store-gateway.sharding-ring.replication-factor store-gateway replicas when querying blocks and -store-gateway.sharding-enabled=true. #3824
  • [ENHANCEMENT] Distributor / HA Tracker: added cleanup of unused elected HA replicas from KV store. Added following metrics to monitor this process: #3809
    • cortex_ha_tracker_replicas_cleanup_started_total
    • cortex_ha_tracker_replicas_cleanup_marked_for_deletion_total
    • cortex_ha_tracker_replicas_cleanup_deleted_total
    • cortex_ha_tracker_replicas_cleanup_delete_failed_total
  • [ENHANCEMENT] Ruler now has new API endpoint /ruler/delete_tenant_config that can be used to delete all ruler groups for tenant. It is intended to be used by administrators who wish to clean up state after removed user. Note that this endpoint is enabled regardless of -experimental.ruler.enable-api. #3750 #3899
  • [ENHANCEMENT] Query-frontend, query-scheduler: cleanup metrics for inactive tenants. #3826
  • [ENHANCEMENT] Blocks storage: added -blocks-storage.s3.region support to S3 client configuration. #3811
  • [ENHANCEMENT] Distributor: Remove cached subrings for inactive users when using shuffle sharding. #3849
  • [ENHANCEMENT] Store-gateway: Reduced memory used to fetch chunks at query time. #3855
  • [ENHANCEMENT] Ingester: attempt to prevent idle compaction from happening in concurrent ingesters by introducing a 25% jitter to the configu...
Read more

Cortex 1.8.0-rc.1

15 Mar 16:49
4aa2783
Compare
Choose a tag to compare
Cortex 1.8.0-rc.1 Pre-release
Pre-release

Changes from 1.8.0-rc.0:

  • [BUGFIX] Distributor: reverted changes done to rate limiting in #3825. #3948

Cortex 1.8.0-rc.0

08 Mar 09:55
v1.8.0-rc.0
da31295
Compare
Choose a tag to compare
Cortex 1.8.0-rc.0 Pre-release
Pre-release

This was a release candidate for 1.8.0.

Cortex 1.7.0

23 Feb 18:26
v1.7.0
3a3015e
Compare
Choose a tag to compare

Changelog

Cortex

Note the blocks storage compactor runs a migration task at startup in this version, which can take many minutes and use a lot of RAM.
Turn this off after first run.

  • [CHANGE] FramedSnappy encoding support has been removed from Push and Remote Read APIs. This means Prometheus 1.6 support has been removed and the oldest Prometheus version supported in the remote write is 1.7. #3682
  • [CHANGE] Ruler: removed the flag -ruler.evaluation-delay-duration-deprecated which was deprecated in 1.4.0. Please use the ruler_evaluation_delay_duration per-tenant limit instead. #3694
  • [CHANGE] Removed the flags -<prefix>.grpc-use-gzip-compression which were deprecated in 1.3.0: #3694
    • -query-scheduler.grpc-client-config.grpc-use-gzip-compression: use -query-scheduler.grpc-client-config.grpc-compression instead
    • -frontend.grpc-client-config.grpc-use-gzip-compression: use -frontend.grpc-client-config.grpc-compression instead
    • -ruler.client.grpc-use-gzip-compression: use -ruler.client.grpc-compression instead
    • -bigtable.grpc-use-gzip-compression: use -bigtable.grpc-compression instead
    • -ingester.client.grpc-use-gzip-compression: use -ingester.client.grpc-compression instead
    • -querier.frontend-client.grpc-use-gzip-compression: use -querier.frontend-client.grpc-compression instead
  • [CHANGE] Querier: it's not required to set -frontend.query-stats-enabled=true in the querier anymore to enable query statistics logging in the query-frontend. The flag is now required to be configured only in the query-frontend and it will be propagated to the queriers. #3595 #3695
  • [CHANGE] Blocks storage: compactor is now required when running a Cortex cluster with the blocks storage, because it also keeps the bucket index updated. #3583
  • [CHANGE] Blocks storage: block deletion marks are now stored in a per-tenant global markers/ location too, other than within the block location. The compactor, at startup, will copy deletion marks from the block location to the global location. This migration is required only once, so you can safely disable it via -compactor.block-deletion-marks-migration-enabled=false once new compactor has successfully started once in your cluster. #3583
  • [CHANGE] OpenStack Swift: the default value for the -ruler.storage.swift.container-name and -swift.container-name config options has changed from cortex to empty string. If you were relying on the default value, you should set it back to cortex. #3660
  • [CHANGE] HA Tracker: configured replica label is now verified against label value length limit (-validation.max-length-label-value). #3668
  • [CHANGE] Distributor: extend_writes field in YAML configuration has moved from lifecycler (inside ingester_config) to distributor_config. This doesn't affect command line option -distributor.extend-writes, which stays the same. #3719
  • [CHANGE] Alertmanager: Deprecated -cluster. CLI flags in favor of their -alertmanager.cluster. equivalent. The deprecated flags (and their respective YAML config options) are: #3677
    • -cluster.listen-address in favor of -alertmanager.cluster.listen-address
    • -cluster.advertise-address in favor of -alertmanager.cluster.advertise-address
    • -cluster.peer in favor of -alertmanager.cluster.peers
    • -cluster.peer-timeout in favor of -alertmanager.cluster.peer-timeout
  • [CHANGE] Blocks storage: the default value of -blocks-storage.bucket-store.sync-interval has been changed from 5m to 15m. #3724
  • [FEATURE] Querier: Queries can be federated across multiple tenants. The tenants IDs involved need to be specified separated by a | character in the X-Scope-OrgID request header. This is an experimental feature, which can be enabled by setting -tenant-federation.enabled=true on all Cortex services. #3250
  • [FEATURE] Alertmanager: introduced the experimental option -alertmanager.sharding-enabled to shard tenants across multiple Alertmanager instances. This feature is still under heavy development and its usage is discouraged. The following new metrics are exported by the Alertmanager: #3664
    • cortex_alertmanager_ring_check_errors_total
    • cortex_alertmanager_sync_configs_total
    • cortex_alertmanager_sync_configs_failed_total
    • cortex_alertmanager_tenants_discovered
    • cortex_alertmanager_tenants_owned
  • [ENHANCEMENT] Allow specifying JAEGER_ENDPOINT instead of sampling server or local agent port. #3682
  • [ENHANCEMENT] Blocks storage: introduced a per-tenant bucket index, periodically updated by the compactor, used to avoid full bucket scanning done by queriers, store-gateways and rulers. The bucket index is updated by the compactor during blocks cleanup, on every -compactor.cleanup-interval. #3553 #3555 #3561 #3583 #3625 #3711 #3715
  • [ENHANCEMENT] Blocks storage: introduced an option -blocks-storage.bucket-store.bucket-index.enabled to enable the usage of the bucket index in the querier, store-gateway and ruler. When enabled, the querier, store-gateway and ruler will use the bucket index to find a tenant's blocks instead of running the periodic bucket scan. The following new metrics are exported by the querier and ruler: #3614 #3625
    • cortex_bucket_index_loads_total
    • cortex_bucket_index_load_failures_total
    • cortex_bucket_index_load_duration_seconds
    • cortex_bucket_index_loaded
  • [ENHANCEMENT] Compactor: exported the following metrics. #3583 #3625
    • cortex_bucket_blocks_count: Total number of blocks per tenant in the bucket. Includes blocks marked for deletion, but not partial blocks.
    • cortex_bucket_blocks_marked_for_deletion_count: Total number of blocks per tenant marked for deletion in the bucket.
    • cortex_bucket_blocks_partials_count: Total number of partial blocks.
    • cortex_bucket_index_last_successful_update_timestamp_seconds: Timestamp of the last successful update of a tenant's bucket index.
  • [ENHANCEMENT] Ruler: Add cortex_prometheus_last_evaluation_samples to expose the number of samples generated by a rule group per tenant. #3582
  • [ENHANCEMENT] Memberlist: add status page (/memberlist) with available details about memberlist-based KV store and memberlist cluster. It's also possible to view KV values in Go struct or JSON format, or download for inspection. #3575
  • [ENHANCEMENT] Memberlist: client can now keep a size-bounded buffer with sent and received messages and display them in the admin UI (/memberlist) for troubleshooting. #3581 #3602
  • [ENHANCEMENT] Blocks storage: added block index attributes caching support to metadata cache. The TTL can be configured via -blocks-storage.bucket-store.metadata-cache.block-index-attributes-ttl. #3629
  • [ENHANCEMENT] Alertmanager: Add support for Azure blob storage. #3634
  • [ENHANCEMENT] Compactor: tenants marked for deletion will now be fully cleaned up after some delay since deletion of last block. Cleanup includes removal of remaining marker files (including tenant deletion mark file) and files under debug/metas. #3613
  • [ENHANCEMENT] Compactor: retry compaction of a single tenant on failure instead of re-running compaction for all tenants. #3627
  • [ENHANCEMENT] Querier: Implement result caching for tenant query federation. #3640
  • [ENHANCEMENT] API: Add a mode query parameter for the config endpoint: #3645
    • /config?mode=diff: Shows the YAML configuration with all values that differ from the defaults.
    • /config?mode=defaults: Shows the YAML configuration with all the default values.
  • [ENHANCEMENT] OpenStack Swift: added the following config options to OpenStack Swift backend client: #3660
    • Chunks storage: -swift.auth-version, -swift.max-retries, -swift.connect-timeout, -swift.request-timeout.
    • Blocks storage: -blocks-storage.swift.auth-version, -blocks-storage.swift.max-retries, -blocks-storage.swift.connect-timeout, -blocks-storage.swift.request-timeout.
    • Ruler: -ruler.storage.swift.auth-version, -ruler.storage.swift.max-retries, -ruler.storage.swift.connect-timeout, -ruler.storage.swift.request-timeout.
  • [ENHANCEMENT] Disabled in-memory shuffle-sharding subring cache in the store-gateway, ruler and compactor. This should reduce the memory utilisation in these services when shuffle-sharding is enabled, without introducing a significantly increase CPU utilisation. #3601
  • [ENHANCEMENT] Shuffle sharding: optimised subring generation used by shuffle sharding. #3601
  • [ENHANCEMENT] New /runtime_config endpoint that returns the defined runtime configuration in YAML format. The returned configuration includes overrides. #3639
  • [ENHANCEMENT] Query-frontend: included the parameter name failed to validate in HTTP 400 message. #3703
  • [ENHANCEMENT] Fail to startup Cortex if provided runtime config is invalid. #3707
  • [ENHANCEMENT] Alertmanager: Add flags to customize the cluster configuration: #3667
    • -alertmanager.cluster.gossip-interval: The interval between sending gossip messages. By lowering this value (more frequent) gossip messages are propagated across cluster more quickly at the expense of increased bandwidth usage.
    • -alertmanager.cluster.push-pull-interval: The interval between gossip state syncs. Setting this interval lower (more frequent) will increase convergence speeds across larger clusters at the expense of increased bandwidth usage.
  • [ENHANCEMENT] Distributor: change the error message returned when a received series has too many label values. The new message format has the series at the end and this plays better with Prometheus logs truncation. #3718
    • From: sample for '<series>' has <value> label names; limit <value>
    • To: series has too many labels (actual: <value>, limit: <value>) series: '<series>'
  • [ENHANCEMENT] Improve bucket index loader to...
Read more

Cortex 1.7.0-rc.2

19 Feb 21:42
v1.7.0-rc.2
a87f171
Compare
Choose a tag to compare
Cortex 1.7.0-rc.2 Pre-release
Pre-release
  • [BUGFIX] Handle missing samples due to large steps and single point extents. #3818 #3835