Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elasticsearchv8 cluster_settings unmarshal error #840

Open
Jjungs7 opened this issue Dec 25, 2023 · 4 comments
Open

elasticsearchv8 cluster_settings unmarshal error #840

Jjungs7 opened this issue Dec 25, 2023 · 4 comments

Comments

@Jjungs7
Copy link

Jjungs7 commented Dec 25, 2023

Related issue: #509

I was testing elasticsearch_exporter on elasticsearch-8.x and found unmarshal error with --collector.clustersettings flag on

  1. Environment
    • elasticsearch_exporter: v1.7.0
    • elasticsearch: 8.11.3 (and all versions that include the new setting cluster.routing.allocation.disk.watermark.low.max_headroom"
    • golang: 1.21.3
  2. Steps to reproduce
# Prepare elasticsearch_exportetr
go build .
./elasticsearch_exporter --es.uri http://localhost:9200 --log.level debug --collector.clustersettings --es.all --es.indices_settings --es.shards

# run elasticsearchv8 instance
docker run -d --name elasticsearch -p 9200:9200 -e "discovery.type=single-node" -e"xpack.security.enabled=false" elasticsearch:8.11.3

# Change default settings. This removes the "cluster.routing.allocation.disk.watermark.low" field in the Defaults section
curl -XPUT -H"Content-Type: application/json" http://localhost:9200/_cluster/settings -d'{"transient":{"cluster.routing.allocation.disk.watermark.low": "90%"}}'
  1. Expected result: "elasticsearch_clustersettings_allocation_watermark_low_ratio" metric in /metrics
# HELP elasticsearch_clustersettings_allocation_watermark_low_ratio Low watermark for disk usage as a ratio.
# TYPE elasticsearch_clustersettings_allocation_watermark_low_ratio gauge
elasticsearch_clustersettings_allocation_watermark_low_ratio 0.9
  1. Actual result: unmarshal error shown in elasticsearch_exporter logs and the expected metric is not exported in /metrics
level=error ts=2023-12-25T09:56:11.492789Z caller=collector.go:189 msg="collector failed" name=clustersettings duration_seconds=0.015642208 err="json: cannot unmarshal object into Go struct field clusterSettingsWatermark.defaults.cluster.routing.allocation.disk.watermark.low of type string"
  1. Possible solution: apply flat_settings: true when invoking "GET _cluster/settings" and refactor the struct to match the flat_settings format

Example:

// clusterSettingsSection is a representation of a Elasticsearch Cluster Settings
type clusterSettingsSection struct {
Cluster clusterSettingsCluster `json:"cluster"`
}
// clusterSettingsCluster is a representation of a Elasticsearch clusterSettingsCluster Settings
type clusterSettingsCluster struct {
Routing clusterSettingsRouting `json:"routing"`
// This can be either a JSON object (which does not contain the value we are interested in) or a string
MaxShardsPerNode interface{} `json:"max_shards_per_node"`
}
// clusterSettingsRouting is a representation of a Elasticsearch Cluster shard routing configuration
type clusterSettingsRouting struct {
Allocation clusterSettingsAllocation `json:"allocation"`
}
// clusterSettingsAllocation is a representation of a Elasticsearch Cluster shard routing allocation settings
type clusterSettingsAllocation struct {
Enabled string `json:"enable"`
Disk clusterSettingsDisk `json:"disk"`
}
// clusterSettingsDisk is a representation of a Elasticsearch Cluster shard routing disk allocation settings
type clusterSettingsDisk struct {
ThresholdEnabled string `json:"threshold_enabled"`
Watermark clusterSettingsWatermark `json:"watermark"`
}
// clusterSettingsWatermark is representation of Elasticsearch Cluster shard routing disk allocation watermark settings
type clusterSettingsWatermark struct {
FloodStage string `json:"flood_stage"`
High string `json:"high"`
Low string `json:"low"`
}

// clusterSettingsResponse is a representation of a Elasticsearch Cluster Settings
type clusterSettingsResponse struct {
	Defaults   clusterSettingsSection `json:"defaults"`
	Persistent clusterSettingsSection `json:"persistent"`
	Transient  clusterSettingsSection `json:"transient"`
}

// clusterSettingsSection is a representation of a Elasticsearch Cluster Settings
type clusterSettingsSection struct {
	ClusterMaxShardsPerNode                         string `json:"cluster.max_shards_per_node"`
	ClusterRoutingAllocationBalanceDiskUsage        string `json:"cluster.routing.allocation.balance.disk_usage"`
	ClusterRoutingAllocationBalanceIndex            string `json:"cluster.routing.allocation.balance.index"`
	ClusterRoutingAllocationBalanceShard            string `json:"cluster.routing.allocation.balance.shard"`
	ClusterRoutingAllocationBalanceThreshold        string `json:"cluster.routing.allocation.balance.threshold"`
	ClusterRoutingAllocationBalanceWriteLoad        string `json:"cluster.routing.allocation.balance.write_load"`
	ClusterRoutingAllocationEnable                  string `json:"cluster.routing.allocation.enable"`
	ClusterRoutingAllocationDiskThresholdEnabled    string `json:"cluster.routing.allocation.disk.threshold_enabled"`
	ClusterRoutingAllocationDiskWatermarkFloodStage string `json:"cluster.routing.allocation.disk.watermark.flood_stage"`
	ClusterRoutingAllocationDiskWatermarkHigh       string `json:"cluster.routing.allocation.disk.watermark.high"`
	ClusterRoutingAllocationDiskWatermarkLow        string `json:"cluster.routing.allocation.disk.watermark.low"`
}

q.Set("include_defaults", "true")

...
u := c.u.ResolveReference(&url.URL{Path: "_cluster/settings"})
q := u.Query()
q.Set("flat_settings", "true")
q.Set("include_defaults", "true")
...
@Skunnyk
Copy link

Skunnyk commented Feb 1, 2024

I can reproduce the problem, on elasticsearch_exporter 1.7.0, with ES 8.12.0 and ES 7.17;

By setting only cluster.routing.allocation.disk.watermark.low, the unmarshall error happens, but it's ok on another cluster, and it fails there with cluster.routing.allocation.disk.watermark.flood_stage 🤔

@sysadmind
Copy link
Contributor

It would be really helpful to have an example of the API response from elasticsearch to use in our tests. cluster_settings_test.go has tests to cover the cluster settings endpoint including the watermark metrics. From what I understand, based on the provided error message, the problem is that instead of a string, there is now a nested json object being returned for the watermark metrics.

This could be a scenario where we need to use different structs based on elasticsearch version or customize the unmarshal.

@Skunnyk
Copy link

Skunnyk commented Feb 26, 2024

Hello! Here is an extract of a simple _cluster/settings route on a ES 7.17 cluster where I have the problem:

{                        
  "persistent" : {                     
    "cluster" : {
      "routing" : {
        "allocation" : {
          "disk" : {                  
            "watermark" : {               
              "low" : "88%",
              "flood_stage" : "100%",
              "high" : "93%"
            }                
          }
        }
      }
    }
  }
}

AFAIK, the format/hierarchy is the same with ES8, and haven't changed for ages 🤔

@Skunnyk
Copy link

Skunnyk commented Feb 26, 2024

Ooooh, by querying it with include_defaults=true (like the exporter), we can see the following in the defaults section:
ES 7.17:

          "disk" : {
            "threshold_enabled" : "true",
            "watermark" : {
              "enable_for_single_data_node" : "false",
              "flood_stage" : {
                "frozen" : "95%",
                "frozen.max_headroom" : "20GB"
              }
            },
            "include_relocations" : "true",
            "reroute_interval" : "60s"

ES 8.12:

          "disk" : {
            "threshold_enabled" : "true",
            "reroute_interval" : "60s",
            "watermark" : {
              "flood_stage" : {
                "frozen" : "95%",
                "frozen.max_headroom" : "20GB",
                "max_headroom" : "-1"
              },
              "high" : {
                "max_headroom" : "-1"
              },
              "low" : {
                "max_headroom" : "-1"
              },
              "enable_for_single_data_node" : "true"
            }
          },

So we can have an entry in persistent as cluster.routing.allocation.disk.watermark.low, and in defaults as cluster.routing.allocation.disk.watermark.low.max_headroom, can the problem comes from here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants