/v2 and /v2/candidates endpoints not respecting broker partition pruning - range partition #16222

ColeAtCharter · 2024-04-01T17:59:10Z

This is to report unexpected behavior where the broker selects too many segments to query. The identified scenario is when secondary partition information exists (eg, for range partitioning) but is not being used in conjunction with a query filter which corresponds to the secondary partitions. The result of not completely pruning segments at the broker is a detrimental impact on system operations such as for performance and cost.

Affected Version

28

Description

The /druid/v2 and /druid/v2/candidates broker endpoints are returning segments which should be filtered out based on secondary partition metadata. Returning unneeded segments for planning and executing queries will cause unnecessary I/O throughout the system, causing avoidable detriment to cost and performance.

Steps to reproduce:

create segments with range partitioning, mark as used, load onto historicals. Testing will require multiple segments per combination of datasource and time chunk.
Identify the partition column value(s) corresponding to a single segment (within a datasource/time chunk)
submit /v2 and /v2/candidates requests to the broker for the datasource and time chunk identified. Add a query filter using a partition value previously identified for the first partition column.

Expected/observed behavior summary

The expected behavior is that the response will return only the segment identified and no others for the loaded datasource/time chunk.
The observed behavior is that the response will include additional segments for the loaded datasource/time chunk.

Testing notes

Testing used a segment where the partition/filter column was not a setwise start or end (ie, not indicated with a negative or positive infinity value for the column). Further, to reduce ambiguity, a partition filter value was chosen that was not the first or last for the column within the segment.
The test setup had multiple partition columns in the queried segments but only filtered on the first partition column
Testing both included and omitted the query context value "secondaryPartitionPruning"
Some testing included "bySegment" in the query context to assess broker logic for selecting segments
Testing only queried a single time chunk (aligning with segment granularity). This shouldn't be strictly necessary, but reproductions of the test may need to at least ensure that all segments in the time chunk have the same partitioning.

Representation of the test setup

segments for "src1"

there are multiple segments for the tested time chunk

segment_id
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_1
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_2

server segments for datasource "src1"

2x replication - similar to a typical production setup, but not strictly required for testing

segment_id,server
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z,host1:8283
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z,host2:8283
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_1,host1:8283
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_1,host3:8283
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_2,host2:8283
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_2,host3:8283

segment metadata - range partition column values

for testing we can assume that partitionDimensions are the first columns in dimensions, but that should not be strictly necessary

src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z: partitionDimensions: ["col_1","col_2"], start: [-inf, -inf], end: ["value1ghi","value2tuv"]
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_1: partitionDimensions: ["col_1","col_2"], start: ["value1ghi","value2jkl"], end: ["value1lmn", "value2mnop"]
src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_2: partitionDimensions: ["col_1","col_2"], start: ["value1lmn", "value2qrs"], end: [+inf, +inf]

test query

select the identified datasource/time chunk and filter on the first partition column using a value matching only one of the multiple segments

{
  "queryType": "scan",
  "dataSource": "src1",
  "intervals": "2025-02-16T01:00:00.000Z/2025-02-16T02:00:00.000Z",
  "columns": ["__time", "col_1", "col_15"],
  "filter": {
    "type": "selector",
    "dimension": "col_1",
    "value": "value1jkl"
  },
  "context": {"secondaryPartitionPruning": true}
}

Expected result example - /v2/candidates

[
  {
    "interval": "2025-02-16T01:00:00.000Z/2025-02-16T02:00:00.000Z",
    "version": "2024-04-01T23:59:59.999Z",
    "partitionNumber": 1,
    "size": 999999,
    "locations": [
      {
        "name": "host1:8283",
        "host": null,
        "hostAndTlsPort": "host1:8283",
        "maxSize": 999999999999,
        "type": "historical",
        "tier": "_default_tier",
        "priority": 0
      },
      {
        "name": "host3:8283",
        "host": null,
        "hostAndTlsPort": "host3:8283",
        "maxSize": 999999999999,
        "type": "historical",
        "tier": "tier2",
        "priority": 0
      }
    ]
  }
]

If partition pruning is reflected at the broker, it would be unexpected to receive back any of "the other two segments" (regardless of server/replication)

Unexpected result example 1: /druid/v2 endpoint - extra (but not all) segments are returned

expected only src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_1
(FYI) unexpected segments are loaded on different servers than the expected segments
the test adds "bySegment" to the query context

[
    {
        "timestamp": "2024-04-01T09:00:00.000Z",
        "result": {
            "results": [
                {
                    "segmentId": "src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_1",
                    "columns": ["__time", "col_1", "col_15"],
                    "events": [
                        {"__time": 1739667600000, "col_1": "value1jkl", "col_15": "hello"},
                        {"__time": 1739668980000, "col_1": "value1jkl", "col_15": "world"},
                        {"__time": 1739668980000, "col_1": "value1jkl", "col_15": "hello"},
                        {"__time": 1739670360000, "col_1": "value1jkl", "col_15": "world"}
                    ],
                    "rowSignature": [
                        {"name": "__time", "type": "LONG"},
                        {"name": "col_1", "type": "STRING"},
                        {"name": "col_15", "type": "STRING"}
                    ]
                }
            ],
            "segment": "src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_1",
            "interval": "2025-02-16T01:00:00.000Z/2025-02-16T02:00:00.000Z"
        }
    },
    {
        "timestamp": "2024-04-01T09:00:00.000Z",
        "result": {
            "results": [],
            "segment": "src1_2025-02-16T01:00:00.000Z_2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z",
            "interval": "2025-02-16T01:00:00.000Z/2025-02-16T02:00:00.000Z"
        }
    }
]

Unexpected result example 2: /druid/v2/candidates endpoint - all segments are returned

expected only src1_2025-02-16T01:00:00.000Z/2025-02-16T02:00:00.000Z_2024-04-01T23:59:59.999Z_1

[
  {
    "interval": "2025-02-16T01:00:00.000Z/2025-02-16T02:00:00.000Z",
    "version": "2024-04-01T23:59:59.999Z",
    "partitionNumber": 0,
    "size": 999999,
    "locations": [
      {
        "name": "host1:8283",
        "host": null,
        "hostAndTlsPort": "host1:8283",
        "maxSize": 999999999999,
        "type": "historical",
        "tier": "_default_tier",
        "priority": 0
      },
      {
        "name": "host2:8283",
        "host": null,
        "hostAndTlsPort": "host2:8283",
        "maxSize": 999999999999,
        "type": "historical",
        "tier": "tier2",
        "priority": 0
      }
    ]
  },
  {
    "interval": "2025-02-16T01:00:00.000Z/2025-02-16T02:00:00.000Z",
    "version": "2024-04-01T23:59:59.999Z",
    "partitionNumber": 1,
    "size": 999999,
    "locations": [
      {
        "name": "host1:8283",
        "host": null,
        "hostAndTlsPort": "host1:8283",
        "maxSize": 999999999999,
        "type": "historical",
        "tier": "_default_tier",
        "priority": 0
      },
      {
        "name": "host3:8283",
        "host": null,
        "hostAndTlsPort": "host3:8283",
        "maxSize": 999999999999,
        "type": "historical",
        "tier": "tier2",
        "priority": 0
      }
    ]
  },
  {
    "interval": "2025-02-16T01:00:00.000Z/2025-02-16T02:00:00.000Z",
    "version": "2024-04-01T23:59:59.999Z",
    "partitionNumber": 2,
    "size": 999999,
    "locations": [
      {
        "name": "host2:8283",
        "host": null,
        "hostAndTlsPort": "host2:8283",
        "maxSize": 999999999999,
        "type": "historical",
        "tier": "_default_tier",
        "priority": 0
      },
      {
        "name": "host3:8283",
        "host": null,
        "hostAndTlsPort": "host3:8283",
        "maxSize": 999999999999,
        "type": "historical",
        "tier": "tier2",
        "priority": 0
      }
    ]
  }
]

Summary

The broker should only select and/or query the smallest number of segments matching on datasource/time chunk and partition dimensions when it has enough has enough partition information about the used/loaded segments to prune out segments that don't match on partition dimensions (in addition to those segments not matching on datasource or time chunk)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/v2 and /v2/candidates endpoints not respecting broker partition pruning - range partition #16222

/v2 and /v2/candidates endpoints not respecting broker partition pruning - range partition #16222

ColeAtCharter commented Apr 1, 2024 •

edited

/v2 and /v2/candidates endpoints not respecting broker partition pruning - range partition #16222

/v2 and /v2/candidates endpoints not respecting broker partition pruning - range partition #16222

Comments

ColeAtCharter commented Apr 1, 2024 • edited

Affected Version

Description

Steps to reproduce:

Expected/observed behavior summary

Representation of the test setup

segments for "src1"

server segments for datasource "src1"

segment metadata - range partition column values

test query

Expected result example - /v2/candidates

Unexpected result example 1: /druid/v2 endpoint - extra (but not all) segments are returned

Unexpected result example 2: /druid/v2/candidates endpoint - all segments are returned

Summary

Related documentation

ColeAtCharter commented Apr 1, 2024 •

edited