Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Slow query when trying to retrieve tuples used in chunk segmentation #6863

Open
chicco785 opened this issue Apr 25, 2024 · 2 comments

Comments

@chicco785
Copy link

What type of bug is this?

Performance issue

What subsystems and features are affected?

Compression, Query executor, Query planner

What happened?

I have a compressed hypertable that stores data from 55 devices every 20ms.

CREATE TABLE phasor_values (
    "timestamp" timestamp with time zone default now() not null,
    "mRID" uuid not null default uuid_generate_v4(),
    producer_id text,
    measurement_id TEXT not null,
    magnitude_sensor_accuracy real,
    angle_sensor_accuracy real,
    magnitude real not null,
    angle real not null,
    estimated BOOLEAN not null default false
);

SELECT create_hypertable('phasor_values',  by_range('timestamp', INTERVAL '1h'));

The table is compressed as follows:

ALTER TABLE phasor_values
SET (
    timescaledb.compress,
    timescaledb.compress_segmentby='measurement_id,estimated,producer_id',
    timescaledb.compress_orderby='timestamp DESC'
);

I am looking for an efficient query to extract the distinct couples of measurement_id and producer_id over a time range:

e.g.

SELECT measurement_id, producer_id
         FROM phasor_values
	     WHERE timestamp >= '2023-04-23T15:40:58.601Z' and timestamp <= '2024-04-25T15:40:58.601Z' and estimated = false
GROUP by measurement_id, producer_id

or

SELECT DISTINCT measurement_id, producer_id
         FROM phasor_values
	     WHERE timestamp >= '2023-04-23T15:40:58.601Z' and timestamp <= '2024-04-25T15:40:58.601Z' and estimated = false

These queries take more than 13 minutes to execute on my system, while given that I suppose finding the tuples should be possible just by looking at the indexes created by chunk segmentation, the query should be much faster :/

Group  (cost=15853018.64..15853232.06 rows=877 width=23)
  Group Key: _hyper_5_37_chunk.measurement_id, _hyper_5_37_chunk.producer_id
  ->  Gather Merge  (cost=15853018.64..15853223.29 rows=1754 width=23)
        Workers Planned: 2
        ->  Sort  (cost=15852018.62..15852020.81 rows=877 width=23)
              Sort Key: _hyper_5_37_chunk.measurement_id, _hyper_5_37_chunk.producer_id
              ->  Partial HashAggregate  (cost=15851966.98..15851975.75 rows=877 width=23)
                    Group Key: _hyper_5_37_chunk.measurement_id, _hyper_5_37_chunk.producer_id
                    ->  Parallel Append  (cost=0.06..7976631.98 rows=1575067000 width=23)
                          ->  Custom Scan (DecompressChunk) on _hyper_5_37_chunk  (cost=0.07..4291.12 rows=65875000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_53_chunk  (cost=0.00..4291.12 rows=65875 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_127_chunk  (cost=0.06..4270.12 rows=65875000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_143_chunk  (cost=0.00..4270.12 rows=65875 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_4_chunk  (cost=0.06..4191.88 rows=65525000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_16_chunk  (cost=0.00..4191.88 rows=65525 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_8_chunk  (cost=0.06..4191.88 rows=65525000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_22_chunk  (cost=0.00..4191.88 rows=65525 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_15_chunk  (cost=0.06..4191.88 rows=65525000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_28_chunk  (cost=0.00..4191.88 rows=65525 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_19_chunk  (cost=0.06..4191.88 rows=65525000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_35_chunk  (cost=0.00..4191.88 rows=65525 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_25_chunk  (cost=0.06..4191.88 rows=65525000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_42_chunk  (cost=0.00..4191.88 rows=65525 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_31_chunk  (cost=0.06..4191.88 rows=65525000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_48_chunk  (cost=0.00..4191.88 rows=65525 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_67_chunk  (cost=0.06..4191.88 rows=65525000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_83_chunk  (cost=0.00..4191.88 rows=65525 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_73_chunk  (cost=0.06..4191.88 rows=65525000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_89_chunk  (cost=0.00..4191.88 rows=65525 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_79_chunk  (cost=0.06..4191.88 rows=65525000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_96_chunk  (cost=0.00..4191.88 rows=65525 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_85_chunk  (cost=0.06..4191.88 rows=65525000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_101_chunk  (cost=0.00..4191.88 rows=65525 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_91_chunk  (cost=0.06..4191.88 rows=65525000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_106_chunk  (cost=0.00..4191.88 rows=65525 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_97_chunk  (cost=0.06..4191.88 rows=65525000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_114_chunk  (cost=0.00..4191.88 rows=65525 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_103_chunk  (cost=0.06..4191.88 rows=65525000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_120_chunk  (cost=0.00..4191.88 rows=65525 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_109_chunk  (cost=0.06..4191.88 rows=65525000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_126_chunk  (cost=0.00..4191.88 rows=65525 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_115_chunk  (cost=0.06..4191.88 rows=65525000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_131_chunk  (cost=0.00..4191.88 rows=65525 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_121_chunk  (cost=0.06..4191.88 rows=65525000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_137_chunk  (cost=0.00..4191.88 rows=65525 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_134_chunk  (cost=0.06..4191.88 rows=65525000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_146_chunk  (cost=0.00..4191.88 rows=65525 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_43_chunk  (cost=0.06..4213.12 rows=65875000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_59_chunk  (cost=0.00..4213.12 rows=65875 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_50_chunk  (cost=0.06..4213.12 rows=65875000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_65_chunk  (cost=0.00..4213.12 rows=65875 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_55_chunk  (cost=0.06..4213.12 rows=65875000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_70_chunk  (cost=0.00..4213.12 rows=65875 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_61_chunk  (cost=0.06..4213.12 rows=65875000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_76_chunk  (cost=0.00..4213.12 rows=65875 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_1_chunk  (cost=0.05..3732.89 rows=75126000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_10_chunk  (cost=0.00..3732.89 rows=75126 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                          ->  Custom Scan (DecompressChunk) on _hyper_5_139_chunk  (cost=0.05..888.46 rows=17898000 width=23)
                                Filter: (NOT estimated)
                                Vectorized Filter: (("timestamp" >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND ("timestamp" <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
                                ->  Parallel Seq Scan on compress_hyper_12_149_chunk  (cost=0.00..888.46 rows=17898 width=76)
                                      Filter: ((_ts_meta_max_1 >= '2023-04-23 15:40:58.601+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2024-04-25 15:40:58.601+00'::timestamp with time zone))
JIT:
  Functions: 104
  Options: Inlining true, Optimization true, Expressions true, Deforming true

TimescaleDB version affected

2.13.0

PostgreSQL version used

15.5

What operating system did you use?

Ubuntu 22.04 x64

What installation method did you use?

Docker

What platform did you run on?

Amazon Web Services (AWS)

Relevant log output and stack trace

No response

How can we reproduce the bug?

See above data on table structure and query.
@chicco785 chicco785 added the bug label Apr 25, 2024
@antekresic
Copy link
Contributor

Hi! Sorry for the late reply here.

Looks like you are hitting a bug that's being addressed in version 2.15. Here is the link to the specific PR.

I'm keeping the ticket open since I'm not 100% sure this will solve all your performance problems here but its a step in the right direction. Would you mind updating us when 2.15 lands and you are able to test it? Release should be happening very soon (within the next week I believe).

Hope this helps!

@chicco785
Copy link
Author

Hi! Sorry for the late reply here.

Looks like you are hitting a bug that's being addressed in version 2.15. Here is the link to the specific PR.

I'm keeping the ticket open since I'm not 100% sure this will solve all your performance problems here but its a step in the right direction. Would you mind updating us when 2.15 lands and you are able to test it? Release should be happening very soon (within the next week I believe).

Hope this helps!

Thanks, we will surely test!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants