[Bug]: NaN behavior changes in compressed tables #6884

jakehedlund · 2024-05-03T18:45:34Z

What type of bug is this?

Incorrect result

What subsystems and features are affected?

Compression

What happened?

Summary

When data in a compressed chunk is queried for NaN values using a WHERE clause, the clause has no effect.

This is significant when running aggregate queries; for example, avg() will return NaN for an entire time bucket if just one NaN is contained therein. As a workaround, NaN values can be ignored by adding ... WHERE value <> 'NaN' ... but this only is working on uncompressed chunks.

Details

Table DDL:

CREATE TABLE public.sensor_data (
	"time" timestamp NOT NULL,
	sensor_id int4 NOT NULL,
	value float8 NULL,
	eng_val float8 NULL,
	CONSTRAINT sens_pkey PRIMARY KEY ("time", sensor_id)
);

SELECT create_hypertable('sensor_data', 'time', 
	chunk_time_interval => INTERVAL '2 hours', 
	partitioning_column => 'sensor_id', 
	number_partitions => 6 
	);

Sample data:

INSERT INTO public.sensor_data ("time", sensor_id, value, eng_val)
VALUES('2024-01-01 01:59:57.000', 1, 'Nan'::double precision, NULL);
INSERT INTO public.sensor_data ("time", sensor_id, value, eng_val)
VALUES('2024-01-01 01:59:56.000', 1, 'Nan'::double precision, NULL);
INSERT INTO public.sensor_data ("time", sensor_id, value, eng_val)
VALUES('2024-01-01 01:59:55.000', 1, 49.99952403604293, NULL);
INSERT INTO public.sensor_data ("time", sensor_id, value, eng_val)
VALUES('2024-01-01 01:59:54.000', 1, 49.999314612380545, NULL);
INSERT INTO public.sensor_data ("time", sensor_id, value, eng_val)
VALUES('2024-01-01 01:59:53.000', 1, 'Inf'::double precision, NULL);
INSERT INTO public.sensor_data ("time", sensor_id, value, eng_val)
VALUES('2024-01-01 01:59:52.000', 1, 49.99878153520547, NULL);
INSERT INTO public.sensor_data ("time", sensor_id, value, eng_val)
VALUES('2024-01-01 01:59:51.000', 1, 49.99845788216983, NULL);
INSERT INTO public.sensor_data ("time", sensor_id, value, eng_val)
VALUES('2024-01-01 01:59:50.000', 1, 49.99809615313412, NULL);

Run the below query examples both before and after compressing the chunk, or insert more data that will fall into a different chunk.

To compress the chunk:

select compress_chunk(c, true, true) 
from show_chunks('monitor_data', 
  older_than => timestamp '2024-01-02', 
  newer_than => timestamp '2024-01-01') c;

Query examples:

select * from sensor_data where time between '2024-01-01' and '2024-01-02' and value = 'inf'::numeric;
- Result: returns all 'Infinity' values (correct)
select * from sensor_data where time between '2024-01-01' and '2024-01-02' and value = 'nan'::numeric;
- Result: Returns nothing if chuck is compressed. Returns rows where value is 'nan' if chunk is uncompressed.
- Expected result: all rows with NaN as a value are returned.
select * from sensor_data where time between '2024-01-01' and '2024-01-02' and value <> 'nan'::numeric;
- Returns everything, including NaN values, if chunk is compressed.
- Expected result: returns everything except NaN rows.

Inconsistent behavior occurs when updating into a compressed chunk: the updates succeed, but then the above example queries behave as expected. As in, they return/ignore NaN values when asked to. I'm assuming this is because TimescaleDB temporarily decompresses the chunk/slice.

I believe the root cause is due to inconsistent assumptions regarding the equality of NaN: in PostgreSQL, NaN == NaN. This can be shown by running select 'nan'::numeric = 'nan'::numeric as isnan;. However, the IEEE 754 spec indicates that NaN should NOT equal NaN (which seems to be the behavior when querying compressed chunks).

See PostgreSQL note (https://www.postgresql.org/docs/14/datatype-numeric.html):

IEEE 754 specifies that NaN should not compare equal to any other floating-point value (including NaN). In order to allow floating-point values to be sorted and used in tree-based indexes, PostgreSQL treats NaN values as equal, and greater than all non-NaN values.

Workaround

As a workaround, the nullif() function can be used in the where clause. For example:

select *
from sensor_data 
where time between '2024-01-01' and '2024-01-02' and sensor_id = 1
and not nullif(value, 'nan') is null   -- works
-- and value <> 'nan'   -- fails
order by 1 ;

Whereas using ...and value <> 'nan' ... fails (returns NaNs when it should exclude them).

The positive (... and nullif(value, 'nan') is null behaves as expected.

Defining functions like isnan() (create or replace function isnan(double precision) returns boolean as $$ select $1 > 0 and -$1 > 0 $$ language sql; fail as well.

TimescaleDB version affected

2.14.2

PostgreSQL version used

14.11

What operating system did you use?

RHEL7, 64 bit

What installation method did you use?

RPM

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

See description.

The text was updated successfully, but these errors were encountered:

akuzm · 2024-05-06T09:50:36Z

Sorry about that, I think it's a bug in implementation of vectorized filters in TimescaleDB. I didn't realize Postgres used a different convention for NaNs. Nice workaround, there's also a more heavy-handed way to work around it by disabling the bulk decompression and therefore all vectorization with set timescaledb.enable_bulk_decompression to off.

jakehedlund added the bug label May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: NaN behavior changes in compressed tables #6884

[Bug]: NaN behavior changes in compressed tables #6884

jakehedlund commented May 3, 2024 •

edited

akuzm commented May 6, 2024

[Bug]: NaN behavior changes in compressed tables #6884

[Bug]: NaN behavior changes in compressed tables #6884

Comments

jakehedlund commented May 3, 2024 • edited

What type of bug is this?

What subsystems and features are affected?

What happened?

Summary

Details

Table DDL:

Sample data:

Query examples:

Workaround

TimescaleDB version affected

PostgreSQL version used

What operating system did you use?

What installation method did you use?

What platform did you run on?

Relevant log output and stack trace

How can we reproduce the bug?

akuzm commented May 6, 2024

jakehedlund commented May 3, 2024 •

edited