Fix update triggers issue caused by decompression #6903

antekresic · 2024-05-09T08:08:08Z

Update triggers were broken due to snapshot manipulation and usage of dynamic snapshots for scanning tuples. Using a static snapshot guarantees we cannot see tuples that would be invisible during an update trigger.

Fixes #6858

codecov · 2024-05-09T08:17:50Z

Codecov Report

Attention: Patch coverage is 86.95652% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 81.01%. Comparing base (59f50f2) to head (4bfbedb).
Report is 182 commits behind head on main.

❗ Current head 4bfbedb differs from pull request most recent head 156e1cc

Please upload reports for the commit 156e1cc to get more accurate results.

Files	Patch %	Lines
tsl/src/compression/compression.c	84.21%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6903      +/-   ##
==========================================
+ Coverage   80.06%   81.01%   +0.94%     
==========================================
  Files         190      199       +9     
  Lines       37181    37317     +136     
  Branches     9450     9740     +290     
==========================================
+ Hits        29770    30231     +461     
- Misses       2997     3186     +189     
+ Partials     4414     3900     -514

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fabriziomello · 2024-05-09T17:55:18Z

tsl/test/sql/compression_update_delete.sql

+SELECT 'f2ca7073-1395-5770-8378-7d0339804580', '2024-04-16 04:50:00+02',
+1100.00, '2024-04-23 11:56:38.494095+02' FROM generate_series(1,2500,1) c;
+
+VACUUM FULL update_trigger_test;


Why VACUUM FULL here??

The issue depends on the tuple storage on block level. This was the only way to get a consistent reproduction case.

akuzm · 2024-05-10T10:08:36Z

@antekresic can you explain more about this issue? What row was it trying to lock that was invisible?

akuzm · 2024-05-10T10:36:58Z

src/nodes/hypertable_modify.c

+			/* use a static copy of current transaction snapshot
+			 * this needs to be a copy so we don't read trigger updates
+			 */
+			estate->es_snapshot = RegisterSnapshot(GetTransactionSnapshot());


Do you mean that on the next update cycle, we're trying the tuple previously modified by the BEFORE UPDATE trigger on the previous cycle? And the "invisible" error happens in GetTupleForTrigger()?

I think normally the new tuples from the previous update cycles are skipped because they are TM_SelfModified and not TM_Invisible. This is determined by HeapTupleSatisfiesUpdate. So if we're getting TM_Invisible, this means we either forgot to update the command counter after modifying the tuple, or we're taken the new snapshot with command counter higher than the previous update. I guess here it's the latter and you prevent that by making a copy of the snapshot?

Normally the es_snapshot is initialized with a registered snapshot copy (standard_ExecutorStart), so this is probably correct. But I wonder if we should re-take the snapshot like this at all? Not sure if it's a standard practice. Normally the update code uses the es_output_cid as the current command id for locking the tuples, so maybe just incrementing the output cid after decompression would be enough?

Yep, exactly, what you said. By using a static snapshot which includes the decompressed tuples, we are making sure we see only those tuples and not the updated versions after this point.

You need to update the snapshot of the estate so the decompressed tuples (which are inserted in the same transaction) are included in the scan of the uncompressed chunk. That's the main reason for this.

But I wonder if we should re-take the snapshot like this at all? Not sure if it's a standard practice. Normally the update code uses the es_output_cid as the current command id for locking the tuples, so maybe just incrementing the output cid after decompression would be enough?

Looks like we must do it, otherwise we won't see the tuples we've decompressed. The es_output_cid comes into play later, when we do see the tuples and want to determine if they are updated by us.

akuzm · 2024-05-10T12:33:37Z

src/nodes/hypertable_modify.c

+		/* Modify snapshot only if something got decompressed */
+		if (ts_cm_functions->decompress_target_segments &&
+			ts_cm_functions->decompress_target_segments(ht_state))


OK, I think I managed to understand the rationale why we add the RegisterSnapshot below. But what about this new check, why is it needed? Your test passes without this change with just the RegisterSnapshot. I also double-checked that it fails w/o the RegisterSnapshot.

It seems reasonable to me that the update process should work even if we just increment command counter and take the new snapshot unconditionally always.

New check is to disable messing with snapshots (which I really don't like) when there is nothing decompressed.

I started of by fixing this issue for uncompressed hypertables and that was the first change.

Well, that seems to be the correct way to work with the snapshots there, so maybe we should instead apply it unconditionally? If you only apply it under some complicated conditions, the bugs will be more difficult to find. E.g. now it looks like we should add isolation tests with both REPEATABLE READ and READ COMMITTED to test the result == TM_Deleted && !IsolationUsesXactSnapshot() code path. We still have to do the potentially problematic snapshot operations in some cases, and if we add more conditions to invoke them, I would say it doesn't really improve the things.

I take issue with increasing command counter when there isnt anything that warrants it. Also snapshot manipulation is just not necessary if nothing is decompressed.

So I personally don't see a problem here. There is already a check for compression support which skips all this.

akuzm

Let's try to add an isolation test for the "deleted" branch.

Update triggers were broken due to snapshot manipulation and usage of dynamic snapshots for scanning tuples. Using a static snapshot guarantees we cannot see tuples that would be invisible during an update trigger.

timescale-automation · 2024-05-27T08:59:33Z

Automated backport to 2.15.x not done: cherry-pick failed.

Git status

HEAD detached at origin/2.15.x
You are currently cherry-picking commit cec6b37fb.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	new file:   .unreleased/fix_6858
	modified:   src/nodes/hypertable_modify.c
	modified:   tsl/src/compression/compression.c
	modified:   tsl/test/isolation/expected/compression_dml_iso.out
	modified:   tsl/test/isolation/specs/compression_dml_iso.spec
	modified:   tsl/test/sql/compression_update_delete.sql

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   tsl/test/expected/compression_update_delete.out

Job log

antekresic force-pushed the fix_update_trigger branch from b65f9e2 to e127bc5 Compare May 9, 2024 08:14

antekresic force-pushed the fix_update_trigger branch 4 times, most recently from 1b9bb6a to 4bfbedb Compare May 9, 2024 12:04

antekresic self-assigned this May 9, 2024

antekresic added the bug label May 9, 2024

antekresic marked this pull request as ready for review May 9, 2024 12:07

antekresic requested a review from svenklemm May 9, 2024 12:07

fabriziomello reviewed May 9, 2024

View reviewed changes

svenklemm approved these changes May 10, 2024

View reviewed changes

akuzm reviewed May 10, 2024

View reviewed changes

akuzm approved these changes May 22, 2024

View reviewed changes

fabriziomello modified the milestone: TimescaleDB 2.15.1 May 25, 2024

Fix update triggers issue caused by decompression

156e1cc

Update triggers were broken due to snapshot manipulation and usage of dynamic snapshots for scanning tuples. Using a static snapshot guarantees we cannot see tuples that would be invisible during an update trigger.

antekresic force-pushed the fix_update_trigger branch from 4bfbedb to 156e1cc Compare May 27, 2024 08:14

antekresic merged commit cec6b37 into timescale:main May 27, 2024
36 checks passed

timescale-automation added auto-backport-not-done Automated backport of this PR has failed non-retriably (e.g. conflicts) backported-2.15.x labels May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix update triggers issue caused by decompression #6903

Fix update triggers issue caused by decompression #6903

antekresic commented May 9, 2024 •

edited

codecov bot commented May 9, 2024 •

edited

fabriziomello May 9, 2024

antekresic May 10, 2024

akuzm commented May 10, 2024

akuzm May 10, 2024

akuzm May 10, 2024

antekresic May 10, 2024

antekresic May 10, 2024

akuzm May 10, 2024

akuzm May 10, 2024

akuzm May 10, 2024

antekresic May 10, 2024

akuzm May 10, 2024

antekresic May 11, 2024

akuzm left a comment

timescale-automation commented May 27, 2024

Fix update triggers issue caused by decompression #6903

Fix update triggers issue caused by decompression #6903

Conversation

antekresic commented May 9, 2024 • edited

codecov bot commented May 9, 2024 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akuzm commented May 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akuzm left a comment

Choose a reason for hiding this comment

timescale-automation commented May 27, 2024

Git status

antekresic commented May 9, 2024 •

edited

codecov bot commented May 9, 2024 •

edited