Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix update triggers issue caused by decompression #6903

Merged
merged 1 commit into from May 27, 2024

Conversation

antekresic
Copy link
Contributor

@antekresic antekresic commented May 9, 2024

Update triggers were broken due to snapshot manipulation and usage of dynamic snapshots for scanning tuples. Using a static snapshot guarantees we cannot see tuples that would be invisible during an update trigger.

Fixes #6858

Copy link

codecov bot commented May 9, 2024

Codecov Report

Attention: Patch coverage is 86.95652% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 81.01%. Comparing base (59f50f2) to head (4bfbedb).
Report is 182 commits behind head on main.

Current head 4bfbedb differs from pull request most recent head 156e1cc

Please upload reports for the commit 156e1cc to get more accurate results.

Files Patch % Lines
tsl/src/compression/compression.c 84.21% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6903      +/-   ##
==========================================
+ Coverage   80.06%   81.01%   +0.94%     
==========================================
  Files         190      199       +9     
  Lines       37181    37317     +136     
  Branches     9450     9740     +290     
==========================================
+ Hits        29770    30231     +461     
- Misses       2997     3186     +189     
+ Partials     4414     3900     -514     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@antekresic antekresic force-pushed the fix_update_trigger branch 4 times, most recently from 1b9bb6a to 4bfbedb Compare May 9, 2024 12:04
@antekresic antekresic self-assigned this May 9, 2024
@antekresic antekresic added the bug label May 9, 2024
@antekresic antekresic marked this pull request as ready for review May 9, 2024 12:07
@antekresic antekresic requested a review from svenklemm May 9, 2024 12:07
SELECT 'f2ca7073-1395-5770-8378-7d0339804580', '2024-04-16 04:50:00+02',
1100.00, '2024-04-23 11:56:38.494095+02' FROM generate_series(1,2500,1) c;

VACUUM FULL update_trigger_test;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why VACUUM FULL here??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue depends on the tuple storage on block level. This was the only way to get a consistent reproduction case.

@akuzm
Copy link
Member

akuzm commented May 10, 2024

@antekresic can you explain more about this issue? What row was it trying to lock that was invisible?

Comment on lines +762 to +724
/* use a static copy of current transaction snapshot
* this needs to be a copy so we don't read trigger updates
*/
estate->es_snapshot = RegisterSnapshot(GetTransactionSnapshot());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that on the next update cycle, we're trying the tuple previously modified by the BEFORE UPDATE trigger on the previous cycle? And the "invisible" error happens in GetTupleForTrigger()?

I think normally the new tuples from the previous update cycles are skipped because they are TM_SelfModified and not TM_Invisible. This is determined by HeapTupleSatisfiesUpdate. So if we're getting TM_Invisible, this means we either forgot to update the command counter after modifying the tuple, or we're taken the new snapshot with command counter higher than the previous update. I guess here it's the latter and you prevent that by making a copy of the snapshot?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally the es_snapshot is initialized with a registered snapshot copy (standard_ExecutorStart), so this is probably correct. But I wonder if we should re-take the snapshot like this at all? Not sure if it's a standard practice. Normally the update code uses the es_output_cid as the current command id for locking the tuples, so maybe just incrementing the output cid after decompression would be enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, exactly, what you said. By using a static snapshot which includes the decompressed tuples, we are making sure we see only those tuples and not the updated versions after this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to update the snapshot of the estate so the decompressed tuples (which are inserted in the same transaction) are included in the scan of the uncompressed chunk. That's the main reason for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I wonder if we should re-take the snapshot like this at all? Not sure if it's a standard practice. Normally the update code uses the es_output_cid as the current command id for locking the tuples, so maybe just incrementing the output cid after decompression would be enough?

Looks like we must do it, otherwise we won't see the tuples we've decompressed. The es_output_cid comes into play later, when we do see the tuples and want to determine if they are updated by us.

Comment on lines +750 to +711
/* Modify snapshot only if something got decompressed */
if (ts_cm_functions->decompress_target_segments &&
ts_cm_functions->decompress_target_segments(ht_state))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I managed to understand the rationale why we add the RegisterSnapshot below. But what about this new check, why is it needed? Your test passes without this change with just the RegisterSnapshot. I also double-checked that it fails w/o the RegisterSnapshot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems reasonable to me that the update process should work even if we just increment command counter and take the new snapshot unconditionally always.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New check is to disable messing with snapshots (which I really don't like) when there is nothing decompressed.

I started of by fixing this issue for uncompressed hypertables and that was the first change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that seems to be the correct way to work with the snapshots there, so maybe we should instead apply it unconditionally? If you only apply it under some complicated conditions, the bugs will be more difficult to find. E.g. now it looks like we should add isolation tests with both REPEATABLE READ and READ COMMITTED to test the result == TM_Deleted && !IsolationUsesXactSnapshot() code path. We still have to do the potentially problematic snapshot operations in some cases, and if we add more conditions to invoke them, I would say it doesn't really improve the things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take issue with increasing command counter when there isnt anything that warrants it. Also snapshot manipulation is just not necessary if nothing is decompressed.

So I personally don't see a problem here. There is already a check for compression support which skips all this.

Copy link
Member

@akuzm akuzm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to add an isolation test for the "deleted" branch.

Update triggers were broken due to snapshot manipulation
and usage of dynamic snapshots for scanning tuples. Using
a static snapshot guarantees we cannot see tuples that would
be invisible during an update trigger.
@antekresic antekresic merged commit cec6b37 into timescale:main May 27, 2024
36 checks passed
@timescale-automation
Copy link

Automated backport to 2.15.x not done: cherry-pick failed.

Git status

HEAD detached at origin/2.15.x
You are currently cherry-picking commit cec6b37fb.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	new file:   .unreleased/fix_6858
	modified:   src/nodes/hypertable_modify.c
	modified:   tsl/src/compression/compression.c
	modified:   tsl/test/isolation/expected/compression_dml_iso.out
	modified:   tsl/test/isolation/specs/compression_dml_iso.spec
	modified:   tsl/test/sql/compression_update_delete.sql

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   tsl/test/expected/compression_update_delete.out


Job log

@timescale-automation timescale-automation added auto-backport-not-done Automated backport of this PR has failed non-retriably (e.g. conflicts) backported-2.15.x labels May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport-not-done Automated backport of this PR has failed non-retriably (e.g. conflicts) backported-2.15.x bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Before Update trigger on hypertable make updates fail
5 participants