Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-15990 REPLACE on a precise-versioned table returns ER_DUP_ENTRY #3229

Open
wants to merge 6 commits into
base: bb-10.5-nikita-MDEV-30046
Choose a base branch
from

Conversation

FooBarrior
Copy link
Contributor

  • The Jira issue number for this PR is: MDEV-15990

Description

This PR attempts to handle several issues related to system versioned history collision handling.

First, the subject of MDEV-15990: REPLACE shouldn't end up with duplicate key error.

Secondly, the user should not experience lost history, when possible, if several updates on the row happened to applty at the same timestamp. This is a contrary to trx_id, where a collision can only happen in a single transaction, and shouldn't be preserved by design, see MDEV-15427.

In addition, don't save history with row_start>row_end. This conforms current historical row insetion behavior.
Though, in production this may happen f.ex. after ntp time correction. Then we'd better want to update row_start=row_end and save the row. I'm open to discussion.

Release Notes

Fix REPLACE behavior on a TRX-ID versioned table, when a single row was rewritten more than once. Now this doesn't cause duplicate key error.
Improve timestamp-based versioned history collision handling, when several changs of a single row happen at the same timestamp.

How can this PR be tested?

For 1 and 2 the test case is added to versioning.replace. For 3 the test is in versioning.misc.

Basing the PR against the correct MariaDB version

  • This is a new feature and the PR is based against the latest MariaDB development branch.
  • This is a bug fix and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

We had a protection against it, by allowing versioned delete if:
 trx->id != table->vers_start_id()

For replace this check fails: replace calls ha_delete_row(record[2]), but
table->vers_start_id() returns the value from record[0], which is irrelevant.

The same problem hits Field::is_max, which may have checked the wrong record.

Fix:
* Refactor Field::is_max to always accept a pointer as an argument. This
ensures that a developer will consider using a correct pointer in future.
* Refactor vers_start_id and vers_end_id to always accept a pointer to the
record. The difference with is_max is that is_max accepts the pointer to the
field data, rather than to the record.
  Method val_int() would be too effortful to refactor to accept the argument, so
instead the value in record is fetched directly, like it is done in
Field_longlong.
@FooBarrior FooBarrior requested a review from midenok April 28, 2024 00:15
Timestamp-versioned row deletion was exposed to a collisional problem: if
current timestamp wasn't changed, then a sequence of row delete+insert could
get a duplication error. A row delete would find another conflicting history row
and return an error.

This is true both for REPLACE and DELETE statements, however in REPLACE, the
"optimized" path is usually taken, especially in the tests. There, delete+insert
is substituted for a single versioned row update. In the end, both paths end up
as ha_update_row + ha_write_row.

The solution is to handle a history collision somehow.

From the design perspective, the user shouldn't experience history rows loss,
unless there's a technical limitation.

To the contrary, trxid-based changes should never generate history for the same
transaction, see MDEV-15427.

If two operations on the same row happened too quickly, so that they happen at
the same timestamp, the history row shouldn't be lost. We can still write a
history row, though it'll have row_start == row_end.

We cannot store more than one such historical row, as this will violate the
unique constraint on row_end. So we will have to phisically delete the row if
the history row is already available.

In this commit:
1. Improve TABLE::delete_row to handle the history collision: if an update
   results with a duplicate error, delete a row for real.
2. use TABLE::delete_row in a non-optimistic path of REPLACE, where the
   system-versioned case now belongs entirely.
…row insert

DB_FOREIGN_DUPLICATE_KEY in row_ins_duplicate_error_in_clust was motivated by
handling the cascade changes during versioned operations.

It was found though, that certain row_update_for_mysql calls could return
DB_FOREIGN_DUPLICATE_KEY, even if there's no foreign relations.

Change DB_FOREIGN_DUPLICATE_KEY to DB_DUPLICATE_KEY in
row_ins_duplicate_error_in_clust.

It will be later converted to DB_FOREIGN_DUPLICATE_KEY in
row_ins_check_foreign_constraint if needed.

Additionally, ha_delete_row should return neither. Ensure it by an assertion.
Copy link
Contributor

@midenok midenok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is partial review. I should evaluate more on TABLE::delete_row(). But anyway posting it now for you @FooBarrior to evaluate it if you want to.

set timestamp=12344;
delete from t1;
select pk,i from t1 for system_time all;
pk i
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either it correctly sees the historical point in the past and works with it like with current point (must not change any data in the current case) or it works like before. I don't like the idea of unversioned DML from the past for the future. That is equally incorrect but is worse because of no history.

@@ -2442,7 +2442,7 @@ row_ins_duplicate_error_in_clust(
&trx_id_len);
ut_ad(trx_id_len == DATA_TRX_ID_LEN);
if (trx->id == trx_read_trx_id(trx_id)) {
err = DB_FOREIGN_DUPLICATE_KEY;
err = DB_DUPLICATE_KEY;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the commit message you should refer to your previous commit "MDEV-15990 REPLACE on a precise-versioned table returns ER_DUP_ENTRY" as this commit depends on it. You did not remove DB_FOREIGN_DUPLICATE_KEY from TABLE::delete_row(). Was it intended? No point in this assignment since err was DB_DUPLICATE_KEY anyway (line 2432), you could revert the whole hunk from MDEV-23644.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, intended. DB_DUPLICATE_KEY is converted to DB_FOREIGN_DUPLICATE_KEY when returned from a cascade function call

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand fully the case. When foreign cascade generates history and that history has duplicate error? That goes not from one command. Is there any test case? I doubt it should skip history in that case.

@@ -1610,7 +1610,7 @@ int multi_delete::do_table_deletes(TABLE *table, SORT_INFO *sort_info,
during ha_delete_row.
Also, don't execute the AFTER trigger if the row operation failed.
*/
if (unlikely(!local_error))
if (likely(!local_error))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should just remove the unlikely() wrapper or keep it as is. This likely/unlikely is just time waster.

@@ -4626,17 +4626,17 @@ void Field_longlong::set_max()
int8store(ptr, unsigned_flag ? ULONGLONG_MAX : LONGLONG_MAX);
}

bool Field_longlong::is_max()
bool Field_longlong::is_max(const uchar *ptr_arg) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should not change the interface like that, it is too dirty. Or add the default value NULL and keep the code unaffected by the bug intact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the opposite: a pointer should ALWAYS be supplied. And it makes no sense to store ptr in Field (offset should be stored, rather). You already may have noticed, how hard it can be to make literally anything with a record that is not table->record[0]. Many things get broken because of that.

In my opinion, these field functions are better not to have any default parameters (or a parameter-less shortcut) so that the one who writes a call would think what are they going to feed there.

As an example, look at how the code has been simplified in log_event_server.cc once I've forced a correct parameter to pass: turns out, it was just is_max called on data from record[0] and record[1]!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simetimes it simplifies, sometimes not. Most the code got dirtier, I don't like how it looks. The correct paradigm is: fields should be the properies of record, not of the table. Until that I'd prefer record[0] is the default parameter of table fields. Imagine your paradigm in implementation: you supply the offset to every property, that are hundreds of them and infinite in infinity. That is horrible!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fields should be the properies of record, not of the table

agree, totally. I had some similar design ideas. I think it'll be easier to transit if most of the code will already accept the correct argument.

Also, can you give your suggestion about how should the changed code in log_event_server.cc look, according to you?

Copy link
Contributor

@midenok midenok May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can you give your suggestion about how should the changed code in log_event_server.cc look, according to you?

If that worked you may keep it intact. Or you may apply your solution, I'm not against it as custom measures. I'm against the total conversion of the interface to this paradigm.

I had some similar design ideas.

That is not something I qualify as an idea, but as obvious, evident, self-granted.

@montywi
Copy link
Contributor

montywi commented May 25, 2024 via email

@FooBarrior
Copy link
Contributor Author

FooBarrior commented May 27, 2024

@montywi This approach of accessing the rocerds has already been presented in a number of Field methods:

int cmp(const uchar *a,const uchar *b) const;
int cmp_binary(const uchar *a,const uchar *b, uint32 max_length) const;
int cmp_prefix(const uchar *a, const uchar *b, size_t prefix_char_len) const;
int key_cmp(const uchar *,const uchar*) const override;
uchar *pack(uchar *to, const uchar *from, uint max_length);
const uchar *unpack(uchar* to, const uchar *from, const uchar *from_end, uint param_data);
void val_str_from_ptr(String *val, const uchar *ptr) const;

So that is not something new for the Field interface, but an organic continuation.

We should not change interfaces for how records are accessed just for one
case and in an old MariaDB version.
When we do that, we should do that systematic and fix all interfaces at
the same time.

There are plans of how to do this, and the suggested change is not the way
to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants