dolt diff does not display added and corresponding deleted rows next to each other #7808

liuliu-dev · 2024-05-01T21:35:17Z

this affects how we display diff rows on dolthub

repro:

clone this db, checkout a new branch.
change pk2 in 2 rows:

UPDATE `change_in_pks` SET `pk2` = "4" WHERE `pk1` = "2" AND `pk2` = "3";
UPDATE `change_in_pks` SET `pk2` = "5" WHERE `pk1` = "2" AND `pk2` = "6";

run dolt diff

related dolthub pull: https://www.dolthub.com/repositories/liuliu/test1/pulls/11/compare

The text was updated successfully, but these errors were encountered:

nicktobey · 2024-05-13T23:48:15Z

I'm not sure this is even possible to fix without breaking history-independence.

Commits store the current state of the table, but they don't track the commands that led to that state. This is on purpose: two tables with the same state should have the same hash regardless of the sequence of commands that were run.

This means that there's no way for the commit to know that the added and deleted row were from the same update operation.

liuliu-dev · 2024-05-14T16:37:30Z

@nicktobey Seems this is not about the sequence of commands? I might be wrong since I'm not familiar with dolt implementation...

A modified row is displayed as deleted and added rows, and we hope to display them next to each other. But when a commit change happens in primary keys, it might mess up this order, since the diff rows are sorted by primary keys.

Here is another example that Tim found if that's clearer: https://www.dolthub.com/repositories/dolthub/hospital-price-transparency-v3/pulls/169/compare, where the values of column code changed from 51079- 0667- to 51079- 0667 in this commit, makes all the added rows go up.

nicktobey · 2024-05-14T17:30:20Z

The reason I mentioned the sequence of commands is because, for instance, the diff you linked to could have happened in multiple ways:

An update command like UPDATE prices SET code = "51079- 0667" WHERE code = "51079- 0667-"
A deletion of the old rows followed by inserting the new rows.

The diff operation only sees the start and end state of the table, so it can't tell the difference between these two possibilities. It has to display the same diff for both.

One thing we could do is attempt to detect when a change was likely an update to a primary key, match each row-add to its corresponding row-remove, and display them next to each other. It means that would do this even if the change wasn't the result of changing a primary key, as long as it could have been... but that's probably rare enough that we don't mind.

The question is how we would detect these pairs of rows and whether we can do it efficiently.

liuliu-dev · 2024-05-14T17:45:49Z

I see now, thanks for the clarification :)

timsehn added enhancement New feature or request diff version control labels May 2, 2024

liuliu-dev mentioned this issue May 9, 2024

dolthub UI improvements list dolthub/dolthub-issues#537

Open

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dolt diff does not display added and corresponding deleted rows next to each other #7808

dolt diff does not display added and corresponding deleted rows next to each other #7808

liuliu-dev commented May 1, 2024

nicktobey commented May 13, 2024

liuliu-dev commented May 14, 2024 •

edited

nicktobey commented May 14, 2024

liuliu-dev commented May 14, 2024

dolt diff does not display added and corresponding deleted rows next to each other #7808

dolt diff does not display added and corresponding deleted rows next to each other #7808

Comments

liuliu-dev commented May 1, 2024

nicktobey commented May 13, 2024

liuliu-dev commented May 14, 2024 • edited

nicktobey commented May 14, 2024

liuliu-dev commented May 14, 2024

liuliu-dev commented May 14, 2024 •

edited