Support Conflict Resolution During Rebase #7820

fulghum · 2024-05-03T20:39:14Z

Dolt currently supports an interactive rebase workflow but if any data or schema conflicts arise, the rebase is automatically aborted. Customers would like to be able to resolve data and schema conflicts as part of the rebase workflow.

Separating this work into first adding support for handling data conflicts and then a second pass for handling schema conflicts is probably a good way to tackle it.

Data Conflicts

Rebase is built on top of cherry-pick, which already has support for handling data conflicts using the standard Dolt conflict resolution tables. Today, rebase just aborts if cherry-pick reports that there were any data conflicts, but we could stop the rebase, let the user resolve the conflicts, and then allow the --continue parameter to restart the current rebase at the next step in the rebase plan. That would allow the user to continue the rebase after all data conflicts were resolved. This should be fairly straightforward; implementation and testing is likely 3 to 5 days.

As a sidenote, supporting a continue'able workflow for interactive rebase would also allow us to easily support the edit rebase action, too. Currently the --continue param only allows a rebase plan to be started at the first step in the plan.

Schema Conflicts

Cherry-pick does support handling schema changes, but does not support handling schema conflicts – the cherry-pick is aborted if a schema conflict is encountered. For example, you can cherry-pick a commit that creates a new table or modifies an existing table, but you can't currently cherry-pick a commit that inserts or modifies data in a table if the columns that exist on the branch being rebased don't match in the upstream branch. Note that if the upstream branch has only added/dropped columns, and not changed existing columns, then this can be rebased/cherry-picked successfully currently.

There is a key difference between merge and rebase here – merge takes two databases versions (as well as the common ancestor version for comparing against to determine which side changed data/schema) where as rebase takes the commits from the branch being rebased, and reapplies each commit's changes against a new branch from the tip of the upstream branch. Both merge and rebase will hit schema conflicts in the same situations, but it seems slightly easier (currently) to resolve the more antagonistic changes (e.g. changing a column from varchar to float) with merge, because you can add a new commit on the branch to bring the branch's schema in line with the upstream branch's schema and then successfully merge once you've manually resolved the schema conflict that way. With rebase, replaying the commits on top of the changed schema from the upstream branch either 1) has to rely on the automatic schema mapping that we support today (e.g. adding/dropping columns, certain types of column changes), or 2) you can apply a commit to the branch to bring the schema in line with the upstream, then squash all those commits together and rebase against the tip of the upstream. The second approach of course loses some of the main benefits of rebasing, but I'd also expect those larger schema changes are not as frequent.

There are more ways we can enhance the automatic schema mapping, but it seems worth first spending some time thinking through these cases more to really understand how customers need schema resolution during rebase to work.

To make this a little more concrete, here are two example scenarios. One were we cannot automatically handle a schema change and one where we can automatically handle it.

Scenario 1: Incompatible data for upstream schema change

In this scenario, the schema has changed on the main branch after branch1 was created – the c1 varchar column was dropped and a new c1 int column was added. This type of schema change invalidates any data inserts that have happened on branch1. When the commits from branch1 are rebased onto the tip of main, they can't be applied cleanly because Dolt doesn't know how to map the data between the two schemas.

create table t (pk int primary key, c1 varchar(100));
call dolt_commit('-Am', 'adding table t on main');
insert into t values (1, "one");
call dolt_commit('-am', 'adding row 1 on main');
call dolt_branch('branch1');
insert into t values (2, "two");
call dolt_commit('-am', 'adding row 2 on main');
alter table t drop column c1;
alter table t add column c1 int;
call dolt_commit('-am', 'dropping and readding t.c1 on main');

call dolt_checkout('branch1');
insert into t (pk, c1) values (100, "hundy");
call dolt_commit('-am', 'inserting row 100 on branch1');
select * from dolt_log;
call dolt_rebase('-i', 'main');
select * from dolt_rebase;
call dolt_rebase('--continue');

-- Today, this fails with:
-- merge conflict detected while rebasing commit tfpgijqe317clm2cl3trl15lnm16tljj. the rebase has been automatically aborted

-- Merging this would also result in a schema conflict that has to be manually resolved. It seems *slightly* easier to merge this together than to rebase it though. You can add a new commit at the tip of the branch to apply the schema changes and then merge. For rebasing, you'd need to do that, and then squash all the changes on this branch into a single commit, and then perform the rebase.

Scenario 2: Upstream additive schema change

In this scenario, the schema has changed on the main/upstream branch since branch1 was branched off, but in an additive way (adding column c2). This type of schema change can be rebased and merged without needing to manually resolve the schema difference first.

create table t (pk int primary key, c1 varchar(100));
call dolt_commit('-Am', 'adding table t on main');
insert into t values (1, "one");
call dolt_commit('-am', 'adding row 1 on main');
call dolt_branch('branch1');
insert into t values (2, "two");
call dolt_commit('-am', 'adding row 2 on main');
alter table t add column c2 int;
call dolt_commit('-am', 'adding t.c2 on main');

call dolt_checkout('branch1');
insert into t (pk, c1) values (100, "hundy");
call dolt_commit('-am', 'inserting row 100 on branch1');
select * from dolt_log;
call dolt_rebase('-i', 'main');
select * from dolt_rebase;
call dolt_rebase('--continue');

-- this works

The text was updated successfully, but these errors were encountered:

fulghum · 2024-05-07T23:56:50Z

In the event of a schema conflict like in scenario 1 above that prevents using a rebase workflow, you can still use a merge workflow to update the branch. (Using rebase to try and squash the commits on the dev branch into a single commit that can then be rebased onto the tip of main, will still currently hit a schema conflict.)

1 – Bring the schema into sync with the upstream branch

We can use the dolt_patch() table function to easily find the SQL statements needed to migrate the schema to match main:

-- find the schema differences using dolt_patch()
select statement_order, statement from dolt_patch('branch1', 'main') where diff_type="schema";
+-----------------+-------------------------------+
| statement_order | statement                     |
+-----------------+-------------------------------+
| 1               | ALTER TABLE `t` DROP `c1`;    |
| 2               | ALTER TABLE `t` ADD `c1` int; |
+-----------------+-------------------------------+

-- apply the schema changes
ALTER TABLE `t` DROP `c1`;
ALTER TABLE `t` ADD `c1` int;

-- create a Dolt commit
call dolt_commit('-am', 'updating schemas to match main');

2 – Merge the latest changes from the upstream branch into the dev branch.

-- before merging main to branch1, we can look at the current log for branch1 to see the reachable commits
select * from dolt_log;
+----------------------------------+-----------+-------------------------+---------------------+--------------------------------+
| commit_hash                      | committer | email                   | date                | message                        |
+----------------------------------+-----------+-------------------------+---------------------+--------------------------------+
| 314crgv3oj20o2blsjcin3g50nld258j | root      | root@localhost          | 2024-05-07 23:35:43 | updating schemas to match main |
| jmstip9r3d992h80ik5v95a76d4l7rnc | root      | root@localhost          | 2024-05-07 23:34:48 | inserting row 100 on branch1   |
| 38lpgag8jkbjo7l5gsmnfks4vgjqo6h3 | root      | root@localhost          | 2024-05-07 23:34:48 | adding row 1 on main           |
| df42kstlos2i84c7vujnv6kofrudn332 | root      | root@localhost          | 2024-05-07 23:34:48 | adding table t on main         |
| d0ltbo8su41muhftfg3gm6jr6re19jpt | jfulghum  | jason.fulghum@gmail.com | 2024-05-07 23:29:01 | Initіаlіze datа rеpоsitory     |
+----------------------------------+-----------+-------------------------+---------------------+--------------------------------+

-- merge the main branch into branch1
call dolt_merge('main');
+----------------------------------+--------------+-----------+------------------+
| hash                             | fast_forward | conflicts | message          |
+----------------------------------+--------------+-----------+------------------+
| bjjli2hp9jti0vc9djdkss7gtceenuof | 0            | 0         | merge successful |
+----------------------------------+--------------+-----------+------------------+

-- now examine the commit log for branch1 and we can see new commits from main are now reachable
-- on branch1 (e.g. "adding row 2 on main")
rebaseSchemaConflictResolution/branch1> select * from dolt_log;
+----------------------------------+-----------+-------------------------+---------------------+------------------------------------+
| commit_hash                      | committer | email                   | date                | message                            |
+----------------------------------+-----------+-------------------------+---------------------+------------------------------------+
| bjjli2hp9jti0vc9djdkss7gtceenuof | root      | root@localhost          | 2024-05-07 23:53:07 | Merge branch 'main' into branch1   |
| 314crgv3oj20o2blsjcin3g50nld258j | root      | root@localhost          | 2024-05-07 23:35:43 | updating schemas to match main     |
| jp0kv3re8ltafhef1k0a4g7p66t4gu53 | root      | root@localhost          | 2024-05-07 23:34:48 | dropping and readding t.c1 on main |
| jmstip9r3d992h80ik5v95a76d4l7rnc | root      | root@localhost          | 2024-05-07 23:34:48 | inserting row 100 on branch1       |
| bohjglronlvql4ltdkekloqbrdcp2m21 | root      | root@localhost          | 2024-05-07 23:34:48 | adding row 2 on main               |
| 38lpgag8jkbjo7l5gsmnfks4vgjqo6h3 | root      | root@localhost          | 2024-05-07 23:34:48 | adding row 1 on main               |
| df42kstlos2i84c7vujnv6kofrudn332 | root      | root@localhost          | 2024-05-07 23:34:48 | adding table t on main             |
| d0ltbo8su41muhftfg3gm6jr6re19jpt | jfulghum  | jason.fulghum@gmail.com | 2024-05-07 23:29:01 | Initіаlіze datа rеpоsitory         |
+----------------------------------+-----------+-------------------------+---------------------+------------------------------------+

fulghum added enhancement New feature or request customer issue labels May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Conflict Resolution During Rebase #7820

Support Conflict Resolution During Rebase #7820

fulghum commented May 3, 2024

fulghum commented May 7, 2024

Support Conflict Resolution During Rebase #7820

Support Conflict Resolution During Rebase #7820

Comments

fulghum commented May 3, 2024

Data Conflicts

Schema Conflicts

Scenario 1: Incompatible data for upstream schema change

Scenario 2: Upstream additive schema change

fulghum commented May 7, 2024