Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy-info: initial design doc #3574

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

copy-info: initial design doc #3574

wants to merge 1 commit into from

Conversation

torquestomp
Copy link
Collaborator

Checklist

If applicable:

  • I have updated CHANGELOG.md
  • I have updated the documentation (README.md, docs/, demos/)
  • I have updated the config schema (cli/src/config-schema.json)
  • I have added tests to cover my changes

@torquestomp torquestomp mentioned this pull request Apr 25, 2024
4 tasks
Copy link
Collaborator

@PhilipMetzger PhilipMetzger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor stuff, as I'm no expert.

I do have one question though, will implicit tracking also be a future goal, where are plain mv/cp in a jj repo gets rewritten to a jj mv/cp?

And a minor note on the title, if it only targets metadata it is fine. But if it also tackles #47 and #3386 then a better title would be "Copy info and Copy tracing Design" or "Copy Info and rename tracking for Jujutsu".

docs/design/copy-info.md Outdated Show resolved Hide resolved
docs/design/copy-info.md Outdated Show resolved Hide resolved
docs/design/copy-info.md Outdated Show resolved Hide resolved
docs/design/copy-info.md Outdated Show resolved Hide resolved
docs/design/copy-info.md Outdated Show resolved Hide resolved
docs/design/copy-info.md Outdated Show resolved Hide resolved
docs/design/copy-info.md Outdated Show resolved Hide resolved
docs/design/copy-info.md Outdated Show resolved Hide resolved
docs/design/copy-info.md Outdated Show resolved Hide resolved
docs/design/copy-info.md Outdated Show resolved Hide resolved
docs/design/copy-info.md Outdated Show resolved Hide resolved
docs/design/copy-info.md Outdated Show resolved Hide resolved
@torquestomp
Copy link
Collaborator Author

Partial reply; I haven't added anything for merge resolution yet, need to think about it for a while.

docs/design/copy-info.md Outdated Show resolved Hide resolved
docs/design/copy-info.md Outdated Show resolved Hide resolved
docs/design/copy-info.md Outdated Show resolved Hide resolved
docs/design/copy-info.md Outdated Show resolved Hide resolved
docs/design/copy-info.md Outdated Show resolved Hide resolved
docs/design/copy-info.md Outdated Show resolved Hide resolved
docs/design/copy-info.md Outdated Show resolved Hide resolved
docs/design/copy-info.md Outdated Show resolved Hide resolved
docs/design/copy-info.md Outdated Show resolved Hide resolved
@torquestomp torquestomp force-pushed the dploch/copy-design branch 3 times, most recently from cfd7e64 to 528f120 Compare May 8, 2024 18:15
docs/design/copy-info.md Outdated Show resolved Hide resolved
```rust
pub struct Commit {
...
copies: Option<HashMap<RepoPathBuf, MergedCopySoure>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the "Merged" (or conflicting) CopySource represent here?

Based on "Squash conflicts" example,

Suppose we create commit A by renaming file F1 -> F, then we create child
commit B in which we replace F by renaming F2 -> F. This touches on two issues.

Firstly, if these two commits are squashed together, then we have a destination
F with two copy sources, F1 and F2. In this case, we can store a
Merge[Some(F1), None, Some(F2)] as the copy source for F,

a file in a squashed commit may have multiple copy sources. However, this "squash" operation doesn't involve merging, and I'm not sure what the negative term None represents. Suppose copy info forms a file-level DAG, F: vec![F1, F2] might be more appropriate (if the rename in the commit B is not destructive.)

FWIW, I don't know how Merge<Option<CopySource>> conflicts (calculated by transitive get_copy_records()) will be processed by caller. Other Merge<T> types represent conflicts of values, but CopySource is a diff at a certain point in the DAG.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so instead of a Merge<>, we could represent copy info as an ordered set (stored as a Vec<>)? That's an interesting idea, though some backends (ours definitely) would reject multiple copies in a commit, and I'm not sure how you'd handle that for something like jj diff or jj rebase. I chose the Merge<> struct just to represent it as a conflict that needed resolving (where None means "no copy source"). Maybe it should still represent that just as a Vec<> instead of a Merge<>, where len()==0 means no copy, len()==1 is resolved and len()>1 is a conflict.

Is your concern with the Merge<> type itself or do you think we should actually support copies from multiple sources in the same commit?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but CopySource is a diff at a certain point in the DAG.

Is it, really?

do you think we should actually support copies from multiple sources in the same commit?

Wouldn't this be like doing jj new x y, in which both x and y rename a file to something different, or even F1 to F to F2?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specifically about two copies into the same destination, since copy info is keyed by destination. Two separate copies from F -> F1 and F -> F2 can coexist without any complications.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so it's like jj new x y, in which both x and y rename a file to the same name?

Copy link
Collaborator

@yuja yuja May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so instead of a Merge<>, we could represent copy info as an ordered set (stored as a Vec<>)? That's an interesting idea, though some backends (ours definitely) would reject multiple copies in a commit, and I'm not sure how you'd handle that for something like jj diff or jj rebase.

If the backend reject multiple sources, Merge<> (which is basically Vec of positive and negative terms) wouldn't work either. I thought your intention was to support that.

I chose the Merge<> struct just to represent it as a conflict that needed resolving (where None means "no copy source"). Maybe it should still represent that just as a Vec<> instead of a Merge<>, where len()==0 means no copy, len()==1 is resolved and len()>1 is a conflict.

Is your concern with the Merge<> type itself or do you think we should actually support copies from multiple sources in the same commit?

My concern is what the negative CopySource term represents, and where it comes from for the linear squash case. Let's say we have actually squashed a merge commit A->{B, C},

         A
        / \
{f<-b} B   C {f<-c}
        \ /
         D {f<-d}

we could say that the resulting commit had a copy source f: Merge[b, d, c] (if we do merge copy sources just like the other value types.) However, I have no idea what the negative term f: d would mean. (btw, I think f: [b, c] is more correct in this example.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a CopySources enum to represent resolved/conflicted copies without the negative elements that the Merge type has, I think this resolves the main issue.

If the backend reject multiple sources

To clarify: our backend can support this (and will), but the backend's backend does not and so users will be unable to do the internal equivalent of 'git push' on such commits, which I think is fine (same as you can't push jj commits with conflicts).

@PhilipMetzger
Copy link
Collaborator

@newren Hello Elijah, this may be of interest for you, would you mind taking a look at it?

@torquestomp
Copy link
Collaborator Author

Just some minor stuff, as I'm no expert.

I do have one question though, will implicit tracking also be a future goal, where are plain mv/cp in a jj repo gets rewritten to a jj mv/cp?

And a minor note on the title, if it only targets metadata it is fine. But if it also tackles #47 and #3386 then a better title would be "Copy info and Copy tracing Design" or "Copy Info and rename tracking for Jujutsu".

I renamed the doc and file name to better capture the 'tracking' aspect

Comment on lines +40 to +48
/// The source commit the target was copied from. If not specified, then the
/// parent of the target commit is the source commit. Backends may use this
/// field to implement 'integration' logic, where a source may be
/// periodically merged into a target, similar to a branch, but the
/// branching occurs at the file level rather than the repository level. It
/// also follows naturally that any copy source targeted to a specific
/// commit should avoid copy propagation on rebasing, which is desirable
/// for 'fork' style copies.
commit: Option<CommitId>,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still a bit unclear to me how this will be used. What will it look like to the user? Will they do something like jj cp --from-commit xyz old_name new_name?

It bothers me that there's a commit id in here. Do we need to guarantee that it's an ancestor of the commit we encountered the CopySource on? What does it mean if it's not?

My gut feeling is still that this is too specific to Google's Perforce-like VCS. Do we think it'll be useful with other backends in the future? If not, can we store it in the commit objects at Google somehow and use a Google-specific command to manipulate it? Maybe for upstream all we'll want is a boolean value indicating whether to propagate changes. We might then have jj cp --[no-]propagate-changes old_name new_name (not sure which should be the default).

// TODO: Probably something for git similarity detection
}

using CopyRecordStream = Pin<Box<dyn Stream<Item = BackendResult<CopyRecord>>>>;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: using is called type in Rust


In terms of _copy info_ there is no conflict here, because C does not have copy
info and needs none, but resolving the contents of F becomes more complicated.
We need to (1) identify the greatest command ancestor of A and B (D)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "command" -> "common"

jj forget-cp $DEST [-r REV]
```

## Behavioral Changes
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also be worth mentioning jj restore behavior even though it's probably pretty straight-forward how it should behave. I would think (hope) that a simple jj new xyz-; jj restore --from xyz would result in the same copies as in xyz. Presumably what we'll do is to find copies between xyz- and xyz and record them in the new commit.

Comment on lines +172 to +175
1. Rebase commit A from parents [B] to parents [C]
1. Get copy records from [D]->[B] and [D]->[C], where [D] are the common ancestors of [B] and [C]
1. DescendantRebaser maintains an in-memory map of commits to extra copy info, which it may inject into (2). When squashing a rename of a newly created file into the commit that creates that file, DescendentRebase will return this rename for all rebases of descendants of the newly modified commit. The rename lives ephemerally in memory and has no persistence after the rebase completes.
1. A to-be-determined algorithm diffs the copy records between [D]->[B] and [D]->[C] in order to make changes to the rebased commit. This results in edits to renamed files being propagated to those renamed files, and avoiding conflicts on the deletion of their sources. A copy/move may also be undone in this way; abandoning a commit which renames A->B should move all descendant edits of B back into A.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It sounds like this is mostly generic and not related to the problem mentioned above. I think it would be clearer to describe the general rebase behavior first and then describe how we can handle the thorny problem after that.

Comment on lines +172 to +175
1. Rebase commit A from parents [B] to parents [C]
1. Get copy records from [D]->[B] and [D]->[C], where [D] are the common ancestors of [B] and [C]
1. DescendantRebaser maintains an in-memory map of commits to extra copy info, which it may inject into (2). When squashing a rename of a newly created file into the commit that creates that file, DescendentRebase will return this rename for all rebases of descendants of the newly modified commit. The rename lives ephemerally in memory and has no persistence after the rebase completes.
1. A to-be-determined algorithm diffs the copy records between [D]->[B] and [D]->[C] in order to make changes to the rebased commit. This results in edits to renamed files being propagated to those renamed files, and avoiding conflicts on the deletion of their sources. A copy/move may also be undone in this way; abandoning a commit which renames A->B should move all descendant edits of B back into A.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mentions how rebasing will look at both the commits between D and B and between D and C. I'm not sure it's worth explaining here, but FYI, Mercurial takes the copies from D to B and keeps only the renames from that set, then reverses the renames to get renames from B to D. It then chains that data with the copies and renames from D to C.

Comment on lines +79 to +85
/// Get all copy records for `paths` in the dag range `roots..heads`.
///
/// The exact order these are returned is unspecified, but it is guaranteed
/// to be reverse-topological. That is, for any two copy records with
/// different commit ids A and B, if A is an ancestor of B, A is streamed
/// after B.
async fn get_copy_records(&self, paths: &[RepoPathBuf], roots: &[CommitId], heads: &[CommitId]) -> CopyRecordStream;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the results streamed instead of being returned in a single vector? Do we have use cases where you would want to learn about some copies and then terminate the stream? Ah, is it for blame/annotate support, so that can be incrementally populated in a GUI? Could you mention that here?

Comment on lines +68 to +72
/// If true, follow transitive copy records. That is, if file A is copied to
/// B and then to C, a request for copy records of file C should also return
/// the copy from A to B. These two copies will be returned in separate
/// CopyRecords.
transitive: bool
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is this useful?

Comment on lines +258 to +262
Alternatively, we can keep copy-tracking simple in jj by taking a stronger
stance here and treating all copies-onto-existing-files as 'replacement'
operations. This makes integrations with more complex VCSs that do support
'integrate'-style operations trickier, but it is possible that a more generic
commit extension system is better suited to such backends.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is what I was asking about on the CopySource::commit field. Thanks for mentioning it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants