New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
copy-info: initial design doc #3574
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor stuff, as I'm no expert.
I do have one question though, will implicit tracking also be a future goal, where are plain mv/cp
in a jj
repo gets rewritten to a jj mv/cp
?
And a minor note on the title, if it only targets metadata it is fine. But if it also tackles #47 and #3386 then a better title would be "Copy info and Copy tracing Design" or "Copy Info and rename tracking for Jujutsu".
4a4cba8
to
477e237
Compare
477e237
to
a646cce
Compare
Partial reply; I haven't added anything for merge resolution yet, need to think about it for a while. |
cfd7e64
to
528f120
Compare
528f120
to
2e6ff94
Compare
docs/design/copy-info.md
Outdated
```rust | ||
pub struct Commit { | ||
... | ||
copies: Option<HashMap<RepoPathBuf, MergedCopySoure>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the "Merged" (or conflicting) CopySource
represent here?
Based on "Squash conflicts" example,
Suppose we create commit A by renaming file F1 -> F, then we create child
commit B in which we replace F by renaming F2 -> F. This touches on two issues.Firstly, if these two commits are squashed together, then we have a destination
F with two copy sources, F1 and F2. In this case, we can store a
Merge[Some(F1), None, Some(F2)]
as the copy source for F,
a file in a squashed commit may have multiple copy sources. However, this "squash" operation doesn't involve merging, and I'm not sure what the negative term None
represents. Suppose copy info forms a file-level DAG, F: vec![F1, F2]
might be more appropriate (if the rename in the commit B is not destructive.)
FWIW, I don't know how Merge<Option<CopySource>>
conflicts (calculated by transitive get_copy_records()
) will be processed by caller. Other Merge<T>
types represent conflicts of values, but CopySource
is a diff at a certain point in the DAG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so instead of a Merge<>, we could represent copy info as an ordered set (stored as a Vec<>)? That's an interesting idea, though some backends (ours definitely) would reject multiple copies in a commit, and I'm not sure how you'd handle that for something like jj diff
or jj rebase
. I chose the Merge<> struct just to represent it as a conflict that needed resolving (where None
means "no copy source"). Maybe it should still represent that just as a Vec<> instead of a Merge<>, where len()==0 means no copy, len()==1 is resolved and len()>1 is a conflict.
Is your concern with the Merge<> type itself or do you think we should actually support copies from multiple sources in the same commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but
CopySource
is a diff at a certain point in the DAG.
Is it, really?
do you think we should actually support copies from multiple sources in the same commit?
Wouldn't this be like doing jj new x y
, in which both x and y rename a file to something different, or even F1 to F to F2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specifically about two copies into the same destination, since copy info is keyed by destination. Two separate copies from F -> F1 and F -> F2 can coexist without any complications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so it's like jj new x y
, in which both x and y rename a file to the same name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so instead of a Merge<>, we could represent copy info as an ordered set (stored as a Vec<>)? That's an interesting idea, though some backends (ours definitely) would reject multiple copies in a commit, and I'm not sure how you'd handle that for something like
jj diff
orjj rebase
.
If the backend reject multiple sources, Merge<>
(which is basically Vec
of positive and negative terms) wouldn't work either. I thought your intention was to support that.
I chose the Merge<> struct just to represent it as a conflict that needed resolving (where
None
means "no copy source"). Maybe it should still represent that just as a Vec<> instead of a Merge<>, where len()==0 means no copy, len()==1 is resolved and len()>1 is a conflict.Is your concern with the Merge<> type itself or do you think we should actually support copies from multiple sources in the same commit?
My concern is what the negative CopySource
term represents, and where it comes from for the linear squash case. Let's say we have actually squashed a merge commit A->{B, C}
,
A
/ \
{f<-b} B C {f<-c}
\ /
D {f<-d}
we could say that the resulting commit had a copy source f: Merge[b, d, c]
(if we do merge copy sources just like the other value types.) However, I have no idea what the negative term f: d
would mean. (btw, I think f: [b, c]
is more correct in this example.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a CopySources
enum to represent resolved/conflicted copies without the negative elements that the Merge
type has, I think this resolves the main issue.
If the backend reject multiple sources
To clarify: our backend can support this (and will), but the backend's backend does not and so users will be unable to do the internal equivalent of 'git push' on such commits, which I think is fine (same as you can't push jj commits with conflicts).
@newren Hello Elijah, this may be of interest for you, would you mind taking a look at it? |
I renamed the doc and file name to better capture the 'tracking' aspect |
2e6ff94
to
ec0fabb
Compare
ec0fabb
to
75cf906
Compare
/// The source commit the target was copied from. If not specified, then the | ||
/// parent of the target commit is the source commit. Backends may use this | ||
/// field to implement 'integration' logic, where a source may be | ||
/// periodically merged into a target, similar to a branch, but the | ||
/// branching occurs at the file level rather than the repository level. It | ||
/// also follows naturally that any copy source targeted to a specific | ||
/// commit should avoid copy propagation on rebasing, which is desirable | ||
/// for 'fork' style copies. | ||
commit: Option<CommitId>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still a bit unclear to me how this will be used. What will it look like to the user? Will they do something like jj cp --from-commit xyz old_name new_name
?
It bothers me that there's a commit id in here. Do we need to guarantee that it's an ancestor of the commit we encountered the CopySource
on? What does it mean if it's not?
My gut feeling is still that this is too specific to Google's Perforce-like VCS. Do we think it'll be useful with other backends in the future? If not, can we store it in the commit objects at Google somehow and use a Google-specific command to manipulate it? Maybe for upstream all we'll want is a boolean value indicating whether to propagate changes. We might then have jj cp --[no-]propagate-changes old_name new_name
(not sure which should be the default).
// TODO: Probably something for git similarity detection | ||
} | ||
|
||
using CopyRecordStream = Pin<Box<dyn Stream<Item = BackendResult<CopyRecord>>>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: using
is called type
in Rust
|
||
In terms of _copy info_ there is no conflict here, because C does not have copy | ||
info and needs none, but resolving the contents of F becomes more complicated. | ||
We need to (1) identify the greatest command ancestor of A and B (D) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "command" -> "common"
jj forget-cp $DEST [-r REV] | ||
``` | ||
|
||
## Behavioral Changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might also be worth mentioning jj restore
behavior even though it's probably pretty straight-forward how it should behave. I would think (hope) that a simple jj new xyz-; jj restore --from xyz
would result in the same copies as in xyz
. Presumably what we'll do is to find copies between xyz-
and xyz
and record them in the new commit.
1. Rebase commit A from parents [B] to parents [C] | ||
1. Get copy records from [D]->[B] and [D]->[C], where [D] are the common ancestors of [B] and [C] | ||
1. DescendantRebaser maintains an in-memory map of commits to extra copy info, which it may inject into (2). When squashing a rename of a newly created file into the commit that creates that file, DescendentRebase will return this rename for all rebases of descendants of the newly modified commit. The rename lives ephemerally in memory and has no persistence after the rebase completes. | ||
1. A to-be-determined algorithm diffs the copy records between [D]->[B] and [D]->[C] in order to make changes to the rebased commit. This results in edits to renamed files being propagated to those renamed files, and avoiding conflicts on the deletion of their sources. A copy/move may also be undone in this way; abandoning a commit which renames A->B should move all descendant edits of B back into A. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It sounds like this is mostly generic and not related to the problem mentioned above. I think it would be clearer to describe the general rebase behavior first and then describe how we can handle the thorny problem after that.
1. Rebase commit A from parents [B] to parents [C] | ||
1. Get copy records from [D]->[B] and [D]->[C], where [D] are the common ancestors of [B] and [C] | ||
1. DescendantRebaser maintains an in-memory map of commits to extra copy info, which it may inject into (2). When squashing a rename of a newly created file into the commit that creates that file, DescendentRebase will return this rename for all rebases of descendants of the newly modified commit. The rename lives ephemerally in memory and has no persistence after the rebase completes. | ||
1. A to-be-determined algorithm diffs the copy records between [D]->[B] and [D]->[C] in order to make changes to the rebased commit. This results in edits to renamed files being propagated to those renamed files, and avoiding conflicts on the deletion of their sources. A copy/move may also be undone in this way; abandoning a commit which renames A->B should move all descendant edits of B back into A. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mentions how rebasing will look at both the commits between D and B and between D and C. I'm not sure it's worth explaining here, but FYI, Mercurial takes the copies from D to B and keeps only the renames from that set, then reverses the renames to get renames from B to D. It then chains that data with the copies and renames from D to C.
/// Get all copy records for `paths` in the dag range `roots..heads`. | ||
/// | ||
/// The exact order these are returned is unspecified, but it is guaranteed | ||
/// to be reverse-topological. That is, for any two copy records with | ||
/// different commit ids A and B, if A is an ancestor of B, A is streamed | ||
/// after B. | ||
async fn get_copy_records(&self, paths: &[RepoPathBuf], roots: &[CommitId], heads: &[CommitId]) -> CopyRecordStream; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the results streamed instead of being returned in a single vector? Do we have use cases where you would want to learn about some copies and then terminate the stream? Ah, is it for blame/annotate support, so that can be incrementally populated in a GUI? Could you mention that here?
/// If true, follow transitive copy records. That is, if file A is copied to | ||
/// B and then to C, a request for copy records of file C should also return | ||
/// the copy from A to B. These two copies will be returned in separate | ||
/// CopyRecords. | ||
transitive: bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is this useful?
Alternatively, we can keep copy-tracking simple in jj by taking a stronger | ||
stance here and treating all copies-onto-existing-files as 'replacement' | ||
operations. This makes integrations with more complex VCSs that do support | ||
'integrate'-style operations trickier, but it is possible that a more generic | ||
commit extension system is better suited to such backends. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is what I was asking about on the CopySource::commit
field. Thanks for mentioning it.
Checklist
If applicable:
CHANGELOG.md