Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: add jj fix proof of concept #3647

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

martinvonz
Copy link
Owner

Checklist

If applicable:

  • I have updated CHANGELOG.md
  • I have updated the documentation (README.md, docs/, demos/)
  • I have updated the config schema (cli/src/config-schema.json)
  • I have added tests to cover my changes

@martinvonz martinvonz force-pushed the push-ypqvxpyoowtm branch 2 times, most recently from 666a710 to 9247a5f Compare May 8, 2024 07:14
Copy link
Collaborator

@arxanas arxanas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there's context for this PR or if you're seeking review from anyone 🙂.

tx.finish(ui, format!("fixed {} commits", root_commits.len()))
}

fn format_files<'a>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment: Ultimately, we presumably want to program against an interface dealing with commits as an intermediary between jj fix and the formatter, which I guess would eventually arrive as part of jj run?

For reference, here's what the git-branchless interface currently looks like (it's used in git test but also in git submit for the Phabricator backend): https://github.com/arxanas/git-branchless/blob/e87ab50603fe40e1285bba574d9d58e7b9c88baa/git-branchless-test/src/lib.rs#L1204-L1215.

(One thing I would change about the interface is to not necessarily pass in an entire slice of Commits, and use only a lazily-evaluated revset of some sort, for the case of searching commits. It's also actually relevant for formatting commits, if your objective is to find the first badly-formatted commit and apply the formatting as a patch, rather than formatting all commits in a stateless manner — both are meaningful strategies.)

That interface only supports external commands, but I assume that jj also wants to support in-process commands/functions. In either case, I think there's not too much information that we need to pass to the command/function, which would form the interface of a "command" that can be run as part of jj run/jj fix:

  • The ID of the commit being processed.
  • The IDs of ancestors of that commit which are also roots of the commit set being formatted.
  • If literally running a command in a workspace, then a handle to the workspace.
  • For convenience, perhaps also the following, even though they can be computed directly from the IDs of all the involved commits?
    • The set of files changed compared to parents.
    • The set of files changed compared to roots.

The approach as written in this PR deduplicates requests to format the same file, but I would encourage to not make that the direct interface, as it's not general enough to handle certain classes of fix. If the formatter function is actually running in-process, then it shouldn't be too hard to recover the caching at a different level of abstraction instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions

(One thing I would change about the interface is to not necessarily pass in an entire slice of Commits, and use only a lazily-evaluated revset of some sort, for the case of searching commits. It's also actually relevant for formatting commits, if your objective is to find the first badly-formatted commit and apply the formatting as a patch, rather than formatting all commits in a stateless manner — both are meaningful strategies.)

Can you explain your idea a bit more, as with run we try to cache commits somewhere, either on disk or depending on backend somewhere else.

That interface only supports external commands, but I assume that jj also wants to support in-process commands/functions. In either case, I think there's not too much information that we need to pass to the command/function, which would form the interface of a "command" that can be run as part of jj run/jj fix:

  • The ID of the commit being processed.

  • The IDs of ancestors of that commit which are also roots of the commit set being formatted.

  • If literally running a command in a workspace, then a handle to the workspace.

  • For convenience, perhaps also the following, even though they can be computed directly from the IDs of all the involved commits?

    • The set of files changed compared to parents.
    • The set of files changed compared to roots.

This looks like a really good idea, but how should that interface look? Should there be some Context which is passed into the inner run impl?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach as written in this PR deduplicates requests to format the same file, but I would encourage to not make that the direct interface, as it's not general enough to handle certain classes of fix. If the formatter function is actually running in-process, then it shouldn't be too hard to recover the caching at a different level of abstraction instead?

jj fix is specialized for formatting individual files (does not support formatters that need to inspect multiple files) and can therefore avoid materializing working copies. Maybe we can rewrite it using the jj run infrastructure once we have some virtual file system support so we don't need to materialize the working copies. Does that sound okay?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about working copies specifically, but rather about organizing around commits rather than files:

  • organizing around commits will be better for the eventual jj run integration
  • there are purely in-memory reasons to operate on commits for jj fix
    • some formatters are designed to operate specifically on changed sections of code (particularly useful for transition periods between not having and having a formatter, or upgrading the formatter style)
    • there are some meaningful operations to do on commit messages as well, like inserting a change ID or word wrapping (I guess?)
  • error reporting is almost certainly going to want to be per-commit, not per-file — you want to know which commit to go to in order to fix a syntax error

Therefore I recommend that even the current iteration be designed around commits.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot to unpack here, but I can say much of it has come up already in our internal use of hg fix, and my plans for jj fix aren't much different from what we converged on there. I don't yet see a way of totally unifying fix and run that seems worth it to me, but I do want jj run --fix -r <rev> <command> to work as an alternative to jj fix for commands that need more context than formatters.

The point about error reporting is interesting, because there can be multiple root commits where a syntax error would need to be fixed. So you can't just pick a good one to associate the error message with. hg fix just prints the error once for every commit where it occurs. Once we add support for formatting changed line ranges, there will be less possible de-duplication of work across commits, but it will still be quite beneficial to some users if we can do O(changes) work instead of O(changes*commits) for chains of commits with disjoint sets of modified files.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • some formatters are designed to operate specifically on changed sections of code (particularly useful for transition periods between not having and having a formatter, or upgrading the formatter style)

@hooper and I talked a bit about this the other day. I think we can do it by still fully formatting both sides of the diff and then keep only hunks that overlap with hunks in the diff. Consequences:

  • Works independent of formatter (no need for formatter to support partial formatting, no need for jj fix to provide a way to pass in line ranges)
  • Wastefully formats entire files
  • If formatting results in a large chunk of the file changing (e.g. reflowing), then this algorithm will make the reformatting apply to a large chunk too, for better or worse (I don't know how formatters that support partial formatting treat this case)


let formatted = format_files(tx.repo().store().as_ref(), &file_ids)?;

tx.mut_repo().transform_descendants(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment: One thing to consider is how to recover from failures. My initial expectation was that I would want the entire process to fail if processing any commit failed. But, so far in my experience running git test fix, if processing one or more commits fails —

  • I've still actually always wanted to apply any fixes for commits whose ancestors did not fail.
  • In some but not all cases, I've wanted to apply fixes for commits that passed, even if some ancestor failed.
  • I don't think I've ever wanted to apply a fix if the command for it returned a non-zero exit status, even if it did successfully format some files on disk.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I suspect @hooper has thought about that but I don't think I have. I think what you say makes sense. I suppose the effect is that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach that has worked well for us in hg fix is to only use the output if the exit code is zero, and to print any stderr lines with a prefix that indicates which commit/file caused it. Part of this problem is just meeting the formatting tools where they are instead of trying to negotiate some new contract with them.

@martinvonz
Copy link
Owner Author

I don't know if there's context for this PR or if you're seeking review from anyone 🙂.

Sorry that I didn't provide context. I wrote this as a starting point for @hooper who is going to finish this PR (or duplicate it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants