Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support subpackage for Git's pathspecs #588

Closed
wants to merge 5 commits into from
Closed

Conversation

mih
Copy link
Member

@mih mih commented Jan 9, 2024

The main (if not only) purpose of this functionality is pathspec mangling/translation for handing them over to analog Git command calls on submodules -- for any Git command that supports pathspecs, but not recursion.

A simple example for such a command is git ls-files --other. It accepts pathspecs, but does not implement --recurse-submodules for listing untracked files.

The goal of this functionality is to be able to take pathspecs that is valid in the context of a top-level repository, and translate it such that the set of paths specs given to the same command running on/in a submodule/subdirectory gives the same results, as if the initial top-level invocation reported them (if it even could).

The included sketch of a testbattery uses ``git ls-files --other` for testing, rather than a formal description -- because the behavior of the implementation is more elaborate than the documentation at https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec suggests.

All testing is (for now) performed within a single repository, and with translation for execution in subdirectories.

The implementation is a rough sketch for exploring the problem, rather than anything polished.

Ping #587

Also see datalad/datalad#6933

TODO:

  • ready tests for case-insensitive file systems
  • support :(glob) pathspecs via PurePath.match()
  • test :(exclude) pathspec handling
  • test :(attr:...) pathspec handling
  • have another go at figuring out why the icase magic does not work as documented
  • add optimization to bypass expensive translations when a pathspec is just a path with no wildcards
  • Add constraint to validate pathspecs
  • Add pathspec support to iter_submodules(). Alternatively, add pathspec support to iter_gitworktree(). However, the former could be less complex. We could take any reported submodule, and try to port any given pathspec to it, and report only submodule that has any match. In iter_gitworktree() we'd have the problem that git ls-files is totally silent if given a pathspec that points (exclusively) into a submodule.

@mih mih force-pushed the pathspec branch 3 times, most recently from 46fad4e to e1baf68 Compare January 9, 2024 16:26
@mih mih added this to the 1.2 milestone Jan 21, 2024
@mih
Copy link
Member Author

mih commented Jan 26, 2024

The additions related to next-status now provide a nice test ground for getting this done.

Copy link

codecov bot commented Jan 26, 2024

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (3d0c8a4) 92.68% compared to head (5ec4ebf) 26.00%.

Files Patch % Lines
datalad_next/gitpathspecs/__init__.py 98.21% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #588       +/-   ##
===========================================
- Coverage   92.68%   26.00%   -66.69%     
===========================================
  Files         160      162        +2     
  Lines       11495    12022      +527     
  Branches     1769     1832       +63     
===========================================
- Hits        10654     3126     -7528     
- Misses        652     8776     +8124     
+ Partials      189      120       -69     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

The main (if not only) purpose of this functionality is pathspec
mangling/translation for handing them over to analog Git command
calls on submodules -- for any Git command that supports pathspecs,
but not recursion.

A simple example for such a command is `git ls-files --other`. It
accepts pathspecs, but does not implement `--recurse-submodules` for
listing untracked files.

The goal of this functionality is to be able to take pathspecs that is
valid in the context of a top-level repository, and translate it such
that the set of paths specs given to the same command running on/in a
submodule/subdirectory gives the same results, as if the initial
top-level invocation reported them (if it even could).

The included sketch of a testbattery uses ``git ls-files --other`
for testing, rather than a formal description -- because the behavior
of the implementation is more elaborate than the documentation at
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
suggests.

All testing is (for now) performed within a single repository, and with
translation for execution in subdirectories.

The implementation is a rough sketch for exploring the problem, rather
than anything polished.

Ping datalad#587

Also see datalad/datalad#6933
There is still some weird behavior coming from Git some times, but this
has no impact on this code, it just makes constructing test cases
more complex.
@mih
Copy link
Member Author

mih commented Jan 31, 2024

Leaving a note here re adoption of pathspecs.

I found that one implementation is missing: Recursive discovery of untracked files that honors pathspecs.

A use case would be a recursive save of a pathspec-defined subset.

However, for consistency and simplicity of behavior, even a status report should be able to select or ignore untracked elements by pathspec.

A git status does not do that. A submodule is considered "untracked content" whenever there is any untracked content. It does not pass on pathspecs that may filter out all untracked content, and would make a submodule "clean considering constraints".

For next-status this would have the implication that, when pathspecs are given, we never let Git do a full submodule evaluation, but recursive ourselves.

@mih mih mentioned this pull request Feb 2, 2024
@mih mih modified the milestones: 1.2, 1.3 Feb 2, 2024
@mih mih modified the milestones: 1.3, 1.4 Mar 19, 2024
@mih mih modified the milestones: 1.4, 1.5 May 16, 2024
@mih mih mentioned this pull request May 23, 2024
6 tasks
@mih
Copy link
Member Author

mih commented May 27, 2024

Closing this PR, because #714 has an advanced implementation.

@mih mih closed this May 27, 2024
@mih mih deleted the pathspec branch May 27, 2024 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant