Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify Git.execute and Popen arguments #1688

Merged
merged 11 commits into from Oct 3, 2023

Conversation

EliahKagan
Copy link
Contributor

@EliahKagan EliahKagan commented Oct 3, 2023

This is a sequel to #1687, improving how parameters to Git.execute are documented, and improving debug logging and tests of how they affect the arguments passed to Popen.

In the Git.execute docstring, the method is no longer described as using a shell, since this is typically (and by default) not the case; the items documenting each parameter are listed in the order of the parameters; and some copyediting is done, for clarity, consistency, spacing, and in a few cases other formatting. I copyedited some other docstrings in the module as well.

The order of names in the execute_kwargs set is tweaked to also match the order in which they appear as parameters to Git.execute. (This is just for the purpose of code clarity, as set objects guarantee no particular iteration order.) Since not all unintentional mismatches between this set and the defined method parameters would cause existing tests to fail, and the failures that would occur would not always immediately show the cause of the problem, I also added a test that execute_kwargs has the exact expected relationship to the parameters of Git.execute. (Though an alternative might be to generate execute_kwargs programmatically from Git.execute using a similar technique.)

One part of the test logic in #1687 was unnecessarily complicated, due to swallowing an exception produced by running git with no arguments. This changes that by passing with_exceptions=False so that exception is never generated.

Another part of the the test logic in #1687 combined claims about the code under test with custom assertion logic in a way that made it hard to see what claims were being made by reading the test code. This fixes that by generalizing, and extracting out, an _assert_logged_for_popen test helper method. I also took this opportunity to remove its unstated incompatibility with value representations containing regular expression metacharacters. (Please note that the new code is still only robust enough for the special purpose for which it is intended; it does not actually parse the debug message rigorously.)

As requested in #1686 (comment), I changed istream= to stdin= in the debug logging message documenting the Popen call. I reused the test helper in writing a new test, test_it_logs_istream_summary_for_stdin,
which checks both that it has the new name and that it has the expected simplified value representation. I did not change any parameters or variables called istream to stdin or any other name; rather, this changes the message to better reflect the Popen call it exists to document.

If I would only have changed the displayed parameter name, I would likely not have written a test, but I also wanted to change two other things, in which the test helped verify that correctness was maintained: I reordered the displayed name=value representations in the log message to match the relative order in which they are passed to Popen. And I eliminated the istream_ok variable (which was named like a boolean flag but was not one), because having the logic for producing the string in the logging call makes it better resemble the nonidentical but corresponding logic in the Popen call, so they can be compared.

Although this is peripheral to the core concept of the clarity of function arguments, I also renamed and fixed the docstring of a local function that appeared (including by claiming) to be a method.

- Reorder the items in the git.cmd.Git.execute docstring that
  document its parameters, to be in the same order the parameters
  are actually defined in.

- Use consistent spacing, by having a blank line between successive
  items that document parameters. Before, most of them were
  separated this way, but some of them were not.

- Reorder the elements of execute_kwargs (which list all those
  parameters except the first parameter, command) to be listed in
  that order as well. These were mostly in order, but a couple were
  out of order. This is just about the order they appear in the
  definition, since sets in Python (unlike dicts) have no key order
  guarantees.
The top line of the Git.execute docstring said that it used a
shell, which is not necessarily the case (and is not usually the
case, since the default is not to use one). This removes that
claim while keeping the top-line wording otherwise the same.

It also rephrases the description of the command parameter, in a
way that does not change its meaning but reflects the more common
practice of passing a sequence of arguments (since portable calls
that do not use a shell must do that).
These are some small clarity and consistency revisions to the
docstring of git.cmd.Git.execute that didn't specifically fit in
the narrow topics of either of the preceding two commits.
(Not specific to git.cmd.Git.execute.)
The kill_process local function defined in the Git.execute method
is a local function and not a method, but it was easy to misread as
a method for two reasons:

- Its docstring described it as a method.

- It was named with a leading underscore, as though it were a
  nonpublic method. But this name is a local variable, and local
  variables are always nonpublic (except when they are function
  parameters, in which case they are in a sense public). A leading
  underscore in a local variable name usually means the variable is
  unused in the function.

This fixes the docstring and drops the leading underscore from the
name. If this is ever extracted from the Git.execute method and
placed at class or module scope, then the name can be changed back.
Instead of swallowing GitCommandError exceptions in the helper used
by test_it_uses_shell_or_not_as_specified and
test_it_logs_if_it_uses_a_shell, this modifies the helper so it
prevents Git.execute from raising the exception in the first place.
This extracts the logic of searching log messages, and asserting
that (at least) one matches a pattern for the report of a Popen
call with a given argument, from test_it_logs_if_it_uses_a_shell
into a new nonpublic test helper method _assert_logged_for_popen.

The extracted version is modified to make it slightly more general,
and slightly more robust. This is still not extremely robust: the
notation used to log Popen calls is informal, so it wouldn't make
sense to really parse it as code. But this no longer assumes that
the representation of a value ends at a word boundary, nor that the
value is free of regular expression metacharacters.
This changes how the Popen call debug logging line shows the
informal summary of what kind of thing is being passed as the stdin
argument to Popen, showing it with stdin= rather than istream=.

The new test, with "istream" in place of "stdin", passed before the
code change in the git.cmd module, failed once "istream" was
changed to "stdin" in the test, and then, as expected, passed again
once "istream=" was changed to "stdin=" in the log.debug call in
git.cmd.Git.execute.
This is still not including all or even most of the arguments, nor
are all the logged arguments literal (nor should either of those
things likely be changed). It is just to facilitate easier
comparison of what is logged to the Popen call in the code.
In Git.execute, the stdin argument to Popen is the only one where a
compound expression (rather than a single term) is currently
passed. So having that be the same in the log message makes it
easier to understand what is going on, as well as to see how the
information shown in the log corresponds to what Popen receives.
@EliahKagan EliahKagan marked this pull request as ready for review October 3, 2023 15:34
Comment on lines +378 to +384
def test_execute_kwargs_set_agrees_with_method(self):
parameter_names = inspect.signature(cmd.Git.execute).parameters.keys()
self_param, command_param, *most_params, extra_kwargs_param = parameter_names
self.assertEqual(self_param, "self")
self.assertEqual(command_param, "command")
self.assertEqual(set(most_params), cmd.execute_kwargs) # Most important.
self.assertEqual(extra_kwargs_param, "subprocess_kwargs")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or git.cmd.execute_kwargs could be generated in a manner similar to this, and this test, the need to manually update it, and the note in the Git.execute docstring about that need, could all be done away with. Maybe something like this:

execute_kwargs = inspect.signature(Git.execute).parameters.keys() - {"self", "command", "subprocess_kwargs"}

Right now it is defined very high up in the module, even higher than __all__, and this suggests that it may be intended to be available for static inspection. But I don't think any tools will statically inspect a set of strings like that (plus, static analysis tools can examine the parameters of Git.execute... though the incomplete lists of parameters in the @overload-decorated stubs that precede it confuse the situation somewhat).

git.cmd.__all__ contains only "Git" and I hope that means code that uses GitPython should not be using git.cmd.execute_kwargs or relying on its presence. If possible, perhaps it could be made more explicitly private (_execute_kwargs, or if it needs to remain statically defined, maybe _EXECUTE_KWARGS) or, even better, removed altogether if the Git.execute method's arguments can be inspected efficiently enough without it where execute_kwargs is currently used. On the other hand, people may have come to depend on it even though the presence of __all__ that omits it means no one should have depended on it.

Anyway, I do not think any of the changes I suggest in this comment need to be made in this pull request. But I wanted to mention the issue just in case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing execute_kwargs seems like a reduction of complexity, which would always be a valuable reduction of maintenance costs should changes need to be made.

As execute_kwargs was never advertised in __all__ I'd think that it's fair to say that those who depend on it nonetheless new the risk. I think the same argument is valid knowing that nothing is ever truly private, everything can be introspected if one truly wants to, yet it's something one simply has to ignore in order to be able to make any changes to python software once released.

Comment on lines -69 to +72
"stdout_as_string",
"output_stream",
"with_stdout",
"stdout_as_string",
"kill_after_timeout",
"with_stdout",
Copy link
Contributor Author

@EliahKagan EliahKagan Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes only the order in which they are given, not the contents, and the set is equal to the same value as before.

However, I am wondering if command should actually be added here. Although the @overload-decorated stubs for Git.execute define some parameters as keyword-only, in the actual definition no parameter is keyword-only and there is definitional symmetry between command and the others. Should I worry about what happens if someone has a git-command script that they can usually run as git command and tries to run g.command() where g is a git.cmd.Git instance? Or related situations?

Another ramification of the parameters not being keyword-only is that, because they can be passed positionally, it is a breaking change to add a new one to Git.execute elsewhere than the end. Still, regarding #1686 (comment), if you think a stdin parameter should be added as a synonym of istream, this could still be done even with stdin at the end, with the docstring items that document the parameters referring to each other to overcome any confusion. I am inclined to say the added complexity is not worthwhile (for example, the function would have to handle the case where they are both passed and with inconsistent values). But a possible argument against this is that the text synonym of universal_newlines could also be added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I worry about what happens if someone has a git-command script that they can usually run as git command and tries to run g.command() where g is a git.cmd.Git instance? Or related situations?

In these situations it would be great to know why command was put there in the first place, and I for one do not remember. I know there never was an issue related to command specifically, which seems to indicate it's a limitation worth ignoring or fixing in a non-breaking fashion (which seems possible).

Regarding istream, I'd think maintaining stability would let me sleep better at night especially since you seem to agree that it's used consistently here and in gitdb. Probably along with the documentation improvements, all is well and better than it was before, which even in the past didn't seem to have caused enough confusion to cause an issue to be opened.

Copy link
Contributor Author

@EliahKagan EliahKagan Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In these situations it would be great to know why command was put there in the first place, and I for one do not remember.

Do you mean why it was omitted from the execute_kwargs set? If so, my guess is that it was intended to be used positionally while the others were intended to be keyword-only. This is judging by how they are given as keyword-only arguments in the @overloads, where they are preceded by a *, item in the list of formal parameters, which is absent in the actual function definition (thus causing them to accept both positional and keyword arguments).

But it may be that I am misunderstanding what you mean here. It is also possible that their keyword-only status in the @overloads was intentionally different from the ultimate definition and intended to provide guidance. (I'm not sure. The @overloads are missing most of the arguments, which confuses tooling and seems unintentional.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to adopt your conclusion in that this is not intentional and since there are negative side-effects, like tooling not working correctly, it's certainly something that could be improved.
If against all odds such an action does create downstream breakage, it could also be undone with a patch release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should make the parameters in the real definition actually keyword-only, since that is a breaking change. Or, at least, I would not rush to that. It seems likely to me that, at least for the first few, people are passing them positionally.

I believe it is merely the absence of many parameters from the @overload-decorated stubs (which precede the real definition) that is causing VS Code not to show or autocomplete some arguments. Adding the missing parameters to the @overload-decorated stubs should be sufficient to fix that, and it shouldn't be a breaking change because (a) they are just type stubs, so it doesn't break runtime behavior, (b) the actual change seems compatible even relative to what the type stubs said before, and (c) GitPython doesn't have static typing stability currently anyway, because although py.typed is included in the package, mypy checks don't pass (nor, per #1659, do pyright checks pass).

@Byron Byron added this to the v3.1.38 - Bugfixes milestone Oct 3, 2023
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this second round of improvements!

It's at the core of GitPython and any improvements is most certainly valued downstream as long as it's not subtly breaking, which you made sure of.

With that said, I remember that this module is also used to start two long-running python processes lazily when objects are accessed, along with various calls to git for doing pretty much anything else.

Also thanks to your work the idea of using gitoxide (or rather its still to be created python-bindings) has been forming in my head, which could lead to git invocations to be reduced to zero while probably performing as well or better. Of course, anything like it is still years away, but it's something I look forward to.

Comment on lines +378 to +384
def test_execute_kwargs_set_agrees_with_method(self):
parameter_names = inspect.signature(cmd.Git.execute).parameters.keys()
self_param, command_param, *most_params, extra_kwargs_param = parameter_names
self.assertEqual(self_param, "self")
self.assertEqual(command_param, "command")
self.assertEqual(set(most_params), cmd.execute_kwargs) # Most important.
self.assertEqual(extra_kwargs_param, "subprocess_kwargs")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing execute_kwargs seems like a reduction of complexity, which would always be a valuable reduction of maintenance costs should changes need to be made.

As execute_kwargs was never advertised in __all__ I'd think that it's fair to say that those who depend on it nonetheless new the risk. I think the same argument is valid knowing that nothing is ever truly private, everything can be introspected if one truly wants to, yet it's something one simply has to ignore in order to be able to make any changes to python software once released.

@Byron Byron merged commit 91f63cd into gitpython-developers:main Oct 3, 2023
8 checks passed
@EliahKagan EliahKagan deleted the execute-args branch October 3, 2023 17:54
renovate bot added a commit to allenporter/flux-local that referenced this pull request Oct 20, 2023
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [GitPython](https://togithub.com/gitpython-developers/GitPython) |
`==3.1.37` -> `==3.1.40` |
[![age](https://developer.mend.io/api/mc/badges/age/pypi/GitPython/3.1.40?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/GitPython/3.1.40?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/GitPython/3.1.37/3.1.40?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/GitPython/3.1.37/3.1.40?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|

---

### Release Notes

<details>
<summary>gitpython-developers/GitPython (GitPython)</summary>

###
[`v3.1.40`](https://togithub.com/gitpython-developers/GitPython/compare/3.1.38...3.1.40)

[Compare
Source](https://togithub.com/gitpython-developers/GitPython/compare/3.1.38...3.1.40)

###
[`v3.1.38`](https://togithub.com/gitpython-developers/GitPython/releases/tag/3.1.38)

[Compare
Source](https://togithub.com/gitpython-developers/GitPython/compare/3.1.37...3.1.38)

#### What's Changed

- Add missing assert keywords by
[@&#8203;EliahKagan](https://togithub.com/EliahKagan) in
[gitpython-developers/GitPython#1678
- Make clear every test's status in every CI run by
[@&#8203;EliahKagan](https://togithub.com/EliahKagan) in
[gitpython-developers/GitPython#1679
- Fix new link to license in readme by
[@&#8203;EliahKagan](https://togithub.com/EliahKagan) in
[gitpython-developers/GitPython#1680
- Drop unneeded flake8 suppressions by
[@&#8203;EliahKagan](https://togithub.com/EliahKagan) in
[gitpython-developers/GitPython#1681
- Update instructions and test helpers for git-daemon by
[@&#8203;EliahKagan](https://togithub.com/EliahKagan) in
[gitpython-developers/GitPython#1684
- Fix Git.execute shell use and reporting bugs by
[@&#8203;EliahKagan](https://togithub.com/EliahKagan) in
[gitpython-developers/GitPython#1687
- No longer allow CI to select a prerelease for 3.12 by
[@&#8203;EliahKagan](https://togithub.com/EliahKagan) in
[gitpython-developers/GitPython#1689
- Clarify Git.execute and Popen arguments by
[@&#8203;EliahKagan](https://togithub.com/EliahKagan) in
[gitpython-developers/GitPython#1688
- Ask git where its daemon is and use that by
[@&#8203;EliahKagan](https://togithub.com/EliahKagan) in
[gitpython-developers/GitPython#1697
- Fix bugs affecting exception wrapping in rmtree callback by
[@&#8203;EliahKagan](https://togithub.com/EliahKagan) in
[gitpython-developers/GitPython#1700
- Fix dynamically-set **all** variable by
[@&#8203;DeflateAwning](https://togithub.com/DeflateAwning) in
[gitpython-developers/GitPython#1659
- Fix small
[#&#8203;1662](https://togithub.com/gitpython-developers/GitPython/issues/1662)
regression due to
[#&#8203;1659](https://togithub.com/gitpython-developers/GitPython/issues/1659)
by [@&#8203;EliahKagan](https://togithub.com/EliahKagan) in
[gitpython-developers/GitPython#1701
- Drop obsolete info on yanking from security policy by
[@&#8203;EliahKagan](https://togithub.com/EliahKagan) in
[gitpython-developers/GitPython#1703
- Have Dependabot offer submodule updates by
[@&#8203;EliahKagan](https://togithub.com/EliahKagan) in
[gitpython-developers/GitPython#1702
- Bump git/ext/gitdb from `49c3178` to `8ec2390` by
[@&#8203;dependabot](https://togithub.com/dependabot) in
[gitpython-developers/GitPython#1704
- Bump git/ext/gitdb from `8ec2390` to `6a22706` by
[@&#8203;dependabot](https://togithub.com/dependabot) in
[gitpython-developers/GitPython#1705
- Update readme for milestone-less releasing by
[@&#8203;EliahKagan](https://togithub.com/EliahKagan) in
[gitpython-developers/GitPython#1707
- Run Cygwin CI workflow commands in login shells by
[@&#8203;EliahKagan](https://togithub.com/EliahKagan) in
[gitpython-developers/GitPython#1709

#### New Contributors

- [@&#8203;DeflateAwning](https://togithub.com/DeflateAwning) made their
first contribution in
[gitpython-developers/GitPython#1659

**Full Changelog**:
gitpython-developers/GitPython@3.1.37...3.1.38

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/allenporter/flux-local).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4xOS4yIiwidXBkYXRlZEluVmVyIjoiMzcuMTkuMiIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants