Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New code to check ignored files trivially leads to command-line overload #1758

Closed
mjpieters opened this issue Jun 21, 2021 · 3 comments · Fixed by #1759
Closed

New code to check ignored files trivially leads to command-line overload #1758

mjpieters opened this issue Jun 21, 2021 · 3 comments · Fixed by #1759
Labels
bug Something isn't working

Comments

@mjpieters
Copy link
Contributor

mjpieters commented Jun 21, 2021

5.9.0 made changes to the way gitignore is handled (see 927cf7e and 4f32c44), but unfortunately doesn't take into account that a typical project could easily have many thousands of files. The following line then blows up:

files = glob.glob(str(git_folder) + "/**/*", recursive=True)

with:

Traceback (most recent call last):
  File "/.../bin/isort", line 8, in <module>
    sys.exit(main())
  File "/.../site-packages/isort/main.py", line 1130, in main
    if config.is_skipped(Path(file_name)):
  File "/.../site-packages/isort/settings.py", line 586, in is_skipped
    git_folder = self._check_folder_gitignore(str(file_path.parent))
  File "/.../site-packages/isort/settings.py", line 550, in _check_folder_gitignore
    subprocess.check_output(  # nosec # skipcq: PYL-W1510
  File "/.../subprocess.py", line 424, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/.../subprocess.py", line 505, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/.../subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/.../subprocess.py", line 1821, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 7] Argument list too long: 'git'

In our project, git tracks 214 files. But, running glob.glob() on the same directory gives me:

$ python3 -c 'import glob; print(sum(1 for _ in glob.glob("./**/*", recursive=True)))'
8677

8677 is way too many arguments to pass to a single command. Our project happens to include a Node.js component and so has a node_modules subfolder (ignored), accounting for 7.3k files in that total.

The code should either be reverted, or updated to handle a large number of files in batches, or use the --stdin to pipe through the filenames over stdin.

E.g. the following code works for our project without breaking command-line length:

            files_result = subprocess.check_output(
                ["git", "-C", str(git_folder), "check-ignore", "--stdin"],
                encoding="utf-8",
                env={"LANG": "C.UTF-8"},
                input="\n".join(files),
            ).splitlines()

Note: in addition to switching to --stdin, I also added LANG=C.UTF-8 to the environment (should make the codecs.escape_decode() call obsolete on Windows), and used .splitlines() to remove the need to handle an empty last line result.

@timothycrosley
Copy link
Member

Thanks again for identifying, articulating, and fixing this issue! A hotfix release has been pushed to PyPI containing your patch (5.9.1)

@mjpieters
Copy link
Contributor Author

mjpieters commented Jun 21, 2021

Fantastic, thanks for the fast turn-around! I was looking into why escape_decode() (an unsupported legacy function) was being used and found out about git’s octal escapes. These can be disabled by passing -c core.quotePath= to git commands. Add -z to use NUL delimiters should eliminate all remaining quoting reasons.

See the git config documentation on core.quotePath and the git check-ignore -z switch.

Shall I create another PR to address this?

@mjpieters
Copy link
Contributor Author

I've just gone ahead and created an issue and PR for this, see #1760

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants