Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling recursive symlinks #7956

Merged
merged 1 commit into from
Oct 31, 2020
Merged

Conversation

csernazs
Copy link
Contributor

When pytest was run on a directory containing a recursive symlink it failed
with ELOOP as the library was not able to determine the type of the
direntry:

src/_pytest/main.py:685: in collect
if not direntry.is_file():
E OSError: [Errno 40] Too many levels of symbolic links: '/home/florian/proj/pytest/tests/recursive'

This is fixed now by handling ELOOP.

Fixes #7951

@The-Compiler
Copy link
Member

Looks like the test fails on Windows - I guess it should probably be skipped there instead.

@csernazs csernazs force-pushed the fix-7951 branch 2 times, most recently from f98502c to a2884aa Compare October 28, 2020 19:09
@csernazs
Copy link
Contributor Author

Something is not entirely correct with codecov, I'm not sure if my patch caused that drop in coverage, I've rebased my commit to master and it still appears. Could you please look at it? Thanks.

@bluetech
Copy link
Member

Something is not entirely correct with codecov, I'm not sure if my patch caused that drop in coverage, I've rebased my commit to master and it still appears. Could you please look at it? Thanks.

The codecov/patch check looks at the PR diff itself to see how much of it is covered. The useful place to look at is the Diff tab in the codecov page, which shows that the raise statement in the patch is not covered by any test.

If you can reasonably write a test to cover it, then that would fix it, but sometimes it's not worth the effort, in which case we can override it and merge anyway.

(As for the change itself, I need to look into it a bit before giving a proper review, which I'll do soon)

@csernazs
Copy link
Contributor Author

csernazs commented Oct 28, 2020

Thanks! When I looked at codecov it resulted me a really huge diff about many many lines of difference, but now it seems trivial. Maybe I wasn't paying attention to every detail.

Anyway, I did some research on how I could add a test for that bare raise statement. Thinking about it, what other errno can be raised from that is_file() call? It is a DirEntry object which was created from the results returned by getdents() system call (on Linux). Even if I create a directory with 0o444, so it can be listed but the contents cannot be stat()ed, is_file() method returns True. This is because the entries returned by getdents() contains the file type (regular, socket, symlink, etc..) and it is used so not stat() is made. For a recursive symlink, I think it returns DT_UNKNOWN so based on posixmodule.c, cpython calls stat() on it which is then fails with ELOOP - this appears in strace.

Reading further, man readdir says:

          Currently, only some filesystems (among them: Btrfs, ext2, ext3,
          and ext4) have full support  for  returning  the  file  type  in
          d_type.   All  applications  must  properly  handle  a return of
          DT_UNKNOWN.

So it seems that it is implementation specific. I ran out of ideas adding coverage for that raise (mounting an nfs share is a bit overkill), but I would keep it there for completeness.

@csernazs csernazs force-pushed the fix-7951 branch 2 times, most recently from 4abf898 to d3db71f Compare October 29, 2020 13:44
Copy link
Member

@nicoddemus nicoddemus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great finding.

Also please add yourself to AUTHORS @csernazs! 👍

if not direntry.is_file():
continue
except OSError as err:
# 1921: ERROR_CANT_RESOLVE_FILENAME - fix for broken symlink pointing to itself
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# 1921: ERROR_CANT_RESOLVE_FILENAME - fix for broken symlink pointing to itself
# 1921: ERROR_CANT_RESOLVE_FILENAME - fix for broken symlink pointing to itself (#7951)

@nicoddemus nicoddemus added the needs backport applied to PRs, indicates that it should be ported to the current bug-fix branch label Oct 31, 2020
Copy link
Member

@bluetech bluetech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @csernazs,

I think this approach is good.

There is one more place which calls visit() and would require a similar treatment. Also, while looking at this I discovered #7981 and submitted PR #7982 to fix it, but that PR would add yet another place which can trigger this exception. I think that adding the try-catch code in every instance would be too annoying, therefore I suggest the following:

  • Copy _ignore_error() from pathlib into _pytest/pathlib.py (with a comment indicating it was copied from pathlib as of Python 3.9).

  • Modify _pytest.pathlib.visit() to do a pre-iteration of the os.scandir results, running entry.is_file() (while ignoring the result) and skipping the entry if the error should be ignored.

This way, we can localize the try-catch in one place and know that it doesn't happen at later stages.

WDYT?

@csernazs
Copy link
Contributor Author

I completely agree with moving the code to pathlib.py (I also wanted to do it but I couldn't find a good place for it, but pathlib.py is perfect)
I see the logic behind _ignore_errors and I can support it, to my best knowledge I thought that no others than ELOOP should be handled but re-reading stat(2) shows me that ENOTDIR and ENOENT is also possible. And on the plus side I think including _ignore_errors is a good idea as it gives the same protection as python core.

@csernazs
Copy link
Contributor Author

I've pushed the suggested changes. A few remarks:

  • I kept the test in test_collection.py as it tests the original case described in the issue
  • a new test added to test_pathlib.py for this case which calls visit() directly
  • while copying the ignore_errors function, one typo fixed (_IGNORED_ERROS -> _IGNORED_ERRORS)

Copy link
Member

@bluetech bluetech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csernazs Perfect! I left a suggestion and a comment.


entries = []
for entry in os.scandir(path):
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a comment here, something like this:

Skip entries with symlink loops and other brokenness, so the caller doesn't
have to deal with it.

entry.is_file()
except OSError as err:
if _ignore_error(err):
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
continue
continue
raise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In such case, when the directory have n entries where 1 entry is broken, the caller will not able to see any entries but the error itself. In such case, the caller won't be able to ignore or somehow workaround the error.
On the other hand, raising the error makes sense as if is_file() raises some error, probably the following open() or other file-related call will fail and it is better to report the error as soon as possible.
So made the suggested change, just wanted to comment on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with both the first and second part.

def test_foo(): assert True
"""
)
os.symlink("recursive", str(testdir.tmpdir.join("recursive")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another comment: should use symlink_or_skip here (already imported), for platforms which don't support symlinks.

tmpdir.join("foo").write_binary(b"")
tmpdir.join("bar").write_binary(b"")

os.symlink("recursive", str(tmpdir.join("recursive")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, looks like it's here too...

entry.is_file()
except OSError as err:
if _ignore_error(err):
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with both the first and second part.

When pytest was run on a directory containing a recursive symlink it failed
with ELOOP as the library was not able to determine the type of the
direntry:

src/_pytest/main.py:685: in collect
    if not direntry.is_file():
E   OSError: [Errno 40] Too many levels of symbolic links: '/home/florian/proj/pytest/tests/recursive'

This is fixed by handling ELOOP and other errors in the visit function in
pathlib.py, so the entries whose is_file() call raises an OSError with the
pre-defined list of error numbers will be exluded from the result.

The _ignore_errors function was copied from Lib/pathlib.py of cpython 3.9.

Fixes pytest-dev#7951
Copy link
Member

@bluetech bluetech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @csernazs! Will also backport this to 6.1.x.

@bluetech bluetech merged commit 7fb0ea3 into pytest-dev:master Oct 31, 2020
bluetech added a commit to bluetech/pytest that referenced this pull request Oct 31, 2020
Fix handling recursive symlinks

(cherry picked from commit 7fb0ea3)
@csernazs
Copy link
Contributor Author

Thank you guys for all the help!

bluetech added a commit to bluetech/pytest that referenced this pull request Oct 31, 2020
Fix handling recursive symlinks

(cherry picked from commit 7fb0ea3)

Adjusted to pass str(path) for Python 3.5 support.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs backport applied to PRs, indicates that it should be ported to the current bug-fix branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pytest crashes when recursive symlinks present
4 participants