Fix handling recursive symlinks #7956

csernazs · 2020-10-28T11:29:39Z

When pytest was run on a directory containing a recursive symlink it failed
with ELOOP as the library was not able to determine the type of the
direntry:

src/_pytest/main.py:685: in collect
if not direntry.is_file():
E OSError: [Errno 40] Too many levels of symbolic links: '/home/florian/proj/pytest/tests/recursive'

This is fixed now by handling ELOOP.

Fixes #7951

The-Compiler · 2020-10-28T13:00:26Z

Looks like the test fails on Windows - I guess it should probably be skipped there instead.

csernazs · 2020-10-28T19:28:30Z

Something is not entirely correct with codecov, I'm not sure if my patch caused that drop in coverage, I've rebased my commit to master and it still appears. Could you please look at it? Thanks.

bluetech · 2020-10-28T20:47:18Z

Something is not entirely correct with codecov, I'm not sure if my patch caused that drop in coverage, I've rebased my commit to master and it still appears. Could you please look at it? Thanks.

The codecov/patch check looks at the PR diff itself to see how much of it is covered. The useful place to look at is the Diff tab in the codecov page, which shows that the raise statement in the patch is not covered by any test.

If you can reasonably write a test to cover it, then that would fix it, but sometimes it's not worth the effort, in which case we can override it and merge anyway.

(As for the change itself, I need to look into it a bit before giving a proper review, which I'll do soon)

csernazs · 2020-10-28T22:46:13Z

Thanks! When I looked at codecov it resulted me a really huge diff about many many lines of difference, but now it seems trivial. Maybe I wasn't paying attention to every detail.

Anyway, I did some research on how I could add a test for that bare raise statement. Thinking about it, what other errno can be raised from that is_file() call? It is a DirEntry object which was created from the results returned by getdents() system call (on Linux). Even if I create a directory with 0o444, so it can be listed but the contents cannot be stat()ed, is_file() method returns True. This is because the entries returned by getdents() contains the file type (regular, socket, symlink, etc..) and it is used so not stat() is made. For a recursive symlink, I think it returns DT_UNKNOWN so based on posixmodule.c, cpython calls stat() on it which is then fails with ELOOP - this appears in strace.

Reading further, man readdir says:

          Currently, only some filesystems (among them: Btrfs, ext2, ext3,
          and ext4) have full support  for  returning  the  file  type  in
          d_type.   All  applications  must  properly  handle  a return of
          DT_UNKNOWN.

So it seems that it is implementation specific. I ran out of ideas adding coverage for that raise (mounting an nfs share is a bit overkill), but I would keep it there for completeness.

nicoddemus

Great finding.

Also please add yourself to AUTHORS @csernazs! 👍

nicoddemus · 2020-10-31T12:15:57Z

src/_pytest/main.py

+                        if not direntry.is_file():
+                            continue
+                    except OSError as err:
+                        # 1921: ERROR_CANT_RESOLVE_FILENAME - fix for broken symlink pointing to itself


Suggested change

# 1921: ERROR_CANT_RESOLVE_FILENAME - fix for broken symlink pointing to itself

# 1921: ERROR_CANT_RESOLVE_FILENAME - fix for broken symlink pointing to itself (#7951)

bluetech

Hi @csernazs,

I think this approach is good.

There is one more place which calls visit() and would require a similar treatment. Also, while looking at this I discovered #7981 and submitted PR #7982 to fix it, but that PR would add yet another place which can trigger this exception. I think that adding the try-catch code in every instance would be too annoying, therefore I suggest the following:

Copy _ignore_error() from pathlib into _pytest/pathlib.py (with a comment indicating it was copied from pathlib as of Python 3.9).
Modify _pytest.pathlib.visit() to do a pre-iteration of the os.scandir results, running entry.is_file() (while ignoring the result) and skipping the entry if the error should be ignored.

This way, we can localize the try-catch in one place and know that it doesn't happen at later stages.

WDYT?

csernazs · 2020-10-31T15:04:49Z

I completely agree with moving the code to pathlib.py (I also wanted to do it but I couldn't find a good place for it, but pathlib.py is perfect)
I see the logic behind _ignore_errors and I can support it, to my best knowledge I thought that no others than ELOOP should be handled but re-reading stat(2) shows me that ENOTDIR and ENOENT is also possible. And on the plus side I think including _ignore_errors is a good idea as it gives the same protection as python core.

csernazs · 2020-10-31T15:52:06Z

I've pushed the suggested changes. A few remarks:

I kept the test in test_collection.py as it tests the original case described in the issue
a new test added to test_pathlib.py for this case which calls visit() directly
while copying the ignore_errors function, one typo fixed (_IGNORED_ERROS -> _IGNORED_ERRORS)

bluetech

@csernazs Perfect! I left a suggestion and a comment.

bluetech · 2020-10-31T16:00:41Z

src/_pytest/pathlib.py

+
+    entries = []
+    for entry in os.scandir(path):
+        try:


I would add a comment here, something like this:

Skip entries with symlink loops and other brokenness, so the caller doesn't
have to deal with it.

bluetech · 2020-10-31T16:01:05Z

src/_pytest/pathlib.py

+            entry.is_file()
+        except OSError as err:
+            if _ignore_error(err):
+                continue


Suggested change

continue

continue

raise

In such case, when the directory have n entries where 1 entry is broken, the caller will not able to see any entries but the error itself. In such case, the caller won't be able to ignore or somehow workaround the error.
On the other hand, raising the error makes sense as if is_file() raises some error, probably the following open() or other file-related call will fail and it is better to report the error as soon as possible.
So made the suggested change, just wanted to comment on it.

I agree with both the first and second part.

bluetech · 2020-10-31T16:03:51Z

testing/test_collection.py

+        def test_foo(): assert True
+        """
+    )
+    os.symlink("recursive", str(testdir.tmpdir.join("recursive")))


Another comment: should use symlink_or_skip here (already imported), for platforms which don't support symlinks.

bluetech · 2020-10-31T16:26:32Z

testing/test_pathlib.py

+    tmpdir.join("foo").write_binary(b"")
+    tmpdir.join("bar").write_binary(b"")
+
+    os.symlink("recursive", str(tmpdir.join("recursive")))


Sorry, looks like it's here too...

bluetech · 2020-10-31T16:27:42Z

src/_pytest/pathlib.py

+            entry.is_file()
+        except OSError as err:
+            if _ignore_error(err):
+                continue


I agree with both the first and second part.

When pytest was run on a directory containing a recursive symlink it failed with ELOOP as the library was not able to determine the type of the direntry: src/_pytest/main.py:685: in collect if not direntry.is_file(): E OSError: [Errno 40] Too many levels of symbolic links: '/home/florian/proj/pytest/tests/recursive' This is fixed by handling ELOOP and other errors in the visit function in pathlib.py, so the entries whose is_file() call raises an OSError with the pre-defined list of error numbers will be exluded from the result. The _ignore_errors function was copied from Lib/pathlib.py of cpython 3.9. Fixes pytest-dev#7951

bluetech

Thanks @csernazs! Will also backport this to 6.1.x.

Fix handling recursive symlinks (cherry picked from commit 7fb0ea3)

csernazs · 2020-10-31T17:06:05Z

Thank you guys for all the help!

Fix handling recursive symlinks (cherry picked from commit 7fb0ea3) Adjusted to pass str(path) for Python 3.5 support.

csernazs force-pushed the fix-7951 branch 2 times, most recently from f98502c to a2884aa Compare October 28, 2020 19:09

csernazs force-pushed the fix-7951 branch 2 times, most recently from 4abf898 to d3db71f Compare October 29, 2020 13:44

nicoddemus approved these changes Oct 31, 2020

View reviewed changes

nicoddemus added the needs backport applied to PRs, indicates that it should be ported to the current bug-fix branch label Oct 31, 2020

bluetech requested changes Oct 31, 2020

View reviewed changes

csernazs force-pushed the fix-7951 branch from d3db71f to 02b23b3 Compare October 31, 2020 15:47

bluetech reviewed Oct 31, 2020

View reviewed changes

csernazs force-pushed the fix-7951 branch from 02b23b3 to 5ce7c73 Compare October 31, 2020 16:24

bluetech reviewed Oct 31, 2020

View reviewed changes

csernazs force-pushed the fix-7951 branch from 5ce7c73 to 67682d6 Compare October 31, 2020 16:36

csernazs force-pushed the fix-7951 branch from 67682d6 to 8a38e7a Compare October 31, 2020 16:41

bluetech approved these changes Oct 31, 2020

View reviewed changes

bluetech merged commit 7fb0ea3 into pytest-dev:master Oct 31, 2020

bluetech added a commit to bluetech/pytest that referenced this pull request Oct 31, 2020

Merge pull request pytest-dev#7956 from csernazs/fix-7951

b1110cb

Fix handling recursive symlinks (cherry picked from commit 7fb0ea3)

bluetech mentioned this pull request Oct 31, 2020

[6.1.x] Fix handling recursive symlinks #7986

Merged

bluetech added a commit to bluetech/pytest that referenced this pull request Oct 31, 2020

Merge pull request pytest-dev#7956 from csernazs/fix-7951

1adbb04

Fix handling recursive symlinks (cherry picked from commit 7fb0ea3) Adjusted to pass str(path) for Python 3.5 support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix handling recursive symlinks #7956

Fix handling recursive symlinks #7956

csernazs commented Oct 28, 2020

The-Compiler commented Oct 28, 2020

csernazs commented Oct 28, 2020

bluetech commented Oct 28, 2020

csernazs commented Oct 28, 2020 •

edited

nicoddemus left a comment

nicoddemus Oct 31, 2020

bluetech left a comment

csernazs commented Oct 31, 2020

csernazs commented Oct 31, 2020

bluetech left a comment

bluetech Oct 31, 2020

bluetech Oct 31, 2020

csernazs Oct 31, 2020

bluetech Oct 31, 2020

bluetech Oct 31, 2020

bluetech Oct 31, 2020

bluetech Oct 31, 2020

bluetech left a comment

csernazs commented Oct 31, 2020

	# 1921: ERROR_CANT_RESOLVE_FILENAME - fix for broken symlink pointing to itself
	# 1921: ERROR_CANT_RESOLVE_FILENAME - fix for broken symlink pointing to itself (#7951)

Fix handling recursive symlinks #7956

Fix handling recursive symlinks #7956

Conversation

csernazs commented Oct 28, 2020

The-Compiler commented Oct 28, 2020

csernazs commented Oct 28, 2020

bluetech commented Oct 28, 2020

csernazs commented Oct 28, 2020 • edited

nicoddemus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bluetech left a comment

Choose a reason for hiding this comment

csernazs commented Oct 31, 2020

csernazs commented Oct 31, 2020

bluetech left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bluetech left a comment

Choose a reason for hiding this comment

csernazs commented Oct 31, 2020

csernazs commented Oct 28, 2020 •

edited