Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve iterparentnodeids to consume / parts until the first :: #8577

Merged
merged 4 commits into from Apr 29, 2021

Conversation

parthdpa
Copy link
Contributor

@parthdpa parthdpa commented Apr 23, 2021

Fix for issue #8509 and improve iterparentnodeidsto consume / parts until the first ::

…th a / in its name by checking if there is :: before the selected / or any :: after. Also added a test case for this.
@parthdpa parthdpa changed the title Fix TestCase.setUpClass not being called for test names with / in them for issue #8509 Fix TestCase.setUpClass not being called for test names with / in them for versions later than 6.2.0 Apr 23, 2021
@parthdpa
Copy link
Contributor Author

How do I go about fixing these errors?

@RonnyPfannschmidt
Copy link
Member

The python 3.10 are unrelated and we ignore them as others fix it this weekend

Copy link
Member

@RonnyPfannschmidt RonnyPfannschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, im under the impression we have a follow-up coming

Thanks!

@@ -23,6 +23,8 @@
("a/b/c::D/d::e", ["", "a", "a/b", "a/b/c::D", "a/b/c::D/d", "a/b/c::D/d::e"]),
# : alone is not a separator.
("a/b::D:e:f::g", ["", "a", "a/b", "a/b::D:e:f", "a/b::D:e:f::g"]),
# / not considered if a part of a test name
("a/b::c/d::e[/test]", ["", "a", "a/b::c", "a/b::c/d", "a/b::c/d::e[/test]"]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im under the impression a/b is missing as element

But thats for a follow-up as the same detail applies to other nodes as well

Copy link
Contributor Author

@parthdpa parthdpa Apr 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but I believe that is apart of the functionality of the iterparentnodeids function because on line 65 of src/_pytest/nodes.py @bluetech put a comment that says

Note that :: parts are only considered at the last / component.

So that any :: between / would be considered as a part of the path.
Unless you mean this may need to be reconsidered, then I apologize for this comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean this may need a reconsideration much later, your pr is fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay. I understand. For this to be merged do I just wait for @bluetech to review it? I am new to the process, sorry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay. I understand. For this to be merged do I just wait for @bluetech to review it? I am new to the process, sorry.

Right -- I will try to review it today or tomorrow.

Copy link
Member

@bluetech bluetech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @parthdpa!

In the issue you described the solution as:

I return the path only if there is :: after or there are no :: before.

I'm having a bit of trouble wrapping my head around the new behavior and relating it to the code. It might be good but I think maybe refactoring the code or adding some explanation to the docstring would help.

To maybe help with that, let me try to reconstruct how I got to this code in the first place. I was probably thinking of a grammar like the following, using the longest-match/"eager" rule to resolve ambiguities:

SEP_PART = [^/]*   # any text without /
COLON_PART = [^::]*  # any text without :: (not proper regex syntax...)
NODEID = "" (SEP_PART "/")* (COLON_PART "::")* COLON_PART

which translates to something like this:

def iterparentnodeids(nodeid: str) -> Iterator[str]:
    pos = 0
    # The root Session node - always present.
    yield ""
    # Eagerly consume SEP parts.
    while True:
        at = nodeid.find(SEP, pos)
        if at == -1:
            break
        if at > 0:
            yield nodeid[:at]
        pos = at + len(SEP)
    # Eagerly consume :: parts.
    while True:
        at = nodeid.find("::", pos)
        if at == -1:
            break
        if at > 0:
            yield nodeid[:at]
        pos = at + len("::")
    # The node ID itself.
    if nodeid:
        yield nodeid

Then for some reason the code tries to be too clever by half and merges the two loops, but that seems useless now, the above code is probably better and easier to understand. So I'd start with that.

My thought was to change the rule such that we eagerly consume / parts until the first ::.
Taking the a/b/c::D/d::e test as an example, that would change the interpretation from

["", "a", "a/b", "a/b/c::D", "a/b/c::D/d", "a/b/c::D/d::e"]

to

["", "a", "a/b", "a/b/c", "a/b/c::D/d", "a/b/c::D/d::e"]

i.e. the path parts cannot contain ::.

But maybe your idea is better, with a bit more explaining so I can understand it :)

@parthdpa
Copy link
Contributor Author

parthdpa commented Apr 25, 2021

Thank you for the reply @bluetech
Some explanation of my thought process when writing my code, I was trying to keep the same functionality ie. still pass all the test cases that were already present as is, so that nothing drastically changes and only accounts for any / in test names. However, now that you mention

My thought was to change the rule such that we eagerly consume / parts until the first ::.
Taking the a/b/c::D/d::e test as an example, that would change the interpretation from

["", "a", "a/b", "a/b/c::D", "a/b/c::D/d", "a/b/c::D/d::e"]

to

["", "a", "a/b", "a/b/c", "a/b/c::D/d", "a/b/c::D/d::e"]

This seems better. I was assuming that I should keep the same behavior so nothing breaks.

Clarifying Questions

I think what needed to be clarified is the possibilities of paths when running pytest (but these questions are because of my little experience of using pytest). Is it ever possible for a path to only contain just /? i.e. a/b or can a path only ever contain / followed by :: (a/b::c). Also, is it possible for a path to contain :: followed by another / like the above example? i.e. a/b::c/d::D. This does not make sense to me as a tests files function cannot lead to another file with a function like this when pytest gathers tests to run. If these are not possible, then I think some of the test cases become irrelevant as they are not even possible, which would then lead to the implementation you mention (quoted above).

My Code

What lead to my implementation was keeping all existing tests passing.

                 if sep == SEP:
                    colons_after_at = nodeid.find("::", at)
                    colons_before_at = nodeid.find("::", pos, at)
                    if colons_after_at > -1 or colons_before_at == -1:
                        yield nodeid[:at]
                    else:
                        sep = "::"
                        continue
                else:
                    yield nodeid[:at]

After finding a location i.e. at != -1, I check to see if it is a /. If so I follow these steps to fix issues with / in a test name.
I check to see if there is :: before and after the at location. I assumed that a path's / can only exist in a path if / is followed by :: or if there are no :: before the located /.
So a / can exist like this

a/b

or

a/b/c::C/d

but not like this

a/b::c/d

as I am now following an assumption that anything after the last :: is part of the test's name.
But to change my implementation to the idea you mention I can simply change my code to this

                 if sep == SEP:
                    colons_before_at = nodeid.find("::", pos, at)
                    if colons_before_at == -1:
                        yield nodeid[:at]
                    else:
                        sep = "::"
                        continue
                else:
                    yield nodeid[:at]

so as to consume / until the first ::.

I think what would lead to this new implementation would be answers of "no" to my clarifying questions above.

Sorry about how the code looks, I do not know how to make the code colors and nice looking like you do above. Can you let me know how?

@bluetech
Copy link
Member

Sorry about how the code looks, I do not know how to make the code colors and nice looking like you do above. Can you let me know how?

To create a "code block" you do

```
def my_code_here(): pass
```

To add syntax highlighting, you need to specify the programming language. In pytest's case that's python or py for short. It's specified in the first line:

```py
def my_code_here(): pass
```

You can see here for more details.

This seems better. I was assuming that I should keep the same behavior so nothing breaks.

Right - the issue is not about a bug in the code, it's about the implemented behavior, so if we change the behavior, it's expected that some tests would require changes.

If these are not possible, then I think some of the test cases become irrelevant as they are not even possible, which would then lead to the implementation you mention (quoted above).

I think some of these are not possible in practice, but we get enough oddities and allow enough flexibility that I think that in general we should assume as little as possible. Basically, the function should handle any arbitrary string as a node ID.

So to change it to what you are thinking @bluetech this is what I think will do that. So that until the first ::, / is consumed.

I think this would have the desired outcome, but the code becomes a bit complicated and I'd need to spend some time with it.

For me it would be easier if you try to start with the code in #8577 (review), which implements the existing behavior in the main branch in a slightly different way, and modify that to the new behavior, just because I think it would be easier to evaluate.

@parthdpa
Copy link
Contributor Author

parthdpa commented Apr 26, 2021

def iterparentnodeids(nodeid: str) -> Iterator[str]:
    pos = 0
    # The root Session node - always present.
    yield ""
    # Eagerly consume SEP parts.
    while True:
        at = nodeid.find(SEP, pos)
        if at == -1:
            break
        if at > 0:
            # CHANGED HERE
            # Only consume / until the first ::
            first_colons = nodeid.find("::", pos, at)
            # break loop and consider :: when the first :: is encountered
            if first_colons != -1:
                        break
           # Otherwise yield the consumed / part
           yield nodeid[:at]
        pos = at + len(SEP)
    # Eagerly consume :: parts.
    while True:
        at = nodeid.find("::", pos)
        if at == -1:
            break
        if at > 0:
            yield nodeid[:at]
        pos = at + len("::")
    # The node ID itself.
    if nodeid:
        yield nodeid

I have implemented the behavior to the code above in the first while loop and added comments where I changed the functionality to consume / until the first ::.

@bluetech
Copy link
Member

@parthdpa that looks good!

I think you can simplify a bit this way:

  1. At the beginning of the function, do first_colons = nodeid.find("::")
  2. Change at = nodeid.find(SEP, pos) to at = nodeid.find(SEP, pos, first_colons)

I think this will have the same outcome but a bit simpler/more efficient.

@parthdpa
Copy link
Contributor Author

def iterparentnodeids(nodeid: str) -> Iterator[str]:
    pos = 0
    first_colons = nodeid.find("::")
    # The root Session node - always present.
    yield ""
    # Eagerly consume SEP parts until first colons.
    while True:
        at = nodeid.find(SEP, pos, first_colons)
        if at == -1:
            break
        if at > 0:
           yield nodeid[:at]
        pos = at + len(SEP)
    # Eagerly consume :: parts.
    while True:
        at = nodeid.find("::", pos)
        if at == -1:
            break
        if at > 0:
            yield nodeid[:at]
        pos = at + len("::")
    # The node ID itself.
    if nodeid:
        yield nodeid

So like this? A question I have about this is if first_colons is -1, will this be a problem? Since the .find will not look at the last character in the nodeid. Should a check be added like

    if first_colons == -1:
        at = nodeid.find(SEP, pos)

or something of the like?

@bluetech
Copy link
Member

Right, if we leave it as -1 then it would exclude the last character of nodeid. I think it would be a little nicer to do the check only once outside the loop, so something like this should work:

first_colons: Optional[int] = nodeid.find("::")
if first_colons == -1:
    first_colons = None

I added the Optional[int] type annotation, otherwise first_colons would get type int and the first_colons = None would get a mypy error.

The value None in str.find(end=...) should work for representing "search until the end of the string", as it does in slice notation.

@parthdpa
Copy link
Contributor Author

parthdpa commented Apr 27, 2021

def iterparentnodeids(nodeid: str) -> Iterator[str]:
    pos = 0
    first_colons: Optional[int] = nodeid.find("::")
    if first_colons == -1:
        first_colons = None
    # The root Session node - always present.
    yield ""
    # Eagerly consume SEP parts until first colons.
    while True:
        at = nodeid.find(SEP, pos, first_colons)
        if at == -1:
            break
        if at > 0:
           yield nodeid[:at]
        pos = at + len(SEP)
    # Eagerly consume :: parts.
    while True:
        at = nodeid.find("::", pos)
        if at == -1:
            break
        if at > 0:
            yield nodeid[:at]
        pos = at + len("::")
    # The node ID itself.
    if nodeid:
        yield nodeid

Like this? The None does work in str.find for searching until the end of the string. And should I update the file to this and fix the test cases accordingly or is this still something we should look over?

@bluetech
Copy link
Member

Like this?

Yes, that looks good to me!

And should I update the file to this and fix the test cases accordingly or is this still something we should look over?

Yes, let's update the PR to this and update the tests accordingly (you can also add other test cases if you can think of them or want to test something specifically). Then we'll see how this approach actually works out :)

…consider ::. Tests were changed to reflect this.
@parthdpa
Copy link
Contributor Author

@bluetech I have made the changes and pushed here

@parthdpa parthdpa changed the title Fix TestCase.setUpClass not being called for test names with / in them for versions later than 6.2.0 Improve iterparentnodeids to consume / parts until the first :: Apr 27, 2021
Copy link
Member

@bluetech bluetech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this is good to go! I only left a suggestion on the changelog.

In the future we should definitely explore @RonnyPfannschmidt suggestion to use something structured and information-preserving for the node IDs, so that we can fix all such issues, but for now hopefully the tradeoff we're making here is the better one.

changelog/8509.improvement.rst Outdated Show resolved Hide resolved
@bluetech
Copy link
Member

@RonnyPfannschmidt this is somewhat different than when you first reviewed, can you PTAL again?

@RonnyPfannschmidt
Copy link
Member

Im currently down with a cold, if its good for you let's go for it

Long term we want a more nuanced datastructure to help us with the ambiguous details

@bluetech
Copy link
Member

OK. @parthdpa I'll take the liberty of applying my suggestion, will merge after CI is green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants