Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract attribute docstrings for FieldInfo.description #6563

Merged
merged 38 commits into from Jan 25, 2024

Conversation

Viicos
Copy link
Contributor

@Viicos Viicos commented Jul 10, 2023

Change Summary

This implements attribute docstring parsing when the FieldInfo.description attribute is None (that is it wasn't set by the user). Thanks to @danields761 for the original implementation (which is really clean) of which the following modifications have been made:

  • Only use quotes docstrings (# comments not implemented), e.g:
class A(BaseModel):
    a: int
    "doc"
    b: int
    """triple quotes supported"""
    c: int

    """Spaces supported as well"""
  • Support spaces between assignment and docstring (unlike the original implementation) (this removes the need for a custom set of ast routines, and VSCode seems to support spaces as well)

A few things to take into account:

  • This feature won't be available in some specific environments (e.g. when in an interactive session). The reason is inspect.getsource.
  • Maybe a model_config value should be added to activate this feature (with a default value of False), considering some people might set some "private" information in these docstrings that could get exposed with FastAPI for example.

I would understand if this feature doesn't make it into Pydantic, as it could be considered a breaking change, and relying on inspect feels a bit hacky to me. But it would be a great feature nonetheless, considering @tiangolo had something similar in mind.

I'll wait for a review before adding tests and documentation

Related issue number

Closes pydantic/pydantic-settings#2.
Related discussion: #4281.

EDIT by @Kludex:

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI and coverage remains at 100%
  • Documentation reflects the changes where applicable
  • changes/<pull request or issue id>-<github username>.md file added describing change
    (see changes/README.md for details)
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @davidhewitt

@Viicos
Copy link
Contributor Author

Viicos commented Jul 10, 2023

Please review

Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this seems pretty neat to me 🚀. It would be interesting to get some feedback on what other maintainers think, overall the implementation looks quite small which is pleasing!

Some initial feedback having just scanned the PR (not reviewed in depth yet):

  • Talking with @samuelcolvin we agree this would definitely need an opt-in config flag to avoid breakage (if this is popular then we could change the default in v3 or other future major version)
  • If Field(description="...") is set as well as then should it error? I think no, probably best to use the description in that case (i.e. for cases when you want the Python and schema to diverge).

pydantic/_internal/_docs_extraction.py Outdated Show resolved Hide resolved
@Viicos
Copy link
Contributor Author

Viicos commented Jul 12, 2023

If Field(description="...") is set as well as then should it error? I think no, probably best to use the description in that case (i.e. for cases when you want the Python and schema to diverge).

Yes currently description has a higher priority and is used even if docstring is set. Having an error would be a breaking change

Copy link
Member

@Kludex Kludex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we instead create an ast.Visitor?

Something like:

from _ast import AnnAssign, Expr
from typing import Any
from pydantic import BaseModel
import inspect
import ast


class Model(BaseModel):
    a: int
    """My docstring."""
    b: int = 1
    """Docstring"""
    c: int = 1
    # This doesn't get to the AST.


class Visitor(ast.NodeVisitor):
    def __init__(self) -> None:
        super().__init__()

        self.target: str | None = None
        self.attrs: dict[str, str] = {}

    def visit_AnnAssign(self, node: AnnAssign) -> Any:
        if isinstance(node.target, ast.Name):
            self.target = node.target.id

    def visit_Expr(self, node: Expr) -> Any:
        if isinstance(node.value, ast.Str):
            docstring = inspect.cleandoc(node.value.s)
            if self.target:
                self.attrs[self.target] = docstring
            self.target = None


source = inspect.getsource(Model)
tree = ast.parse(source)

visitor = Visitor()
visitor.visit(tree)
print(visitor.attrs)

EDIT: I guess we need to make sure the Expr is the next node? 🤔 That can be done implementing the def visit().

pydantic/_internal/_docs_extraction.py Outdated Show resolved Hide resolved
pydantic/_internal/_docs_extraction.py Show resolved Hide resolved
pydantic/_internal/_docs_extraction.py Outdated Show resolved Hide resolved
pydantic/_internal/_docs_extraction.py Outdated Show resolved Hide resolved
pydantic/_internal/_docs_extraction.py Outdated Show resolved Hide resolved
pydantic/_internal/_docs_extraction.py Outdated Show resolved Hide resolved
pydantic/_internal/_docs_extraction.py Outdated Show resolved Hide resolved
@Viicos
Copy link
Contributor Author

Viicos commented Jul 12, 2023

Thanks @Kludex for the review, I'll apply the modifications and try using NodeVisitor, which seems like a good fit as well

@Kludex
Copy link
Member

Kludex commented Jul 12, 2023

Thanks @Kludex for the review, I'll apply the modifications and try using NodeVisitor, which seems like a good fit as well

Cool. 👍

You need to make sure the sequence of nodes is valid i.e. ast.AnnAssign comes before ast.Expr(ast.Str).

@adriangb
Copy link
Member

Will this work for dataclasses? TypedDict? NamedTuple?

@Viicos
Copy link
Contributor Author

Viicos commented Jul 12, 2023

Will this work for dataclasses? TypedDict? NamedTuple?

I've implemented it for dataclasses, and will look into TypedDict and NamedTuple

@Kludex
Copy link
Member

Kludex commented Jul 12, 2023

Will this work for dataclasses? TypedDict? NamedTuple?

I've implemented it for dataclasses, and will look into TypedDict and NamedTuple

There's no difference.

@Kludex
Copy link
Member

Kludex commented Jul 12, 2023

Beautiful! 👏

@Viicos
Copy link
Contributor Author

Viicos commented Jul 12, 2023

Went for the ast.NodeVisitor solution. One test is failing (🎉):

def test_resolve_annotations_module_missing(tmp_path):
# see https://github.com/pydantic/pydantic/issues/2363
file_path = tmp_path / 'module_to_load.py'
# language=Python
file_path.write_text(
"""
from pydantic import BaseModel
class User(BaseModel):
id: int
name: str = 'Jane Doe'
"""
)
spec = importlib.util.spec_from_file_location('my_test_module', file_path)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
assert module.User(id=12).model_dump() == {'id': 12, 'name': 'Jane Doe'}

To determine if a class is a built-in class, inspect checks for a __module__ attribute, and it seems creating a module on the fly doesn't provide this attribute.

(I will add tests in an hour or two)

@Viicos
Copy link
Contributor Author

Viicos commented Jul 12, 2023

Will this work for dataclasses? TypedDict? NamedTuple?

I've implemented it for dataclasses, and will look into TypedDict and NamedTuple

This will probably require some additionnal work for TypedDict and NamedTuple



def test_model_docs_extraction():
class Model(BaseModel):
Copy link
Contributor Author

@Viicos Viicos Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately inspect.getsource isn't working as expected with classes. Tests are failing under 3.9 because prior to python/cpython#10307, the following would happen:

MRE
import inspect


def test1():
    class Model:
        a = 1

    print(inspect.getsource(Model))


def test2():
    class Model:
        a = 2

    print(inspect.getsource(Model))


test1()
#> uses Model with a = 1
test2()
#> uses Model with a = 1 (wrong!)

This has been fixed since then, but I encountered this one: python/cpython#106727

I think this isn't a big deal, as users will probably not define two models with the same name in the same module.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That kinda sounds like a dealbreaker to me. I don’t know why you think it’s not common, it seems to me like it would happen all the time in tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo it's not common from a user perspective (as this only occurs if in the same module). But it is indeed way more common in tests 😕

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users write tests…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes didn't think about that 😅 That was what I was afraid of by using inspect, and I don't see any other path except shipping our own source parsing logic, which probably isn't a good idea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to improve class detection on the Python side (see the related issue on cpython), but unfortunately it will not be able to cover all edge cases, such as:

if cond:
    class A: pass
else:
    class A: pass        

As this feature would be off by default, users could be warned about the potential unexpected behavior in the documentation. I'll let you decide on the output of this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Python got some improvements since then:

It works way better if a method is defined on the class (thanks to __code__.co_firstlineno), but can still pick the wrong class otherwise.

Copy link
Contributor

@dmontagu dmontagu Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I understanding correctly that this misbehavior is only on Python<=3.9?

Also, when it gets the wrong source, does it lead to actual exceptions being raised, or does it just retrieve the docstrings incorrectly?

If it only affects <=3.9 and fails gracefully in the sense that it just doesn't get the docstrings correctly, then unless I'm missing something I think the issues with inspect.getsource maybe shouldn't be a dealbreaker since it would require all of the following conditions to be met for it to cause problems:

  • You are using Python<=3.9
  • You have opted into the new config setting for this functionality
  • You have defined a model with the same name multiple times in the same module
  • You require the description attribute for the FieldInfo must be correct in the context you are using it (I mention this because most tests, even if affected by this, won't be failing because of it or testing what it affects)
  • You aren't happy to (or aren't aware that you can) explicitly specify the description in Field(...) for affected attributes

In particular, for the case of writing tests that you mentioned @adriangb I would think that this would very rarely cause problems for tests that aren't explicitly testing this functionality.

If nothing else, it seems reasonable to me to emit a warning if this particular config flag is set in Python<=3.9.

(That said, it looks to me like there are other test failures happening right now, not just caused by this issue.)

Copy link
Contributor Author

@Viicos Viicos Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following PR:

was introduced in Python 3.9, and fixes class detection when they are nested in other classes.

Up until now, it would still fail with the following:

import inspect

class Model:
    a = 1

class Model:
    a = 2

print(Model.a)
#> 2
print(inspect.getsource(Model))
#> class Model:
#>     a = 1

This got improved with the following PR (it will now pick the last class definition):

But still can get it wrong if you're doing something like:

if True:
    class Model:
        a = 1
else:
    class Model:
        a = 2
# Which really is an uncommon edge case

when it gets the wrong source, does it lead to actual exceptions being raised, or does it just retrieve the docstrings incorrectly?

It will not raise currently, as I'm applying field docstrings in a safe way, but I could maybe raise if we encounter unexpected fields (I'll have to see if it's doable when inheritance is being used).

So to summarize, all of your conditions should indeed be met, except:

  • You are using Python<=3.9
  • it fails gracefully (can be fixed as I said)

I'll be off for a while and will continue working on this in September if this is still accepted

Edit: we are now able to accurately find the right class by walking up the frames. The solution is a bit more complex/hard to understand but work as expected, and we still fallback to the former solution if using stack frames doesn't work.

@Viicos
Copy link
Contributor Author

Viicos commented Sep 20, 2023

Will this work for dataclasses? TypedDict? NamedTuple?

I've Implemented TypedDict, left NamedTuple as it doesn't seem to support pydantic config. Next I might see if I can raise an exception if the wrong class is being used (to be correct, if the retrieved field to docstring mapping does not match the model fields).

@Viicos
Copy link
Contributor Author

Viicos commented Nov 10, 2023

I think I should probably make another PR to update pyright and fix the new issues

@Viicos Viicos mentioned this pull request Nov 10, 2023
5 tasks
Copy link

codspeed-hq bot commented Nov 15, 2023

CodSpeed Performance Report

Merging #6563 will not alter performance

Comparing Viicos:docstrings-description (f3e84a6) with main (91bc828)

Summary

✅ 10 untouched benchmarks

Copy link

codspeed-hq bot commented Nov 15, 2023

CodSpeed Performance Report

Merging #6563 will not alter performance

Comparing Viicos:docstrings-description (d9150c6) with main (91bc828)

Summary

✅ 10 untouched benchmarks

Copy link

codspeed-hq bot commented Nov 15, 2023

CodSpeed Performance Report

Merging #6563 will not alter performance

Comparing Viicos:docstrings-description (abba7d2) with main (91bc828)

Summary

✅ 10 untouched benchmarks

@alexmojaki
Copy link
Contributor

@dmontagu ping

@Pwuts
Copy link

Pwuts commented Jan 2, 2024

bump

(Bumping seems appropriate since it looks like there aren't any blockers on this PR. If there are, it would be useful to have it mentioned here.)

@Viicos
Copy link
Contributor Author

Viicos commented Jan 2, 2024

There have been a lot of reviewing from @alexmojaki so I'm going to sum things up here so that it can be easier to review.

Class definition parsing

This was first implemented in this commit: 29bbf3e, inspired by danields761/pydantic-settings. @Kludex then suggested using an ast.NodeVisitor instead, but @alexmojaki gave some arguments here to go back to the first solution. I'll let maintainers give input on which solution they want to keep.

Retrieving the class source code

To retrieve the source code of a class so that it can be parsed, I was first using inspect.getsource, but I faced some issues described here. After opening an issue on the cpython repo, class discovery got improved on Python 3.12, but can still be wrong in some cases.

Then @alexmojaki suggested using stack frames to circumvent this issue. We got hit by a few issues going this way, in particular with decorated classes (depending on the Python version, the frame f_lineno would differ). We ended up parsing the current frame block source code as a temporary AST to see if it would match the correct class definition (see the implementation of _extract_source_from_frame).

I believe we added extensive testing with edge cases that will most probably never happen in real code. This is also gated behind a config flag, so it won't break existing code.


There has been some recent discussion about PEP 224 -- Attribute Docstrings, which would make this PR obsolete if implemented. However, it would only be available in the latest Python version, so I think this feature PR would still be relevant.

@sydney-runkle
Copy link
Member

@Viicos,

Sorry for all of the pings this week. Gearing up to do a final review + to merge this - could you please rebase when you have a chance? Thank you 🌟!

Copy link

codspeed-hq bot commented Jan 13, 2024

CodSpeed Performance Report

Merging #6563 will not alter performance

Comparing Viicos:docstrings-description (caa5cbe) with main (2e459bb)

Summary

✅ 10 untouched benchmarks

@sydney-runkle sydney-runkle mentioned this pull request Jan 15, 2024
10 tasks
@alexmojaki alexmojaki merged commit 6e59619 into pydantic:main Jan 25, 2024
54 of 55 checks passed
@alexmojaki
Copy link
Contributor

Thanks @Viicos!

@Viicos
Copy link
Contributor Author

Viicos commented Jan 25, 2024

Thanks for the reviews @alexmojaki, glad to see this one merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request for "with_attr_docs" feature Use property docstrings as descriptions
8 participants