New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract attribute docstrings for FieldInfo.description
#6563
Conversation
Please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this seems pretty neat to me 🚀. It would be interesting to get some feedback on what other maintainers think, overall the implementation looks quite small which is pleasing!
Some initial feedback having just scanned the PR (not reviewed in depth yet):
- Talking with @samuelcolvin we agree this would definitely need an opt-in config flag to avoid breakage (if this is popular then we could change the default in v3 or other future major version)
- If
Field(description="...")
is set as well as then should it error? I think no, probably best to use thedescription
in that case (i.e. for cases when you want the Python and schema to diverge).
Yes currently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we instead create an ast.Visitor
?
Something like:
from _ast import AnnAssign, Expr
from typing import Any
from pydantic import BaseModel
import inspect
import ast
class Model(BaseModel):
a: int
"""My docstring."""
b: int = 1
"""Docstring"""
c: int = 1
# This doesn't get to the AST.
class Visitor(ast.NodeVisitor):
def __init__(self) -> None:
super().__init__()
self.target: str | None = None
self.attrs: dict[str, str] = {}
def visit_AnnAssign(self, node: AnnAssign) -> Any:
if isinstance(node.target, ast.Name):
self.target = node.target.id
def visit_Expr(self, node: Expr) -> Any:
if isinstance(node.value, ast.Str):
docstring = inspect.cleandoc(node.value.s)
if self.target:
self.attrs[self.target] = docstring
self.target = None
source = inspect.getsource(Model)
tree = ast.parse(source)
visitor = Visitor()
visitor.visit(tree)
print(visitor.attrs)
EDIT: I guess we need to make sure the Expr
is the next node? 🤔 That can be done implementing the def visit()
.
Thanks @Kludex for the review, I'll apply the modifications and try using |
Cool. 👍 You need to make sure the sequence of nodes is valid i.e. |
Will this work for dataclasses? TypedDict? NamedTuple? |
I've implemented it for dataclasses, and will look into TypedDict and NamedTuple |
There's no difference. |
01ed888
to
3655b42
Compare
Beautiful! 👏 |
Went for the pydantic/tests/test_edge_cases.py Lines 2169 to 2185 in 176714d
To determine if a class is a built-in class, (I will add tests in an hour or two) |
This will probably require some additionnal work for |
tests/test_docs_extraction.py
Outdated
|
||
|
||
def test_model_docs_extraction(): | ||
class Model(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately inspect.getsource
isn't working as expected with classes. Tests are failing under 3.9 because prior to python/cpython#10307, the following would happen:
MRE
import inspect
def test1():
class Model:
a = 1
print(inspect.getsource(Model))
def test2():
class Model:
a = 2
print(inspect.getsource(Model))
test1()
#> uses Model with a = 1
test2()
#> uses Model with a = 1 (wrong!)
This has been fixed since then, but I encountered this one: python/cpython#106727
I think this isn't a big deal, as users will probably not define two models with the same name in the same module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That kinda sounds like a dealbreaker to me. I don’t know why you think it’s not common, it seems to me like it would happen all the time in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo it's not common from a user perspective (as this only occurs if in the same module). But it is indeed way more common in tests 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users write tests…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes didn't think about that 😅 That was what I was afraid of by using inspect
, and I don't see any other path except shipping our own source parsing logic, which probably isn't a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to improve class detection on the Python side (see the related issue on cpython), but unfortunately it will not be able to cover all edge cases, such as:
if cond:
class A: pass
else:
class A: pass
As this feature would be off by default, users could be warned about the potential unexpected behavior in the documentation. I'll let you decide on the output of this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So Python got some improvements since then:
- gh-106727: Make
inspect.getsource
smarter for class for same name definitions python/cpython#106815 - gh-106727: Add
__module__
check forinspect.getsource(cls)
python/cpython#106968
It works way better if a method is defined on the class (thanks to __code__.co_firstlineno
), but can still pick the wrong class otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I understanding correctly that this misbehavior is only on Python<=3.9?
Also, when it gets the wrong source, does it lead to actual exceptions being raised, or does it just retrieve the docstrings incorrectly?
If it only affects <=3.9 and fails gracefully in the sense that it just doesn't get the docstrings correctly, then unless I'm missing something I think the issues with inspect.getsource
maybe shouldn't be a dealbreaker since it would require all of the following conditions to be met for it to cause problems:
- You are using Python<=3.9
- You have opted into the new config setting for this functionality
- You have defined a model with the same name multiple times in the same module
- You require the description attribute for the
FieldInfo
must be correct in the context you are using it (I mention this because most tests, even if affected by this, won't be failing because of it or testing what it affects) - You aren't happy to (or aren't aware that you can) explicitly specify the description in
Field(...)
for affected attributes
In particular, for the case of writing tests that you mentioned @adriangb I would think that this would very rarely cause problems for tests that aren't explicitly testing this functionality.
If nothing else, it seems reasonable to me to emit a warning if this particular config flag is set in Python<=3.9
.
(That said, it looks to me like there are other test failures happening right now, not just caused by this issue.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following PR:
was introduced in Python 3.9, and fixes class detection when they are nested in other classes.
Up until now, it would still fail with the following:
import inspect
class Model:
a = 1
class Model:
a = 2
print(Model.a)
#> 2
print(inspect.getsource(Model))
#> class Model:
#> a = 1
This got improved with the following PR (it will now pick the last class definition):
But still can get it wrong if you're doing something like:
if True:
class Model:
a = 1
else:
class Model:
a = 2
# Which really is an uncommon edge case
when it gets the wrong source, does it lead to actual exceptions being raised, or does it just retrieve the docstrings incorrectly?
It will not raise currently, as I'm applying field docstrings in a safe way, but I could maybe raise if we encounter unexpected fields (I'll have to see if it's doable when inheritance is being used).
So to summarize, all of your conditions should indeed be met, except:
- You are using Python<=3.9
- it fails gracefully (can be fixed as I said)
I'll be off for a while and will continue working on this in September if this is still accepted
Edit: we are now able to accurately find the right class by walking up the frames. The solution is a bit more complex/hard to understand but work as expected, and we still fallback to the former solution if using stack frames doesn't work.
df17772
to
7aa0992
Compare
cd9ad99
to
09919ae
Compare
I've Implemented |
I think I should probably make another PR to update |
6839495
to
f3e84a6
Compare
CodSpeed Performance ReportMerging #6563 will not alter performanceComparing Summary
|
CodSpeed Performance ReportMerging #6563 will not alter performanceComparing Summary
|
CodSpeed Performance ReportMerging #6563 will not alter performanceComparing Summary
|
@dmontagu ping |
bump (Bumping seems appropriate since it looks like there aren't any blockers on this PR. If there are, it would be useful to have it mentioned here.) |
There have been a lot of reviewing from @alexmojaki so I'm going to sum things up here so that it can be easier to review. Class definition parsingThis was first implemented in this commit: 29bbf3e, inspired by danields761/pydantic-settings. @Kludex then suggested using an Retrieving the class source codeTo retrieve the source code of a class so that it can be parsed, I was first using Then @alexmojaki suggested using stack frames to circumvent this issue. We got hit by a few issues going this way, in particular with decorated classes (depending on the Python version, the frame I believe we added extensive testing with edge cases that will most probably never happen in real code. This is also gated behind a config flag, so it won't break existing code. There has been some recent discussion about PEP 224 -- Attribute Docstrings, which would make this PR obsolete if implemented. However, it would only be available in the latest Python version, so I think this feature PR would still be relevant. |
Sorry for all of the pings this week. Gearing up to do a final review + to merge this - could you please rebase when you have a chance? Thank you 🌟! |
CodSpeed Performance ReportMerging #6563 will not alter performanceComparing Summary
|
Thanks @Viicos! |
Thanks for the reviews @alexmojaki, glad to see this one merged |
Change Summary
This implements attribute docstring parsing when the
FieldInfo.description
attribute isNone
(that is it wasn't set by the user). Thanks to @danields761 for the original implementation (which is really clean) of which the following modifications have been made:#
comments not implemented), e.g:A few things to take into account:
inspect.getsource
.model_config
value should be added to activate this feature (with a default value ofFalse
), considering some people might set some "private" information in these docstrings that could get exposed with FastAPI for example.I would understand if this feature doesn't make it into Pydantic, as it could be considered a breaking change, and relying on
inspect
feels a bit hacky to me. But it would be a great feature nonetheless, considering @tiangolo had something similar in mind.I'll wait for a review before adding tests and documentation
Related issue number
Closes pydantic/pydantic-settings#2.
Related discussion: #4281.
EDIT by @Kludex:
Checklist
changes/<pull request or issue id>-<github username>.md
file added describing change(see changes/README.md for details)
Selected Reviewer: @davidhewitt