Extract attribute docstrings for `FieldInfo.description` #6563

Viicos · 2023-07-10T18:00:04Z

Change Summary

This implements attribute docstring parsing when the FieldInfo.description attribute is None (that is it wasn't set by the user). Thanks to @danields761 for the original implementation (which is really clean) of which the following modifications have been made:

Only use quotes docstrings (# comments not implemented), e.g:

class A(BaseModel):
    a: int
    "doc"
    b: int
    """triple quotes supported"""
    c: int

    """Spaces supported as well"""

Support spaces between assignment and docstring (unlike the original implementation) (this removes the need for a custom set of ast routines, and VSCode seems to support spaces as well)

A few things to take into account:

This feature won't be available in some specific environments (e.g. when in an interactive session). The reason is inspect.getsource.
Maybe a model_config value should be added to activate this feature (with a default value of False), considering some people might set some "private" information in these docstrings that could get exposed with FastAPI for example.

I would understand if this feature doesn't make it into Pydantic, as it could be considered a breaking change, and relying on inspect feels a bit hacky to me. But it would be a great feature nonetheless, considering @tiangolo had something similar in mind.

I'll wait for a review before adding tests and documentation

Related issue number

Closes pydantic/pydantic-settings#2.
Related discussion: #4281.

EDIT by @Kludex:

Closes Use property docstrings as descriptions #5656

Checklist

Unit tests for the changes exist
Tests pass on CI and coverage remains at 100%
Documentation reflects the changes where applicable
changes/<pull request or issue id>-<github username>.md file added describing change
(see changes/README.md for details)
My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @davidhewitt

pydantic/_internal/_docs_extraction.py

Viicos · 2023-07-10T18:02:23Z

Please review

davidhewitt

Nice, this seems pretty neat to me 🚀. It would be interesting to get some feedback on what other maintainers think, overall the implementation looks quite small which is pleasing!

Some initial feedback having just scanned the PR (not reviewed in depth yet):

Talking with @samuelcolvin we agree this would definitely need an opt-in config flag to avoid breakage (if this is popular then we could change the default in v3 or other future major version)
If Field(description="...") is set as well as then should it error? I think no, probably best to use the description in that case (i.e. for cases when you want the Python and schema to diverge).

pydantic/_internal/_docs_extraction.py

Viicos · 2023-07-12T06:05:27Z

If Field(description="...") is set as well as then should it error? I think no, probably best to use the description in that case (i.e. for cases when you want the Python and schema to diverge).

Yes currently description has a higher priority and is used even if docstring is set. Having an error would be a breaking change

Kludex

Can't we instead create an ast.Visitor?

Something like:

from _ast import AnnAssign, Expr
from typing import Any
from pydantic import BaseModel
import inspect
import ast


class Model(BaseModel):
    a: int
    """My docstring."""
    b: int = 1
    """Docstring"""
    c: int = 1
    # This doesn't get to the AST.


class Visitor(ast.NodeVisitor):
    def __init__(self) -> None:
        super().__init__()

        self.target: str | None = None
        self.attrs: dict[str, str] = {}

    def visit_AnnAssign(self, node: AnnAssign) -> Any:
        if isinstance(node.target, ast.Name):
            self.target = node.target.id

    def visit_Expr(self, node: Expr) -> Any:
        if isinstance(node.value, ast.Str):
            docstring = inspect.cleandoc(node.value.s)
            if self.target:
                self.attrs[self.target] = docstring
            self.target = None


source = inspect.getsource(Model)
tree = ast.parse(source)

visitor = Visitor()
visitor.visit(tree)
print(visitor.attrs)

EDIT: I guess we need to make sure the Expr is the next node? 🤔 That can be done implementing the def visit().

pydantic/_internal/_docs_extraction.py

Viicos · 2023-07-12T11:01:59Z

Thanks @Kludex for the review, I'll apply the modifications and try using NodeVisitor, which seems like a good fit as well

Kludex · 2023-07-12T11:13:55Z

Thanks @Kludex for the review, I'll apply the modifications and try using NodeVisitor, which seems like a good fit as well

Cool. 👍

You need to make sure the sequence of nodes is valid i.e. ast.AnnAssign comes before ast.Expr(ast.Str).

adriangb · 2023-07-12T12:27:03Z

Will this work for dataclasses? TypedDict? NamedTuple?

Viicos · 2023-07-12T12:29:43Z

Will this work for dataclasses? TypedDict? NamedTuple?

I've implemented it for dataclasses, and will look into TypedDict and NamedTuple

Kludex · 2023-07-12T12:31:09Z

Will this work for dataclasses? TypedDict? NamedTuple?

I've implemented it for dataclasses, and will look into TypedDict and NamedTuple

There's no difference.

Kludex · 2023-07-12T18:34:58Z

Beautiful! 👏

Viicos · 2023-07-12T18:35:39Z

Went for the ast.NodeVisitor solution. One test is failing (🎉):

pydantic/tests/test_edge_cases.py

Lines 2169 to 2185 in 176714d

    
           def test_resolve_annotations_module_missing(tmp_path): 
        
               # see https://github.com/pydantic/pydantic/issues/2363 
        
               file_path = tmp_path / 'module_to_load.py' 
        
               # language=Python 
        
               file_path.write_text( 
        
                   """ 
        
           from pydantic import BaseModel 
        
           class User(BaseModel): 
        
               id: int 
        
               name: str = 'Jane Doe' 
        
           """ 
        
               ) 
        
               spec = importlib.util.spec_from_file_location('my_test_module', file_path) 
        
               module = importlib.util.module_from_spec(spec) 
        
               spec.loader.exec_module(module) 
        
               assert module.User(id=12).model_dump() == {'id': 12, 'name': 'Jane Doe'}

To determine if a class is a built-in class, inspect checks for a __module__ attribute, and it seems creating a module on the fly doesn't provide this attribute.

(I will add tests in an hour or two)

Viicos · 2023-07-12T20:37:37Z

Will this work for dataclasses? TypedDict? NamedTuple?

I've implemented it for dataclasses, and will look into TypedDict and NamedTuple

This will probably require some additionnal work for TypedDict and NamedTuple

Viicos · 2023-07-13T18:01:47Z

tests/test_docs_extraction.py

+
+
+def test_model_docs_extraction():
+    class Model(BaseModel):


Unfortunately inspect.getsource isn't working as expected with classes. Tests are failing under 3.9 because prior to python/cpython#10307, the following would happen:

MRE

import inspect def test1(): class Model: a = 1 print(inspect.getsource(Model)) def test2(): class Model: a = 2 print(inspect.getsource(Model)) test1() #> uses Model with a = 1 test2() #> uses Model with a = 1 (wrong!)

This has been fixed since then, but I encountered this one: python/cpython#106727

I think this isn't a big deal, as users will probably not define two models with the same name in the same module.

That kinda sounds like a dealbreaker to me. I don’t know why you think it’s not common, it seems to me like it would happen all the time in tests.

Imo it's not common from a user perspective (as this only occurs if in the same module). But it is indeed way more common in tests 😕

Users write tests…

Ah yes didn't think about that 😅 That was what I was afraid of by using inspect, and I don't see any other path except shipping our own source parsing logic, which probably isn't a good idea

I'll try to improve class detection on the Python side (see the related issue on cpython), but unfortunately it will not be able to cover all edge cases, such as:

if cond: class A: pass else: class A: pass

As this feature would be off by default, users could be warned about the potential unexpected behavior in the documentation. I'll let you decide on the output of this PR

So Python got some improvements since then:

gh-106727: Make inspect.getsource smarter for class for same name definitions python/cpython#106815

gh-106727: Add __module__ check for inspect.getsource(cls) python/cpython#106968

It works way better if a method is defined on the class (thanks to __code__.co_firstlineno), but can still pick the wrong class otherwise.

Am I understanding correctly that this misbehavior is only on Python<=3.9?

Also, when it gets the wrong source, does it lead to actual exceptions being raised, or does it just retrieve the docstrings incorrectly?

If it only affects <=3.9 and fails gracefully in the sense that it just doesn't get the docstrings correctly, then unless I'm missing something I think the issues with inspect.getsource maybe shouldn't be a dealbreaker since it would require all of the following conditions to be met for it to cause problems:

You are using Python<=3.9

You have opted into the new config setting for this functionality

You have defined a model with the same name multiple times in the same module

You require the description attribute for the FieldInfo must be correct in the context you are using it (I mention this because most tests, even if affected by this, won't be failing because of it or testing what it affects)

You aren't happy to (or aren't aware that you can) explicitly specify the description in Field(...) for affected attributes

In particular, for the case of writing tests that you mentioned @adriangb I would think that this would very rarely cause problems for tests that aren't explicitly testing this functionality.

If nothing else, it seems reasonable to me to emit a warning if this particular config flag is set in Python<=3.9.

(That said, it looks to me like there are other test failures happening right now, not just caused by this issue.)

The following PR:

bpo-35113: Fix inspect.getsource to return correct source for inner classes python/cpython#10307

was introduced in Python 3.9, and fixes class detection when they are nested in other classes.

Up until now, it would still fail with the following:

import inspect class Model: a = 1 class Model: a = 2 print(Model.a) #> 2 print(inspect.getsource(Model)) #> class Model: #> a = 1

This got improved with the following PR (it will now pick the last class definition):

gh-106727: Make inspect.getsource smarter for class for same name definitions python/cpython#106815

But still can get it wrong if you're doing something like:

if True: class Model: a = 1 else: class Model: a = 2 # Which really is an uncommon edge case

when it gets the wrong source, does it lead to actual exceptions being raised, or does it just retrieve the docstrings incorrectly?

It will not raise currently, as I'm applying field docstrings in a safe way, but I could maybe raise if we encounter unexpected fields (I'll have to see if it's doable when inheritance is being used).

So to summarize, all of your conditions should indeed be met, except:

You are using Python<=3.9

it fails gracefully (can be fixed as I said)

I'll be off for a while and will continue working on this in September if this is still accepted

Edit: we are now able to accurately find the right class by walking up the frames. The solution is a bit more complex/hard to understand but work as expected, and we still fallback to the former solution if using stack frames doesn't work.

Viicos · 2023-09-20T21:31:06Z

Will this work for dataclasses? TypedDict? NamedTuple?

I've Implemented TypedDict, left NamedTuple as it doesn't seem to support pydantic config. Next I might see if I can raise an exception if the wrong class is being used (to be correct, if the retrieved field to docstring mapping does not match the model fields).

Viicos · 2023-11-10T12:44:49Z

I think I should probably make another PR to update pyright and fix the new issues

codspeed-hq · 2023-11-15T17:13:05Z

CodSpeed Performance Report

Merging #6563 will not alter performance

_{Comparing Viicos:docstrings-description (f3e84a6) with main (91bc828)}

Summary

✅ 10 untouched benchmarks

codspeed-hq · 2023-11-15T17:18:23Z

CodSpeed Performance Report

Merging #6563 will not alter performance

_{Comparing Viicos:docstrings-description (d9150c6) with main (91bc828)}

Summary

✅ 10 untouched benchmarks

codspeed-hq · 2023-11-15T17:19:46Z

CodSpeed Performance Report

Merging #6563 will not alter performance

_{Comparing Viicos:docstrings-description (abba7d2) with main (91bc828)}

Summary

✅ 10 untouched benchmarks

alexmojaki · 2023-11-21T10:22:50Z

@dmontagu ping

Pwuts · 2024-01-02T09:24:32Z

bump

(Bumping seems appropriate since it looks like there aren't any blockers on this PR. If there are, it would be useful to have it mentioned here.)

Viicos · 2024-01-02T09:51:12Z

There have been a lot of reviewing from @alexmojaki so I'm going to sum things up here so that it can be easier to review.

Class definition parsing

This was first implemented in this commit: 29bbf3e, inspired by danields761/pydantic-settings. @Kludex then suggested using an ast.NodeVisitor instead, but @alexmojaki gave some arguments here to go back to the first solution. I'll let maintainers give input on which solution they want to keep.

Retrieving the class source code

To retrieve the source code of a class so that it can be parsed, I was first using inspect.getsource, but I faced some issues described here. After opening an issue on the cpython repo, class discovery got improved on Python 3.12, but can still be wrong in some cases.

Then @alexmojaki suggested using stack frames to circumvent this issue. We got hit by a few issues going this way, in particular with decorated classes (depending on the Python version, the frame f_lineno would differ). We ended up parsing the current frame block source code as a temporary AST to see if it would match the correct class definition (see the implementation of _extract_source_from_frame).

I believe we added extensive testing with edge cases that will most probably never happen in real code. This is also gated behind a config flag, so it won't break existing code.

There has been some recent discussion about PEP 224 -- Attribute Docstrings, which would make this PR obsolete if implemented. However, it would only be available in the latest Python version, so I think this feature PR would still be relevant.

sydney-runkle · 2024-01-12T19:51:13Z

@Viicos,

Sorry for all of the pings this week. Gearing up to do a final review + to merge this - could you please rebase when you have a chance? Thank you 🌟!

codspeed-hq · 2024-01-13T12:28:00Z

CodSpeed Performance Report

Merging #6563 will not alter performance

_{Comparing Viicos:docstrings-description (caa5cbe) with main (2e459bb)}

Summary

✅ 10 untouched benchmarks

alexmojaki · 2024-01-25T18:36:14Z

Thanks @Viicos!

Viicos · 2024-01-25T18:58:01Z

Thanks for the reviews @alexmojaki, glad to see this one merged

Viicos added 2 commits July 9, 2023 22:09

WIP: Implement docs extraction

29bbf3e

Implement attribute docstring extractiob

792da19

Viicos commented Jul 10, 2023

View reviewed changes

pydantic/_internal/_docs_extraction.py Outdated Show resolved Hide resolved

pydantic-hooky bot added the ready for review label Jul 10, 2023

pydantic-hooky bot assigned davidhewitt Jul 10, 2023

davidhewitt reviewed Jul 11, 2023

View reviewed changes

pydantic/_internal/_docs_extraction.py Outdated Show resolved Hide resolved

Kludex reviewed Jul 12, 2023

View reviewed changes

Apply feedback

d3e8402

Viicos force-pushed the docstrings-description branch from 01ed888 to 3655b42 Compare July 12, 2023 16:59

Implement ast.NodeVisitor

c9325d7

Viicos added 2 commits July 12, 2023 22:35

Add use_attributes_docstring config value

1f81724

WIP: First tests

cab7a07

Viicos mentioned this pull request Jul 13, 2023

inspect.getsource (findsource) not working as expected when duplicate class is used python/cpython#106727

Closed

Viicos commented Jul 13, 2023

View reviewed changes

Fix tests to cope with inspect.getsource

e46890c

Viicos force-pushed the docstrings-description branch from df17772 to 7aa0992 Compare September 20, 2023 19:29

Viicos added 2 commits September 20, 2023 21:34

Add config docstring

af4f340

Support TypedDict

d3a4761

Viicos force-pushed the docstrings-description branch from cd9ad99 to 09919ae Compare September 20, 2023 21:28

Update to latest pyright

f3e84a6

Viicos mentioned this pull request Nov 10, 2023

Update to pyright==1.1.335 #8075

Merged

5 tasks

Viicos force-pushed the docstrings-description branch from 6839495 to f3e84a6 Compare November 15, 2023 17:09

Viicos added 2 commits November 15, 2023 18:14

fix rebase

d9150c6

Use latest pyright

abba7d2

Merge branch 'main' into docstrings-description

caa5cbe

Viicos mentioned this pull request Jan 13, 2024

adds model_name option in ConfigDict #6814

Open

5 tasks

sydney-runkle mentioned this pull request Jan 15, 2024

🚀 2.6.0b1 checklist ✅ #8549

Closed

10 tasks

kschwab mentioned this pull request Jan 19, 2024

Feature Request: Add CLI argument parsing for model initialization pydantic/pydantic-settings#209

Open

sydney-runkle added the relnotes-feature label Jan 25, 2024

alexmojaki merged commit 6e59619 into pydantic:main Jan 25, 2024
54 of 55 checks passed

This was referenced Jan 25, 2024

Request for "with_attr_docs" feature pydantic/pydantic-settings#2

Open

[Feature Request] Auto-Docstring generation for Schema bearing Models #638

Closed

Viicos deleted the docstrings-description branch January 25, 2024 18:57

kschwab mentioned this pull request Jan 28, 2024

Add CLI Settings Source pydantic/pydantic-settings#214

Open

rmehyde mentioned this pull request Feb 25, 2024

Render Enum Member Docstrings in JSON Schema #8888

Open

13 tasks

PaleNeutron mentioned this pull request Mar 27, 2024

add comments on tables and columns, visible also on the generated DB tiangolo/sqlmodel#492

Closed

8 tasks

Antyos mentioned this pull request Apr 11, 2024

FieldInfo Description Docstrings are not working #9218

Closed

1 task

Galarzaa90 mentioned this pull request Apr 17, 2024

Duplicated field description when using Pydantic 2.7's use_attribute_docstrings setting mansenfranzen/autodoc_pydantic#254

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract attribute docstrings for `FieldInfo.description` #6563

Extract attribute docstrings for `FieldInfo.description` #6563

Viicos commented Jul 10, 2023 •

edited

Viicos commented Jul 10, 2023

davidhewitt left a comment

Viicos commented Jul 12, 2023

Kludex left a comment •

edited

Viicos commented Jul 12, 2023

Kludex commented Jul 12, 2023

adriangb commented Jul 12, 2023

Viicos commented Jul 12, 2023

Kludex commented Jul 12, 2023

Kludex commented Jul 12, 2023

Viicos commented Jul 12, 2023 •

edited

Viicos commented Jul 12, 2023

Viicos Jul 13, 2023 •

edited

adriangb Jul 13, 2023

Viicos Jul 13, 2023

adriangb Jul 13, 2023

Viicos Jul 13, 2023

Viicos Jul 16, 2023

Viicos Aug 10, 2023

dmontagu Aug 16, 2023 •

edited

Viicos Aug 17, 2023 •

edited

Viicos commented Sep 20, 2023

Viicos commented Nov 10, 2023

codspeed-hq bot commented Nov 15, 2023

codspeed-hq bot commented Nov 15, 2023

codspeed-hq bot commented Nov 15, 2023

alexmojaki commented Nov 21, 2023

Pwuts commented Jan 2, 2024 •

edited

Viicos commented Jan 2, 2024

sydney-runkle commented Jan 12, 2024

codspeed-hq bot commented Jan 13, 2024

alexmojaki commented Jan 25, 2024

Viicos commented Jan 25, 2024

Extract attribute docstrings for FieldInfo.description #6563

Extract attribute docstrings for FieldInfo.description #6563

Conversation

Viicos commented Jul 10, 2023 • edited

Change Summary

Related issue number

Checklist

Viicos commented Jul 10, 2023

davidhewitt left a comment

Choose a reason for hiding this comment

Viicos commented Jul 12, 2023

Kludex left a comment • edited

Choose a reason for hiding this comment

Viicos commented Jul 12, 2023

Kludex commented Jul 12, 2023

adriangb commented Jul 12, 2023

Viicos commented Jul 12, 2023

Kludex commented Jul 12, 2023

Kludex commented Jul 12, 2023

Viicos commented Jul 12, 2023 • edited

Viicos commented Jul 12, 2023

Viicos Jul 13, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmontagu Aug 16, 2023 • edited

Choose a reason for hiding this comment

Viicos Aug 17, 2023 • edited

Choose a reason for hiding this comment

Viicos commented Sep 20, 2023

Viicos commented Nov 10, 2023

codspeed-hq bot commented Nov 15, 2023

Merging #6563 will not alter performance

Summary

codspeed-hq bot commented Nov 15, 2023

Merging #6563 will not alter performance

Summary

codspeed-hq bot commented Nov 15, 2023

Merging #6563 will not alter performance

Summary

alexmojaki commented Nov 21, 2023

Pwuts commented Jan 2, 2024 • edited

Viicos commented Jan 2, 2024

Class definition parsing

Retrieving the class source code

sydney-runkle commented Jan 12, 2024

codspeed-hq bot commented Jan 13, 2024

Merging #6563 will not alter performance

Summary

alexmojaki commented Jan 25, 2024

Viicos commented Jan 25, 2024

Extract attribute docstrings for `FieldInfo.description` #6563

Extract attribute docstrings for `FieldInfo.description` #6563

Viicos commented Jul 10, 2023 •

edited

Kludex left a comment •

edited

Viicos commented Jul 12, 2023 •

edited

Viicos Jul 13, 2023 •

edited

dmontagu Aug 16, 2023 •

edited

Viicos Aug 17, 2023 •

edited

Pwuts commented Jan 2, 2024 •

edited