Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Auto-Docstring generation for Schema bearing Models #638

Closed
Lnaden opened this issue Jul 3, 2019 · 17 comments
Closed

Comments

@Lnaden
Copy link
Contributor

Lnaden commented Jul 3, 2019

Feature Request

This request is to have Pydantic models auto-generate their doc strings for their parameters, reading the parameters' Schema objects for more information. The end result would be Model classes who's __doc__ provides details about the parameters the model has. This would have use for people who generate docs for their models though a program like Sphinx to auto create the more complete docstring.

This can, and probably should be an optional thing the user sets or calls since it will require overwriting the __doc__ variable.

This may turn out to be too dependent on individual user preferences of doc style flavors to have any viable officially supported flavor(s) in pydantic, but I wanted to propose anyways.

Below I have a crude toy implementation with examples to show the outputs. I have tested this in Python 3.6 and 3.7 with Pydantic Versions 0.26 and 0.29 and should run as with no external dependencies beyond pydantic itself)

Foreseeable difficulties:

  • Hard to catch every combination and case to ensure doc is formatted correctly
  • Very dependent on the internal pydantic representation structure (See code where I have to check for pre 0.28 and post 0.28)
  • Requires overwriting doc variable
  • Documentation style could be in debate (e.g. NumPy, Google, reStructuredText, Epytext, etc.)
  • Not sure how this would effect speed
  • How to handle model nesting
  • How to handle mixed documented through Schema and not variables

Known issues with toy implementation:

  • Very crude
  • Does not support mixed Schema vs. non-Schema parameters
  • Nesting pydantic models requires the nested model to have its own Description
  • Have not tested all combinations of parameters
  • Formats to NumPy Docstring style
  • Formats some things to Sphinx cross-reference style (e.g. nested models get cast to :class: TargetClass instead of any further docstring description which would in Sphinx's RST format as a link to that class in the docs, not exactly helpful in all cases though)
from enum import Enum
from textwrap import dedent, indent
from typing import Tuple, Dict

from pydantic import BaseModel, Schema, confloat, BaseSettings, validator, ValidationError

####################################
# Start of Auto-Doc Generation block
####################################


class _JsonRefModel(BaseModel):
    """
    Reference model for Json replacement fillers

    Matches style of:

    ``'allOf': [{'$ref': '#/definitions/something'}]}``

    and will always be a length 1 list
    """
    allOf: Tuple[Dict[str, str]]

    @validator("allOf", whole=True)
    def all_of_entries(cls, v):
        value = v[0]
        if len(value) != 1:
            raise ValueError("Dict must be of length 1")
        elif '$ref' not in value:
            raise ValueError("Dict needs to have key $ref")
        elif not isinstance(value["$ref"], str) or not value["$ref"].startswith('#/'):
            raise ValueError("$ref should be formatted as #/definitions/...")
        return v


def doc_formatter(target_object):
    """
    Set the docstring for a Pydantic object automatically based on the parameters

    This could use improvement.
    """
    doc = target_object.__doc__

    # Handle non-pydantic objects
    if doc is None:
        new_doc = ''
    elif 'Parameters\n' in doc or not (issubclass(target_object, BaseSettings) or issubclass(target_object, BaseModel)):
        new_doc = doc
    else:
        type_formatter = {'boolan': 'bool',
                          'string': 'str',
                          'integer': 'int',
                          'number': 'float'
                          }
        # Add the white space
        if not doc.endswith('\n\n'):
            doc += "\n\n"
        new_doc = dedent(doc) + "Parameters\n----------\n"
        target_schema = target_object.schema()
        # Go through each property
        for prop_name, prop in target_schema['properties'].items():
            # Catch lookups for other Pydantic objects
            if '$ref' in prop:
                # Pre 0.28 lookup
                lookup = prop['$ref'].split('/')[-1]
                prop = target_schema['definitions'][lookup]
            elif 'allOf' in prop:
                # Post 0.28 lookup
                try:
                    # Validation, we don't need output, just the object
                    _JsonRefModel(**prop)
                    lookup = prop['allOf'][0]['$ref'].split('/')[-1]
                    prop = target_schema['definitions'][lookup]
                except ValidationError:
                    # Doesn't conform, pass on
                    pass
            # Get common properties
            prop_type = prop["type"]
            new_doc += prop_name + " : "
            prop_desc = prop['description']

            # Check for enumeration
            if 'enum' in prop:
                new_doc += '{' + ', '.join(prop['enum']) + '}'

            # Set the name/type of object
            else:
                if prop_type == 'object':
                    prop_field = prop['title']
                else:
                    prop_field = prop_type
                new_doc += f'{type_formatter[prop_field] if prop_field in type_formatter else prop_field}'

            # Handle Classes so as not to re-copy pydantic descriptions
            if prop_type == 'object':
                if not ('required' in target_schema and prop_name in target_schema['required']):
                    new_doc += ", Optional"
                prop_desc = f":class:`{prop['title']}`"

            # Handle non-classes
            else:
                if 'default' in prop:
                    default = prop['default']
                    try:
                        # Get the explicit default value for enum classes
                        if issubclass(default, Enum):
                            default = default.value
                    except TypeError:
                        pass
                    new_doc += f", Default: {default}"
                elif not ('required' in target_schema and prop_name in target_schema['required']):
                    new_doc += ", Optional"

            # Finally, write the detailed doc string
            new_doc += "\n" + indent(prop_desc, "    ") + "\n"

    # Assign the new doc string
    target_object.__doc__ = new_doc

########################
# Start of Example block
########################


class FruitEnum(str, Enum):
    apple = "apple"
    orange = "orange"


class Taxes(BaseModel):
    """The State and Federal Taxes charged for operation"""
    state: float = 0.06
    federal: float = 0.08
    city: float = None


class FruitStandNoDoc(BaseModel):
    """
    My fruit stand that I sell various things from
    """
    fruit: FruitEnum = FruitEnum.apple
    stock: int
    price: confloat(ge=0) = 0.6
    advertising: str = None
    currently_open: bool = False
    taxes: Taxes = Taxes()


class FruitStand(BaseModel):
    """
    My fruit stand that I sell various things from
    """
    fruit: FruitEnum = Schema(
        FruitEnum.apple,
        description="The fruit which I have available at my stand"
    )
    stock: int = Schema(
        ...,
        description="How many of each fruit to keep on hand"
    )
    price: float = Schema(
        0.60,
        description="Price per piece of fruit",
        ge=0
    )
    advertising: str = Schema(
        None,
        description="Advertising message to display"
    )
    currently_open: bool = Schema(
        False,
        description="Is the fruit stand open or not?"
    )
    taxes: Taxes = Schema(
        Taxes(),
        description="Taxes charged by the state and local level"
    )


print(FruitStandNoDoc.__doc__)
print('-'*20)
print(FruitStand.__doc__)
print('-'*20)
doc_formatter(FruitStand)
print(FruitStand.__doc__)

Outputs the following lines:

    My fruit stand that I sell various things from
    
--------------------

    My fruit stand that I sell various things from
    
--------------------

My fruit stand that I sell various things from


Parameters
----------
fruit : {apple, orange}, Default: apple
    The fruit which I have available at my stand
stock : int
    How many of each fruit to keep on hand
price : float, Default: 0.6
    Price per piece of fruit
advertising : str, Optional
    Advertising message to display
currently_open : boolean, Default: False
    Is the fruit stand open or not?
taxes : Taxes, Optional
    :class:`Taxes`

@samuelcolvin
Copy link
Member

I'm not opposed to it, my questions/feedback would be:

  • how useful would this actually be? I think I only ever look at docstings in code
  • how much will it slow things down? Big python applications can become noticeably slow to load, we wouldn't want to slow things down by default
  • could we add a util function to do create/set a docstring so it has to be called manually?
  • we could have a config parameter defaulting to no that could be no / if_missing / always

Before we add it, would anyone else want this?

@dgasmith
Copy link
Contributor

dgasmith commented Jul 3, 2019

I think this would be pretty popular as it would interface with canonical Sphinx documentation tech. Effectively auto-docs from the Schema so that you do not need to write this twice.

For speed, we could use the @property decorator so that the doc string would only be evaluated when called (often during docs generation or Jupyter notebooks).

@StephenBrown2
Copy link
Contributor

StephenBrown2 commented Jul 17, 2019

I'm quite in favor of this, as I've been beginning to create docs with pydoc-markdown for an API client I'm writing using Pydantic for data validation/parsing/coersion and such, and having to re-write my docstrings, especially for inherited models, is a bit tedious. I would gladly switch all my definitions to Schema()/Field() defs, if I could get auto-generated docs for each attribute.

@samuelcolvin samuelcolvin added the Schema JSON schema label Jul 19, 2019
@dgasmith
Copy link
Contributor

dgasmith commented Sep 26, 2019

See an example of the autogenerated docs here. This is something that we would still be quite interested in getting into Pydantic. It is fairly straightforward to lazily generate through metaclasses here so that there is no runtime performance penalties.

@dmontagu
Copy link
Contributor

dmontagu commented Sep 26, 2019

@dgasmith For what it's worth, you could implement it without modifying metaclass by making use of __init_subclass__; that would probably be preferable (at least if we followed a similar approach in pydantic), in order to prevent downstream metaclass conflicts.

So ProtoModel would become:

class ProtoModel(BaseModel):
    def __init_subclass__(cls) -> None:
        cls.__doc__ = AutoPydanticDocGenerator(cls, always_apply=True)

and you could drop the metaclass.

@dgasmith
Copy link
Contributor

@dmontagu Thanks! I was not aware of this.

@nicobako
Copy link

@dmontagu 's comment was very helpful for finding a way to autogenerate my own documentation. I really like the __init_subclass__ solution. After quite a bit of experimentation I feel like this would be afeature that doesn't have to be part of pydantic. Instead, what might be extremely useful is to simply add a little section to pydantic's documentation showing how a user could achieve this. A small example would suffice. If you decide this is a good approach I can submit a pull request.

@vlcinsky
Copy link
Contributor

@mansenfranzen
Copy link

mansenfranzen commented Apr 27, 2021

Since this might be related, I created a sphinx autodoc extension called autodoc_pydantic that seamlessly integrates with sphinx autodoc to document pydantic models/settings. It is mainly based upon object inspection like reading field's description or alias attributes but uses standard doc strings.

We use pydantic settings for many applications but lacked a proper way to document them appropriately within our sphinx documentation (e.g. sphinx autodoc does not show default values for fields, it is hard to distinguish validators from standard class methods etc...). It might be helpful for you, too.

@nicobako
Copy link

@mansenfranzen , autodoc_pydantic looks really great! Thanks for creating that sphinx autodoc extension! I found it the other day and have already started using it in my projects! It's also definitely related to this issue!

There's one more thing, and I wonder if autodoc_pydantic can do it...

One of the ideas presented in this issue was wanting to be able to set the class's __doc__ string. That way, users can access the class's docstring by using, for example help(MyPydanticModel).
That way, if I were writing code in a jupyter notebook I can type help(MyPydanticModel) to access the docstring.

Of course, there are pros and cons to setting a class's docstring, as discussed above. Moreover, If people are using tools like autodoc_pydantic, then who in their right mind would use help(MyPydanticModel)? They would instead browse the sphinx documentation!

Anyway, I was just curious what you think about it. Adding this capability (of setting a model's docstring) to pydantic is probably not wise. Offloading this functionality to tools like autodoc_pydantic seems much better, but now people must use Sphinx, and can't access their docstring via help()... so many pros and cons either way.

@mansenfranzen
Copy link

@nicobako I'm glad you're happy with autodoc_pydantic :-).

To support the ipykernel's help() function, I like the solution you already came up with (using __init_subclass__). This works at least for pydantic models that employ this functionality.

To provide the same functionality for all pydantic models regardless of whether they have the appropriate doc string or not, you'll need to overload/patch the help function. Monkey patching existing models does not seem appropriate to me.

Patching the help function should not be too difficult (even though also not very elegant). This functionality should be moved to a separate package, though. I don't think autodoc_pydantic should provide such functionality since it is mainly concerned with sphinx docs. However, the inspection methods and doc creation logic are common to both use cases (sphinx and jupyter's help). Hence, it could be also placed in autodoc_pydantic as two separate functionalities with different dependencies like pip install autodoc_pydantic[sphinx] and pip install autodoc_pydantic[jupyter].

@nicobako
Copy link

nicobako commented May 17, 2021

@mansenfranzen sorry it has taken me a long time to respond to you.

I agree with you that autodoc_pydantic shouldn't support this functionality. I also agree that this functionality, if it came into being, should not be part of pydantic. Even if it were a separate package, the truth is that it would be very difficult to standardize how a pydantic Model gets represented as a docstring... We would have endless debates about whether to use numpy-style, google-style, etc.

I recently created a small blog site, and I just created an article to cover this topic.

https://blog.nicobako.dev/articles/pydantic_autodoc.html

My hope is that it will give people the tools they need to implement custom autodoc functionality themselves -- if they need to.

@link89
Copy link

link89 commented Nov 23, 2022

What's the recommended way to get this feature in 2022? I am thinking about using pydantic to validate kwargs and generate document for it. Do I have to build this from scratch or are there some tools that could be used as a start point?

@nicobako
Copy link

nicobako commented Nov 28, 2022

What's the recommended way to get this feature in 2022? I am thinking about using pydantic to validate kwargs and generate document for it. Do I have to build this from scratch or are there some tools that could be used as a start point?

Hey @link89 , I don't think there are any tools that will do this for you, but pydantic has a really nice API, and let's you introspect your models quite easily.

I wrote up a blog post a while ago on how you could build this functionality yourself: https://blog.nicobako.dev/articles/pydantic_autodoc.html

I think you'll find building your own solution to be easy, and enjoyable!

@samuelcolvin
Copy link
Member

samuelcolvin commented Nov 28, 2022 via email

@dmontagu
Copy link
Contributor

I'll just note here that for compatibility with v2, you'll probably need to change from using __init_subclass__ to __pydantic_init_subclass__ since due to refactors in the metaclass, __init_subclass__ gets called before the fields are accessible. We added __pydantic_init_subclass__ explicitly to address this shortcoming, and it should function basically the same as __init_subclass__, it just doesn't get called until class creation is finished.

Because it seems that this issue has a good solution (namely, the docstring can be configured automatically in the __pydantic_init_subclass__), I'm going to close this issue now. But feel free to request it be reopened, or even better, create a new issue identifying precisely the change desired (as the target evolved a bit over the course of discussion in this issue.)

@alexmojaki
Copy link
Contributor

When #6563 is released (not in 2.6.0, unfortunately) use_attribute_docstrings in the model config will copy docstrings underneath field declarations into the field description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests