Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sphinx AutoSummary to automate API documentation #5082

Closed
wants to merge 8 commits into from

Conversation

Sayam753
Copy link
Member

@Sayam753 Sayam753 commented Oct 16, 2021

Right now, we have many rST files here that use references to classes/functions to generate API documentation. It is hard to regularly sync these files with the codebase. So, this PR uses sphinx autosummary extension to automatically generate API documentation.

Details -

  • By default, sphinx autosummary extension cannot parse the codebase recursively. This PR is based on a stackoverflow answer that explains how to tweak autosummary extension. I recommend going through the linked stackoverflow post.
  • The recursive parsing would expose all the functions/classes in the API documentation, which may not be a good. Instead, I opted to expose the functions/classes present in __all__ of any python module.
    • On the implementation side, sphinx does not provide an option to use __all__ to expose stuff, find here the Feature Request Issue and related PR. This feature will be available in next sphinx release. Till the time, I have added the PR changes to sphinx_autosummary/custom_generate.py and its working.
    • If __all__ is not present in python module, then default behaviour is to expose all the functions/classes present in that module.
  • All rST files present in docs/source/api are deleted.
  • The rST files will still be created automatically for each build, so no need to track them. That's why I added a line to .gitignore

Note - This PR is targeted to doc_skeleton_update branch that @martinacantaro is actively working on. I thought to directly build on top of Martina's work.

TODOs

  • Increase depth of left sidebar
  • Add information from previous rST files to related python modules
  • Adjust API landing page to directly show modules under pymc
  • Make the API docs look consistent with existing ones

OriolAbril and others added 4 commits October 11, 2021 21:38
working and hopefully illustrative doc skeleton

update requirements

WIP
Change log:
- Upload new PyMC logo
- Updated conf.py to pymc instead of pymc3
- Start configuring and populating index page
- Start configuring and populating learning page
- Start configuring and populating api page
- Updated API to follow the docs/Architecture.png layout
- Start configuring and populating learning page
- Fix typo in filename of "index legacy"

add community page, other minor changes

Continue restructuring the API
Delete conflicting files

example doc restructure

update requirements

update requirements and gitignore

some fixes

add changes introduced by pre-commit checks

fix community page title

update api for distributions

update gaussian process api

fix end of file

more changes

sponsor logos

sponsor logos cleanup, other changes
@codecov
Copy link

codecov bot commented Oct 16, 2021

Codecov Report

❗ No coverage uploaded for pull request base (doc_skeleton_update@46532c2). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@                  Coverage Diff                   @@
##             doc_skeleton_update    #5082   +/-   ##
======================================================
  Coverage                       ?   78.41%           
======================================================
  Files                          ?      131           
  Lines                          ?    24464           
  Branches                       ?        0           
======================================================
  Hits                           ?    19184           
  Misses                         ?     5280           
  Partials                       ?        0           

@Sayam753
Copy link
Member Author

There are a few rST files in docs/source/api that contain additional information related to functions/classes apart from docstrings. So, time to backport.

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After taking a look at the issue and PR it looks like that will soon be available via sphinx core so generating an extension for such a short time is probably an overkill. Let's keep it like this for now and integrate with sphinx when possible.

There also might not be an alternative to the parent api page having only pymc linked there but we might be able to make that page an orphan and link directly to api/pymc from the navbar in the homepage toctree 🤔

docs/source/api.rst Outdated Show resolved Hide resolved
autosummary_generate = True

## Overwrite the autosummary generator function with custom one
generate.generate_autosummary_content = custom_generate.generate_autosummary_content
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see that line in the stackoverflow answer nor on the linked example, why is it needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sphinx autosummary extension uses the function generate_autosummary_content to generate documentation.

I am more interested to show documentation of functions/classes listed in __all__ of each python module. So, I have a custom implementation of this function in the file docs/source/sphinx_autosummary/custom_generate.py. And I overwrite the default function at run time.

Definitely not a good practice, but it serves our purpose till the time the related sphinx PR gets merged.

@@ -12,6 +12,13 @@
# See the License for the specific language governing permissions and
# limitations under the License.

"""
This submodule contains various mathematical functions.
Most of them are imported directly from aesara.tensor (see there for more details).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we link to aesara.tensor?

@martinacantaro
Copy link
Contributor

martinacantaro commented Oct 24, 2021

Hi! Thank you Sayam!

Regarding the TODOs:

Increase depth of left sidebar

You mean toctree depth, right?

Add information from previous rST files to related python modules

And this would mean adding all the information from the rST files that we don't want to lose in the python module docstrings so that autosummary picks it up right?

So I'm thinking next steps would be, based on the v4 changelog:

  • Include everything that needs to show up in API docs in all
  • Add the corresponding docstrings

@OriolAbril
Copy link
Member

OriolAbril commented Oct 25, 2021

And this would mean adding all the information from the rST files that we don't want to lose in the python module docstrings so that autosummary picks it up right?

I think @Sayam753 did that already

Include everything that needs to show up in API docs in all
Add the corresponding docstrings

I agree someone should double check, but I think docstrings have been kept mostly up to date as they are right side by side with the actual functions. I would also check them outside this PR as that should probably be done by the person who wrote the functions or someone who knows about it, docstring changes automatically reflect on the website

@Sayam753
Copy link
Member Author

Thanks for the review Oriol and Martina. I'll get back to completing the remaining TODOs in the coming week :)

@OriolAbril
Copy link
Member

Just realized there is a potential issue with this approach. What are the correct import paths for each function?

Is everything available directly as pymc.? If not, what is and what isn't?

For example sample now appears inside the sampling module but I don't think we want anyone to use pymc.sampling.sample instead of pymc.sample

@Sayam753
Copy link
Member Author

Sayam753 commented Nov 6, 2021

I think the recursive parsing implemented in autosummary extension does not care if functions/classes are exposed any level up in package hierarchy.

As an example, the sample function is implemented in pymc/sampling.py. It is also exposed at pymc namespace.

For current stable docs, even the sample function is mentioned as pymc.sampling.sample. But all the times, we use pymc.sample

So, the problem is where to show documentation when functions/classes are also exposed at different namespaces.

In this PR, the docs are shown where the functions and classes are implemented. I think the stable versioned docs also follow the same.

@Sayam753
Copy link
Member Author

Sayam753 commented Nov 6, 2021

What are the correct import paths for each function?

Nothing is changed. Everything will be accessed the same way it was accessed before.

Is everything available directly as pymc.? If not, what is and what isn't?

No. Just look up pymc.__init__.py to find what all stuff is exposed at pymc namespace.

@Sayam753
Copy link
Member Author

Sayam753 commented Nov 6, 2021

Thinking more of this -

So, the problem is where to show documentation when functions/classes are also exposed at different namespaces.

I have a simple solution.

The documentation of functions/classes are shown where they are implemented i.e. at complete paths. We can mention in API quickstart notebook that commonly used functions/classes are exposed at pymc namespace.

Thoughts?

@twiecki
Copy link
Member

twiecki commented Nov 7, 2021

@Sayam753 Yes, that seems like the only logical solution to me.

@OriolAbril
Copy link
Member

I hadn't realized that our current docs do that. In my opinion, the number one priority of documentation is serving the users, and users do not care about where things are implemented, only how to use them.

Automating everything is a good argument to document everything where it is implemented, but we should then try and document which functions are available as part of the pymc namespace. (looping over dir(pymc) and excluding elements that start with underscore or are modules?

I think now with v4 it's particularly important to document what is available directly in pymc. as we have changed many things that are quite common like moving logp as a method to pymc.logp

@Sayam753
Copy link
Member Author

Sayam753 commented Nov 7, 2021

Great opinions @OriolAbril . I also agree to user-serving purpose of the documentation.

So, I'll switch to show documentation for classes/functions at pymc namespace and look how to use sphinx for this.

@OriolAbril
Copy link
Member

We might be able to combine the two approaches, have an extra page called 'pymc namespace"or something like this that only lists functions and links to their actual api page. My main concern is that if we only have complete import paths, what lives in the pymc namespace becomes undocumented, even though I think the main reason we expose everything there is to make user's life easier. As long as we manage to document it it's fine, it doesn't need to be necessarily showing the recommended import paths on the api page

@OriolAbril
Copy link
Member

Just realized an extra drawback of documenting on where things are implemented and not where we expose them. I am setting up sphinx-codeautolink for pymc-examples, which automatically adds links from code blocks to their respective API pages. That means that all pymc funcitions will be clickable and will link to API docs. However, we use pm.Normal, pm.sample... but they are documented in pm.distributions.continuous.Normal, pm.sampling.sample and so it can't add the api links atomatically.

If you take a look at the preview at https://pymc-examples--239.org.readthedocs.build/en/239/splines/spline.html for example, you see how all libraries for which we have intersphinx configured (arviz, numpy, pandas, matplotlib) have their functions and classes linked (not methods though) but pymc objects don't.

@OriolAbril
Copy link
Member

I have been thinking a bit but don't really understand how i works yet and can't think of a good alternative. I did realize though that https://pymc-examples.readthedocs.io/en/latest/gaussian_processes/GP-Heteroskedastic.html#heteroskedastic-gp-latent-variance-model for example has working codeautolink for the gp.cov.ExpQuad which I think is a good example of what we could get if documenting everything at the recommended import path (note that the link isn't working right now but this has already been fixed in the development version of codeautolink).

Would it be possible for example to use the exploring code to define the hierarchy but to document things only on some user provided set of modules? Those could be pymc, pymc.gp, pymc.gp.cov...

I also don't want anyone to dedicate too much time to this as writing those api pages is a bit of work, but is generally a once per major version thing that doesn't need to be updated unless new functions are added, and when it is updated is 1 line, 1 function name worth of characters only

@Sayam753
Copy link
Member Author

Would it be possible for example to use the exploring code to define the hierarchy but to document things only on some user provided set of modules?

I need to lookup how recursive parsing is done in autosummary extension to answer this.

@Sayam753
Copy link
Member Author

Hi. I tried looking into autosummary codebase to figure out sidebar issues. I think it will require a lot more time for me to wrap my head around autosummary and sphinx.
My exams are coming up. And I will able to revisit here only after mid December. So, if anyone wishes, feel free to carry the work forward.

@twiecki
Copy link
Member

twiecki commented Nov 18, 2021

@OriolAbril What do you think we should do here?

@Sayam753
Copy link
Member Author

Now, I think more straight forward and less time consuming way would be to update the rST files manually with all the new stuff that got into v4. I can take a stab doing that early next week.

And we can revisit here once if needed in future.

@OriolAbril
Copy link
Member

Agreed, I can probably update #5059 directly with manual API files and make sure we use "recommended" import path. I might need help knowing what the "recommended" import path is in some cases though.

@OriolAbril
Copy link
Member

closing for now, we'll discuss in the doc meeting how to divide the todos left from #5059 which include api docs

@OriolAbril OriolAbril closed this Nov 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants