Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recurse through submodules in extension modules #318

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

robamler
Copy link

@robamler robamler commented Mar 1, 2021

Fixes #319

The original method of finding submodules by recursing through the
directory structure didn't work with extension modules (aka native "C"
extensions) because there is no directory structure to recurse through.
This commit adds code to recurse through submodules by inspecting
obj.__all__, i.e., using the same mechanism that is already used for
finding variables, functions, and classes.

This is a minimalistic first sketch of a solution. It works on a native
extension that I tried it on but it introduces some duplication since
there are now two strategies to search for submodules. For a native
module, the first strategy (searching the directory structure) will fail
but the second (new) strategy (using obj.__all__) will work. For
regular modules, both strategies will likely work and so the second
strategy will just overwrite the results from the first strategy with
identical objects. This should technically work but it seems
unnecessary. I just don't feel confident enough to remove the first
strategy without further guidance.

The original method of finding submodules by recursing through the
directory structure didn't work with extension modules (aka native "C"
extensions) because there is no directory structure to recurse through.
This commit adds code to recurse through submodules by inspecting
`obj.__all__`, i.e., using the same mechanism that is already used for
finding variables, functions, and classes.

This is a minimalistic first sketch of a solution. It works on a native
extension that I tried it on but it introduces some duplication since
there are now two strategies to search for submodules. For a native
module, the first strategy (searching the directory structure) will fail
but the second (new) strategy (using `obj.__all__`) will work. For
regular modules, both strategies will likely work and so the second
strategy will just overwrite the results from the first strategy with
identical objects. This should technically work but it seems
unnecessary. I just don't feel confident enough to remove the first
strategy without further guidance.
Copy link
Member

@kernc kernc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you construct a minimal, self-contained test with a module that has no path, but contains a reference to another module, and assert both modules are discovered? Like:

EMPTY_MODULE = ModuleType('empty')

For regular modules, the second strategy will just overwrite the results

The procedure actually skips modules that have the same name as one that was already defined:

pdoc/pdoc/__init__.py

Lines 734 to 737 in aef1917

for root in iter_modules(self.obj.__path__):
# Ignore if this module was already doc'd.
if root in self.doc:
continue

Though this method seems unreliable.

Further, if I:

import .foo as bar

module foo will now be be documented as foo and as bar ... 🤔

pdoc/__init__.py Outdated Show resolved Hide resolved
pdoc/__init__.py Outdated Show resolved Hide resolved
- The submodule detection for extension modules (aka native "C"
  extensions) now runs only if the regular submodule detection didn't
  run.
- Further, the submodule detection for extension modules runs only if
  submodules are explicitly exposed via the "__all__" attribute. This is
  is because it would otherwise be difficult or impossible to reliably
  detect cycles and/or reexports of external modules.
For portability reasons, this unit test only simulates how an extension
model would look like. It doesn't actually package an extension module,
because then we'd have to provide a separte extension module for each
platform, os, and python version.
@robamler
Copy link
Author

robamler commented Mar 3, 2021

Thanks for the quick feedback! I've moved the new code into the if self.is_package - else block and added a test.

Studying the existing code for "normal" submodule discovery, I realized I should be more careful to avoid capturing cycles and reexports (not sure if they're possible in native extension modules though). So I made sure the new code only runs if the parent module explicitly exposes public members through the __all__ attribute. This restriction seems to be necessary because the is_from_this_module function doesn't work on submodules (inspect.getmodule(x) returns x rather than x's parent if x is a submodule). This restriction doesn't seem to be a problem for the native extension modules I tried, but I'd be happy to relax it if you have an idea how.

Further, if I:

import .foo as bar

module foo will now be be documented as foo and as bar ... 🤔

Is that possible with native extension modules? I know too little about Python's module system to tell.

@robamler
Copy link
Author

robamler commented Mar 3, 2021

One problem I encountered: the HTML documentation for a native extension module now places all files in the top level directory. The contents of the files shows the correct nested hierarchy of submodules but the paths for the corresponding HTML files doesn't reflect this, so paths may clash if, e.g., a submodule has the same name as its parent module. Any idea how to fix this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Submodules not showing up for (native) extension modules
2 participants