Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Unclear documentation for VietorisRipsPersistence padding #632

Open
raphaelreinauer opened this issue Apr 30, 2022 · 3 comments
Open
Labels
documentation Improvements or additions to documentation

Comments

@raphaelreinauer
Copy link
Contributor

raphaelreinauer commented Apr 30, 2022

There is unclear documentation for the padding used in VietorisRipsPersistence. The documentation says that diagrams may be padded with some points on the diagonal, but it does not say what the padding values are or how they are chosen. This cannot be confusing for users trying to understand the output of the persistence algorithm.

Xt_padded[j, end_idx_nontrivial:end_idx, :2] = [padding_value] * 2
explains that the padding points are chosen as the minimum birth ever observed in that homology dimension, but this is not clear from the documentation.

It would be helpful if the documentation for VietorisRipsPersistence clarified the padding strategy used or if the code were changed to use a more standard padding strategy such as padding with zeros.

@raphaelreinauer raphaelreinauer added the bug Something isn't working label Apr 30, 2022
@wreise
Copy link
Collaborator

wreise commented May 3, 2022

Hey @raphaelreinauer , thank you for pointing this out.

Let me provide clarification here before the docs are updated. For each dimension, we choose the minimum value which appears in any of the diagrams (it is set to zero if there is no finite value), see

min_values = [min([np.min(diagram[dim][:, 0]) if diagram[dim].size
else np.inf for diagram in Xt])
for dim in homology_dimensions]
min_values = [min_value if min_value != np.inf else 0
for min_value in min_values]

This choice is indeed not standard, but to the best of my recollection, it was made with the composition of Transformers in mind. Several transformers in gtda.diagrams use the min-max values of diagrams passed as arguments to .fit to estimate the range to discretize over. By choosing values already in the image of non-trivial points, we make sure that this range is not distorted by padding.

Please let me know if that is clear and/or convincing.

@ulupo ulupo added documentation Improvements or additions to documentation and removed bug Something isn't working labels May 3, 2022
@ulupo ulupo changed the title [BUG] Unclear documentation for VietorisRipsPersistence padding [DOCS] Unclear documentation for VietorisRipsPersistence padding May 3, 2022
@ulupo
Copy link
Collaborator

ulupo commented May 3, 2022

Excellent reply @wreise! I agree that padding was done this way for a reason, but @raphaelreinauer has a point that we could/should document it somewhere.

@raphaelreinauer
Copy link
Contributor Author

Thanks, @wreise for providing clarity on this. As @ulupo pointed out, it would be ideal if you could state the padding strategy and your reason for doing this in the docs.
For example, people familiar with Transformer Models (like the ones in used in NLP) could mistakenly assume that the diagrams are padded with zeroes as this is the most common form of padding for the input to Transformers which could lead to bugs that are hard to detect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants