[DOCS] Unclear documentation for VietorisRipsPersistence padding #632

raphaelreinauer · 2022-04-30T09:22:52Z

There is unclear documentation for the padding used in VietorisRipsPersistence. The documentation says that diagrams may be padded with some points on the diagonal, but it does not say what the padding values are or how they are chosen. This cannot be confusing for users trying to understand the output of the persistence algorithm.

giotto-tda/gtda/homology/_utils.py

Line 63 in 8d09a39

Xt_padded[j, end_idx_nontrivial:end_idx, :2] = [padding_value] * 2

explains that the padding points are chosen as the minimum birth ever observed in that homology dimension, but this is not clear from the documentation.

It would be helpful if the documentation for VietorisRipsPersistence clarified the padding strategy used or if the code were changed to use a more standard padding strategy such as padding with zeros.

wreise · 2022-05-03T17:24:58Z

Hey @raphaelreinauer , thank you for pointing this out.

Let me provide clarification here before the docs are updated. For each dimension, we choose the minimum value which appears in any of the diagrams (it is set to zero if there is no finite value), see

giotto-tda/gtda/homology/_utils.py

Lines 44 to 48 in 8d09a39

    
           min_values = [min([np.min(diagram[dim][:, 0]) if diagram[dim].size 
        
                              else np.inf for diagram in Xt]) 
        
                         for dim in homology_dimensions] 
        
           min_values = [min_value if min_value != np.inf else 0 
        
                         for min_value in min_values]

This choice is indeed not standard, but to the best of my recollection, it was made with the composition of Transformers in mind. Several transformers in gtda.diagrams use the min-max values of diagrams passed as arguments to .fit to estimate the range to discretize over. By choosing values already in the image of non-trivial points, we make sure that this range is not distorted by padding.

Please let me know if that is clear and/or convincing.

ulupo · 2022-05-03T18:27:18Z

Excellent reply @wreise! I agree that padding was done this way for a reason, but @raphaelreinauer has a point that we could/should document it somewhere.

raphaelreinauer · 2022-05-03T19:05:22Z

Thanks, @wreise for providing clarity on this. As @ulupo pointed out, it would be ideal if you could state the padding strategy and your reason for doing this in the docs.
For example, people familiar with Transformer Models (like the ones in used in NLP) could mistakenly assume that the diagrams are padded with zeroes as this is the most common form of padding for the input to Transformers which could lead to bugs that are hard to detect.

raphaelreinauer added the bug Something isn't working label Apr 30, 2022

ulupo added documentation Improvements or additions to documentation and removed bug Something isn't working labels May 3, 2022

ulupo changed the title ~~[BUG] Unclear documentation for VietorisRipsPersistence padding~~ [DOCS] Unclear documentation for VietorisRipsPersistence padding May 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOCS] Unclear documentation for VietorisRipsPersistence padding #632

[DOCS] Unclear documentation for VietorisRipsPersistence padding #632

raphaelreinauer commented Apr 30, 2022 •

edited

wreise commented May 3, 2022

ulupo commented May 3, 2022

raphaelreinauer commented May 3, 2022

[DOCS] Unclear documentation for VietorisRipsPersistence padding #632

[DOCS] Unclear documentation for VietorisRipsPersistence padding #632

Comments

raphaelreinauer commented Apr 30, 2022 • edited

wreise commented May 3, 2022

ulupo commented May 3, 2022

raphaelreinauer commented May 3, 2022

raphaelreinauer commented Apr 30, 2022 •

edited