Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof on principle: linkcheck exclude matched documents #9894

Merged
merged 1 commit into from Dec 8, 2021

Conversation

croth1
Copy link
Contributor

@croth1 croth1 commented Nov 26, 2021

Subject: Proof on principle: linkcheck exclude matched documents

  • Feature

Purpose

For a use-case certain parts of the documentation are merely historical protocols which do need maintenance. We would benefit from telling linkcheck to ignore these parts of the documentation entirely.

Here I implement a system modelled after linkcheck_anchors_ignore. linkcheck_documents_ignore ignores links in documents that match one or more user-given document exclusion patterns.

I would be interested if you would consider merging this into the main repo, or whether building a custom builder for our use case is a better idea. This is at the moment merely a proof of principle, if you are planning to move on with this, I will finish it up.

@tk0miya
Copy link
Member

tk0miya commented Nov 28, 2021

I'm not sure your use-case is commonly needed. I feel broken links are not acceptable even if it's historical documents. Are there any examples? I need to know why such a feature is needed.

@croth1
Copy link
Contributor Author

croth1 commented Nov 29, 2021

Hmh, I am honestly not sure how widely useful this would be. In my research I found this user on the sphinx mailing list who seems to have another use case: https://www.mail-archive.com/sphinx-users@googlegroups.com/msg04536.html

In my case I am pondering whether to add linkcheck to our CI to ensure all links stay valid. This however comes at costs:

  • it seems we have so many links that API requests are throttled, leading to long CI times
  • we have many links in historical documents that would need to stay up-to-date which would require a significant commitment into the future.

I agree that ideally one would ensure that all links, also those in historical documents (more concretely this minutes section) stay valid. However I feel enforcing the validity of all our historical documents is just not a good use of our very limited (human) resources.

What do you think?

@francoisfreitag
Copy link
Contributor

francoisfreitag commented Nov 29, 2021

I’m unsure about this idea. My initial guts is the same as #9894 (comment), all links should be working.

this minutes section

These links look like they could reasonably be ignored with a linkcheck_ignore regexp? Something like r'^https://conda-forge\.org/docs/orga/minutes/.*\.html$'. Do you have another use case where links to very different destinations should all be ignored?


Perhaps a not-so-uncommon use case: a listing of contributors with links to their profile page page.
Some developers self-host their profile page, and it may break. Verifying the links in that document isn’t as valuable for the project, and maintaining linkcheck_ignore is a bit tedious.
OTOH, including it in the Sphinx documentation might not be as valuable either. 🤷

@croth1
Copy link
Contributor Author

croth1 commented Nov 29, 2021

Hi @francoisfreitag,

These links look like they could reasonably be ignored with a linkcheck_ignore regexp? Something like r'^https://conda-forge\.org/docs/orga/minutes/.*\.html$'. Do you have another use case where links to very different destinations should all be ignored?

Wouldn't linkcheck_ignore only work if the links are all external links? Currently we build the minutes section directly with sphinx by iterating all .md files in a dedicated folder. What I want to achieve is that linkcheck ignores all links that occur in any of the markdown files in this folder.

@francoisfreitag
Copy link
Contributor

Thanks for clarifying, I misunderstood the intent with the minutes/* documents.

@francoisfreitag
Copy link
Contributor

Because the use case seems uncommon, it’s tempting to refuse to support it to avoid feature creep.

OTOH, it has a sort symmetry with linkcheck_ignore, which makes it easy enough to explain and reason about.
I can’t find a good alternative to accomplish that behavior without this option, and the code to achieve it is simple enough.
Not verifying some links is a bit odd, but I understand that projects might want to skip on certain parts of the documentation (e.g. they keep it for legacy reasons, or a changelog with old entries, or an author list), because they don’t care as much about those links and to keep linkcheck run times in check.

I’m +0 on adding it. If @tk0miya agrees, I’ll help refine the patch.

@tk0miya
Copy link
Member

tk0miya commented Dec 4, 2021

Thank you for your clarification. The case of the minutes is a good example. I agree that it's non-sense to maintain hyperlinks of them. Okay, let's go forward.

Copy link
Member

@tk0miya tk0miya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to this, document and testcase are needed.

sphinx/builders/linkcheck.py Outdated Show resolved Hide resolved
@tk0miya tk0miya added this to the 4.4.0 milestone Dec 4, 2021
@croth1
Copy link
Contributor Author

croth1 commented Dec 4, 2021

Thanks for the heads up! I will have a look at the docs and tests later this weekend 👍

@croth1
Copy link
Contributor Author

croth1 commented Dec 6, 2021

This should now be ready for a round of reviews!

Copy link
Contributor

@francoisfreitag francoisfreitag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise looking pretty good, thanks!

doc/usage/configuration.rst Outdated Show resolved Hide resolved
CHANGES Outdated Show resolved Hide resolved
tests/test_build_linkcheck.py Outdated Show resolved Hide resolved
sphinx/builders/linkcheck.py Outdated Show resolved Hide resolved
sphinx/builders/linkcheck.py Outdated Show resolved Hide resolved
tests/roots/test-linkcheck-documents_exclude/conf.py Outdated Show resolved Hide resolved
Copy link
Contributor

@francoisfreitag francoisfreitag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple small things.

CHANGES Outdated Show resolved Hide resolved
CHANGES Outdated Show resolved Hide resolved
doc/usage/configuration.rst Outdated Show resolved Hide resolved
CHANGES Outdated Show resolved Hide resolved
tests/test_build_linkcheck.py Outdated Show resolved Hide resolved
@croth1
Copy link
Contributor Author

croth1 commented Dec 6, 2021

This is getting a bit messy :-) You will squash merge when everything is ready, so I don't have to clean up, right?

@francoisfreitag
Copy link
Contributor

You will squash merge when everything is ready, so I don't have to clean up, right?

Sure can!

@croth1
Copy link
Contributor Author

croth1 commented Dec 7, 2021

From my side this should be ready now :-)

Copy link
Member

@tk0miya tk0miya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tk0miya
Copy link
Member

tk0miya commented Dec 7, 2021

@francoisfreitag Please merge this if you are also ok!

@francoisfreitag
Copy link
Contributor

Work has been busy, I’ll give it a look (and squash the commits) later today or tomorrow.

@francoisfreitag francoisfreitag changed the base branch from master to 4.x December 8, 2021 08:56
@francoisfreitag francoisfreitag merged commit 5851344 into sphinx-doc:4.x Dec 8, 2021
@francoisfreitag
Copy link
Contributor

Thanks!

@croth1 croth1 deleted the linkcheck_ignore_docs branch December 9, 2021 08:29
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants