Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linkchecker croaks on specific anchors of GitHub-rendered reStructuredText documents #9016

Open
amotl opened this issue Mar 18, 2021 · 8 comments

Comments

@amotl
Copy link

amotl commented Mar 18, 2021

Dear Sphinx developers,

first things first: Thanks a stack for your paramount work on Sphinx. You already saved many souls of people writing technical documentation and probably also beyond this audience.

We just observed a minor woe with Sphinx' linkchecker we wanted to share with you. We really like that the linkchecker is able to check anchors within HTML documents as contributed by @intgr on behalf of #842.

With kind regards,
Andreas.


Describe the bug
We had the link [1] in our documentation, and, maybe after upgrading to more recent versions of Sphinx, the linkchecker suddenly started croaking on that. After changing it to [2], it worked again. When inspecting the source code of the respective HTML page, you can clearly see that the anchor name #user-content-make-changes defined by

<a name="user-content-make-changes"></a>
<a id="user-content-make-changes" class="anchor" aria-hidden="true" href="#make-changes">

is technically correct. However, it apparently has worked before by referencing #make-changes. So, we are wondering if something changed on GitHub's reStructuredText renderer or even Browsers interpreting the HTML link anchors differently. When invoking those links [1,2] in the Browser, actually both work, including navigation to the appropriate place within the page. Funny, hm?

[1] https://github.com/crate/crate-docs-theme/blob/master/DEVELOP.rst#make-changes
[2] https://github.com/crate/crate-docs-theme/blob/master/DEVELOP.rst#user-content-make-changes

Expected behavior
Technically, from the perspective we know our way around HTML, the behavior is probably the right thing and correct.

However, as we can see, something might have been changed on the HTML standard that Browsers are capable of interpreting different styles of defining link anchors. So, it might be worth to revisit this space and maybe improve the linkchecker implementation on those aspects.

Environment info

  • OS: Linux
  • Python version: 3.9.2
  • Sphinx version: 3.5.2
  • Firefox: 86.0
@tk0miya
Copy link
Member

tk0miya commented Mar 18, 2021

If my understanding is correct, the behavior of anchors in GH came from JavaScript works. There is no anchors without "user-content" prefix in the HTML output. So it's difficult to check the anchor.

Do you remember what version have you used before the update? I'd like to try to reproduce why the old version can handle it correctly.

@amotl
Copy link
Author

amotl commented Mar 18, 2021

Hi Takeshi,

thank you for your quick response. I will try to answer your question thoroughly but please bear with me that I might not know every detail.

So, as far as I am involved, I know that we are currently upgrading from Sphinx 1 to Sphinx 3. We can see from crate/crate-docs@1650a50, that we probably used Sphinx 1.7.4 beforehand.

However, I am currently not sure if something might have changed on GitHub's reStructuredText rendering. So, taking that moving target into account, maybe Sphinx isn't even responsible at all for seeing our CI croaking on this detail now. The upgrading to Sphinx 3, which revealed this, might just have been a coincidence.

There is no anchors without "user-content" prefix in the HTML output.

Exactly.

If my understanding is correct, the behavior of anchors in GH came from JavaScript works.

I already also thought about whether some JavaScript served by GitHub might be involved to make those anchors (here: #make-changes) work (again). However, I haven't investigated into this direction yet, so I can't say for sure.

So it's difficult to check the anchor.

Absolutely. I also believe that there might be nothing we can do about it. Nevertheless, I wanted to share this story as a reference point for others. Maybe it can also serve as an anchor [sic!] to point this out to the GitHub developers, to check if they had some recent changes on their content rendering engine and whether they might be keen on rethinking the respective updates.

With kind regards,
Andreas.

@amotl
Copy link
Author

amotl commented Mar 19, 2021

Hi Takeshi,

apparently, anchors with "user-content" prefix are already around for quite some time, as the references at github/markup#425, MicrosoftDocs/azure-devops-docs#6015, go-gitea/gitea#11896 and go-gitea/gitea#12062 are outlining.

There even seem to be recipes like [1,2] and software solutions like [3,4] which show a way how to apply workarounds for that dilemma.

While I still don't know why this hasn't tripped the linkchecker before on our end, as the implementation coming from e0e9d2a7 seems to be around since Sphinx 1.2 already, I quickly wanted to share those observations and findings with you.

With kind regards,
Andreas.

[1] https://gist.github.com/borekb/f83e48479aceaafa43108e021600f7e3
[2] https://github.com/sergeiudris/lab.gtihub-markdown-big-doc-header-name-uniqueness-dilemma-2020-09-05
[3] https://github.com/samjabrahams/anchorhub
[4] https://github.com/Flet/markdown-it-github-headings

@tk0miya
Copy link
Member

tk0miya commented Mar 20, 2021

I tried to check the anchors with Sphinx-1.7.4. And it also fails as the newest one does.

root@95d7d9cd8d94:/doc# make clean linkcheck
Removing everything under '_build'...
Running Sphinx v1.7.4
making output directory...
loading pickled environment... not yet created
building [mo]: targets for 0 po files that are out of date
building [linkcheck]: targets for 1 source files that are out of date
updating environment: 1 added, 0 changed, 0 removed
reading sources... [100%] index
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] index
(line    9) broken    https://github.com/crate/crate-docs-theme/blob/master/DEVELOP.rst#make-changes - Anchor 'make-changes' not found
(line    9) ok        https://github.com/crate/crate-docs-theme/blob/master/DEVELOP.rst#user-content-make-changes

build finished with problems.
make: *** [Makefile:20: linkcheck] Error 1

I created a sphinx project via sphinx-quickstart command and modified index.rst only as following:

.. p documentation master file, created by
   sphinx-quickstart on Sat Mar 20 14:18:21 2021.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

Welcome to p's documentation!
=============================

https://github.com/crate/crate-docs-theme/blob/master/DEVELOP.rst#make-changes
https://github.com/crate/crate-docs-theme/blob/master/DEVELOP.rst#user-content-make-changes

I still don't understand why the linkcheck was succeeded before the update...

@amotl
Copy link
Author

amotl commented Mar 20, 2021

Hi Takeshi,

thanks for taking the time to look into this. The most probable reason why that didn't croak before is probably that we might have turned warnings into errors by using the -W command line option now. Otherwise, I wouldn't be able to explain it either.

So, I believe it is totally fine to close this issue.

However, if you feel we could improve the Sphinx linkchecker on this matter, I will be happy to submit a patch which takes that topic into account by adding the user-content- prefix as another candidate to evaluate valid anchors against, if that would be appropriate at all.

Given that users would still like to use the short anchor variant (like #make-changes) for writing down the links (as it will work on Browsers), it would be nice to have that kind of convenience that the linkchecker will not croak on this. Maybe we can attach this feature through another linkcheck_anchors configuration setting like linkcheck_anchors_prefixes?

With kind regards,
Andreas.

@tk0miya
Copy link
Member

tk0miya commented May 22, 2021

I have some contradictory opinions for this:

  • Adding a new confval to prefix the anchor is meaningless. AFAIK, this behavior is only needed for GH. No reason to be configurable.
  • It's nice to add code to modify the anchors if the baseurl is GH, instead of configuration. It must be worthy.
  • But, I hesitate to add code for the specific website to Sphinx.
    • Because it should track the change of the structure of the website (in this case, GH). It will break suddenly if the structure of the website has changed. So it's a bit fragile. It's tough to support.
    • If we give the special treatment for the specific website, it will bring another special treatment for the website. Somebody will say "Why not supporting this site?".
  • I understand GH is very commonly used website. So it's very worthy to give special treatment. And it helps many users. In other words, I think not supporting GH is a kind of defect.
    • We already have sphinx.ext.githubpages!

tk0miya added a commit to tk0miya/sphinx that referenced this issue May 22, 2021
@tk0miya tk0miya added this to the 4.1.0 milestone May 31, 2021
@tk0miya tk0miya closed this as completed in 92335bd Jun 3, 2021
tk0miya added a commit that referenced this issue Jun 3, 2021
Close #9016: linkcheck builder failed to check the anchors of github.com
@amotl
Copy link
Author

amotl commented Jun 3, 2021

Dear Takeshi,

thank you very much for #9260.

With kind regards,
Andreas.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 10, 2021
@tk0miya tk0miya reopened this Jul 18, 2021
@tk0miya
Copy link
Member

tk0miya commented Jul 18, 2021

Unfortunately, #9260 causes trouble on checking normal anchors (see #9435). So I determined to disable this feature temporarily (#9467). So I reopened this to resolve this issue again.

@tk0miya tk0miya modified the milestones: 4.1.0, 4.2.0 Jul 18, 2021
@tk0miya tk0miya modified the milestones: 4.2.0, 4.3.0 Sep 12, 2021
@tk0miya tk0miya modified the milestones: 4.3.0, 4.4.0 Nov 8, 2021
@tk0miya tk0miya modified the milestones: 4.4.0, 4.5.0 Jan 12, 2022
@tk0miya tk0miya modified the milestones: 4.5.0, 5.0.0 Mar 27, 2022
@tk0miya tk0miya modified the milestones: 5.0.0, 5.x May 22, 2022
@AA-Turner AA-Turner modified the milestones: 5.x, some future version May 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants