Skip to content
This repository has been archived by the owner on Jan 29, 2024. It is now read-only.

Canonical tags ending with .html creates redirect-canonical loop and makes the pages non-indexable #1691

Open
angelinekwan opened this issue Jan 11, 2023 · 0 comments

Comments

@angelinekwan
Copy link
Collaborator

angelinekwan commented Jan 11, 2023

What's wrong?

In all pages, the canonical tags of docs.aiven.io are pointing to redirected URL versions ending with .html. For example, https://docs.aiven.io/docs/tools/api has a canonical to https://docs.aiven.io/docs/tools/api.html. This creates a redirect-canonical loop and makes the pages non-indexable.

To reproduce the issues:

Canonical tag in all pages

  1. Navigate to any doc site for example https://docs.aiven.io/ and inspect for canonical
Screenshot 2023-05-30 at 15 38 58
  1. Notice the https://docs.aiven.io/index.html will redirect to https://docs.aiven.io

  2. Search engine doesn't like this and consider this a redirect loop. The canonical url should be the final redirect url in this case <link rel="canonical" href="https://docs.aiven.io">

  3. Similarly for other pages like https://docs.aiven.io/docs/platform which has wrong canonical <link rel="canonical" href="https://docs.aiven.io/docs/platform.html">. The expected canonical is without .html <link rel="canonical" href="https://docs.aiven.io/docs/platform">

Sitemap issue

  1. Open the sitemap https://docs.aiven.io/sitemap.xml and notice all the <loc> has .html and they all redirect to non html path.
<url>
<loc>https://docs.aiven.io/docs/platform.html</loc>
</url>
  1. Search engine doesn't like this and consider this a redirect loop. The canonical url should be the final redirect url in this case
<url>
<loc>https://docs.aiven.io/docs/platform</loc>
</url>

Expected behaviour

URL of affected page, and any other information

Issue in all pages. This issue appeared after migration from Netlify to Cloudflare Page on 22.12.2022.
Apparently, Cloudflare Pages auto redirect HTML pages to their extension-less counterparts: for instance, /contact.html will be redirected to /contact, and /about/index.html will be redirected to /about/ - documentation

Notes

  • the canonical url is added automatically by sphinx based on html_baseurl.
  • sitemap is generated with sphinx extension - sphinx-sitemap based on html_baseurl

Tested

Serve page without .html extension.
Reference
-b dirhtml - Build HTML pages, but with a single directory per document. Makes for prettier URLs (no .html) if served from a webserver. However the canonical url is still pointing to html extension - a known bug.

Remove the default canonical url
By removing html_baseurl in conf.py and add hardcoded canonical tag in _templates/base.html. This will break the sitemap without hostname.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant