From fdf0d800081970049810e553c35a775cae0976a4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?S=C3=A9bastien=20Lorber?= Date: Wed, 17 Aug 2022 16:41:41 +0200 Subject: [PATCH] fix(sitemap): filter all routes with robots meta containing noindex (#7964) --- website/versioned_docs/version-2.1.0/seo.md | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/website/versioned_docs/version-2.1.0/seo.md b/website/versioned_docs/version-2.1.0/seo.md index 578bb7761c30..f252889cdf54 100644 --- a/website/versioned_docs/version-2.1.0/seo.md +++ b/website/versioned_docs/version-2.1.0/seo.md @@ -124,7 +124,11 @@ Read more about the robots file in [the Google documentation](https://developers :::caution -**Important**: the `robots.txt` file does **not** prevent HTML pages from being indexed. Use `` as [page metadata](#single-page-metadata) to prevent it from appearing in search results entirely. +**Important**: the `robots.txt` file does **not** prevent HTML pages from being indexed. + +To prevent your whole Docusaurus site from being indexed, use the [`noIndex`](./api/docusaurus.config.js.md#noIndex) site config. Some [hosting providers](./deployment.mdx) may also let you configure a `X-Robots-Tag: noindex` HTTP header (GitHub Pages does not support this). + +To prevent a single page from being indexed, use `` as [page metadata](#single-page-metadata). Read more about the [robots meta tag](https://developers.google.com/search/docs/advanced/robots/robots_meta_tag). ::: @@ -132,6 +136,20 @@ Read more about the robots file in [the Google documentation](https://developers Docusaurus provides the [`@docusaurus/plugin-sitemap`](./api/plugins/plugin-sitemap.md) plugin, which is shipped with `preset-classic` by default. It autogenerates a `sitemap.xml` file which will be available at `https://example.com/[baseUrl]/sitemap.xml` after the production build. This sitemap metadata helps search engine crawlers crawl your site more accurately. +:::tip + +The sitemap plugin automatically filters pages containing a `noindex` [robots meta directive](https://developers.google.com/search/docs/advanced/robots/robots_meta_tag). + +For example, [`/examples/noIndex`](/examples/noIndex) is not included in the [Docusaurus sitemap.xml file](pathname:///sitemap.xml) because it contains the following [page metadata](#single-page-metadata): + +```html + + + +``` + +::: + ## Human readable links {#human-readable-links} Docusaurus uses your file names as links, but you can always change that using slugs, see this [tutorial](./guides/docs/docs-introduction.md#document-id) for more details.