Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated site content if navigating to */index.html documents directly #9552

Closed
4 of 7 tasks
hofalk opened this issue Nov 15, 2023 · 9 comments · May be fixed by #10059
Closed
4 of 7 tasks

Duplicated site content if navigating to */index.html documents directly #9552

hofalk opened this issue Nov 15, 2023 · 9 comments · May be fixed by #10059
Labels
bug An error in the Docusaurus core causing instability or issues with its execution

Comments

@hofalk
Copy link

hofalk commented Nov 15, 2023

Have you read the Contributing Guidelines on issues?

Prerequisites

  • I'm using the latest version of Docusaurus.
  • I have tried the npm run clear or yarn clear command.
  • I have tried rm -rf node_modules yarn.lock package-lock.json and re-installing packages.
  • I have tried creating a repro with https://new.docusaurus.io.
  • I have read the console error message carefully (if applicable).

Description

Navigation to index.html of any page on a clean docusaurus 3.0 installation behind webserver/loadbalancer renders the page twice (e.g. site.xy/index.html or site.xy/blog/index.html). Navigation to the folder url works as expected (e.g. site.xy or site.xy/blog)

I've experienced this issue behind an AWS NLB and was able to reproduce it with a clean docusaurus 3 installation on a local nginx webserver as well (see reproduce case below). We only experienced it after upgrading to 3.0 so I am assuming it's a 3.0 only issue.

The reproduce case is utilizing docker & docker-compose for this, but it should also work on any locally installed & configured webserver. The issue does NOT occur however on running a development server, neither does it occur wen using npm run build && npm run serve

Reproducible demo

No response

Steps to reproduce

  1. create fresh docusaurus installation through: npx create-docusaurus@latest docusaurus-test classic --typescript and change to installation dir docusaurus-test
  2. run npm run build
  3. create a basic ./default.conf for nginx:
server {
    listen       80;
    listen  [::]:80;
    server_name  localhost;

    location / {
        root   /usr/share/nginx/html;
        index  index.html index.htm;
    }
}
  1. create a basic ./docker-compose.yaml service mounting the config and build:
version: '3'
services:
  docs:
    image: nginx:latest
    container_name: docs
    ports:
      - "8080:80"
    volumes:
      - ./default.conf:/etc/nginx/conf.d/default.conf
      - ./build:/usr/share/nginx/html
  1. run docker-compose up
  2. navigate to localhost:8080/index.html or localhost:8080/blog/index.html
  3. scroll down to see that the whole page is rendered twice

Expected behavior

The page should only be rendered once

Actual behavior

Every page accessed directly through */index.html is rendered twice.
Browser console does not show any errors.
image

Your environment

  • Docusaurus version used: 3.0.0 with react 18.2.0
  • Environment name and version: Chrome 119.0.6045.124, Node.js 18.13
  • Operating system and version: Ubuntu 20.04.6 LTS

Self-service

  • I'd be willing to fix this bug myself.
@hofalk hofalk added bug An error in the Docusaurus core causing instability or issues with its execution status: needs triage This issue has not been triaged by maintainers labels Nov 15, 2023
@Josh-Cena Josh-Cena added status: needs more information There is not enough information to take action on the issue. and removed status: needs triage This issue has not been triaged by maintainers labels Nov 15, 2023
@Josh-Cena
Copy link
Collaborator

Very interesting! My first thought is that this is an issue with the web server that we may be able to remedy but ultimately not our fault, but we would need to experiment. Is it possible that you put up such a site somewhere so we may take a look without setting it all up ourselves?

@hofalk
Copy link
Author

hofalk commented Nov 16, 2023

We are running the site on our company internal network, so I cannot provide you with a public link.
But if you don't want to setup docker on your machine, you can reproduce it with a simple express.js server as well.
Skip steps 3-5 of the reproduce case and instead:

  • run npm install express
  • create an index.js file in your installation folder as follows:
const express = require('express');
const path = require('path');
const app = express();

app.use(express.static(path.join(__dirname, 'build')));

app.get('/', async(req, res) => {
    res.sendFile(path.join(__dirname, 'build', 'index.html'));
});

app.listen(8080, () => {
    console.log("Server successfully running on port 8080");
});
  • run node index.js
  • navigate to localhost:8080/index.html or localhost:8080/blog/index.html
  • scroll down to see that the whole page is rendered twice

Note that it only occurs if you navigate explicitly to the index.html pages, the sendFile for the root url works as expected.

@hofalk
Copy link
Author

hofalk commented Nov 16, 2023

I was able to further drill down on the issue by bluntly changing the generated html pages.
The issue seems to be related to the execution of the runtime~main and main javascript files:

<head>
  <!-- cut for brevity -->
  <script src="/assets/js/runtime~main.054d4dfc.js" defer="defer"></script>
  <script src="/assets/js/main.b954583d.js" defer="defer"></script>
</head>

Removing those two scripts from the html fixes the page so that it's only rendered once (probably breaking other things, but that is beside the point). It also seems to work fine, if I only remove the defer attributes, though I am unsure of the implications as from what I understand the scripts will then be executed staight away.

Diffing with a v2 build, the inclusion seems to have changed by making use of the defer attribute in favor of preloading and appending at the end of the <body> tag.

Interestingly though, if I switch back to the v2 behaviour for a v3 build by moving the html around, it still breaks:

<head>
  <!-- cut for brevity -->
  <link rel="preload" href="/assets/js/runtime~main.054d4dfc.js" as="script">
  <link rel="preload" href="/assets/js/main.b954583d.js" as="script">
</head>
<body>
  <!-- ... -->
  <script src="/assets/js/runtime~main.054d4dfc.js"></script>
  <script src="/assets/js/main.b954583d.js"></script>
</body>
<!-- Still duplicate page output -->

Hope this helps in fixing the issue

@slorber
Copy link
Collaborator

slorber commented Nov 21, 2023

The problem is that you are accessing:

The problem is not the JS that we load, but the React hydration. For some reason it seems to fail to hydrate the existing markup when loaded from urls ending with index.html.

Considering you should serve your pages through their canonical urls (not using /index.html), then I'm not sure we should consider it as a bug and going to close. After all maybe we don't want to fix "bugs" that encourage using antipatterns, and prefer to fail fast so that you know you are using the antipattern in the first place.

You should configure your host so that /index.html is redirected to /.

Most static hosts support this out of the box, and otherwise it should be possible to configure it. This feature is usually called "clean urls" or "pretty urls".

@slorber slorber closed this as not planned Won't fix, can't repro, duplicate, stale Nov 21, 2023
@slorber slorber removed the status: needs more information There is not enough information to take action on the issue. label Nov 21, 2023
@hofalk
Copy link
Author

hofalk commented Nov 23, 2023

I can see how this is sufficient for most hosting setups, but it does break implementations that rely on the static content being served directly from file storage (like S3 or NFS), where you do not have a dedicated webserver sitting in front of every request.

For example: We are running our Docusaurus site behind a LoadBalancer with a few simple rules to redirect clients to the correct pages inside a S3 bucket. This is a very cost-effective way to host it, as we can re-use a single LB for as many sites as we like. But we do need to provide a fully qualified URI as there is no implicit fallback to any default pages when targeting folders.

I would even argue that for hosting a static site, the antipattern is to rely on implicit configuration over explicit URIs. And as the wording "clean urls" or "pretty urls" imply, this is more of an esthetical than a functional feature.

@Josh-Cena
Copy link
Collaborator

@hofalk The point is: a path with index.html is NOT a valid path in Docusaurus' opinion. It only happens to work due to your server configuration—it serves an HTML file that's not 404 and we work from there, but the route itself is unknown to us, so anything may happen. If you click any link on the site, you will be taken to a URL without index.html anyway. We have never supported paths with .html nor do we think we would do, because it's kind of anti-React.

@slorber
Copy link
Collaborator

slorber commented Nov 23, 2023

For example: We are running our Docusaurus site behind a LoadBalancer with a few simple rules to redirect clients to the correct pages inside a S3 bucket. This is a very cost-effective way to host it, as we can re-use a single LB for as many sites as we like. But we do need to provide a fully qualified URI as there is no implicit fallback to any default pages when targeting folders.

That doesn't look cost effective to me. A cost-effective setup is having your LB being your CDN at the edge, eg Vercel Netlify Clouflare, or Cloudfront if you really want to stick to AWS tech

I would even argue that for hosting a static site, the antipattern is to rely on implicit configuration over explicit URIs. And as the wording "clean urls" or "pretty urls" imply, this is more of an esthetical than a functional feature.

We rely on conventions and sensible defaults.

This is not implicit: the url of docs/intro.md is /intro, not /intro.html, according to our documentation.

If you want explicit URIs then you can use the slug frontmatter.

If you want to serve docs with a path including an html extension, you can do that. This prevents the double-rendering issue you reported:

---
slug: /intro.html
---

In other words: docs have a default url, but you can override it if you want. It is your responsibility to be explicit if the defaults do not fit your taste.

@hofalk
Copy link
Author

hofalk commented Nov 27, 2023

For example: We are running our Docusaurus site behind a LoadBalancer with a few simple rules to redirect clients to the correct pages inside a S3 bucket. This is a very cost-effective way to host it, as we can re-use a single LB for as many sites as we like. But we do need to provide a fully qualified URI as there is no implicit fallback to any default pages when targeting folders.

That doesn't look cost effective to me. A cost-effective setup is having your LB being your CDN at the edge, eg Vercel Netlify Clouflare, or Cloudfront if you really want to stick to AWS tech

Except you cannot use Cloudfront for internal sites in private VPCs, as it will always use public IPs. If you know of a more cost-effective solution for that on AWS, please let me know, I am very interested.

This is not implicit: the url of docs/intro.md is /intro, not /intro.html, according to our documentation.

What I meant was that the required configuration of the webserver to work with a production build of docusaurus/react is implicit. There is no URL for docs/intro.md. Once built, there is only a URL for docs/intro/index.html, and docusaurus/react implicitly relies on the webserver-configuration to serve it from docs/intro/. Which is fine. But serving the content twice without any error message, if you directly access it as docs/intro/index.html, may come unexpected to some.

In other words: docs have a default url, but you can override it if you want. It is your responsibility to be explicit if the defaults do not fit your taste.

This is unfortunately not a matter of taste for us. There is simply no way to direct an ALB to S3 for serving static content, without pointing it explicitly to the file that should be served, as there is no webserver in between which we could configure to serve the index.html file of a targeted directory by default.

In any case: As you already made clear, that this regression from v2 will not be fixed, we will be moving away from the S3 setup of hosting the site towards hosting it through ECS. That will allow us to bundle it with a dedicated webserver which we can properly configure. Unfortunately it costs a little more than doing it through S3, but we think the docusaurus v3 update in general is worth it.

And last but not least: Keep up all the good work! Loving docusaurus, especially the possibility to individually pick between mdx and md parsing that you introduced in v3.

@slorber
Copy link
Collaborator

slorber commented Apr 19, 2024

We'll try to fix this for V4 in #10059

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An error in the Docusaurus core causing instability or issues with its execution
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants