Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐞] 404 in Search Console reported for /~partytown/partytown-sandbox-sw.html?{timestamp} #546

Open
MTheProgrammer opened this issue Jan 31, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@MTheProgrammer
Copy link

MTheProgrammer commented Jan 31, 2024

Describe the bug

Hello, other people mentioned this problem, but they couldn't reproduce the bug:

In Google Search Console every crawl introduces new 404 pages for /~partytown/partytown-sandbox-sw.html?XXX url:
image

Reproduction

https://lucidmodules.com/~partytown/partytown-sandbox-sw.html?1706003192708

Steps to reproduce

This might vary depending on whether you've already been on this page and web worker has been installed. However, when you clean the browser cache it should be as follows.

404 page on hard reload/first time download:
404 page

The correct page returned after worker has been installed:
partytown worker content

Browser Info

Chrome

Additional Information

Maybe adding <meta name=”robots” content=”noindex,nofollow“> to the head would solve the issue with google bot trying to index the partytown-sandbox-sw.html page.

@MTheProgrammer MTheProgrammer added the bug Something isn't working label Jan 31, 2024
@gioboa
Copy link
Collaborator

gioboa commented Jan 31, 2024

This solution makes sense to me. Is there any specific reason to index this html, I don't think so. Do you have the possibility to change your html file and verify if it works correctly?

@MTheProgrammer
Copy link
Author

I've forked the repo and updated the code that generates the html:
5061997

In Gatsby, these files are copied from the partytown to the static directory:

function setupPartytown() {
  const path = require("path");
  const { copyLibFiles } = require("@builder.io/partytown/utils");

  exports.onPreBuild = async () => {
    await copyLibFiles(path.join(__dirname, "static", "~partytown"));
  };
}

I'll check GSC after few days to verify whether this page is still being indexed.

@MTheProgrammer
Copy link
Author

It doesn't seem to help:

image

@gioboa
Copy link
Collaborator

gioboa commented Feb 17, 2024

@MTheProgrammer I see 🤔 maybe is the ~partytown folder. Can you try to remove the ~ from the folder name pls?

@MTheProgrammer
Copy link
Author

MTheProgrammer commented Feb 24, 2024

That's the official documentation with ~patytown directory: https://partytown.builder.io/gatsby#copy-library-files
You mean to change the folder name in all places where it is used?

The page ~partytown/partytown-sandbox-sw.html is dynamic, the static folder contains only .js files:
image

My guess is that Google robot crawls the page without cache and without cache it returns 404 - because Partytown worker has not yet been installed.

Every page includes iframe with the link to the Partytown. However, attribute rel="nofollow" is not valid as in the anchor tag <a href=www.example.com rel="nofollow">
image

EDIT: I'm testing a hack with empty physical ~/partytown/partytown-sandbox-sw.html file containing noindex,nofollow directive. When worker is ready it returns the correct dynamic page.

@gioboa
Copy link
Collaborator

gioboa commented Feb 24, 2024

I see, great research. So I'm wondering how serve a different/valid html for the crawler but preserve the Partytown code in the html 🤔

@f33w
Copy link

f33w commented Apr 16, 2024

did noindex, nofollow the script folder help? Facing the same issue in GSC
@MTheProgrammer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants