Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure Netlify deployments are not listed with their .netlify.com domain in Google and Co #1949

Closed
janpio opened this issue Mar 27, 2020 · 22 comments
Assignees
Milestone

Comments

@janpio
Copy link
Member

janpio commented Mar 27, 2020

Currently this can happen (note the .netlify.com domain instead of prisma.io):
image

Unfortunately this is not just one domain, but this is happening for many of them:
https://www.google.com/search?q=site%3Anetlify.com+prisma
image


We need to implement something that tells search Engines like Google not to list these .netlify.com domains, while still not forbidding indexing of prisma.io, v1.prisma.io etc.

For many subpage deployments we can probably just add a robots.txt that forbids indexing (old docs, old homepages etc). For the main deployment of the website we will need to think of something smarter somehow.

After this is in place, we will need to tell Google to remove these pages and track the removal.


Internal discussion with possible workarounds or solutions:
https://prisma-company.slack.com/archives/C5Z9TH6N9/p1585147685000200

@janpio janpio self-assigned this Apr 9, 2020
@janpio janpio changed the title Fix Prisma 2 Docs Netlify in Google Make sure Netlify deployments are not listed with their .netlify.com domain in Google and Co Apr 11, 2020
@janpio janpio removed their assignment Apr 11, 2020
@janpio janpio added this to the Beta 3 milestone Apr 11, 2020
@Jolg42
Copy link
Member

Jolg42 commented Apr 14, 2020

@janpio I suggest adding a robots.txt like this (deny all)

User-agent: *
Disallow: /

On all netlify deployments that are deployed as subdirectories.

And for the rest / in general use canonical urls with the production url.

@janpio
Copy link
Member Author

janpio commented Apr 14, 2020

Ok, that will require collecting and classifying all the hosts/deployments and then careful applying of these to not mess it up accidentally (especially with the Netlify redirect fallbacks in use sometimes).

@Jolg42
Copy link
Member

Jolg42 commented Apr 28, 2020

For Netlify it looks like turning the logs to private makes the pages private, so I did it for most of them except https://app.netlify.com/sites/graphql-playground/deploys as there is some activity there.

Screen Shot 2020-04-28 at 15 32 27

@janpio
Copy link
Member Author

janpio commented Apr 28, 2020

based on this Google search I found that we need to remove:

We should do this more fundamentally and also apply this to other Netlify properties that are not listed yet - or they might be one day.

@Jolg42
Copy link
Member

Jolg42 commented Apr 28, 2020

@Jolg42
Copy link
Member

Jolg42 commented Apr 28, 2020

@janpio We can probably redirect https://ru.howtographql.com/ to https://www.howtographql.com/

@Jolg42
Copy link
Member

Jolg42 commented Apr 28, 2020

@janpio Do we still use it or can we remove it? (https://app.netlify.com/sites/prisma-docs-till-1-33/)
https://prisma-docs-till-1-33.netlify.app/
I we still use it I suggest adding a robots.txt but as it's a manual deploy not linked to a git, I have no idea who should do it.

@Jolg42
Copy link
Member

Jolg42 commented Apr 28, 2020

@janpio https://prisma-docs.netlify.app is linked to https://github.com/graphcool/prisma-docs but 404, do you know where it lives now?

@janpio
Copy link
Member Author

janpio commented Apr 28, 2020

@janpio We can probably delete:
app.netlify.com/sites/homepage-v7-photonjs/overview
app.netlify.com/sites/homepage-v7-lift/overview

Because lift.prisma.io and photonjs.prisma.io are redirecting to:
2.0.0-preview020 (release)

#1949 (comment)
I set a password (WiFi password) for these: https://homepage-v7-photonjs.netlify.app/ instead of deleting, because we might want to look at these ourselves.
Not sure how Google will deal with this though.
Alternatively of course we again can add a robots.txt

@janpio We can probably redirect ru.howtographql.com to howtographql.com

Yep go ahead and fix this, maybe ask why this exists in the first place though before changing anything.

@janpio Do we still use it or can we remove it? (app.netlify.com/sites/prisma-docs-till-1-33)
prisma-docs-till-1-33.netlify.app
I we still use it I suggest adding a robots.txt but as it's a manual deploy not linked to a git, I have no idea who should do it.

Yes still in use at https://github.com/prisma/v1.prisma.io/blob/master/_redirects - @2color just deployed all these so can do that for you if needed.

@janpio prisma-docs.netlify.app is linked to graphcool/prisma-docs but 404, do you know where it lives now?

You can ask Dom for access to https://github.com/prisma/prisma-docs-v1 (which is the name of the private repo that backs this it seems) - again @2color can deploy stuff there if needed.

@2color
Copy link
Contributor

2color commented Apr 28, 2020

I can take this over as I've recently deployed the v1 docs. Shouldn't be a problem to add the robots.txt file

@Jolg42
Copy link
Member

Jolg42 commented Apr 29, 2020

I successfully added canonical on https://www.howtographql.com/ 🎊
Fun session with netlify deploys 😅

@matthewmueller matthewmueller removed their assignment Apr 30, 2020
@2color
Copy link
Contributor

2color commented Apr 30, 2020

Added robots.txt to https://prisma-docs.netlify.app/robots.txt

@Jolg42
Copy link
Member

Jolg42 commented May 4, 2020

Google currently returns "About 1.400 results" I will check again in a few days.

@janpio
Copy link
Member Author

janpio commented May 4, 2020

https://www.google.com/webmasters/tools/removals?pli=1 might help if the correct robots.txt is in place.

@Jolg42
Copy link
Member

Jolg42 commented May 12, 2020

I just requested the removals through your link @janpio 😃

@janpio janpio modified the milestones: Beta 5, Beta 6 May 12, 2020
@Jolg42
Copy link
Member

Jolg42 commented May 18, 2020

It looks like some pages disappeared and for the rest it will take more time or it's not from us.

@Jolg42 Jolg42 closed this as completed May 18, 2020
@janpio
Copy link
Member Author

janpio commented May 19, 2020

Problem still exists, now with a new TLD as well:
https://www.google.com/search?q=site%3Anetlify.app+prisma - About 1.290 results
https://www.google.com/search?q=site%3Anetlify.com+prisma - About 1.270 results

@janpio janpio reopened this May 19, 2020
@Jolg42
Copy link
Member

Jolg42 commented May 20, 2020

I requested removals from Google with both http and https urls for all of them including .com & .app we'll see how it ends up next week 🤷‍♂️

I checked https://prisma-docs-till-1-33.netlify.app/ (https://app.netlify.com/sites/prisma-docs-till-1-33/overview) and I can't find a robots.txt nor a canonical so this one need to be fixed first or maybe just redeployed if the fix was already done 🤔

Also for the number of results it's really not accurate the next pages are full of content that we don't own.

@janpio janpio modified the milestones: Beta 6, Beta 7, Beta 8 May 26, 2020
@Jolg42
Copy link
Member

Jolg42 commented Jun 2, 2020

@2color just added https://prisma-docs-till-1-33.netlify.app/robots.txt 🎊
I'm requesting deletion from Google for it and closing for good as the other results have a canonical and requesting a Google update doesn't work, it will just disappear one day 🤷‍♂️

@Jolg42 Jolg42 closed this as completed Jun 2, 2020
@janpio
Copy link
Member Author

janpio commented Jun 2, 2020

@janpio janpio reopened this Jun 2, 2020
@Jolg42
Copy link
Member

Jolg42 commented Jun 4, 2020

@janpio We can't put a robots.txt on these pages because it will also apply to the production urls and get all the content down.

For these we added a canonical url but it seems that Google is kind of slow at picking that up and the removal tool is not cooperative on that either 🤷‍♂️

@Jolg42 Jolg42 closed this as completed Jun 4, 2020
@janpio janpio modified the milestones: Beta 8, Beta new 8 Jun 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants