New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure Netlify deployments are not listed with their .netlify.com domain in Google and Co #1949
Comments
@janpio I suggest adding a robots.txt like this (deny all)
On all netlify deployments that are deployed as subdirectories. And for the rest / in general use canonical urls with the production url. |
Ok, that will require collecting and classifying all the hosts/deployments and then careful applying of these to not mess it up accidentally (especially with the Netlify redirect fallbacks in use sometimes). |
It looks like we have about 3000 urls that should not be indexed, based on this Google search I found that we need to remove:
I also found these public netlify pages, I didn't these could be public 🙃
|
We should do this more fundamentally and also apply this to other Netlify properties that are not listed yet - or they might be one day. |
@janpio We can probably delete: Because lift.prisma.io and photonjs.prisma.io are redirecting to: |
@janpio We can probably redirect https://ru.howtographql.com/ to https://www.howtographql.com/ |
@janpio Do we still use it or can we remove it? (https://app.netlify.com/sites/prisma-docs-till-1-33/) |
@janpio https://prisma-docs.netlify.app is linked to https://github.com/graphcool/prisma-docs but 404, do you know where it lives now? |
#1949 (comment)
Yep go ahead and fix this, maybe ask why this exists in the first place though before changing anything.
Yes still in use at https://github.com/prisma/v1.prisma.io/blob/master/_redirects - @2color just deployed all these so can do that for you if needed.
You can ask Dom for access to https://github.com/prisma/prisma-docs-v1 (which is the name of the private repo that backs this it seems) - again @2color can deploy stuff there if needed. |
I can take this over as I've recently deployed the v1 docs. Shouldn't be a problem to add the |
I successfully added canonical on |
Added |
Google currently returns "About 1.400 results" I will check again in a few days. |
https://www.google.com/webmasters/tools/removals?pli=1 might help if the correct robots.txt is in place. |
I just requested the removals through your link @janpio 😃 |
It looks like some pages disappeared and for the rest it will take more time or it's not from us. |
Problem still exists, now with a new TLD as well: |
I requested removals from Google with both http and https urls for all of them including .com & .app we'll see how it ends up next week 🤷♂️ I checked https://prisma-docs-till-1-33.netlify.app/ (https://app.netlify.com/sites/prisma-docs-till-1-33/overview) and I can't find a robots.txt nor a canonical so this one need to be fixed first or maybe just redeployed if the fix was already done 🤔 Also for the number of results it's really not accurate the next pages are full of content that we don't own. |
@2color just added https://prisma-docs-till-1-33.netlify.app/robots.txt 🎊 |
@janpio We can't put a robots.txt on these pages because it will also apply to the production urls and get all the content down. For these we added a canonical url but it seems that Google is kind of slow at picking that up and the removal tool is not cooperative on that either 🤷♂️ |
Currently this can happen (note the .netlify.com domain instead of prisma.io):
Unfortunately this is not just one domain, but this is happening for many of them:
https://www.google.com/search?q=site%3Anetlify.com+prisma
We need to implement something that tells search Engines like Google not to list these .netlify.com domains, while still not forbidding indexing of prisma.io, v1.prisma.io etc.
For many subpage deployments we can probably just add a robots.txt that forbids indexing (old docs, old homepages etc). For the main deployment of the website we will need to think of something smarter somehow.
After this is in place, we will need to tell Google to remove these pages and track the removal.
Internal discussion with possible workarounds or solutions:
https://prisma-company.slack.com/archives/C5Z9TH6N9/p1585147685000200
The text was updated successfully, but these errors were encountered: