Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we need a robots.txt file to stop AI crawlers? #1032

Open
2 tasks done
ClJarvis opened this issue Oct 5, 2023 · 5 comments
Open
2 tasks done

Do we need a robots.txt file to stop AI crawlers? #1032

ClJarvis opened this issue Oct 5, 2023 · 5 comments
Assignees
Labels
Status: Assigned Someone is working on this issue Type: enhancement New feature or request

Comments

@ClJarvis
Copy link
Contributor

ClJarvis commented Oct 5, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Type of Change

Brand new page

URL of existing page

No response

Context for content change

Do we need a robots.txt to stop ai crawlers from training on VC content? We have a lot of content here and are adding stuff constantly.
We also have a list of members names and socials plus now approximate locations. Do we need to disallow OpenAI and Bard from training on what our members are writing?

Proposed solution

I could write a roboot file that tells the crawlers to not read our files.

Resources that can help

No response

Collaborators

No response

Code of Conduct

  • I've read the Code of Conduct and understand my responsibilities as a member of the Virtual Coffee community
@ClJarvis ClJarvis added Status: Needs Triage This issue hasn't been reviewed by maintainers yet. Type: Content Content additions or updates labels Oct 5, 2023
@danieltott
Copy link
Member

@ClJarvis in general, we do want robots to crawl the site, so that we show up on google etc. However I hadn't really thought about AI. If you can find some documentation on how to do that (tell openAI and Bard not to crawl, but allowing other bots) I'd definitely consider this.

@danieltott danieltott added Type: enhancement New feature or request Status: Discussion Not ready for development yet and removed Status: Needs Triage This issue hasn't been reviewed by maintainers yet. Type: Content Content additions or updates labels Oct 6, 2023
@ClJarvis
Copy link
Contributor Author

ClJarvis commented Oct 7, 2023

It's my understanding that we can allow google while blocking Bard/AI bots. I will find the docs I used a while ago.

@paceaux
Copy link
Sponsor Contributor

paceaux commented Apr 30, 2024

I have updated robots.txt on my own sites to forbid the AI crawlers. I would recommend it because content on VC properties is copyrighted. the VC Code is copyrighted under creative commons so I personally would not recommend contributing to any AI unless we intentionally want to.

This is what I added to my sites:

User-agent: GPTBot
Disallow: /

@danieltott
Copy link
Member

@paceaux that sounds reasonable. are there any others aside from GPTBot?

Do you think you could make a PR for us?

@paceaux
Copy link
Sponsor Contributor

paceaux commented Apr 30, 2024

@danieltott yes there's a few others. and sure, I'll do a PR.

@danieltott danieltott added Status: Assigned Someone is working on this issue and removed Status: Discussion Not ready for development yet labels Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Assigned Someone is working on this issue Type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants