Do we need a robots.txt file to stop AI crawlers? #1032

ClJarvis · 2023-10-05T18:07:30Z

Is there an existing issue for this?

I have searched the existing issues

Type of Change

Brand new page

URL of existing page

No response

Context for content change

Do we need a robots.txt to stop ai crawlers from training on VC content? We have a lot of content here and are adding stuff constantly.
We also have a list of members names and socials plus now approximate locations. Do we need to disallow OpenAI and Bard from training on what our members are writing?

Proposed solution

I could write a roboot file that tells the crawlers to not read our files.

Resources that can help

No response

Collaborators

No response

Code of Conduct

I've read the Code of Conduct and understand my responsibilities as a member of the Virtual Coffee community

danieltott · 2023-10-06T15:05:18Z

@ClJarvis in general, we do want robots to crawl the site, so that we show up on google etc. However I hadn't really thought about AI. If you can find some documentation on how to do that (tell openAI and Bard not to crawl, but allowing other bots) I'd definitely consider this.

ClJarvis · 2023-10-07T00:06:37Z

It's my understanding that we can allow google while blocking Bard/AI bots. I will find the docs I used a while ago.

paceaux · 2024-04-30T13:26:00Z

I have updated robots.txt on my own sites to forbid the AI crawlers. I would recommend it because content on VC properties is copyrighted. the VC Code is copyrighted under creative commons so I personally would not recommend contributing to any AI unless we intentionally want to.

This is what I added to my sites:

User-agent: GPTBot
Disallow: /

danieltott · 2024-04-30T13:40:25Z

@paceaux that sounds reasonable. are there any others aside from GPTBot?

Do you think you could make a PR for us?

paceaux · 2024-04-30T14:52:03Z

@danieltott yes there's a few others. and sure, I'll do a PR.

ClJarvis added Status: Needs Triage This issue hasn't been reviewed by maintainers yet. Type: Content Content additions or updates labels Oct 5, 2023

danieltott added Type: enhancement New feature or request Status: Discussion Not ready for development yet and removed Status: Needs Triage This issue hasn't been reviewed by maintainers yet. Type: Content Content additions or updates labels Oct 6, 2023

danieltott assigned paceaux Apr 30, 2024

danieltott added Status: Assigned Someone is working on this issue and removed Status: Discussion Not ready for development yet labels Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do we need a robots.txt file to stop AI crawlers? #1032

Do we need a robots.txt file to stop AI crawlers? #1032

ClJarvis commented Oct 5, 2023

danieltott commented Oct 6, 2023

ClJarvis commented Oct 7, 2023

paceaux commented Apr 30, 2024 •

edited

danieltott commented Apr 30, 2024

paceaux commented Apr 30, 2024

Do we need a robots.txt file to stop AI crawlers? #1032

Do we need a robots.txt file to stop AI crawlers? #1032

Comments

ClJarvis commented Oct 5, 2023

Is there an existing issue for this?

Type of Change

URL of existing page

Context for content change

Proposed solution

Resources that can help

Collaborators

Code of Conduct

danieltott commented Oct 6, 2023

ClJarvis commented Oct 7, 2023

paceaux commented Apr 30, 2024 • edited

danieltott commented Apr 30, 2024

paceaux commented Apr 30, 2024

paceaux commented Apr 30, 2024 •

edited