Find ways to limit and/or block LLM/AI bots scraping our content #51

Ambient-Impact · 2024-03-01T13:19:14Z

robots.txt is purely voluntary compliance, so there's no realistic guarantee that most or all bots will adhere to it, leaving us with fully blocking their requests on our end as the main option. In a perfect world this responsibility should fall on the organizations running said bots, but I digress.

Requirements

Blocks requests from most known and possibly unknown but misbehaving bots.
Does not block requests from bots that behave themselves, e.g. the Internet Archive's crawler, community tools to capture page snapshots for gameplay purposes, etc.

Solutions

A combination of methods may be needed, such as Cloudflare rules, robots.txt rules (for those that obey them), and Drupal/PHP blocking via contrib and/or custom Drupal modules.

Links

The text was updated successfully, but these errors were encountered:

Ambient-Impact added enhancement New feature or request security Security improvements labels Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find ways to limit and/or block LLM/AI bots scraping our content #51

Find ways to limit and/or block LLM/AI bots scraping our content #51

Ambient-Impact commented Mar 1, 2024

Find ways to limit and/or block LLM/AI bots scraping our content #51

Find ways to limit and/or block LLM/AI bots scraping our content #51

Comments

Ambient-Impact commented Mar 1, 2024

Requirements

Solutions

Links