fbpx

Say Goodbye to AI Spies: New Internet Laws Block Data-Hungry Bots!

1 min read

New standards are being developed to expand the capabilities of the Robots Exclusion Protocol and Meta Robots tags, allowing publishers to block all AI crawlers from using publicly available content for training models. This proposal, led by Krishna Madhavan, Principal Product Manager at Microsoft AI, and Fabrice Canel, Principal Product Manager at Microsoft Bing, will enable publishers to block mainstream AI crawlers with a single rule. Since legitimate crawlers adhere to Robots.txt and Meta Robots directives, this proposal is ideal for content creators who want to protect their work from being used in AI training.

 

About the Internet Engineering Task Force (IETF)

 

The IETF, an international standards organization founded in 1986, oversees the development and voluntary adoption of Internet standards. The Robots Exclusion Protocol (REP), originally created in 1994, became an official IETF standard in 2019, allowing websites to manage how crawlers interact with their content. In 2022, IETF formally extended the REP to standardize its use.

 

Three Methods to Block AI Training Bots

 

The new proposal outlines three ways to block AI training crawlers:

  1. Robots.txt Protocol Enhancements The proposal extends the Robots Exclusion Protocol (REP) to cover AI training bots, adding clear options for publishers to manage AI crawlers. Though compliance with Robots.txt is voluntary, legitimate bots typically respect these rules.
  2. New Robots.txt commands include:
    • DisallowAITraining: Prevents data use in AI model training.
    • AllowAITraining: Permits data use in AI model training.
  3. Meta Robots HTML Tag The proposal also includes new Meta Robots directives for HTML elements, providing finer control over data usage by specifying these directives within HTML tags:
  4. html
  5. Copy code
  6. <meta name=”robots” content=”DisallowAITraining”>
  7. <meta name=”examplebot” content=”AllowAITraining”>
  8. Application Layer Response Headers The proposal adds new response headers for enhanced control at the server level, allowing web servers to specify rules directly in HTTP responses:
    • DisallowAITraining: Prevents use in AI model training.
    • AllowAITraining: Allows use in AI model training.

 

Empowering Publishers with Control

 

Historically, AI companies have claimed fair use in scraping public data for training purposes, drawing parallels to search engines. These new standards grant publishers greater control over data access, allowing them to distinguish between general search crawlers and AI training bots.

With these protocols, publishers gain a powerful toolset to decide whether or not their content can fuel AI models, leveling the playing field between search engines and AI systems.

 

If you’re still finding it all a bit overwhelming, take a look at our monthly SEO packages and let our experts handle it for you!

Shilpi Mathur
navyya.shilpi@gmail.com