fbpx

Google Introduces Two Innovative Web Crawlers

2 min read

Google Announces Two New Crawlers for Scraping Images and Videos for Research and Development.

 

Google has unveiled details of two new crawlers optimized for scraping image and video content specifically for research and development purposes. While the documentation does not explicitly state this, it is presumed that blocking these new crawlers will not affect website rankings.

Additionally, it should be noted that the data collected by these crawlers is not intended for AI training purposes; that task is handled by the Google-Extended crawler.

 

GoogleOther Crawlers

 

The two new crawlers are versions of Google’s GoogleOther crawler, launched in April 2023. The original GoogleOther crawler was designated for use by Google product teams for research and development in what are described as one-off crawls. This description provides insights into the intended use of the new GoogleOther variants.

The purpose of the original GoogleOther crawler is officially described as follows:

“GoogleOther is the generic crawler that various product teams may use to fetch publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.

 

Two GoogleOther Variants

 

Google has introduced two new GoogleOther crawlers:

GoogleOther-Image 

GoogleOther-Video

 

These new variants are designed to crawl binary data, which refers to data that isn’t text. While HTML data is typically referred to as text files (ASCII or Unicode), binary files, such as image, audio, and video files, cannot be opened in a text viewer. The GoogleOther-Image and GoogleOther-Video crawlers specifically target image and video content.

Google provides user agent tokens for both new crawlers, which can be used in a robots.txt file to block them if desired.

 

  1. GoogleOther-Image
    • User agent tokens:
      • GoogleOther-Image
      • GoogleOther
    • Full user agent string:
      • GoogleOther-Image/1.0
  2. GoogleOther-Video
    • User agent tokens:
      • GoogleOther-Video
      • GoogleOther
    • Full user agent string:
      • GoogleOther-Video/1.0

 

Newly Updated GoogleOther User Agent Strings

 

Google has updated the GoogleOther user agent strings for the regular GoogleOther crawler. For blocking purposes, you can continue using the same user agent token as before (GoogleOther). The new user agent strings provide more detailed information about the crawlers, explicitly indicating the technology used. In this case, the technology is Chrome, with the version number periodically updated (W.X.Y.Z represents a placeholder for the Chrome version number).

The complete list of updated GoogleOther user agent strings is as follows:

  1. Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; GoogleOther)
  2. Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GoogleOther) Chrome/W.X.Y.Z Safari/537.36

 

 

GoogleOther Family of Bots

 

These new bots may occasionally appear in your server logs. The information will help you identify them as genuine Google crawlers and assist publishers who may wish to opt out of having their images and videos scraped for research and development purposes.

 

If you need help with thiscomplex, consider checking out our monthly SEO packages and let the experts help you.

Shilpi Mathur
navyya.shilpi@gmail.com