Google Announces Two New Crawlers for Scraping Images and Videos for Research and Development.
Google has unveiled details of two new crawlers optimized for scraping image and video content specifically for research and development purposes. While the documentation does not explicitly state this, it is presumed that blocking these new crawlers will not affect website rankings.
Additionally, it should be noted that the data collected by these crawlers is not intended for AI training purposes; that task is handled by the Google-Extended crawler.
GoogleOther Crawlers
The two new crawlers are versions of Google’s GoogleOther crawler, launched in April 2023. The original GoogleOther crawler was designated for use by Google product teams for research and development in what are described as one-off crawls. This description provides insights into the intended use of the new GoogleOther variants.
The purpose of the original GoogleOther crawler is officially described as follows:
“GoogleOther is the generic crawler that various product teams may use to fetch publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.”
Two GoogleOther Variants
Google has introduced two new GoogleOther crawlers:
GoogleOther-Image
GoogleOther-Video
These new variants are designed to crawl binary data, which refers to data that isn’t text. While HTML data is typically referred to as text files (ASCII or Unicode), binary files, such as image, audio, and video files, cannot be opened in a text viewer. The GoogleOther-Image and GoogleOther-Video crawlers specifically target image and video content.
Google provides user agent tokens for both new crawlers, which can be used in a robots.txt file to block them if desired.
- GoogleOther-Image
- User agent tokens:
- GoogleOther-Image
- GoogleOther
- Full user agent string:
- GoogleOther-Image/1.0
- User agent tokens:
- GoogleOther-Video
- User agent tokens:
- GoogleOther-Video
- GoogleOther
- Full user agent string:
- GoogleOther-Video/1.0
- User agent tokens:
Newly Updated GoogleOther User Agent Strings
Google has updated the GoogleOther user agent strings for the regular GoogleOther crawler. For blocking purposes, you can continue using the same user agent token as before (GoogleOther). The new user agent strings provide more detailed information about the crawlers, explicitly indicating the technology used. In this case, the technology is Chrome, with the version number periodically updated (W.X.Y.Z represents a placeholder for the Chrome version number).
The complete list of updated GoogleOther user agent strings is as follows:
- Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; GoogleOther)
- Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GoogleOther) Chrome/W.X.Y.Z Safari/537.36
GoogleOther Family of Bots
These new bots may occasionally appear in your server logs. The information will help you identify them as genuine Google crawlers and assist publishers who may wish to opt out of having their images and videos scraped for research and development purposes.
If you need help with thiscomplex, consider checking out our monthly SEO packages and let the experts help you.