Google has released updated guidance on using HTTP caching mechanisms, specifically recommending ETag headers to optimize crawling efficiency. This update, detailed in their refreshed crawler documentation, aims to help SEOs and publishers conserve resources while improving Googlebot’s effectiveness.
What’s New in the Crawler Documentation
Google’s revamped documentation now includes a section on how its crawlers leverage HTTP caching to minimize resource consumption for both publishers and Google. These updates expand significantly on prior guidance, offering actionable insights into best practices for caching.
Key Caching Mechanisms
Google highlights several caching mechanisms to help reduce unnecessary crawling:
- ETag and If-None-Match Headers: These headers signal whether content has changed, enabling efficient re-crawling.
- Last-Modified and If-Modified-Since Headers (Optional): These can also signal content updates but are less precise than ETag.
The documentation notes:
“Google’s crawling infrastructure supports heuristic HTTP caching as defined by the HTTP caching standard, specifically through the ETag response- and If-None-Match request header, and the Last-Modified response- and If-Modified-Since request header.”
Why ETag Is Preferred
Google emphasizes ETag as the preferred method over Last-Modified due to its accuracy and lower likelihood of errors, such as incorrect date formatting. When both ETag and Last-Modified headers are present, Google’s crawlers prioritize ETag per HTTP standards:
“If both ETag and Last-Modified response header fields are present in the HTTP response, Google’s crawlers use the ETag value as required by the HTTP standard.”
It’s important to note that other HTTP caching directives are not supported by Google’s crawlers.
Differences in Crawler Support
Support for caching varies across Google’s crawlers:
- Googlebot: Fully supports caching for re-crawling URLs in Google Search.
- Storebot-Google: Offers limited caching support under specific conditions.
Google elaborates:
“Individual Google crawlers and fetchers may or may not make use of caching, depending on the needs of the product they’re associated with. For example, Googlebot supports caching when re-crawling URLs for Google Search, and Storebot-Google only supports caching in certain conditions.”
Implementation Tips
Google encourages publishers to collaborate with hosting or CMS providers to implement proper caching. While not mandatory, Google suggests setting the max-age field in the Cache-Control response header to guide crawlers on when to revisit URLs.
For more details on optimizing crawl efficiency, see [9 Tips to Optimize Crawl Budget for SEO].
By adopting these updated caching recommendations, publishers can improve their sites’ crawling efficiency, conserve server resources, and stay aligned with Google’s best practices.
If it all still feels overwhelming, don’t worry—our monthly SEO packages are here to help. Let the experts handle it foryou!