fbpx

Think You Know CDNs? Google Just Revealed a Game-Changer for SEO!

3 min read

Google recently released new documentation outlining how Content Delivery Networks (CDNs) affect search crawling and SEO. While CDNs can enhance website performance and search visibility, they can also introduce crawling challenges that need to be addressed. Here’s a breakdown of the key takeaways.

 

What Is a CDN?

 

A Content Delivery Network (CDN) is a service that caches web pages and get them from data centers closest to the user’s browser. By creating and storing copies of web pages, CDNs reduce the number of hops required to serve content from the origin server to the user. This speeds up page delivery and improves user experience.

 

How CDNs Unlock Increased Crawling

 

One significant benefit of CDNs is that they often lead to higher crawl rates from Googlebot. Google increases crawling when it detects that a site is served via a CDN, making CDNs attractive to SEOs and publishers who aim to boost the number of pages crawled.

Normally, Googlebot throttles crawling when server performance issues arise. However, a CDN’s distributed infrastructure raises the throttling threshold, allowing more pages to be crawled without overwhelming the server.

 

Initial Crawl Challenges with CDNs

 

When a CDN is implemented, the first access of a URL requires the origin server to serve the page. Google refers to this as acoldcache. For instance, if a website with over a million URLs is backed by a CDN, the origin server must serve each URL at least once towarm upthe CDN’s cache. This can place a temporary burden on the server and consume a significant portion of the site’s crawl budget.

Google provides this example:

“Even if your website is backed by a CDN, your server will need to serve those 1,000,007 URLs at least once. Only after that initial serve can your CDN help you with its caches. That’s a significant burden on yourcrawl budgetand the crawl rate will likely be high for a few days; keep that in mind if you’re planning to launch many URLs at once.”

 

When CDNs Backfire for Crawling

 

While CDNs generally improve crawling, they can sometimes block Googlebot’s access, resulting in what Google describes ashard blocksandsoft blocks.”

 

Hard Blocks

Hard blocks occur when the CDN returns server errors that signal major issues:

  • 500 Internal Server Error: Indicates a serious server problem.
  • 502 Bad Gateway: Suggests a communication error between the server and CDN.

Both errors can cause Googlebot to reduce crawl rates. Persistent errors may even lead to URLs being dropped from Google’s search index. Google recommends responding with a 503 Service Unavailable status code for temporary issues to prevent indexing problems.

Another form of hard block is arandom errorwhere the server returns a 200 OK status code while serving an error page. Google interprets these as duplicate content and may drop them from the index. Recovering from such errors can be time-consuming.

 

Soft Blocks

Soft blocks occur when a CDN shows bot-verification interstitials (e.g.,Are you human?pop-ups) to Googlebot. These interstitials should return a 503 HTTP status code, signaling a temporary issue to prevent indexing disruptions.

Google’s documentation explains:

“When the interstitial shows up, that’s all [Googlebot] sees, not your awesome site. In case of these bot-verification interstitials, we strongly recommend sending a clear signal in the form of a 503 HTTP status code to automated clients like crawlers that the content is temporarily unavailable. This will ensure that the content is not removed from Google’sindex automatically.”

 

Debugging CDN Crawling Issues

 

Google suggests using the following tools and techniques to debug crawling issues:

  1. URL Inspection Tool in Google Search Console: This tool can show how the CDN serves your web pages to Googlebot.
  2. Web Application Firewall (WAF) Controls: Check if the CDN’s firewall is blocking Googlebot’s IP addresses. Compare blocked IPs to Google’s official list of IPs to ensure Googlebot is not mistakenly blocked.

 

Google advises:

“If you need your site to show up in search engines, we strongly recommend checking whether the crawlers you care about can access your site. Remember that the IPs may end up on a blocklist automatically, without you knowing, so checking in on the blocklists every now and then is a good idea for your site’s success in search and beyond.”

By proactively managing CDN settings and monitoring Googlebot’s access, you can maximize the benefits of CDNs while avoiding common pitfalls that affect crawling and SEO.

 

 

If you’re still feeling overwhelmed, don’t worry—our monthly SEO packages are here to take the stress off your shoulders. Let the experts handle it for you!

Shilpi Mathur
navyya.shilpi@gmail.com