In May, during an interview at the SERP Conf 2024 in Bulgaria, Google’s Gary Illyes discussed the causes of the dreaded “crawled but not indexed” errors. He provided multiple reasons that are essential for debugging and fixing this issue.
Despite the interview occurring in May, the video went largely unnoticed. I became aware of it thanks to Olesia Korobka (@Giridja), who recently highlighted it in a Facebook post. Even though the interview was a few months ago, the insights remain timely and valuable.
Reason for Crawled – Currently Not Indexed
“Crawled Currently Not Indexed” refers to an error report in the Google Search Console Page Indexing report, indicating that a page was crawled by Google but not indexed.
During a live interview, a question was submitted:
“Can ‘crawled but not indexed’ be a result of a page being too similar to other content already indexed? Is Google suggesting enough similar content, making your page not unique enough?”
Google’s Search Console documentation doesn’t clearly answer why Google might crawl a page but not index it, so this is a legitimate question. Gary Illyes responded that one of the reasons could be the presence of similar content. However, he noted that there are other factors as well.
He explained:
“Yeah, that could be one reason. Ideally, we would break up that category into more granular chunks, but it’s super hard because of how the data internally exists.
It can be a bunch of things. Duplicate elimination is one of those factors where we crawl a page and decide not to index it because there’s already a version of that content or a remarkably similar version with better signals in our index.
But yeah, it can be multiple things.”
General Quality of Site Can Impact Indexing
Gary Illyes highlighted another reason why Google might crawl but not index a site, pointing to overall site quality as a potential factor.
He elaborated:
“And the site’s general quality can significantly impact the number of ‘crawled but not indexed’ pages you see in Search Console. If you notice many of these URLs, it could indicate general quality issues.
Since February, I’ve seen instances where we decided not to index many URLs because our perception of the site’s quality had changed.”
Other Reasons for Crawled Not Indexed
Gary Illyes discussed additional reasons why URLs might be crawled but not indexed, noting that it could be due to changes in Google’s perception of the site or potential technical issues. He explained:
“One possibility is that when you see that number rising, Google’s perception of the site has changed. However, there could also be technical errors, such as the site serving the same page for every URL. That could also contribute to the increase in ‘crawled but not indexed’ pages.
So, there could be many factors at play.”
Takeaways
Gary Illyes provided valuable insights to help debug why a web page might be crawled but not indexed by Google:
- Similar Content: The content may be too similar to content already ranked in the search engine results pages (SERPs).
- Duplicate Content with Better Signals: The same content exists on another site with better signals.
- General Site Quality Issues: The site’s overall quality could impact indexing.
- Technical Issues: There could be technical issues affecting indexing.
Although Illyes didn’t elaborate on what he meant by saying that another site has better signals, it likely refers to scenarios where a site syndicates its content to another site. Google chooses to rank the other site over the original publisher.
If you find the SEO process overwhelming, consider exploring our monthly SEO packages and letting our experts assist you.