A groundbreaking exploit discovered by Mark Williams-Cook has uncovered over 2,000 properties Google uses to classify queries and websites. This discovery sheds light on previously hidden classifications, such as consensus scoring, refined query types, and site quality scores.
Why This Matters
This vulnerability offers unprecedented insights into Google’s search mechanisms. Earlier this year, the massive Content API Warehouse leak provided substantial revelations about Google’s ranking factors. Now, these findings further unravel the intricate layers of scoring, classification, and quality evaluation that drive Google’s algorithm.
Consensus Scoring
Google evaluates the number of passages in content that align with, contradict, or remain neutral to the “generalconsensus.” This evaluation results in a consensus score, which likely influences rankings for specific queries, particularly those aimed at debunking myths (e.g., “Is the Earth flat?”). Content with higher alignment to the general consensus may gain an edge in search results.
Query Classifications
Google organizes nearly all queries into eight refined semantic classes:
- Short Fact
- Bool (Boolean yes/no questions)
- Other
- Instruction
- Definition
- Reason
- Comparison
- Consequence (Your Money or Your Life, or YMYL queries)
These classifications guide how Google adjusts its algorithm to suit specific query types. For example, YMYL queries, which involve sensitive topics like finance and health, have different ranking weights—a practice confirmed as early as 2019.
Site Quality Scores
According to Williams-Cook, Google’s results are heavily influenced by site quality scores, calculated at the subdomain level. Factors contributing to these scores include:
- Brand Visibility: Metrics like branded searches or searches containing the brand’s name.
- User Interactions: Clicks and engagement, even when the site isn’t in Position 1.
- Anchor Text Relevance: The contextual relevance of anchor text linking to the site across the web.
Sites failing to meet a quality threshold (e.g., 0.4 on a 0-1 scale) are excluded from prominent search features like featured snippets and People Also Ask boxes.
Click Probability
While Google does not use click-through rate (CTR) directly in its ranking algorithm, it appears to factor in “click probability” for each organic result. This metric tells the likelihood of a user clicking on a specific result. Page title modifications, for instance, can influence this probability. Tools like the Google Ads Planner offer insights into estimated click-through rates, providing indirect clues to optimize for click probability.
The Data Behind the Discovery
Williams-Cook and his team analyzed an astonishing 2 terabytes of data, encompassing more than 90 million queries. In recognition of their work uncovering the endpoint vulnerability, Google awarded them $13,337.
Final Thoughts
This discovery underscores the immense complexity of Google’s search ecosystem. By understanding these intricate systems, SEO professionals can better align strategies with Google’s evolving ranking methodologies, giving them a crucial edge in the ever-competitive digital landscape.
If you still find it all difficult and confusing, check out our monthly SEO packages and let the experts handle it for you.