
Microsoft has introduced significant updates to Bing’s search infrastructure, integrating advanced language models and optimization techniques. The updates promise faster, more accurate search results while reducing operational costs.
A New Era of Search: LLMs and SLMs Combined
Bing’s innovative approach combines Large Language Models (LLMs) and Small Language Models (SLMs) to redefine search capabilities. According to Microsoft:
“At Bing, we are always pushing the boundaries of search technology. Leveraging both Large Language Models (LLMs) and Small Language Models (SLMs) marks a significant milestone in enhancing our search capabilities. While transformer models have served us well, the growing complexity of search queries necessitated more powerful models.”
This hybrid model aims to overcome the challenges associated with LLMs, such as high costs and slower speeds. SLMs, trained by Bing, are 100 times faster than LLMs, enabling quicker and more efficient processing of search queries.
NVIDIA Technology Optimizes Performance
Bing leverages NVIDIA TensorRT-LLM technology to enhance the performance of its SLMs. TensorRT-LLM reduces the time and cost of running large models on NVIDIA GPUs, leading to significant gains in search performance.
Before optimization, Bing’s original transformer model had a 95th percentile latency of 4.76 seconds per batch (20 queries) and a throughput of 4.2 queries per second per instance. Post-optimization with TensorRT-LLM, latency dropped to 3.03 seconds per batch, and throughput increased to 6.6 queries per second per instance. This represents a 36% reduction in latency and a 57% decrease in operational costs.
Enhanced Deep Search Capability
The integration of SLMs and TensorRT-LLM has bolstered Bing’s “Deep Search” feature. Deep Search utilizes real-time processing to deliver highly relevant web results, offering users a more efficient search experience without compromising on quality.
Microsoft emphasized:
“Our product is built on the foundation of providing the best results, and we will not compromise on quality for speed. TensorRT-LLM reduces model inference time and, consequently, the end-to-end experience latency without sacrificing result quality.”
Key Benefits for Bing Users
The updates bring several advantages for Bing users:
- Faster Search Results: Optimized inference reduces response times, delivering results quicker than ever.
- Improved Accuracy: Advanced SLM capabilities provide more contextual and precise results.
- Cost Efficiency: Lower operational costs enable reinvestment in innovation and further improvements.
Why This Matters for Search
As search queries grow increasingly complex, the ability to process and respond effectively is critical. Bing’s adoption of LLMs and SLMs, combined with NVIDIA’s optimization technology, sets a new standard for the search industry. While the long-term impact remains to be seen, these updates position Bing as a serious contender in the evolving search landscape.
If you’re still feeling overwhelmed, why not leave it to the experts? Explore our monthly SEO packages and let us handle the hard work for you!