In our first episode of No Math AI, Akash and Isha are joined by guest research engineers, Shivchander Sudalairaj, GX Xu, and Kai Xu, to discuss a crucial topic that’s making waves in AI performance: inference-time scaling.
Simple put, inference-time scaling is a cost-effective method for improving AI model performance. Discover how this technique enhances reasoning in smaller language models, powers agentic AI, and ensures higher accuracy in mission-critical applications where precision is key.
The discussion covers how inference-time scaling boosts model performance and decision-making in AI systems. Our guests also highlight a groundbreaking research paper that unveils how a probabilistic approach to selecting the best answers in reasoning models can significantly enhance accuracy.
Read more about the research paper here: https://probabilistic-inference-scaling.github.io/
00:00 Why are people interested in inference-time scaling?
01:18 What is inference-time scaling in the context of LLMs?
07:14 What are my technology options for inference-time scaling?
11:38 How does inference-time scaling apply to enterprise settings?
17:12 Inference-time scaling vs reasoning
Tune in to learn how inference-time scaling is transforming the way AI operates in real-world scenarios.
Like and subscribe to stay up to date on the latest AI innovations.
Aligned with its commitment to open-source AI, Red Hat is proud to support and facilitate the production of this community-focused show.