Random Samples is a weekly seminar series that bridges the gap between cutting-edge AI research and real-world application. Designed for AI developers, data scientists, and researchers, each episode explores the latest advancements in AI and how they’re being used in production today.
This week’s topic: Instance-Adaptive Inference-Time Scaling with Calibrated Process Reward Models
Abstract:
Process reward models (PRMs) play a central role in guiding inference-time scaling algorithms for LLMs. However, even state-of-the-art PRMs can be poorly calibrated. To address this, we present a calibration approach that adjusts PRM outputs to better align with true success probabilities. We introduce an instance-adaptive scaling framework that dynamically adjusts the inference budget based on the estimated likelihood that a partial reasoning trajectory will yield a correct final answer. Experiments on math reasoning benchmarks show that (i) our PRM calibration method achieves small calibration error, outperforming the baseline methods, (ii) calibration is crucial for enabling effective adaptive scaling, and (iii) the proposed IAS strategy reduces inference costs while maintaining final accuracy.
Speaker Bio: Young-Jin Park is a PhD candidate at MIT’s Laboratory for Information and Decision Systems (LIDS), where he develops methods for making AI systems more reliable at scale. His research spans inference-time scaling, AI safety and alignment, and sequential decision-making, with recent focus on uncertainty calibration for process reward models for efficient LLM reasoning. Prior to his doctoral studies, he worked as a research engineer at NAVER AI Lab from 2019 to 2022.
Subscribe to stay ahead of the curve with weekly deep dives into AI! New episodes drop every Friday.