Random Samples: On scalable RL in the era of agentic LLMs

0 views

0 0

Random Samples: On scalable RL in the era of agentic LLMs

Random Samples is a weekly seminar series that bridges the gap between cutting-edge AI research and real-world application. Designed for AI developers, data scientists, and researchers, each episode explores the latest advancements in AI and how they’re being used in production today.

This week’s topic:
On scalable RL in the era of agentic LLMs

Abstract:
As AI progresses beyond individual models toward tool-using, multi-agent systems, a new frontier is emerging—one where language models act over time, coordinate, and interface with real-world environments. These agentic systems promise to automate complex tasks, but realizing their full potential will require more than just scale. Customization, efficiency, and precise control will be essential to adapting these systems for specialized domains.

This talk will focus on the broader challenge of optimizing interactive AI behaviors, and the limitations of conventional supervised fine-tuning in such settings. I will introduce async-grpo, a novel high-performance reinforcement learning library purpose-built for training language models. Its asynchronous architecture and Group Relative Policy Optimization (GRPO) algorithm enable significant throughput improvements, opening new possibilities for efficient, scalable experimentation with adaptive systems.

Subscribe to stay ahead of the curve with weekly deep dives into AI! New episodes drop every Friday.

Date: July 9, 2025

Related videos