sebae banner ad-300x250
sebae intro coupon 30 off
sebae banner 728x900
sebae banner 300x250

Inside distributed inference with llm-d ft. Carlos Costa | Technically Speaking with Chris Wright

0 views
0%

Inside distributed inference with llm-d ft. Carlos Costa | Technically Speaking with Chris Wright

Deploying models on one server is easy, but how do we achieve production-grade distributed inference at scale? This isn’t just about adding more machines; it requires a new kind of intelligence in the infrastructure itself.

In this episode, Red Hat CTO Chris Wright sits down with Carlos Costa, Distinguished Engineer at IBM Research, a key figure in the llm-d project. They discuss how llm-d is extending Kubernetes to create a common, open-source control plane to manage complex AI workloads, the importance of disaggregating prefill and decode, and how the community is solving the challenges of running massive models like Mixture of Experts (MoE) efficiently.

00:54 – From high-performance computing to distributed AI
05:30 – The evolution from scaled-out training to scaled-out inference
07:20 – The origin story of the llm-d project
10:47 – Building a community around a shared vision for AI infrastructure
14:39 – How llm-d extends Kubernetes for AI workloads
16:25 – Supporting Mixture of Experts (MoE) models with wide parallelism
18:24 – Fine-tuning resource management for cost-effective AI
21:39 – The future of llm-d and how to get involved
25:27 – The power of open source in production-grade AI

Learn More:
Explore the llm-d project on GitHub: https://github.com/llm-d/llm-d
Get a more technical deep dive on Kubernetes-native distributed inferencing with llm-d: https://developers.redhat.com/articles/2025/05/20/llm-d-kubernetes-native-distributed-inferencing

Follow us:
Chris Wright: https://www.linkedin.com/in/chris-wright-b733851/
Chris Wright: https://twitter.com/kernelcdub
Carlos Costa: https://www.linkedin.com/in/carlos-costa-9b9b1a1/

What is Technically Speaking?
Technically Speaking taps into emerging technology trends with insights from leading experts across the globe and Red Hat CTO Chris Wright. The series blends deep-dive discussions, tech updates, and creative short-form content, solidifying Red Hat’s role as a pioneer in technology innovation and open source thought leadership.

Want to participate? Leave us a comment if there’s a topic or a guest you’d like to see featured.

Watch More Technically Speaking:

YouTube Playlist: https://www.youtube.com/playlist?list=PLbMP1JcGBmSGMPKkLoq6CGwwKDxl9VvBE
Show Page: https://www.redhat.com/en/technically-speaking
Subscribe to Red Hat’s YouTube channel: https://www.youtube.com/redhat/?sub_confirmation=1

Date: August 6, 2025