Distributed inference with llm-d’s “well-lit paths”

0 views

0 0

Distributed inference with llm-d’s “well-lit paths”

Large language models like DeepSeek-R1 need a large amount of parameters to perform complex tasks, creating the need for a distributed hardware system. Such a system requires distributed inference to optimize performance. Enter llm-d, an open source framework for distributed LLM inference.

Join Robert Shaw, Red Hat’s Director of Engineering for AI, as he dives into llm-d’s well-lit paths approach—a straightforward and efficient way to manage LLM inference distribution and meet the demands of large-scale AI workloads.

00:00 Introduction
00:43 The Enterprise Generative AI Inference Platform Stack
04:36 The llm-d Architecture Overview
08:39 Introducing Well-Lit Paths
09:54 Intelligent Inference Scheduling: Prefix-Aware & Load-Aware Routing
14:14 P/D Disaggregation: Splitting Prefill and Decode for Efficiency
17:45 Efficient KV Cache Transfer in VLLM with NIXL and RDMA
18:36 Flexible, Configurable Deployments with Heterogeneous Tensor Parallelism
19:32 KV Cache Management
22:58 Mixture of Experts Overview and Model Deployment
24:26 Wide Expert Parallelism (WideEP) Optimizations for MoE Scaling
27:45 Performance Summary and Closing

🔗 Read more about distributed inference: https://www.redhat.com/en/topics/ai/what-is-distributed-inference

#AI #RedHat #Kubernetes #llmd

Date: November 19, 2025