Join us for our next vLLM Office Hours on July 31 at 2:00pm EST! These bi-weekly sessions are your chance to stay up to date on the latest in the vLLM ecosystem, ask questions, and hear directly from contributors and power users.
This week’s special topic: Scaling MoE models with llm-d
Join Robert Shaw and Tyler Smith, vLLM core committers from Red Hat, as they walk through our work on scaling Mixture-of-Experts (MoE) models with llm-d. They’ll cover:
1. How llm-d enables wide expert-parallel (EP) MoE deployments with vLLM
2. How to leverage prefill/decode (P/D) disaggregation for more efficient cluster-scale inference
3. Early insights and lessons learned from real-world, multi-node MoE deployments
llm-d GitHub: https://github.com/llm-d/llm-d