Join us for our next vLLM Office Hours on November 20, 2025, at 2:00 PM EST! These bi-weekly sessions are your chance to stay current with the vLLM ecosystem, ask questions, and hear directly from contributors and power users.
This week’s special topic: vLLM-triton-backend – How to get SOTA performance on NVIDIA and AMD with just Triton
We’ll kick off with the latest vLLM project update from Michael Goin, followed by a deep dive with Burkhard Ringlein from IBM on how to achieve state-of-the-art performance across NVIDIA and AMD GPUs using Triton. vLLM—now part of PyTorch—is the industry standard for serving large language models, powering production workloads across NVIDIA GPUs, AMD GPUs, AWS Inferentia, and more. Burkhard will share how IBM’s new Triton backend eliminates the need for hand-written CUDA or HIP kernels, delivering top-tier performance and portability with a single Triton-only code base through advanced autotuning and optimized kernel design.
Want to join the discussion live on Google Meet? Get a calendar invite by filling out this form: https://red.ht/office-hours