HomeRed Hat[vLLM Office Hours #41] LLM Compressor Update & Case Study – January 22, 2026

[vLLM Office Hours #41] LLM Compressor Update & Case Study – January 22, 2026

0 views

0 0

[vLLM Office Hours #41] LLM Compressor Update & Case Study - January 22, 2026

Welcome to vLLM office hours! These bi-weekly sessions are your chance to stay current with the vLLM ecosystem, ask questions, and hear directly from contributors and power users.

This week’s special topic: LLM Compressor Update & Quantization Case Study from Cohere

We will start with our regular bi-weekly vLLM update from core committer Michael Goin, then welcome teams from Red Hat AI for a focused update on LLM Compressor 0.9.0, covering new attention and KV cache quantization features, model-free PTQ for FP8, AutoRound, experimental MXFP4 support, and performance improvements like batched calibration and expanded AWQ support. We’ll then hear from Cohere as they share a real-world case study on how they use LLM Compressor for model quantization in production.

Want to join the discussion live on Google Meet? Get a calendar invite by filling out this form: https://red.ht/office-hours

Date: January 22, 2026