![[vLLM Office Hours #41] LLM Compressor Update & Case Study - January 22, 2026](https://i1.ytimg.com/vi/lXub9qlQ1YM/hqdefault.jpg)
Welcome to vLLM office hours! These bi-weekly sessions are your chance to stay current with the vLLM ecosystem, ask questions, and hear directly from contributors and power users.
This week’s special topic: LLM Compressor Update & Quantization Case Study from Cohere
We will start with our regular bi-weekly vLLM update from core committer Michael Goin, then welcome teams from Red Hat AI for a focused update on LLM Compressor 0.9.0, covering new attention and KV cache quantization features, model-free PTQ for FP8, AutoRound, experimental MXFP4 support, and performance improvements like batched calibration and expanded AWQ support. We’ll then hear from Cohere as they share a real-world case study on how they use LLM Compressor for model quantization in production.
Want to join the discussion live on Google Meet? Get a calendar invite by filling out this form: https://red.ht/office-hours











