Optimize LLMs for inference with LLM Compressor

0 views

0 0

Optimize LLMs for inference with LLM Compressor

Exponential growth in LLM parameters comes with serious deployment and infrastructure challenges. Principal Software Engineer Dipika Sikka and Machine Learning Engineer Kyle Sayers break down how LLM Compressor, an open source framework, can help streamline model deployment for higher throughput and lower latency.

00:00 Introduction
00:14 The Artificial Intelligence (AI) Scaling Challenge
03:28 Why Optimize LLMs?
07:49 Introducing LLM Compressor
08:29 Using LLM Compressor to Optimize Models
13:52 Available Algorithms
20:34 Model Inference with vLLM
22:22 Supported Workflows with Examples
26:16 LLM Compressor User Summary
26:49 Where to Get Started
27:07 Roadmap and Conclusion

🔗Check out the project: https://github.com/vllm-project/llm-compressor

🔗 Ready to put your optimized models into production? See how Red Hat’s AI portfolio helps you deploy and accelerate inference at scale. https://www.redhat.com/en/products/ai

#RedHat #AI #vLLM #LLMCompressor

Date: November 21, 2025

Related videos