Building more efficient AI with vLLM ft. Nick Hill | Technically Speaking with Chris Wright

0 views

0 0

Building more efficient AI with vLLM ft. Nick Hill | Technically Speaking with Chris Wright

Massive language models are here, but getting them to run efficiently is a major challenge. In this episode, Red Hat CTO Chris Wright sits down with Nick Hill, Senior Principal Software Engineer at Red Hat, to explore vLLM, an open-source project revolutionizing AI inference. They discuss how innovations born from systems-level thinking are making AI more practical and accessible.

00:00 – The challenge of running massive language models
00:59 – Nick Hill’s journey from IBM Watson to generative AI
03:03 – What is vLLM and why is it different?
05:41 – Optimizing the KV Cache and GPU utilization
07:35 – PagedAttention: Virtual memory for your GPU
09:51 – Speculative decoding and its CPU parallels
11:50 – The future of distributed and heterogeneous hardware in AI
16:38 – How open source and community are accelerating AI innovation

Learn More:

vLLM Project: https://vllm.ai/
Sky Computing Lab at UC Berkeley: https://sky.cs.berkeley.edu/

What is Technically Speaking?
Technically Speaking taps into emerging technology trends with insights from leading experts across the globe and Red Hat CTO Chris Wright. The series blends deep-dive discussions, tech updates, and creative short-form content, solidifying Red Hat’s role as a pioneer in technology innovation and open source thought leadership.

Want to participate? Leave us a comment if there’s a topic or a guest you’d like to see featured.

Watch More Technically Speaking:

YouTube Playlist: https://www.youtube.com/playlist?list=PLbMP1JcGBmSGfI0Rl4s6PpycLF4rZcfW8
Show Page: https://www.redhat.com/en/technically-speaking
Subscribe to Red Hat’s YouTube channel: https://www.youtube.com/redhat/?sub_confirmation=1

#RedHat #vLLM #AIInference #TechnicallySpeaking #OpenSource

Date: July 2, 2025

Related videos