sebae banner 728x900
sebae banner 300x250

The future of AI: Distributing inference beyond a few GPUs

0 views
0%

The future of AI: Distributing inference beyond a few GPUs

How do you run an AI model with a million-token context? 🕸️ Chris Wright and Nick Hill discuss the future of AI scaling, covering distributed inference, splitting tasks across different hardware, and the challenge of compressing the KV cache for massive models.

Explore the future of enterprise AI in the full Technically Speaking episode, now on YouTube!

#DistributedInference #LLM #AI #vLLM #llm-d #RedHat

Date: July 30, 2025