HomeRed HatThe future of AI: Distributing inference beyond a few GPUs

The future of AI: Distributing inference beyond a few GPUs

0 views

0%

0 0

The future of AI: Distributing inference beyond a few GPUs

How do you run an AI model with a million-token context? 🕸️ Chris Wright and Nick Hill discuss the future of AI scaling, covering distributed inference, splitting tasks across different hardware, and the challenge of compressing the KV cache for massive models.

Explore the future of enterprise AI in the full Technically Speaking episode, now on YouTube!

#DistributedInference #LLM #AI #vLLM #llm-d #RedHat

Date: July 30, 2025

Red Hat Context milliontoken model with 🔸

2025 Red Hat Innovation Awards Winners pushing the boundaries of what’s possible

2025 Red Hat Innovation Awards Winners pushing the boundaries of what’s possible

Bridging the gap between MLOps and DevOps with OpenShift AI

Bridging the gap between MLOps and DevOps with OpenShift AI

Red Hat Application Foundations Demo

Red Hat Application Foundations Demo

Ask an OpenShift Expert | Ep 154 | Red Hat Summit ’25 Recap

Ask an OpenShift Expert | Ep 154 | Red Hat Summit ’25 Recap

In the Clouds (E33) | The Journey to CIO

In the Clouds (E33) | The Journey to CIO

GitOps Guide to the Galaxy (Ep.74) | Argocon Recap: First Class OCI Support in Argo Proposal

GitOps Guide to the Galaxy (Ep.74) | Argocon Recap: First Class OCI Support in Argo Proposal

OpenShift on Edge

OpenShift on Edge

Ask an OpenShift Admin | Ep 126 | Updating OpenShift

Ask an OpenShift Admin | Ep 126 | Updating OpenShift