Model types and performance bottlenecks

0 views

0 0

Model types and performance bottlenecks

Learn why your powerful new AI model might be running slowly during inference. This video dives into the landscape of modern AI models, including Large Language Models (LLMs), Diffusion Models, Visual Language Models (VLMs), and Mixture of Experts (MoE). We uncover the four common performance bottlenecks—compute, memory capacity, memory bandwidth, and networking—and provide practical strategies for engineers to identify and address these issues, helping you achieve optimal performance for your AI applications.

Chapters:
0:00 – Introduction: Why is my AI model slow?
0:47 – The 4 types of modern AI models
2:03 – The four common bottlenecks
4:12 – Practical strategies for LLMs (Quantization)
4:52 – Practical strategies for Diffusion Models
5:24 – Practical strategies for Mixture of Experts (MoE)
5:53 – Conclusion: A playbook for performance

Resources:
AI Hypercomputer overview → https://goo.gle/3JMXNb2
Introduction to Cloud TPU → https://goo.gle/4nMA0WE

Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech

#GoogleCloud #LLM #VLM #AIModel

Speakers: Duncan Campbell
Products Mentioned: AI Infrastructure

Date: November 13, 2025