How to autoscale a TGI deployment on GKE

0 views

0 0

How to autoscale a TGI deployment on GKE

Tutorial: Configure autoscaling for TGI on GKE → https://goo.gle/3Z9a7WK
Learn more about observability on GKE → https://goo.gle/4951bWY
Hugging Face TGI (Text Generation Inference) → https://goo.gle/4hXScLk

Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs. TGI is ready for production with its support for observability and metrics built-in.. Watch along as Googlers Wietse Venema and Abdel Sghiouar demonstrate how to autoscale TGI workloads on Google Kubernetes Engine (GKE) using TGI queue size as the scaling signal.

More resources:
Learn more about the TGI architecture → https://goo.gle/3Oo8mzY
A deep dive into autoscaling LLM workloads on GKE → https://goo.gle/4fKpD2t

Watch more Google Cloud: Building with Hugging Face → https://goo.gle/BuildWithHuggingFace
Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech

#GoogleCloud #HuggingFace

Speakers: Wietse Venema, Abdel Sghiouar
Products Mentioned: Google Kubernetes Engine, Gemma

Date: November 26, 2024