
Tutorial: Configure autoscaling for TGI on GKE → https://goo.gle/3Z9a7WK
Learn more about observability on GKE → https://goo.gle/4951bWY
Hugging Face TGI (Text Generation Inference) → https://goo.gle/4hXScLk
Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs. TGI is ready for production with its support for observability and metrics built-in.. Watch along as Googlers Wietse Venema and Abdel Sghiouar demonstrate how to autoscale TGI workloads on Google Kubernetes Engine (GKE) using TGI queue size as the scaling signal.
More resources:
Learn more about the TGI architecture → https://goo.gle/3Oo8mzY
A deep dive into autoscaling LLM workloads on GKE → https://goo.gle/4fKpD2t
Watch more Google Cloud: Building with Hugging Face → https://goo.gle/BuildWithHuggingFace
Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech
#GoogleCloud #HuggingFace
Speakers: Wietse Venema, Abdel Sghiouar
Products Mentioned: Google Kubernetes Engine, Gemma









![KEY ADVANTAGES – ทำไมธุรกิจจึงต้องใช้ Google Cloud Platform | TANGERINE [Official Video]](https://videos.sebae.net/wp-content/uploads/2022/05/key-advantages-e0b897e0b8b3e0b984e0b8a1e0b898e0b8b8e0b8a3e0b881e0b8b4e0b888e0b888e0b8b6e0b887e0b895e0b989e0b8ade0b887e0b983e0b88a.jpg)

