Deploy Gemma 2 with multiple LoRA adapters on GKE

0 views

0 0

Deploy Gemma 2 with multiple LoRA adapters on GKE

Tutorial: Deploy Gemma 2 with multiple LoRA adapters using TGI on GKE → https://goo.gle/4f5KP1C
Video: Train a LoRA adapter with your own dataset → https://goo.gle/4gkBLar
Deep dive: A conceptual overview of Low-Rank Adaptation (LoRA) → https://goo.gle/4in4NrA

Learn how to deploy multiple LoRA adapters in one deployment on Google Kubernetes Engine. Low-Rank Adaptation, or LoRA, is a fine-tuning technique used to adapt a base model to specific tasks without retraining the entire model. Watch along and learn how to deploy Gemma 2, a powerful open large language model, and TGI, an open-source LLM inference server from Hugging Face, to deploy multiple LoRA adapters for different tasks.

More resources:
Docs: Hugging Face Hub Inference client → https://goo.gle/3Zrwo2c
Docs: An overview of the TGI command line interface flags → https://goo.gle/41Fs1nd

Watch more Google Cloud: Building with Hugging Face → https://goo.gle/BuildWithHuggingFace
Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech

#GoogleCloud #HuggingFace

Speakers: Wietse Venema, Abdel Sghiouar
Products Mentioned: Gemma, Gemini, Google Kubernetes Engine (GKE)

Date: December 10, 2024