text-generation-inference documentation
Using TGI with Google TPUs
Getting started
Text Generation InferenceQuick TourSupported ModelsUsing TGI with Nvidia GPUsUsing TGI with AMD GPUsUsing TGI with Intel GaudiUsing TGI with AWS Trainium and InferentiaUsing TGI with Google TPUsUsing TGI with Intel GPUsInstallation from sourceMulti-backend supportInternal ArchitectureUsage Statistics
Tutorials
Consuming TGIPreparing Model for ServingServing Private & Gated ModelsUsing TGI CLIDeploying on AWS (EC2 and SageMaker)Non-core Model ServingSafetyUsing Guidance, JSON, toolsVisual Language ModelsMonitoring TGI with Prometheus and GrafanaTrain Medusa
Backends
Reference
Conceptual Guides
Using TGI with Google TPUs
Check out this guide on how to serve models with TGI on TPUs.
Update on GitHub