Add examples/gke/tgi-tpu-deployment/ for TGI on TPU #62
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
Description
This PR adds an example on how to use the recently created TGI container for TPU inference on #57 in Google Kubernetes Engine (GKE) using TPU v5e chips. In this case, the model served is
google/gemma-7b-itwhich is among the supported models withinoptimum-tpu.For more information on
optimum-tpuplease check https://github.com/huggingface/optimum-tpuWhat's missing?
We still need to ping Google Cloud about the recent release of the TPU container as well as waiting for it to be released, and then just update the
CONTAINER_URIaccordingly.