Skip to content

sayakpaul/deploy-hf-tf-vision-models

Repository files navigation

Deploying Vision Models (TensorFlow) from 🤗 Transformers

By Chansung Park and Sayak Paul

This repository shows various ways of deploying a vision model (TensorFlow) from 🤗 Transformers using the TensorFlow Ecosystem. In particular, we use TensorFlow Serving (for local deployment), Vertex AI (serveless deployment), Kubernetes and GKE (more controlled deployment) with TensorFlow Serving and ONNX.

For this project, we leverage Google Cloud Platform for using managed services Vertex AI and GKE.

Methods covered

  • Local TensorFlow Serving | Blog post from 🤗

    • We cover how to locally deploy a Vision Transformer (ViT) model from 🤗 Transformers with TensorFlow Serving.
    • With this, you will be able to serve your own machine learning models in a standalone Python application.
  • TensorFlow Serving on Kubernetes (GKE) | Blog post from 🤗

    • We cover how to build a custom TensorFlow Serving Docker image with Vision Transformer (ViT) model from 🤗 Transformers, provision Google Kubernetes Engine(GKE) cluster, deploy the Docker image to the GKE cluster.
    • Particularly, we cover Kubernetes specific topics such as creating Deployment/Service/HPA Kubernetes objects for scalable deployment of the Docker image to the nodes(VMs) and expose them as a service to clients.
    • With this, you will be able to serve and scale your own machine learning models according to the CPU utilizations of the deployment as a whole.
    • We provide utilities to perform load-test with Locust and visualization notebook as well. Refer here for more details.
  • ONNX on Kubernetes (GKE)

    • The workflow here is similar to the above one but here we used an ONNX-optimized version of the ViT model.
    • ONNX is particularly useful when you're deploying models using x86 CPUs.
    • This workflow doesn't require you to build any custom TF Serving image.
    • One important thing to keep in mind is to generate the ONNX model in a machine type which is the same as the deployment hardware. This means if you're going to use the n1-standard-8 machine type for deployment, generate the ONNX model in the same machine type to ensure ONNX optimizations are relevant.
  • Vertex AI Prediction | Blog post from 🤗

    • We cover how to deploy Vision Transformer (ViT) model from 🤗 Transformers to Google Cloud's fully managed machine learning deployment service (Vertex AI Prediction).
    • Under the hood, Vertex AI Prediction leverages all the technologies from GKE, TensorFlow Serving, and more.
    • That means you can deploy and scale the deployment of machine learning models, but you don't need to worry about building a custom Docker image or writing Kubernetes-specific manifests, or setting up model monitoring capability.
    • With this, you will be able to serve and scale your own machine learning model by calling various APIs from google-cloud-aiplatform SDK to interact with Vertex AI.
    • We provide utilities to perform load-test with Locust. Refer here for more details.
  • Vertex AI Prediction (w/ optimized TFRT)

    • TBD
    • Know more about the optimized TFRT(TensorFlow RunTime) here.

Acknowledgements

We're thankful to the ML Developer Programs team at Google that provided GCP support.