Skip to main content

Online Endpoint

Overview

Online endpoints are used for online (real-time) inferencing. They deploy models behind a web server that can return predictions under the HTTP protocol.

You may use them when:

  • you have low-latency requirements
  • your model can answer the request in a relatively short amount of time
  • your model's inputs fit on the HTTP payload of the request
  • you need to scale up in term of number of request
note

We recommend you use Kubernetes online endpoints.

Model deployment example

Description

Submit pipelines, deploys on AI-Platforms

The emphasis of the notebook is on the deployment of the trained model to an online endpoint.

The notebook includes:

  • A 3-stage training pipeline (train → analyze → score)
  • Model registration from the pipeline output
  • Creates a Kubernetes Online Endpoint
  • Deploys the model using KubernetesOnlineDeployment
  • Configures compute and resource constraints
  • Sets traffic to the deployment
  • Includes a curl-based test for online inference

Instructions

  1. Go to the repository folder with the notebook example and associated files and folders:

  2. Copy the train-on-ai-platform-aks.ipynb to your development environment in AI platform.

  3. Before running the notebook to deploy the pipeline, ensure your environment is properly set up and the few configuration values (like workspace and compute names) are filled in.

  4. Follow the instructions in the notebook and run the code cells.