Online Endpoint
Overview
Online endpoints are used for online (real-time) inferencing. They deploy models behind a web server that can return predictions under the HTTP protocol.
You may use them when:
- you have low-latency requirements
- your model can answer the request in a relatively short amount of time
- your model's inputs fit on the HTTP payload of the request
- you need to scale up in term of number of request
We recommend you use Kubernetes online endpoints.
Model deployment example
Description
Submit pipelines, deploys on AI-Platforms
The emphasis of the notebook is on the deployment of the trained model to an online endpoint.
The notebook includes:
- A 3-stage training pipeline (train → analyze → score)
- Model registration from the pipeline output
- Creates a Kubernetes Online Endpoint
- Deploys the model using
KubernetesOnlineDeployment - Configures compute and resource constraints
- Sets traffic to the deployment
- Includes a curl-based test for online inference
Instructions
-
Go to the repository folder with the notebook example and associated files and folders:
-
Copy the
train-on-ai-platform-aks.ipynbto your development environment in AI platform. -
Before running the notebook to deploy the pipeline, ensure your environment is properly set up and the few configuration values (like workspace and compute names) are filled in.
-
Follow the instructions in the notebook and run the code cells.