Online Endpoint

Overview

Online endpoints are used for online (real-time) inferencing. They deploy models behind a web server that can return predictions under the HTTP protocol.

You may use them when:

you have low-latency requirements
your model can answer the request in a relatively short amount of time
your model's inputs fit on the HTTP payload of the request
you need to scale up in term of number of request

note

We recommend you use Kubernetes online endpoints.

Model deployment example

Submit pipelines, deploys on AI-Platforms

The emphasis of the notebook is on the deployment of the trained model to an online endpoint.

The notebook includes:

A 3-stage training pipeline (train → analyze → score)
Model registration from the pipeline output
Creates a Kubernetes Online Endpoint
Deploys the model using KubernetesOnlineDeployment
Configures compute and resource constraints
Sets traffic to the deployment
Includes a curl-based test for online inference

Go to the repository folder with the notebook example and associated files and folders:
- equinor/ai-platform-aml-usecases/tree/main/azureml-tutorials/python/online-deployment-3-stage-pipeline-example
Copy the train-on-ai-platform-aks.ipynb to your development environment in AI platform.
Before running the notebook to deploy the pipeline, ensure your environment is properly set up and the few configuration values (like workspace and compute names) are filled in.
Follow the instructions in the notebook and run the code cells.

Overview​

Model deployment example​

Overview

Model deployment example