Skip to main content

CV MLOps Architecture

Development architecture

The development stage architecture is set up to:

  1. Start in a dev notebook container → run experiments using Jupyter + CV tools
  2. Tune models with Ray Tune → log results to MLflow
  3. Deploy and serve model → deploy with KServe for real-time REST inference

Development architecture Figure 1: Development stage architecture diagram

Toggle for description of the architecture diagram (Figure 1).

Dev Notebook Container
MLflow Tracking Server
Model Serving

Foundational libraries

  • Python, TensorFlow, OpenCV, scikit-image: core CV and ML libraries

Interactive development

  • Jupyter, Conda: for developing, running, and managing environments in notebooks

Orchestration & resource management

  • Kubeflow: manages pipelines, execution, and resource orchestration

Model training & data quality

  • Three Lines of Code (3LC): simplifies model training
  • Label Studio: labeling and data versioning

Hyperparameter Tuning

  • Uses Ray Tune, a scalable hyperparameter tuning library
  • Automatically varies parameters and retrains models to find optimal settings

Logs and tracks

  • Parameters (e.g., learning rate, batch size)
  • Metrics (e.g., accuracy, F1 score)
  • Artifacts (e.g., trained model files, plots)
  • Model metadata and versions

Model Card

  • Summarizes model details, training dataset, and evaluation metrics (like Mean Precision, Recall, F1 Score)
  • Helps maintain model documentation and reproducibility

KServe

  • A standard model inference platform on Kubernetes, built for scalable, production-grade ML serving
  • Supports real-time inference via a REST API with a standardized protocol across ML frameworks
  • Enables modern serverless inference workloads with autoscaling (e.g., Scale to Zero, GPU-based)
  • Provides advanced deployment strategies such as canary rollouts, ensembles, and model transformers
  • Supports pre/post-processing, monitoring, and explainability
  • Integrates with ModelMesh for intelligent routing and high-density model serving

Deployment architecture

Data and metric flow are set up as follows:

  1. Input data flows in from the K8s Persistent Volume
  2. Data travels through each pipeline stage
  3. Results and logs are captured by MLflow
  4. Deployment can then trigger serving or monitoring

Deployment architecture Figure 2: Deployment stage architecture diagram

Toggle for description of the architecture diagram (Figure 2).

AI Platform AKS Cluster
Kubeflow Pipeline
MLflow Integration

  • The architecture is deployed on an AI Platform Azure Kubernetes Service (AKS) cluster.
  • Kubernetes (K8S) Persistent Data Volume stores data used in Kubeflow pipeline stages.

Core workflow composed of three main stages:

  1. Pre-processing
  2. Processing
  3. Post-processing
  • MLflow logs results from each of the Kubeflow pipeline stages.
  • MLflow client used to request logged run data