Posted on May 31, 2024

LLM Deployment Pipeline with Azure and Kubeflow

To deploy model espcially LLM based application in Azure can be daunting task manually. We can automate the deployment pipeline with Kubeflow.

I am providing one example of an end-to-end machine learning deployment pipeline using Kubeflow on Azure. This example will cover setting up a Kubeflow pipeline, training a model, and deploying the model.

Prerequisites:

Azure Account: You need an Azure account.
Azure Kubernetes Service (AKS): You need a Kubernetes cluster. You can create an AKS cluster via the Azure portal or CLI.
Kubeflow: You need Kubeflow installed on your AKS cluster. Follow the Kubeflow on Azure documentation to set this up.

Step 1: Setting Up the Environment

First, ensure you have the Azure CLI and kubectl installed and configured.

 # Install Azure CLI curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash # Install kubectl az aks install-cli # Log in to Azure az login # Set the subscription (if you have multiple subscriptions) az account set --subscription "<your-subscription-id>" # Get credentials for your AKS cluster az aks get-credentials --resource-group <resource-group-name> --name <aks-cluster-name>

Step 2: Deploying Kubeflow on AKS

Follow the official Kubeflow deployment guide for Azure AKS:

Deploy Kubeflow on Azure AKS

Step 3: Creating a Kubeflow Pipeline

We'll create a simple pipeline that trains and deploys a machine learning model.

Pipeline Definition

Create a file pipeline.py:

 import kfp from kfp import dsl from kfp.components import create_component_from_func def train_model() -> str: import pandas as pd from sklearn.datasets import load_iris from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split import joblib iris = load_iris() X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2) clf = LogisticRegression() clf.fit(X_train, y_train) accuracy = clf.score(X_test, y_test) print(f"Model accuracy: {accuracy}") model_path = "/model.pkl" joblib.dump(clf, model_path) return model_path train_model_op = create_component_from_func( train_model, base_image='python:3.8-slim' ) @dsl.pipeline( name='Iris Training Pipeline', description='A pipeline to train and deploy an Iris classification model.' ) def iris_pipeline(): train_task = train_model_op() if __name__ == '__main__': kfp.compiler.Compiler().compile(iris_pipeline, 'iris_pipeline.yaml')

Step 4: Deploying the Pipeline

Upload the pipeline to your Kubeflow instance.

 pip install kfp kfp_client = kfp.Client() kfp_client.upload_pipeline(pipeline_package_path='iris_pipeline.yaml', pipeline_name='Iris Training Pipeline')

Step 5: Running the Pipeline

Once the pipeline is uploaded, you can run it via the Kubeflow dashboard or programmatically.

 # Run the pipeline  experiment = kfp_client.create_experiment('Iris Experiment') run = kfp_client.run_pipeline(experiment.id, 'iris_pipeline_run', 'iris_pipeline.yaml')

Step 6: Deploying the Model

Assuming the trained model is saved in a storage bucket, you can create a deployment pipeline to deploy the model to Azure Kubernetes Service (AKS).

Model Deployment Component

Create a file deploy.py:

 from kubernetes import client, config def deploy_model(model_path: str): config.load_kube_config() # Define deployment specs  deployment = client.V1Deployment( metadata=client.V1ObjectMeta(name="iris-model-deployment"), spec=client.V1DeploymentSpec( replicas=1, selector={'matchLabels': {'app': 'iris-model'}}, template=client.V1PodTemplateSpec( metadata=client.V1ObjectMeta(labels={'app': 'iris-model'}), spec=client.V1PodSpec(containers=[client.V1Container( name="iris-model", image="mydockerhub/iris-model:latest", ports=[client.V1ContainerPort(container_port=80)] )]) ) ) ) # Create deployment  apps_v1 = client.AppsV1Api() apps_v1.create_namespaced_deployment(namespace="default", body=deployment) deploy_model_op = create_component_from_func( deploy_model, base_image='python:3.8-slim' ) @dsl.pipeline( name='Iris Deployment Pipeline', description='A pipeline to deploy an Iris classification model.' ) def iris_deploy_pipeline(model_path: str): deploy_task = deploy_model_op(model_path) if __name__ == '__main__': kfp.compiler.Compiler().compile(iris_deploy_pipeline, 'iris_deploy_pipeline.yaml')

Step 7: Running the Deployment Pipeline

Upload and run the deployment pipeline.

 # Upload the deployment pipeline kfp_client.upload_pipeline(pipeline_package_path='iris_deploy_pipeline.yaml', pipeline_name='Iris Deployment Pipeline') # Run the deployment pipeline experiment = kfp_client.create_experiment('Iris Deployment Experiment') run = kfp_client.run_pipeline(experiment.id, 'iris_deploy_pipeline_run', 'iris_deploy_pipeline.yaml', params={'model_path': '<path-to-your-model>'})

Conclusion

This end-to-end example demonstrates setting up a Kubeflow pipeline on Azure, training a model, and deploying it to AKS. Customize the model_path, Docker image, and other specifics as needed for your actual use case.

Deploying a Large Language Model (LLM) involves a few additional steps compared to a general machine learning model. Here’s how you can set up an end-to-end deployment pipeline for an LLM using Kubeflow on Azure, similar to the previous example.

Prerequisites

Ensure you have the necessary tools and environment set up as mentioned in the previous steps, including an Azure account, AKS cluster, and Kubeflow.

Step 1: Setting Up the Environment

Use the same steps as before to install Azure CLI, kubectl, and configure your environment.

Step 2: Deploying Kubeflow on AKS

Follow the official Kubeflow deployment guide for Azure AKS:

Deploy Kubeflow on Azure AKS

Step 3: Creating a Kubeflow Pipeline for LLM

Let's create a pipeline that fine-tunes a Hugging Face LLM and deploys it.

Pipeline Definition

Create a file llm_pipeline.py:

 import kfp from kfp import dsl from kfp.components import create_component_from_func def train_llm() -> str: from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments from datasets import load_dataset import torch # Load dataset  dataset = load_dataset("wikitext", "wikitext-2-raw-v1") # Load model and tokenizer  model_name = "gpt2" model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) def tokenize_function(examples): return tokenizer(examples["text"], padding="max_length", truncation=True) tokenized_datasets = dataset.map(tokenize_function, batched=True) tokenized_datasets = tokenized_datasets.remove_columns(["text"]) tokenized_datasets.set_format("torch") # Define training arguments  training_args = TrainingArguments( output_dir="./results", evaluation_strategy="epoch", learning_rate=2e-5, per_device_train_batch_size=8, per_device_eval_batch_size=8, num_train_epochs=3, weight_decay=0.01, ) # Create Trainer  trainer = Trainer( model=model, args=training_args, train_dataset=tokenized_datasets["train"], eval_dataset=tokenized_datasets["validation"], ) # Train model  trainer.train() # Save model  model_path = "/model" model.save_pretrained(model_path) tokenizer.save_pretrained(model_path) return model_path train_llm_op = create_component_from_func( train_llm, base_image='python:3.8-slim' ) @dsl.pipeline( name='LLM Training Pipeline', description='A pipeline to train and deploy a Large Language Model.' ) def llm_pipeline(): train_task = train_llm_op() if __name__ == '__main__': kfp.compiler.Compiler().compile(llm_pipeline, 'llm_pipeline.yaml')

Step 4: Deploying the Pipeline

Upload the pipeline to your Kubeflow instance.

 pip install kfp kfp_client = kfp.Client() kfp_client.upload_pipeline(pipeline_package_path='llm_pipeline.yaml', pipeline_name='LLM Training Pipeline')

Step 5: Running the Pipeline

Once the pipeline is uploaded, run it via the Kubeflow dashboard or programmatically.

 # Run the pipeline  experiment = kfp_client.create_experiment('LLM Experiment') run = kfp_client.run_pipeline(experiment.id, 'llm_pipeline_run', 'llm_pipeline.yaml')

Step 6: Deploying the Model

Create a deployment pipeline to deploy the LLM to Azure Kubernetes Service (AKS).

Model Deployment Component

Create a file deploy_llm.py:

 from kubernetes import client, config def deploy_llm(model_path: str): config.load_kube_config() # Define deployment specs  deployment = client.V1Deployment( metadata=client.V1ObjectMeta(name="llm-deployment"), spec=client.V1DeploymentSpec( replicas=1, selector={'matchLabels': {'app': 'llm'}}, template=client.V1PodTemplateSpec( metadata=client.V1ObjectMeta(labels={'app': 'llm'}), spec=client.V1PodSpec(containers=[client.V1Container( name="llm", image="mydockerhub/llm:latest", ports=[client.V1ContainerPort(container_port=80)], volume_mounts=[client.V1VolumeMount(mount_path="/model", name="model-volume")] )], volumes=[client.V1Volume( name="model-volume", persistent_volume_claim=client.V1PersistentVolumeClaimVolumeSource(claim_name="model-pvc") )]) ) ) ) # Create deployment  apps_v1 = client.AppsV1Api() apps_v1.create_namespaced_deployment(namespace="default", body=deployment) deploy_llm_op = create_component_from_func( deploy_llm, base_image='python:3.8-slim' ) @dsl.pipeline( name='LLM Deployment Pipeline', description='A pipeline to deploy a Large Language Model.' ) def llm_deploy_pipeline(model_path: str): deploy_task = deploy_llm_op(model_path) if __name__ == '__main__': kfp.compiler.Compiler().compile(llm_deploy_pipeline, 'llm_deploy_pipeline.yaml')

Step 7: Running the Deployment Pipeline

Upload and run the deployment pipeline.

 # Upload the deployment pipeline kfp_client.upload_pipeline(pipeline_package_path='llm_deploy_pipeline.yaml', pipeline_name='LLM Deployment Pipeline') # Run the deployment pipeline experiment = kfp_client.create_experiment('LLM Deployment Experiment') run = kfp_client.run_pipeline(experiment.id, 'llm_deploy_pipeline_run', 'llm_deploy_pipeline.yaml', params={'model_path': '<path-to-your-model>'})

Conclusion

This example demonstrates how to create a Kubeflow pipeline for training and deploying a Large Language Model (LLM) on Azure Kubernetes Service (AKS). Adjust the model_path, Docker image, and other specifics as needed for your actual use case. The steps involve setting up the pipeline, running the training, and deploying the trained model, all within the Kubeflow framework.

To deploy containerized LLMs with Kubeflow on Azure, you'll need to follow these steps:

Containerize Your LLM: Create a Docker image of your LLM application.
Push the Docker Image to a Container Registry: Push the Docker image to Azure Container Registry (ACR) or Docker Hub.
Create a Kubeflow Pipeline for Deployment: Define a Kubeflow pipeline to deploy your LLM application using the Docker image.
Run the Deployment Pipeline: Execute the pipeline to deploy your LLM application on AKS.

Step 1: Containerize Your LLM

Create a Dockerfile for your LLM application.

Example Dockerfile

 # Use an official Python runtime as a parent image FROM python:3.11-slim # Set the working directory in the container WORKDIR /app # Copy the current directory contents into the container at /app COPY . /app # Install any needed packages specified in requirements.txt RUN pip install --no-cache-dir -r requirements.txt # Make port 80 available to the world outside this container EXPOSE 80 # Define environment variable ENV NAME World # Run app.py when the container launches CMD ["python", "app.py"]

Example app.py

 from flask import Flask, request, jsonify from transformers import AutoModelForCausalLM, AutoTokenizer app = Flask(__name__) model_name = "gpt2" model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) @app.route('/predict', methods=['POST']) def predict(): data = request.json inputs = tokenizer.encode(data['text'], return_tensors='pt') outputs = model.generate(inputs) response = tokenizer.decode(outputs[0], skip_special_tokens=True) return jsonify({'response': response}) if __name__ == '__main__': app.run(host='0.0.0.0', port=80)

Build and Push Docker Image

 # Build the Docker image docker build -t mydockerhub/llm:latest . # Push the Docker image to Docker Hub or ACR docker push mydockerhub/llm:latest

Step 2: Push Docker Image to Azure Container Registry

If you prefer to use ACR:

 # Log in to Azure az login # Create an ACR if you don't have one az acr create --resource-group <your-resource-group> --name <your-registry-name> --sku Basic # Log in to the ACR az acr login --name <your-registry-name> # Tag the Docker image with the ACR login server name docker tag mydockerhub/llm:latest <your-registry-name>.azurecr.io/llm:latest # Push the Docker image to ACR docker push <your-registry-name>.azurecr.io/llm:latest

Step 3: Create a Kubeflow Pipeline for Deployment

Create a deployment pipeline to deploy the containerized LLM.

Deployment Component

Create a file deploy_llm.py:

 from kubernetes import client, config from kfp.components import create_component_from_func from kfp import dsl def deploy_llm(image: str): config.load_kube_config() deployment = client.V1Deployment( metadata=client.V1ObjectMeta(name="llm-deployment"), spec=client.V1DeploymentSpec( replicas=1, selector={'matchLabels': {'app': 'llm'}}, template=client.V1PodTemplateSpec( metadata=client.V1ObjectMeta(labels={'app': 'llm'}), spec=client.V1PodSpec(containers=[client.V1Container( name="llm", image=image, ports=[client.V1ContainerPort(container_port=80)] )]) ) ) ) service = client.V1Service( metadata=client.V1ObjectMeta(name="llm-service"), spec=client.V1ServiceSpec( selector={'app': 'llm'}, ports=[client.V1ServicePort(protocol="TCP", port=80, target_port=80)] ) ) apps_v1 = client.AppsV1Api() core_v1 = client.CoreV1Api() apps_v1.create_namespaced_deployment(namespace="default", body=deployment) core_v1.create_namespaced_service(namespace="default", body=service) deploy_llm_op = create_component_from_func( deploy_llm, base_image='python:3.8-slim' ) @dsl.pipeline( name='LLM Deployment Pipeline', description='A pipeline to deploy a containerized LLM.' ) def llm_deploy_pipeline(image: str): deploy_task = deploy_llm_op(image=image) if __name__ == '__main__': kfp.compiler.Compiler().compile(llm_deploy_pipeline, 'llm_deploy_pipeline.yaml')

Step 4: Run the Deployment Pipeline

Upload and run the deployment pipeline.

 # Upload the deployment pipeline kfp_client = kfp.Client() kfp_client.upload_pipeline(pipeline_package_path='llm_deploy_pipeline.yaml', pipeline_name='LLM Deployment Pipeline') # Run the deployment pipeline experiment = kfp_client.create_experiment('LLM Deployment Experiment') run = kfp_client.run_pipeline( experiment.id, 'llm_deploy_pipeline_run', 'llm_deploy_pipeline.yaml', params={'image': '<your-registry-name>.azurecr.io/llm:latest'} )

Conclusion

By following these steps, you can deploy a containerized LLM using Kubeflow on Azure. This process involves containerizing your LLM application, pushing the Docker image to a container registry, creating a deployment pipeline in Kubeflow, and running the pipeline to deploy your LLM application on Azure Kubernetes Service (AKS). Adjust the specifics as needed for your actual use case.

You can get more help here. Also you can get many Machine Learning and LLM notebooks including few for Kubeflow here.

DEV Community

LLM Deployment Pipeline with Azure and Kubeflow

Top comments (0)