I am creating custom azureml pipeline using python sdk this is my code
from azureml.core import Environment, Workspace, ComputeTarget, Experiment
from azureml.pipeline.core import Pipeline, StepSequence
from azureml.pipeline.steps import PythonScriptStep
from azureml.core.runconfig import RunConfiguration
env = Environment("stage-env")
env.docker.base_image = "<acr_name>.azurecr.io/sw-predc-stage-env:0.2"
env.python.user_managed_dependencies = True
env.docker.base_image_registry.address = "<acr_name>.azurecr.io"
env.docker.base_image_registry.username = "<registry_name>"
env.docker.base_image_registry.password = "<Password>"
# Set up the workspace
ws = Workspace.from_config()
# Define the compute target
compute_target = ComputeTarget(workspace=ws, name="gpu-compute1")
# Set up RunConfiguration with the new environment
run_config = RunConfiguration()
run_config.environment = env
run_config.target = compute_target
# Define the PythonScriptStep for Apprhs Prophet Model Training
cayuga_prophet_step = PythonScriptStep(
name="Cayuga Prophet Model Training",
script_name="Cayuga_prophet_Model.py",
compute_target=compute_target,
runconfig=run_config,
source_directory="/mnt/batch/tasks/shared/LS_root/mounts/clusters/gpu-compute1/",
allow_reuse=True
)
# Define the PythonScriptStep for Apprhs RandomForest Model Training
cayuga_rforest_step = PythonScriptStep(
name="Cayuga RForest Model Training",
script_name="Cayuga_randomforest_Model.py",
compute_target=compute_target,
runconfig=run_config,
source_directory="/mnt/batch/tasks/shared/LS_root/mounts/clusters/gpu-compute1/",
allow_reuse=True
)
# Define the PythonScriptStep for Apprhs RandomForest Model Training
cayuga_deepar_step = PythonScriptStep(
name="Cayuga DeepAr Model Training",
script_name="Cayuga_DeepAR_Model.py",
compute_target=compute_target,
runconfig=run_config,
source_directory="/mnt/batch/tasks/shared/LS_root/mounts/clusters/gpu-compute1/",
allow_reuse=True
)
# Define the PythonScriptStep for Model Selection
cayuga_by_hospital_step = PythonScriptStep(
name="Cayuga By Hospital",
script_name="Cayuga_By_Hospital.py",
compute_target=compute_target,
runconfig=run_config,
source_directory="/mnt/batch/tasks/shared/LS_root/mounts/clusters/gpu-compute1/",
allow_reuse=True
)
# Define the PythonScriptStep for Model Selection
cayuga_model_selection_step = PythonScriptStep(
name="Model Selection and Moving Inference Files",
script_name="model_selection_accuracy_comparision.py",
compute_target=compute_target,
runconfig=run_config,
source_directory="/mnt/batch/tasks/shared/LS_root/mounts/clusters/gpu-compute1/",
allow_reuse=True
)
# Make the steps run sequentially
step_sequence = StepSequence(steps=[cayuga_prophet_step, cayuga_rforest_step, cayuga_deepar_step, cayuga_by_hospital_step, cayuga_model_selection_step])
# Define the pipeline
pipeline = Pipeline(workspace=ws, steps=step_sequence)
# Submit the pipeline
experiment = Experiment(workspace=ws, name="Cayuga-Models-Training-and-Selection-Pipeline")
pipeline_run = experiment.submit(pipeline, tags={"pipeline_name": "Cayuga Unified Model Training Pipeline"})
pipeline_run.wait_for_completion(show_output=True)
My Dockerfile looks like below
# Base image with Python 3.10 and pre-installed Miniconda
FROM mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.1-cudnn8-ubuntu20.04
# Set up working directory
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
curl \
git \
&& rm -rf /var/lib/apt/lists/*
# Add conda to PATH (optional if not already in PATH)
ENV PATH /opt/miniconda/bin:$PATH
# Initialize conda
RUN conda init bash
# Copy the environment.yml file into the container
COPY pdc_dev_env.yml /app/environment.yml
# Create the conda environment based on environment.yml
RUN conda env create -f /app/environment.yml
# Set the default shell to bash to allow 'conda activate' to work
SHELL ["/bin/bash", "--login", "-c"]
# Activate the environment by default
RUN echo "conda activate pdc_dev_env" >> ~/.bashrc
# Ensure the conda environment stays activated
ENV PATH /opt/miniconda/envs/pdc_dev_env/bin:$PATH
# Copy the project files into the working directory
COPY . /app
# Set the entry point (replace 'your_script.py' with the actual script)
ENTRYPOINT ["conda", "run", "-n", "pdc_dev_env"]
and my yaml file
name: pdc_dev_env
channels:
- conda-forge
dependencies:
- python=3.10
- numpy
- pip
- scikit-learn
- scipy
- pandas
- pip:
- azureml-core
- plotly
- kaleido
- azure-ai-ml
- azureml
- inference-schema[numpy-support]==1.3.0
- mlflow==2.8.0
- mlflow-skinny==2.8.0
- azureml-mlflow==1.51.0
- psutil>=5.8,<5.9
- tqdm>=4.60
- ipykernel~=6.0
- matplotlib
- prophet
- azure-storage-blob
- darts==0.30.0
so i tried with CPU compute_instance with Dockerfile as
# Base image with Python 3.10 and pre-installed Miniconda
FROM mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest
# Set up working directory
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
curl \
git \
&& rm -rf /var/lib/apt/lists/*
# Add conda to PATH (optional if not already in PATH)
ENV PATH /opt/miniconda/bin:$PATH
# Initialize conda
RUN conda init bash
# Copy the environment.yml file into the container
COPY pdc_dev_env.yml /app/environment.yml
# Create the conda environment based on environment.yml
RUN conda env create -f /app/environment.yml
# Set the default shell to bash to allow 'conda activate' to work
SHELL ["/bin/bash", "--login", "-c"]
# Activate the environment by default
RUN echo "conda activate pdc_dev_env" >> ~/.bashrc
# Ensure the conda environment stays activated
ENV PATH /opt/miniconda/envs/pdc_dev_env/bin:$PATH
# Copy the project files into the working directory
COPY . /app
# Set the entry point (replace 'your_script.py' with the actual script)
ENTRYPOINT ["conda", "run", "-n", "pdc_dev_env"]
with the same YAML file and the same pipeline code, it ran successfully but when I used gpu and gpu based Dockerfile and the image and pipeline failed. I am getting the below error
Failed to execute command group with error API queried with a bad parameter: {"message":"unknown or invalid runtime name: nvidia"}