Uploading Multiple Folders for Remote Job via Azure Machine Learning Python SDK (v2)

Question

I am working with the Python SDK (v2) of Azure Machine Learning.

I want to launch a training script on a serverless compute by using jobs. Typically, I create and launch a job using the Python command shown below.

The problem I’m facing is that my workspace has the source code, the experiment script, and the utility functions organized in separate folders. However, the command function in the SDK allows uploading only a single folder. I prefer not to restructure my entire codebase to fit AzureML's requirements. Is there a way to upload multiple folders?

from azure.ai.ml import command, Input

job = command(
    inputs=dict(...),
    code='path/to/code/folder',
    command='python main.py --train-data ${{inputs.train_data}} --test-data ${{inputs.test_data}}',
    environment="...",
    experiment_name='...'
)

ml_client.create_or_update(job)

Answer

Hello Diego STUCCHI,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that you would like to upload multiple folders for Remote Job via Azure Machine Learning Python SDK (v2).

When working with the Azure Machine Learning Python SDK (v2) to upload multiple folders for a remote job, there are several methods to consider. Each method has its pros and cons, and it’s important to choose the one that best fits your needs.

Methods to Upload Multiple Folders

Using a ZIP File,
Using a Custom Docker Image,
Using AzureML Data Assets,
Using the code_paths Parameter.

Best/Optimal Approach is code_paths Parameter.

The code_paths parameter in the AzureML SDK v2 supports multiple folders directly. This method avoids the hassle of compressing files or building custom containers. This is how you can implement it:

from azure.ai.ml import command, Input
job = command(
    inputs=dict(
        train_data=Input(type="uri_file", path="path/to/train_data"),
        test_data=Input(type="uri_file", path="path/to/test_data"),
    ),
    code_paths=[
        "path/to/source",
        "path/to/experiment",
        "path/to/utils"
    ],
    command='python path/to/experiment/main.py --train-data ${{inputs.train_data}} --test-data ${{inputs.test_data}}',
    environment="azureml:",
    experiment_name='my_experiment'
)
ml_client.create_or_update(job)

Using a ZIP File.

Should there be any issues with the code_paths parameter, another practical solution is to consolidate your directories into a ZIP file. This method works within the current SDK constraints and preserves your codebase structure. This is how you can do it:

Use the following bash command to create a ZIP file containing all your folders:

zip -r code_archive.zip path/to/source path/to/experiment path/to/utils

Modify Your Python Script: Use the ZIP file as the code path in your script:

from azure.ai.ml import command, Input
job = command(
    inputs=dict(
        train_data=Input(type="uri_file", path="path/to/train_data"),
        test_data=Input(type="uri_file", path="path/to/test_data"),
    ),
    code="path/to/code_archive.zip",  # Use the ZIP file as the code path
    command='python path/to/experiment/main.py --train-data ${{inputs.train_data}} --test-data ${{inputs.test_data}}',
    environment="azureml:",
    experiment_name='my_experiment'
)
ml_client.create_or_update(job)

Check more details in the Documentation - https://zcusa.951200.xyz/en-us/azure/machine-learning/how-to-use-azureml-sdk and Azure ML Job Submission Best Practices - https://zcusa.951200.xyz/en-us/azure/machine-learning/how-to-submit-jobs

I hope this is helpful! Do not hesitate to let me know if you have any other questions.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Share via

Uploading Multiple Folders for Remote Job via Azure Machine Learning Python SDK (v2)

1 answer

Your answer