Uploading Multiple Folders for Remote Job via Azure Machine Learning Python SDK (v2)

Diego STUCCHI 0 Reputation points
2024-11-14T14:50:40.34+00:00

I am working with the Python SDK (v2) of Azure Machine Learning.

I want to launch a training script on a serverless compute by using jobs. Typically, I create and launch a job using the Python command shown below.

The problem I’m facing is that my workspace has the source code, the experiment script, and the utility functions organized in separate folders. However, the command function in the SDK allows uploading only a single folder. I prefer not to restructure my entire codebase to fit AzureML's requirements. Is there a way to upload multiple folders?

from azure.ai.ml import command, Input

job = command(
    inputs=dict(...),
    code='path/to/code/folder',
    command='python main.py --train-data ${{inputs.train_data}} --test-data ${{inputs.test_data}}',
    environment="...",
    experiment_name='...'
)

ml_client.create_or_update(job)
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,976 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Sina Salam 12,491 Reputation points
    2024-11-14T20:54:22.88+00:00

    Hello Diego STUCCHI,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you would like to upload multiple folders for Remote Job via Azure Machine Learning Python SDK (v2).

    When working with the Azure Machine Learning Python SDK (v2) to upload multiple folders for a remote job, there are several methods to consider. Each method has its pros and cons, and it’s important to choose the one that best fits your needs.

    Methods to Upload Multiple Folders

    • Using a ZIP File,
    • Using a Custom Docker Image,
    • Using AzureML Data Assets,
    • Using the code_paths Parameter.

    Best/Optimal Approach is code_paths Parameter.

    The code_paths parameter in the AzureML SDK v2 supports multiple folders directly. This method avoids the hassle of compressing files or building custom containers. This is how you can implement it:

    from azure.ai.ml import command, Input
    job = command(
        inputs=dict(
            train_data=Input(type="uri_file", path="path/to/train_data"),
            test_data=Input(type="uri_file", path="path/to/test_data"),
        ),
        code_paths=[
            "path/to/source",
            "path/to/experiment",
            "path/to/utils"
        ],
        command='python path/to/experiment/main.py --train-data ${{inputs.train_data}} --test-data ${{inputs.test_data}}',
        environment="azureml:<your-environment-name>",
        experiment_name='my_experiment'
    )
    ml_client.create_or_update(job)
    

    Using a ZIP File.

    Should there be any issues with the code_paths parameter, another practical solution is to consolidate your directories into a ZIP file. This method works within the current SDK constraints and preserves your codebase structure. This is how you can do it:

    Use the following bash command to create a ZIP file containing all your folders:

    zip -r code_archive.zip path/to/source path/to/experiment path/to/utils

    Modify Your Python Script: Use the ZIP file as the code path in your script:

    from azure.ai.ml import command, Input
    job = command(
        inputs=dict(
            train_data=Input(type="uri_file", path="path/to/train_data"),
            test_data=Input(type="uri_file", path="path/to/test_data"),
        ),
        code="path/to/code_archive.zip",  # Use the ZIP file as the code path
        command='python path/to/experiment/main.py --train-data ${{inputs.train_data}} --test-data ${{inputs.test_data}}',
        environment="azureml:<your-environment-name>",
        experiment_name='my_experiment'
    )
    ml_client.create_or_update(job)
    

    Check more details in the Documentation - https://zcusa.951200.xyz/en-us/azure/machine-learning/how-to-use-azureml-sdk and Azure ML Job Submission Best Practices - https://zcusa.951200.xyz/en-us/azure/machine-learning/how-to-submit-jobs

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.