Can you use local file input for training a custom model?

Marques Chacon 40 Reputation points
2025-01-08T23:11:19.29+00:00

I know that you can use local files for RUNNING the read, layout, and custom models. However, in terms of training a model, I am not sure if you are able to use local files or if they have to be in Blob Storage containers.

When using Document Intelligence Studio, it requires blob storage input. I would prefer temporarily uploading files from an on-premise location as opposed to continuous storage of training data. Is there a solution for this?

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,846 questions
{count} votes

Accepted answer
  1. Vinodh247 27,016 Reputation points MVP
    2025-01-09T01:00:10.4033333+00:00

    In the current Azure Document Intelligence workflow, you cannot train custom models using purely local files. Unlike the “test/try” flow (which supports uploading local files on the fly for inference), the training process requires an Azure Blob Storage location. When using Document Intelligence Studio, that’s why you see the requirement to specify a Blob container containing your training files.

    What if you don’t want to keep files in Blob Storage long-term?

    A common workaround is to use a temporary container in an Azure Storage account. You can upload your on-prem training files to this container just before training, and then remove them afterward. Here’s a suggested approach:

    1. Create a temporary container in Azure Storage.
    2. Upload your training files from your on-premises location to the container.
    3. Generate a SAS URL for that container, granting read access to the Document Intelligence service.
    4. Point Document Intelligence Studio (or the API/SDK) at that container and its SAS URL for training.
    5. Delete or clean up the container/files once training is finished, if you don’t need to keep them online.

    By doing this, you keep your training documents “continuous” in Blob Storage only for as long as you need them, and you avoid persisting files in the cloud any longer than necessary. Unfortunately, there isn’t a direct local-file-based training mechanism right now. Blob Storage is still required for the service to access your training documents.

    Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.