Unable to deploy fine-tuned Phi 3.5 MoE model on NC24ads_A100_v4 despite successful training on NC96ads_A100_v4

Sebastian Buzdugan 40

Hello,

I'm trying to deploy a fine-tuned Phi 3.5 MoE model as a real-time endpoint in Azure ML. I've encountered an interesting situation:

Training Environment:

Successfully fine-tuned on NC96ads_A100_v4 (96-core machine)
Using DeepSpeed with tensor parallelism
Model registered in MLflow format

Deployment Attempt:

Trying to deploy on NC24ads_A100_v4 (24-core machine)
Using 3 instances for tensor parallelism
Getting error: "azureml-inference-server-http is missing" despite having it in conda.yaml

I've followed the solution from this similar issue I have created, with the errors on the 24-core machines endpoint: https://zcusa.951200.xyz/en-us/answers/questions/2131992/deployment-on-azure-ml-with-tensor-parallelism-fai

However, I'm still encountering deployment failures when deploying with the NC24ads. If I deploy with NC96ads it works perfectly but is very costly. Is this because of the Phi models that can be trained only on the NC96?

Questions:

Is it possible to deploy a model fine-tuned on NC96ads_A100_v4 to a smaller NC24ads_A100_v4 instance?
Are there specific configurations needed when deploying to a different GPU size than training?
Are there more cost-effective options for deploying this model while maintaining functionality?

Any guidance on making this work with the smaller instance would be greatly appreciated, as the NC96 is significantly more expensive for deployment.

Thanks!

VasaviLankipalle-MSFT 18,296 Reputation points

2024-12-18T23:14:53.14+00:00

Hello @Sebastian Buzdugan , Thanks for using Microsoft Q&A Platform.I would suggest raising the support ticket in the Azure portal for deeper investigation on this issue.

Thank you again for your time and patience throughout this issue.

Share via

Unable to deploy fine-tuned Phi 3.5 MoE model on NC24ads_A100_v4 despite successful training on NC96ads_A100_v4

Your answer