Unable to deploy fine-tuned Phi 3.5 MoE model on NC24ads_A100_v4 despite successful training on NC96ads_A100_v4

Sebastian Buzdugan 40 Reputation points
2024-12-18T20:06:30.6566667+00:00

Hello,

I'm trying to deploy a fine-tuned Phi 3.5 MoE model as a real-time endpoint in Azure ML. I've encountered an interesting situation:

Training Environment:

  • Successfully fine-tuned on NC96ads_A100_v4 (96-core machine)
  • Using DeepSpeed with tensor parallelism
  • Model registered in MLflow format

Deployment Attempt:

  • Trying to deploy on NC24ads_A100_v4 (24-core machine)
  • Using 3 instances for tensor parallelism
  • Getting error: "azureml-inference-server-http is missing" despite having it in conda.yaml

I've followed the solution from this similar issue I have created, with the errors on the 24-core machines endpoint: https://zcusa.951200.xyz/en-us/answers/questions/2131992/deployment-on-azure-ml-with-tensor-parallelism-fai

However, I'm still encountering deployment failures when deploying with the NC24ads. If I deploy with NC96ads it works perfectly but is very costly. Is this because of the Phi models that can be trained only on the NC96?

Questions:

  1. Is it possible to deploy a model fine-tuned on NC96ads_A100_v4 to a smaller NC24ads_A100_v4 instance?
  2. Are there specific configurations needed when deploying to a different GPU size than training?
  3. Are there more cost-effective options for deploying this model while maintaining functionality?

Any guidance on making this work with the smaller instance would be greatly appreciated, as the NC96 is significantly more expensive for deployment.

Thanks!

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,043 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.