Microsoft.MachineLearningServices workspaces/onlineEndpoints/deployments 2024-04-01-preview

Bicep resource definition

The workspaces/onlineEndpoints/deployments resource type can be deployed with operations that target:

For a list of changed properties in each API version, see change log.

Resource format

To create a Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments resource, add the following Bicep to your template.

resource symbolicname 'Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments@2024-04-01-preview' = {
  parent: resourceSymbolicName
  identity: {
    type: 'string'
    userAssignedIdentities: {
      {customized property}: {}
    }
  }
  kind: 'string'
  location: 'string'
  name: 'string'
  properties: {
    appInsightsEnabled: bool
    codeConfiguration: {
      codeId: 'string'
      scoringScript: 'string'
    }
    dataCollector: {
      collections: {
        {customized property}: {
          clientId: 'string'
          dataCollectionMode: 'string'
          dataId: 'string'
          samplingRate: int
        }
      }
      requestLogging: {
        captureHeaders: [
          'string'
        ]
      }
      rollingRate: 'string'
    }
    description: 'string'
    egressPublicNetworkAccess: 'string'
    environmentId: 'string'
    environmentVariables: {
      {customized property}: 'string'
    }
    instanceType: 'string'
    livenessProbe: {
      failureThreshold: int
      initialDelay: 'string'
      period: 'string'
      successThreshold: int
      timeout: 'string'
    }
    model: 'string'
    modelMountPath: 'string'
    properties: {
      {customized property}: 'string'
    }
    readinessProbe: {
      failureThreshold: int
      initialDelay: 'string'
      period: 'string'
      successThreshold: int
      timeout: 'string'
    }
    requestSettings: {
      maxConcurrentRequestsPerInstance: int
      maxQueueWait: 'string'
      requestTimeout: 'string'
    }
    scaleSettings: {
      scaleType: 'string'
      // For remaining properties, see OnlineScaleSettings objects
    }
    endpointComputeType: 'string'
    // For remaining properties, see OnlineDeploymentProperties objects
  }
  sku: {
    capacity: int
    family: 'string'
    name: 'string'
    size: 'string'
    tier: 'string'
  }
  tags: {
    {customized property}: 'string'
  }
}

OnlineScaleSettings objects

Set the scaleType property to specify the type of object.

For Default, use:

{
  scaleType: 'Default'
}

For TargetUtilization, use:

{
  maxInstances: int
  minInstances: int
  pollingInterval: 'string'
  scaleType: 'TargetUtilization'
  targetUtilizationPercentage: int
}

OnlineDeploymentProperties objects

Set the endpointComputeType property to specify the type of object.

For Kubernetes, use:

{
  containerResourceRequirements: {
    containerResourceLimits: {
      cpu: 'string'
      gpu: 'string'
      memory: 'string'
    }
    containerResourceRequests: {
      cpu: 'string'
      gpu: 'string'
      memory: 'string'
    }
  }
  endpointComputeType: 'Kubernetes'
}

For Managed, use:

{
  endpointComputeType: 'Managed'
}

Property values

CodeConfiguration

Name Description Value
codeId ARM resource ID of the code asset. string
scoringScript [Required] The script to execute on startup. eg. "score.py" string

Constraints:
Min length = 1
Pattern = [a-zA-Z0-9_] (required)

Collection

Name Description Value
clientId The msi client id used to collect logging to blob storage. If it's null,backend will pick a registered endpoint identity to auth. string
dataCollectionMode Enable or disable data collection. 'Disabled'
'Enabled'
dataId The data asset arm resource id. Client side will ensure data asset is pointing to the blob storage, and backend will collect data to the blob storage. string
samplingRate The sampling rate for collection. Sampling rate 1.0 means we collect 100% of data by default. int

ContainerResourceRequirements

Name Description Value
containerResourceLimits Container resource limit info: ContainerResourceSettings
containerResourceRequests Container resource request info: ContainerResourceSettings

ContainerResourceSettings

Name Description Value
cpu Number of vCPUs request/limit for container. More info:
https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
string
gpu Number of Nvidia GPU cards request/limit for container. More info:
https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
string
memory Memory size request/limit for container. More info:
https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
string

DataCollector

Name Description Value
collections [Required] The collection configuration. Each collection has it own configuration to collect model data and the name of collection can be arbitrary string.
Model data collector can be used for either payload logging or custom logging or both of them. Collection request and response are reserved for payload logging, others are for custom logging.
DataCollectorCollections (required)
requestLogging The request logging configuration for mdc, it includes advanced logging settings for all collections. It's optional. RequestLogging
rollingRate When model data is collected to blob storage, we need to roll the data to different path to avoid logging all of them in a single blob file.
If the rolling rate is hour, all data will be collected in the blob path /yyyy/MM/dd/HH/.
If it's day, all data will be collected in blob path /yyyy/MM/dd/.
The other benefit of rolling path is that model monitoring ui is able to select a time range of data very quickly.
'Day'
'Hour'
'Minute'
'Month'
'Year'

DataCollectorCollections

Name Description Value

DefaultScaleSettings

Name Description Value
scaleType [Required] Type of deployment scaling algorithm 'Default' (required)

EndpointDeploymentPropertiesBaseEnvironmentVariables

Name Description Value

EndpointDeploymentPropertiesBaseProperties

Name Description Value

KubernetesOnlineDeployment

Name Description Value
containerResourceRequirements The resource requirements for the container (cpu and memory). ContainerResourceRequirements
endpointComputeType [Required] The compute type of the endpoint. 'Kubernetes' (required)

ManagedOnlineDeployment

Name Description Value
endpointComputeType [Required] The compute type of the endpoint. 'Managed' (required)

ManagedServiceIdentity

Name Description Value
type Type of managed service identity (where both SystemAssigned and UserAssigned types are allowed). 'None'
'SystemAssigned'
'SystemAssigned,UserAssigned'
'UserAssigned' (required)
userAssignedIdentities The set of user assigned identities associated with the resource. The userAssignedIdentities dictionary keys will be ARM resource ids in the form: '/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{identityName}. The dictionary values can be empty objects ({}) in requests. UserAssignedIdentities

Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments

Name Description Value
identity Managed service identity (system assigned and/or user assigned identities) ManagedServiceIdentity
kind Metadata used by portal/tooling/etc to render different UX experiences for resources of the same type. string
location The geo-location where the resource lives string (required)
name The resource name string

Constraints:
Pattern = ^[a-zA-Z0-9][a-zA-Z0-9\-_]{0,254}$ (required)
parent In Bicep, you can specify the parent resource for a child resource. You only need to add this property when the child resource is declared outside of the parent resource.

For more information, see Child resource outside parent resource.
Symbolic name for resource of type: workspaces/onlineEndpoints
properties [Required] Additional attributes of the entity. OnlineDeploymentProperties (required)
sku Sku details required for ARM contract for Autoscaling. Sku
tags Resource tags Dictionary of tag names and values. See Tags in templates

OnlineDeploymentProperties

Name Description Value
appInsightsEnabled If true, enables Application Insights logging. bool
codeConfiguration Code configuration for the endpoint deployment. CodeConfiguration
dataCollector The mdc configuration, we disable mdc when it's null. DataCollector
description Description of the endpoint deployment. string
egressPublicNetworkAccess If Enabled, allow egress public network access. If Disabled, this will create secure egress. Default: Enabled. 'Disabled'
'Enabled'
endpointComputeType Set to 'Kubernetes' for type KubernetesOnlineDeployment. Set to 'Managed' for type ManagedOnlineDeployment. 'Kubernetes'
'Managed' (required)
environmentId ARM resource ID of the environment specification for the endpoint deployment. string
environmentVariables Environment variables configuration for the deployment. EndpointDeploymentPropertiesBaseEnvironmentVariables
instanceType Compute instance type. string
livenessProbe Liveness probe monitors the health of the container regularly. ProbeSettings
model The URI path to the model. string
modelMountPath The path to mount the model in custom container. string
properties Property dictionary. Properties can be added, but not removed or altered. EndpointDeploymentPropertiesBaseProperties
readinessProbe Readiness probe validates if the container is ready to serve traffic. The properties and defaults are the same as liveness probe. ProbeSettings
requestSettings Request settings for the deployment. OnlineRequestSettings
scaleSettings Scale settings for the deployment.
If it is null or not provided,
it defaults to TargetUtilizationScaleSettings for KubernetesOnlineDeployment
and to DefaultScaleSettings for ManagedOnlineDeployment.
OnlineScaleSettings

OnlineRequestSettings

Name Description Value
maxConcurrentRequestsPerInstance The number of maximum concurrent requests per node allowed per deployment. Defaults to 1. int
maxQueueWait The maximum amount of time a request will stay in the queue in ISO 8601 format.
Defaults to 500ms.
string
requestTimeout The scoring timeout in ISO 8601 format.
Defaults to 5000ms.
string

OnlineScaleSettings

Name Description Value
scaleType Set to 'Default' for type DefaultScaleSettings. Set to 'TargetUtilization' for type TargetUtilizationScaleSettings. 'Default'
'TargetUtilization' (required)

ProbeSettings

Name Description Value
failureThreshold The number of failures to allow before returning an unhealthy status. int
initialDelay The delay before the first probe in ISO 8601 format. string
period The length of time between probes in ISO 8601 format. string
successThreshold The number of successful probes before returning a healthy status. int
timeout The probe timeout in ISO 8601 format. string

RequestLogging

Name Description Value
captureHeaders For payload logging, we only collect payload by default. If customers also want to collect the specified headers, they can set them in captureHeaders so that backend will collect those headers along with payload. string[]

Sku

Name Description Value
capacity If the SKU supports scale out/in then the capacity integer should be included. If scale out/in is not possible for the resource this may be omitted. int
family If the service has different generations of hardware, for the same SKU, then that can be captured here. string
name The name of the SKU. Ex - P3. It is typically a letter+number code string (required)
size The SKU size. When the name field is the combination of tier and some other value, this would be the standalone code. string
tier This field is required to be implemented by the Resource Provider if the service has more than one tier, but is not required on a PUT. 'Basic'
'Free'
'Premium'
'Standard'

TargetUtilizationScaleSettings

Name Description Value
maxInstances The maximum number of instances that the deployment can scale to. The quota will be reserved for max_instances. int
minInstances The minimum number of instances to always be present. int
pollingInterval The polling interval in ISO 8691 format. Only supports duration with precision as low as Seconds. string
scaleType [Required] Type of deployment scaling algorithm 'TargetUtilization' (required)
targetUtilizationPercentage Target CPU usage for the autoscaler. int

TrackedResourceTags

Name Description Value

UserAssignedIdentities

Name Description Value

UserAssignedIdentity

Name Description Value

ARM template resource definition

The workspaces/onlineEndpoints/deployments resource type can be deployed with operations that target:

For a list of changed properties in each API version, see change log.

Resource format

To create a Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments resource, add the following JSON to your template.

{
  "type": "Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments",
  "apiVersion": "2024-04-01-preview",
  "name": "string",
  "identity": {
    "type": "string",
    "userAssignedIdentities": {
      "{customized property}": {
      }
    }
  },
  "kind": "string",
  "location": "string",
  "properties": {
    "appInsightsEnabled": "bool",
    "codeConfiguration": {
      "codeId": "string",
      "scoringScript": "string"
    },
    "dataCollector": {
      "collections": {
        "{customized property}": {
          "clientId": "string",
          "dataCollectionMode": "string",
          "dataId": "string",
          "samplingRate": "int"
        }
      },
      "requestLogging": {
        "captureHeaders": [ "string" ]
      },
      "rollingRate": "string"
    },
    "description": "string",
    "egressPublicNetworkAccess": "string",
    "environmentId": "string",
    "environmentVariables": {
      "{customized property}": "string"
    },
    "instanceType": "string",
    "livenessProbe": {
      "failureThreshold": "int",
      "initialDelay": "string",
      "period": "string",
      "successThreshold": "int",
      "timeout": "string"
    },
    "model": "string",
    "modelMountPath": "string",
    "properties": {
      "{customized property}": "string"
    },
    "readinessProbe": {
      "failureThreshold": "int",
      "initialDelay": "string",
      "period": "string",
      "successThreshold": "int",
      "timeout": "string"
    },
    "requestSettings": {
      "maxConcurrentRequestsPerInstance": "int",
      "maxQueueWait": "string",
      "requestTimeout": "string"
    },
    "scaleSettings": {
      "scaleType": "string"
      // For remaining properties, see OnlineScaleSettings objects
    },
    "endpointComputeType": "string"
    // For remaining properties, see OnlineDeploymentProperties objects
  },
  "sku": {
    "capacity": "int",
    "family": "string",
    "name": "string",
    "size": "string",
    "tier": "string"
  },
  "tags": {
    "{customized property}": "string"
  }
}

OnlineScaleSettings objects

Set the scaleType property to specify the type of object.

For Default, use:

{
  "scaleType": "Default"
}

For TargetUtilization, use:

{
  "maxInstances": "int",
  "minInstances": "int",
  "pollingInterval": "string",
  "scaleType": "TargetUtilization",
  "targetUtilizationPercentage": "int"
}

OnlineDeploymentProperties objects

Set the endpointComputeType property to specify the type of object.

For Kubernetes, use:

{
  "containerResourceRequirements": {
    "containerResourceLimits": {
      "cpu": "string",
      "gpu": "string",
      "memory": "string"
    },
    "containerResourceRequests": {
      "cpu": "string",
      "gpu": "string",
      "memory": "string"
    }
  },
  "endpointComputeType": "Kubernetes"
}

For Managed, use:

{
  "endpointComputeType": "Managed"
}

Property values

CodeConfiguration

Name Description Value
codeId ARM resource ID of the code asset. string
scoringScript [Required] The script to execute on startup. eg. "score.py" string

Constraints:
Min length = 1
Pattern = [a-zA-Z0-9_] (required)

Collection

Name Description Value
clientId The msi client id used to collect logging to blob storage. If it's null,backend will pick a registered endpoint identity to auth. string
dataCollectionMode Enable or disable data collection. 'Disabled'
'Enabled'
dataId The data asset arm resource id. Client side will ensure data asset is pointing to the blob storage, and backend will collect data to the blob storage. string
samplingRate The sampling rate for collection. Sampling rate 1.0 means we collect 100% of data by default. int

ContainerResourceRequirements

Name Description Value
containerResourceLimits Container resource limit info: ContainerResourceSettings
containerResourceRequests Container resource request info: ContainerResourceSettings

ContainerResourceSettings

Name Description Value
cpu Number of vCPUs request/limit for container. More info:
https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
string
gpu Number of Nvidia GPU cards request/limit for container. More info:
https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
string
memory Memory size request/limit for container. More info:
https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
string

DataCollector

Name Description Value
collections [Required] The collection configuration. Each collection has it own configuration to collect model data and the name of collection can be arbitrary string.
Model data collector can be used for either payload logging or custom logging or both of them. Collection request and response are reserved for payload logging, others are for custom logging.
DataCollectorCollections (required)
requestLogging The request logging configuration for mdc, it includes advanced logging settings for all collections. It's optional. RequestLogging
rollingRate When model data is collected to blob storage, we need to roll the data to different path to avoid logging all of them in a single blob file.
If the rolling rate is hour, all data will be collected in the blob path /yyyy/MM/dd/HH/.
If it's day, all data will be collected in blob path /yyyy/MM/dd/.
The other benefit of rolling path is that model monitoring ui is able to select a time range of data very quickly.
'Day'
'Hour'
'Minute'
'Month'
'Year'

DataCollectorCollections

Name Description Value

DefaultScaleSettings

Name Description Value
scaleType [Required] Type of deployment scaling algorithm 'Default' (required)

EndpointDeploymentPropertiesBaseEnvironmentVariables

Name Description Value

EndpointDeploymentPropertiesBaseProperties

Name Description Value

KubernetesOnlineDeployment

Name Description Value
containerResourceRequirements The resource requirements for the container (cpu and memory). ContainerResourceRequirements
endpointComputeType [Required] The compute type of the endpoint. 'Kubernetes' (required)

ManagedOnlineDeployment

Name Description Value
endpointComputeType [Required] The compute type of the endpoint. 'Managed' (required)

ManagedServiceIdentity

Name Description Value
type Type of managed service identity (where both SystemAssigned and UserAssigned types are allowed). 'None'
'SystemAssigned'
'SystemAssigned,UserAssigned'
'UserAssigned' (required)
userAssignedIdentities The set of user assigned identities associated with the resource. The userAssignedIdentities dictionary keys will be ARM resource ids in the form: '/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{identityName}. The dictionary values can be empty objects ({}) in requests. UserAssignedIdentities

Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments

Name Description Value
apiVersion The api version '2024-04-01-preview'
identity Managed service identity (system assigned and/or user assigned identities) ManagedServiceIdentity
kind Metadata used by portal/tooling/etc to render different UX experiences for resources of the same type. string
location The geo-location where the resource lives string (required)
name The resource name string

Constraints:
Pattern = ^[a-zA-Z0-9][a-zA-Z0-9\-_]{0,254}$ (required)
properties [Required] Additional attributes of the entity. OnlineDeploymentProperties (required)
sku Sku details required for ARM contract for Autoscaling. Sku
tags Resource tags Dictionary of tag names and values. See Tags in templates
type The resource type 'Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments'

OnlineDeploymentProperties

Name Description Value
appInsightsEnabled If true, enables Application Insights logging. bool
codeConfiguration Code configuration for the endpoint deployment. CodeConfiguration
dataCollector The mdc configuration, we disable mdc when it's null. DataCollector
description Description of the endpoint deployment. string
egressPublicNetworkAccess If Enabled, allow egress public network access. If Disabled, this will create secure egress. Default: Enabled. 'Disabled'
'Enabled'
endpointComputeType Set to 'Kubernetes' for type KubernetesOnlineDeployment. Set to 'Managed' for type ManagedOnlineDeployment. 'Kubernetes'
'Managed' (required)
environmentId ARM resource ID of the environment specification for the endpoint deployment. string
environmentVariables Environment variables configuration for the deployment. EndpointDeploymentPropertiesBaseEnvironmentVariables
instanceType Compute instance type. string
livenessProbe Liveness probe monitors the health of the container regularly. ProbeSettings
model The URI path to the model. string
modelMountPath The path to mount the model in custom container. string
properties Property dictionary. Properties can be added, but not removed or altered. EndpointDeploymentPropertiesBaseProperties
readinessProbe Readiness probe validates if the container is ready to serve traffic. The properties and defaults are the same as liveness probe. ProbeSettings
requestSettings Request settings for the deployment. OnlineRequestSettings
scaleSettings Scale settings for the deployment.
If it is null or not provided,
it defaults to TargetUtilizationScaleSettings for KubernetesOnlineDeployment
and to DefaultScaleSettings for ManagedOnlineDeployment.
OnlineScaleSettings

OnlineRequestSettings

Name Description Value
maxConcurrentRequestsPerInstance The number of maximum concurrent requests per node allowed per deployment. Defaults to 1. int
maxQueueWait The maximum amount of time a request will stay in the queue in ISO 8601 format.
Defaults to 500ms.
string
requestTimeout The scoring timeout in ISO 8601 format.
Defaults to 5000ms.
string

OnlineScaleSettings

Name Description Value
scaleType Set to 'Default' for type DefaultScaleSettings. Set to 'TargetUtilization' for type TargetUtilizationScaleSettings. 'Default'
'TargetUtilization' (required)

ProbeSettings

Name Description Value
failureThreshold The number of failures to allow before returning an unhealthy status. int
initialDelay The delay before the first probe in ISO 8601 format. string
period The length of time between probes in ISO 8601 format. string
successThreshold The number of successful probes before returning a healthy status. int
timeout The probe timeout in ISO 8601 format. string

RequestLogging

Name Description Value
captureHeaders For payload logging, we only collect payload by default. If customers also want to collect the specified headers, they can set them in captureHeaders so that backend will collect those headers along with payload. string[]

Sku

Name Description Value
capacity If the SKU supports scale out/in then the capacity integer should be included. If scale out/in is not possible for the resource this may be omitted. int
family If the service has different generations of hardware, for the same SKU, then that can be captured here. string
name The name of the SKU. Ex - P3. It is typically a letter+number code string (required)
size The SKU size. When the name field is the combination of tier and some other value, this would be the standalone code. string
tier This field is required to be implemented by the Resource Provider if the service has more than one tier, but is not required on a PUT. 'Basic'
'Free'
'Premium'
'Standard'

TargetUtilizationScaleSettings

Name Description Value
maxInstances The maximum number of instances that the deployment can scale to. The quota will be reserved for max_instances. int
minInstances The minimum number of instances to always be present. int
pollingInterval The polling interval in ISO 8691 format. Only supports duration with precision as low as Seconds. string
scaleType [Required] Type of deployment scaling algorithm 'TargetUtilization' (required)
targetUtilizationPercentage Target CPU usage for the autoscaler. int

TrackedResourceTags

Name Description Value

UserAssignedIdentities

Name Description Value

UserAssignedIdentity

Name Description Value

Terraform (AzAPI provider) resource definition

The workspaces/onlineEndpoints/deployments resource type can be deployed with operations that target:

  • Resource groups

For a list of changed properties in each API version, see change log.

Resource format

To create a Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments resource, add the following Terraform to your template.

resource "azapi_resource" "symbolicname" {
  type = "Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments@2024-04-01-preview"
  name = "string"
  identity = {
    type = "string"
    userAssignedIdentities = {
      {customized property} = {
      }
    }
  }
  kind = "string"
  location = "string"
  sku = {
    capacity = int
    family = "string"
    name = "string"
    size = "string"
    tier = "string"
  }
  tags = {
    {customized property} = "string"
  }
  body = jsonencode({
    properties = {
      appInsightsEnabled = bool
      codeConfiguration = {
        codeId = "string"
        scoringScript = "string"
      }
      dataCollector = {
        collections = {
          {customized property} = {
            clientId = "string"
            dataCollectionMode = "string"
            dataId = "string"
            samplingRate = int
          }
        }
        requestLogging = {
          captureHeaders = [
            "string"
          ]
        }
        rollingRate = "string"
      }
      description = "string"
      egressPublicNetworkAccess = "string"
      environmentId = "string"
      environmentVariables = {
        {customized property} = "string"
      }
      instanceType = "string"
      livenessProbe = {
        failureThreshold = int
        initialDelay = "string"
        period = "string"
        successThreshold = int
        timeout = "string"
      }
      model = "string"
      modelMountPath = "string"
      properties = {
        {customized property} = "string"
      }
      readinessProbe = {
        failureThreshold = int
        initialDelay = "string"
        period = "string"
        successThreshold = int
        timeout = "string"
      }
      requestSettings = {
        maxConcurrentRequestsPerInstance = int
        maxQueueWait = "string"
        requestTimeout = "string"
      }
      scaleSettings = {
        scaleType = "string"
        // For remaining properties, see OnlineScaleSettings objects
      }
      endpointComputeType = "string"
      // For remaining properties, see OnlineDeploymentProperties objects
    }
  })
}

OnlineScaleSettings objects

Set the scaleType property to specify the type of object.

For Default, use:

{
  scaleType = "Default"
}

For TargetUtilization, use:

{
  maxInstances = int
  minInstances = int
  pollingInterval = "string"
  scaleType = "TargetUtilization"
  targetUtilizationPercentage = int
}

OnlineDeploymentProperties objects

Set the endpointComputeType property to specify the type of object.

For Kubernetes, use:

{
  containerResourceRequirements = {
    containerResourceLimits = {
      cpu = "string"
      gpu = "string"
      memory = "string"
    }
    containerResourceRequests = {
      cpu = "string"
      gpu = "string"
      memory = "string"
    }
  }
  endpointComputeType = "Kubernetes"
}

For Managed, use:

{
  endpointComputeType = "Managed"
}

Property values

CodeConfiguration

Name Description Value
codeId ARM resource ID of the code asset. string
scoringScript [Required] The script to execute on startup. eg. "score.py" string

Constraints:
Min length = 1
Pattern = [a-zA-Z0-9_] (required)

Collection

Name Description Value
clientId The msi client id used to collect logging to blob storage. If it's null,backend will pick a registered endpoint identity to auth. string
dataCollectionMode Enable or disable data collection. 'Disabled'
'Enabled'
dataId The data asset arm resource id. Client side will ensure data asset is pointing to the blob storage, and backend will collect data to the blob storage. string
samplingRate The sampling rate for collection. Sampling rate 1.0 means we collect 100% of data by default. int

ContainerResourceRequirements

Name Description Value
containerResourceLimits Container resource limit info: ContainerResourceSettings
containerResourceRequests Container resource request info: ContainerResourceSettings

ContainerResourceSettings

Name Description Value
cpu Number of vCPUs request/limit for container. More info:
https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
string
gpu Number of Nvidia GPU cards request/limit for container. More info:
https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
string
memory Memory size request/limit for container. More info:
https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
string

DataCollector

Name Description Value
collections [Required] The collection configuration. Each collection has it own configuration to collect model data and the name of collection can be arbitrary string.
Model data collector can be used for either payload logging or custom logging or both of them. Collection request and response are reserved for payload logging, others are for custom logging.
DataCollectorCollections (required)
requestLogging The request logging configuration for mdc, it includes advanced logging settings for all collections. It's optional. RequestLogging
rollingRate When model data is collected to blob storage, we need to roll the data to different path to avoid logging all of them in a single blob file.
If the rolling rate is hour, all data will be collected in the blob path /yyyy/MM/dd/HH/.
If it's day, all data will be collected in blob path /yyyy/MM/dd/.
The other benefit of rolling path is that model monitoring ui is able to select a time range of data very quickly.
'Day'
'Hour'
'Minute'
'Month'
'Year'

DataCollectorCollections

Name Description Value

DefaultScaleSettings

Name Description Value
scaleType [Required] Type of deployment scaling algorithm 'Default' (required)

EndpointDeploymentPropertiesBaseEnvironmentVariables

Name Description Value

EndpointDeploymentPropertiesBaseProperties

Name Description Value

KubernetesOnlineDeployment

Name Description Value
containerResourceRequirements The resource requirements for the container (cpu and memory). ContainerResourceRequirements
endpointComputeType [Required] The compute type of the endpoint. 'Kubernetes' (required)

ManagedOnlineDeployment

Name Description Value
endpointComputeType [Required] The compute type of the endpoint. 'Managed' (required)

ManagedServiceIdentity

Name Description Value
type Type of managed service identity (where both SystemAssigned and UserAssigned types are allowed). 'None'
'SystemAssigned'
'SystemAssigned,UserAssigned'
'UserAssigned' (required)
userAssignedIdentities The set of user assigned identities associated with the resource. The userAssignedIdentities dictionary keys will be ARM resource ids in the form: '/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{identityName}. The dictionary values can be empty objects ({}) in requests. UserAssignedIdentities

Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments

Name Description Value
identity Managed service identity (system assigned and/or user assigned identities) ManagedServiceIdentity
kind Metadata used by portal/tooling/etc to render different UX experiences for resources of the same type. string
location The geo-location where the resource lives string (required)
name The resource name string

Constraints:
Pattern = ^[a-zA-Z0-9][a-zA-Z0-9\-_]{0,254}$ (required)
parent_id The ID of the resource that is the parent for this resource. ID for resource of type: workspaces/onlineEndpoints
properties [Required] Additional attributes of the entity. OnlineDeploymentProperties (required)
sku Sku details required for ARM contract for Autoscaling. Sku
tags Resource tags Dictionary of tag names and values.
type The resource type "Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments@2024-04-01-preview"

OnlineDeploymentProperties

Name Description Value
appInsightsEnabled If true, enables Application Insights logging. bool
codeConfiguration Code configuration for the endpoint deployment. CodeConfiguration
dataCollector The mdc configuration, we disable mdc when it's null. DataCollector
description Description of the endpoint deployment. string
egressPublicNetworkAccess If Enabled, allow egress public network access. If Disabled, this will create secure egress. Default: Enabled. 'Disabled'
'Enabled'
endpointComputeType Set to 'Kubernetes' for type KubernetesOnlineDeployment. Set to 'Managed' for type ManagedOnlineDeployment. 'Kubernetes'
'Managed' (required)
environmentId ARM resource ID of the environment specification for the endpoint deployment. string
environmentVariables Environment variables configuration for the deployment. EndpointDeploymentPropertiesBaseEnvironmentVariables
instanceType Compute instance type. string
livenessProbe Liveness probe monitors the health of the container regularly. ProbeSettings
model The URI path to the model. string
modelMountPath The path to mount the model in custom container. string
properties Property dictionary. Properties can be added, but not removed or altered. EndpointDeploymentPropertiesBaseProperties
readinessProbe Readiness probe validates if the container is ready to serve traffic. The properties and defaults are the same as liveness probe. ProbeSettings
requestSettings Request settings for the deployment. OnlineRequestSettings
scaleSettings Scale settings for the deployment.
If it is null or not provided,
it defaults to TargetUtilizationScaleSettings for KubernetesOnlineDeployment
and to DefaultScaleSettings for ManagedOnlineDeployment.
OnlineScaleSettings

OnlineRequestSettings

Name Description Value
maxConcurrentRequestsPerInstance The number of maximum concurrent requests per node allowed per deployment. Defaults to 1. int
maxQueueWait The maximum amount of time a request will stay in the queue in ISO 8601 format.
Defaults to 500ms.
string
requestTimeout The scoring timeout in ISO 8601 format.
Defaults to 5000ms.
string

OnlineScaleSettings

Name Description Value
scaleType Set to 'Default' for type DefaultScaleSettings. Set to 'TargetUtilization' for type TargetUtilizationScaleSettings. 'Default'
'TargetUtilization' (required)

ProbeSettings

Name Description Value
failureThreshold The number of failures to allow before returning an unhealthy status. int
initialDelay The delay before the first probe in ISO 8601 format. string
period The length of time between probes in ISO 8601 format. string
successThreshold The number of successful probes before returning a healthy status. int
timeout The probe timeout in ISO 8601 format. string

RequestLogging

Name Description Value
captureHeaders For payload logging, we only collect payload by default. If customers also want to collect the specified headers, they can set them in captureHeaders so that backend will collect those headers along with payload. string[]

Sku

Name Description Value
capacity If the SKU supports scale out/in then the capacity integer should be included. If scale out/in is not possible for the resource this may be omitted. int
family If the service has different generations of hardware, for the same SKU, then that can be captured here. string
name The name of the SKU. Ex - P3. It is typically a letter+number code string (required)
size The SKU size. When the name field is the combination of tier and some other value, this would be the standalone code. string
tier This field is required to be implemented by the Resource Provider if the service has more than one tier, but is not required on a PUT. 'Basic'
'Free'
'Premium'
'Standard'

TargetUtilizationScaleSettings

Name Description Value
maxInstances The maximum number of instances that the deployment can scale to. The quota will be reserved for max_instances. int
minInstances The minimum number of instances to always be present. int
pollingInterval The polling interval in ISO 8691 format. Only supports duration with precision as low as Seconds. string
scaleType [Required] Type of deployment scaling algorithm 'TargetUtilization' (required)
targetUtilizationPercentage Target CPU usage for the autoscaler. int

TrackedResourceTags

Name Description Value

UserAssignedIdentities

Name Description Value

UserAssignedIdentity

Name Description Value