Configure dataflow endpoints for Microsoft Fabric OneLake
Important
Azure IoT Operations Preview – enabled by Azure Arc is currently in preview. You shouldn't use this preview software in production environments.
You'll need to deploy a new Azure IoT Operations installation when a generally available release becomes available. You won't be able to upgrade a preview installation.
For legal terms that apply to Azure features that are in beta, in preview, or otherwise not yet released into general availability, see the Supplemental Terms of Use for Microsoft Azure Previews.
To send data to Microsoft Fabric OneLake in Azure IoT Operations Preview, you can configure a dataflow endpoint. This configuration allows you to specify the destination endpoint, authentication method, table, and other settings.
Prerequisites
- An instance of Azure IoT Operations Preview
- A configured dataflow profile
- Microsoft Fabric OneLake. See the following steps to create a workspace and lakehouse.
- Create a workspace. The default my workspace isn't supported.
- Create a lakehouse.
- If shown, ensure Lakehouse schemas (Public Preview) is unchecked.
- Make note of the workspace and lakehouse names.
Create a Microsoft Fabric OneLake dataflow endpoint
To configure a dataflow endpoint for Microsoft Fabric OneLake, we suggest using the managed identity of the Azure Arc-enabled Kubernetes cluster. This approach is secure and eliminates the need for secret management.
- In Azure portal, go to your Azure IoT Operations instance and select Overview.
- Copy the name of the extension listed after Azure IoT Operations Arc extension. For example, azure-iot-operations-xxxx7.
- In the Microsoft Fabric workspace you created, select Manage access > + Add people or groups. Search for the Azure IoT Operations Arc extension by its name and select it. Select Contributor as the role, then select Add.
- Create the DataflowEndpoint resource and specify the managed identity authentication method.
In the operations experience, select the Dataflow endpoints tab.
Under Create new dataflow endpoint, select Microsoft Fabric OneLake > New.
Enter the following settings for the endpoint:
Setting Description Host The hostname of the Microsoft Fabric OneLake endpoint in the format onelake.dfs.fabric.microsoft.com
.Lakehouse name The name of the lakehouse where the data should be stored. Workspace name The name of the workspace associated with the lakehouse. OneLake path type The type of path used in OneLake. Select Files or Tables. Authentication method The method used for authentication. Choose System assigned managed identity or User assigned managed identity Client ID The client ID of the user-assigned managed identity. Required if using User assigned managed identity. Tenant ID The tenant ID of the user-assigned managed identity. Required if using User assigned managed identity. Select Apply to provision the endpoint.
OneLake path type
The oneLakePathType
setting determines the type of path to use in the OneLake path. The default value is Tables
, which is the recommended path type for the most common use cases. The Tables
path type is a table in the OneLake lakehouse that is used to store the data. It can also be set as Files
, which is a file in the OneLake lakehouse that is used to store the data. The Files
path type is useful when you want to store the data in a file format that isn't supported by the Tables
path type.
The OneLake path type is set in the Basic tab for the dataflow endpoint.
Available authentication methods
The following authentication methods are available for Microsoft Fabric OneLake dataflow endpoints. For more information about enabling secure settings by configuring an Azure Key Vault and enabling workload identities, see Enable secure settings in Azure IoT Operations Preview deployment.
Before you create the dataflow endpoint, assign workspace Contributor role to the IoT Operations extension that grants permission to write to the Fabric lakehouse.
To learn more, see Give access to a workspace.
System-assigned managed identity
Using the system-assigned managed identity is the recommended authentication method for Azure IoT Operations. Azure IoT Operations creates the managed identity automatically and assigns it to the Azure Arc-enabled Kubernetes cluster. It eliminates the need for secret management and allows for seamless authentication.
In the DataflowEndpoint resource, specify the managed identity authentication method. In most cases, you don't need to specify other settings. This configuration creates a managed identity with the default audience.
In the operations experience dataflow endpoint settings page, select the Basic tab then choose Authentication method > System assigned managed identity.
If you need to override the system-assigned managed identity audience, you can specify the audience
setting.
In most cases, you don't need to specify a service audience. Not specifying an audience creates a managed identity with the default audience scoped to your storage account.
User-assigned managed identity
To use user-managed identity for authentication, you must first deploy Azure IoT Operations with secure settings enabled. To learn more, see Enable secure settings in Azure IoT Operations Preview deployment.
Then, specify the user-assigned managed identity authentication method along with the client ID, tenant ID, and scope of the managed identity.
In the operations experience dataflow endpoint settings page, select the Basic tab then choose Authentication method > User assigned managed identity.
Enter the user assigned managed identity client ID and tenant ID in the appropriate fields.
Here, the scope is optional and defaults to https://storage.azure.com/.default
. If you need to override the default scope, specify the scope
setting using Bicep or Kubernetes.
Advanced settings
You can set advanced settings for the Fabric OneLake endpoint, such as the batching latency and message count. You can set these settings in the dataflow endpoint Advanced portal tab or within the dataflow endpoint custom resource.
Batching
Use the batching
settings to configure the maximum number of messages and the maximum latency before the messages are sent to the destination. This setting is useful when you want to optimize for network bandwidth and reduce the number of requests to the destination.
Field | Description | Required |
---|---|---|
latencySeconds |
The maximum number of seconds to wait before sending the messages to the destination. The default value is 60 seconds. | No |
maxMessages |
The maximum number of messages to send to the destination. The default value is 100000 messages. | No |
For example, to configure the maximum number of messages to 1000 and the maximum latency to 100 seconds, use the following settings:
In the operations experience, select the Advanced tab for the dataflow endpoint.
Next steps
To learn more about dataflows, see Create a dataflow.