Enrich data by using dataflows

Article
10/31/2024

Important

Azure IoT Operations Preview – enabled by Azure Arc is currently in preview. You shouldn't use this preview software in production environments.

You'll need to deploy a new Azure IoT Operations installation when a generally available release becomes available. You won't be able to upgrade a preview installation.

For legal terms that apply to Azure features that are in beta, in preview, or otherwise not yet released into general availability, see the Supplemental Terms of Use for Microsoft Azure Previews.

You can enrich data by using the contextualization datasets function. When incoming records are processed, you can query these datasets based on conditions that relate to the fields of the incoming record. This capability allows for dynamic interactions. Data from these datasets can be used to supplement information in the output fields and participate in complex calculations during the mapping process.

For example, consider the following dataset with a few records, represented as JSON records:

{
  "Position": "Analyst",
  "BaseSalary": 70000,
  "WorkingHours": "Regular"
},
{
  "Position": "Receptionist",
  "BaseSalary": 43000,
  "WorkingHours": "Regular"
}

The mapper accesses the reference dataset stored in the Azure IoT Operations distributed state store (DSS) by using a key value based on a condition specified in the mapping configuration. Key names in the DSS correspond to a dataset in the dataflow configuration.

Bicep
Kubernetes

datasets: [
  {
    key: 'position',
    inputs: [
      '$source.Position' //  - $1
      '$context.Position' // - $2
    ],
    expression: '$1 == $2'
  }
]

datasets:
- key: position
  inputs:
    - $source.Position #  - $1
    - $context.Position # - $2
  expression: $1 == $2

When a new record is being processed, the mapper performs the following steps:

Data request: The mapper sends a request to the DSS to retrieve the dataset stored under the key Position.
Record matching: The mapper then queries this dataset to find the first record where the Position field in the dataset matches the Position field of the incoming record.

Bicep
Kubernetes

{
  inputs: [
    '$context(position).WorkingHours' //  - $1 
  ]
  output: 'WorkingHours'
}
{
  inputs: [
    'BaseSalary' // - - - - - - - - - - - - $1
    '$context(position).BaseSalary' //  - - $2
  ]
  output: 'BaseSalary'
  expression: 'if($1 == (), $2, $1)'
}

- inputs:
  - $context(position).WorkingHours #  - $1 
  output: WorkingHours

- inputs:
  - BaseSalary   # - - - - - - - - - - - $1
  - $context(position).BaseSalary #  - - $2 
  output: BaseSalary
  expression: if($1 == (), $2, $1)

In this example, the WorkingHours field is added to the output record, while the BaseSalary is used conditionally only when the incoming record doesn't contain the BaseSalary field (or the value is null if it's a nullable field). The request for the contextualization data doesn't happen with every incoming record. The mapper requests the dataset and then it receives notifications from DSS about the changes, while it uses a cached version of the dataset.

It's possible to use multiple datasets:

Bicep
Kubernetes

datasets: [
  {
    key: 'position'
    inputs: [
      '$source.Position'  // - $1
      '$context.Position' // - $2
    ],
    expression: '$1 == $2'
  }
  {
    key: 'permissions'
    inputs: [
      '$source.Position'  // - $1
      '$context.Position' // - $2
    ],
    expression: '$1 == $2'
  }
]

datasets:
- key: position
  inputs:
    - $source.Position  # - $1
    - $context.Position # - $2
  expression: $1 == $2

- key: permissions
  inputs:
    - $source.Position  # - $1
    - $context.Position # - $2
  expression: $1 == $2

Then use the references mixed:

Bicep
Kubernetes

inputs: [
  '$context(position).WorkingHours'  // - $1
  '$context(permissions).NightShift' // - $2
]

- inputs:
  - $context(position).WorkingHours  #    - - $1
  - $context(permission).NightShift  #    - - $2

The input references use the key of the dataset like position or permission. If the key in DSS is inconvenient to use, you can define an alias:

Bicep
Kubernetes

datasets: [
  {
    key: 'datasets.parag10.rule42 as position'
    inputs: [
      '$source.Position'  // - $1
      '$context.Position' // - $2
    ],
    expression: '$1 == $2'
  }
]

datasets:
  - key: datasets.parag10.rule42 as position
    inputs:
      - $source.Position  # - $1
      - $context.Position # - $2
    expression: $1 == $2

The configuration renames the dataset with the key datasets.parag10.rule42 to position.

Share via

Enrich data by using dataflows

Feedback

Additional resources