How data is processed in FinOps hubs
FinOps hubs perform many data processing activities to clean up, normalize, and optimize data. The following sections show how data flows from Cost Management into a hub instance.
Scope setup
A scope is a level within the cloud resource and account hierarchy that provides access to cost, usage, and carbon data. For FinOps hubs, we typically recommend using Enterprise Agreement (EA) billing accounts or Microsoft Customer Agreement (MCA) billing profiles, however any cloud scope is sufficient for basic analysis. The main concern is about whether price and reservation data is needed, since Cost Management only exposes the data for EA billing accounts and MCA billing profiles.
FinOps hubs support configuring scopes by manually configuring Cost Management exports or by granting FinOps hubs access to manage scopes on your behalf. Managed scopes are configured in the config/settings.json file in hub storage. The information describes what happens when a new, managed scope is added into this file. Unmanaged scopes, where Cost Management exports are manually configured, don't require other setup.
- The config_SettingsUpdated trigger runs when the settings.json file is updated.
- The config_ConfigureExports pipeline creates new exports for any new scopes that were added.
Data ingestion
Data ingestion can be broken down into two parts:
- Exports push data to storage.
- Hubs processes and ingests data.
For managed scopes, hubs perform the following steps:
- The config_DailySchedule and config_MonthlySchedule triggers run on their respective schedules to kick off data ingestion.
- The config_StartExportProcess pipeline gets the applicable exports for the schedule that is running.
- The config_RunExportJobs pipeline executes each of the selected exports.
- Cost Management exports raw cost details to the msexports container. Learn more.
After exports are run, whether managed or unmanaged, hubs perform the following steps:
- The msexports_ExecuteETL pipeline kicks off the extract-transform-load (ETL) process when files are added to storage.
- The msexports_ETL_ingestion pipeline transforms the data to parquet format and moves it to the ingestion container using a scalable file structure. Learn more.
- Power BI or other tools read data from the ingestion container.
About ingestion
FinOps hubs rely on a specific folder path in the ingestion container:
ingestion/{dataset}/{yyyy}/{mm}/{scope-id}
ingestion
is the container where the data pipeline saves data.{dataset}
is the exported dataset type.{month}
is the year and month of the exported data formatted asyyyyMM
.{scope-id}
is expected to be the fully qualified resource ID of the scope the data is from.
If you need to use hubs to monitor non-Azure data, convert the data to FOCUS and drop it into the ingestion container. Note this process wasn't explicitly tested in the latest release. If you experience any issues, create an issue.
About exports
FinOps hubs use Cost Management exports to obtain cost data. Cost Management controls the folder structure for the exported data in the msexports container. A typical path looks like:
{container}/{path}/{date-range}/{export-name}/{export-time}/{guid}/{file}
As of 0.4, FinOps hubs don't rely on file paths. Hubs utilize the manifest file to identify the scope, dataset, month, etc. The only important part of the path for hubs is the container, which must be msexports.
Warning
- Don't export data to the ingestion container. Exported CSVs must be published to the msexports container to be processed by the hubs engine.
- To ingest custom data, save FOCUS-aligned parquet files in the ingestion container for the FinOps toolkit Power BI reports to work as expected.
Export manifests can change with API versions. Here's an example with API version 2023-07-01-preview
:
{
"exportConfig": {
"exportName": "<export-name>",
"resourceId": "/<scope>/providers/Microsoft.CostManagement/exports/<export-name>",
"dataVersion": "<dataset-version>",
"apiVersion": "<api-version>",
"type": "<dataset-type>",
"timeFrame": "OneTime|TheLastMonth|MonthToDate",
"granularity": "Daily"
},
"deliveryConfig": {
"partitionData": true,
"dataOverwriteBehavior": "CreateNewReport|OverwritePreviousReport",
"fileFormat": "Csv",
"containerUri": "<storage-resource-id>",
"rootFolderPath": "<path>"
},
"runInfo": {
"executionType": "Scheduled",
"submittedTime": "2024-02-03T18:33:03.1032074Z",
"runId": "af754a8e-30fc-4ef3-bfc6-71bd1efb8598",
"startDate": "2024-01-01T00:00:00",
"endDate": "2024-01-31T00:00:00"
},
"blobs": [
{
"blobName": "<path>/<export-name>/<date-range>/<export-time>/<guid>/<file-name>.csv",
"byteCount": ###
}
]
}
FinOps hubs use the following properties:
exportConfig.resourceId
to identify the scope.exportConfig.type
to identify the dataset type.exportConfig.dataVersion
to identify the dataset version.runInfo.startDate
to identify the exported month.
FinOps hubs support the following dataset types, versions, and API versions:
- FocusCost:
1.0
,1.0-preview(v1)
- PriceSheet:
2023-05-01
- ReservationDetails:
2023-03-01
- ReservationRecommendations:
2023-05-01
- ReservationTransactions:
2023-05-01
- API versions:
2023-07-01-preview
FinOps hubs v0.4-0.5
The following information describes how data is processed in FinOps hubs v0.4-0.5.
Scope setup in v0.4-0.5
- The config_SettingsUpdated trigger runs when the settings.json file is updated.
- The config_ConfigureExports pipeline creates new exports for any new scopes that were added.
Data ingestion in v0.4-0.5
For managed scopes:
- The config_DailySchedule and config_MonthlySchedule triggers run on their respective schedules to kick off data ingestion.
- The config_ExportData pipeline gets the applicable exports for the schedule that is running.
- The config_RunExports pipeline executes each of the selected exports.
- Cost Management exports raw cost details to the msexports container. For more information, see About exports in v04-05.
After exports are completed, for both managed and unmanaged scopes:
- The msexports_ExecuteETL pipeline kicks off the extract-transform-load (ETL) process when files are added to storage.
- The msexports_ETL_ingestion pipeline transforms the data to a standard schema and saves the raw data in parquet format to the ingestion container. For more information, see About ingestion in v04-05.
- Power BI reads cost data from the ingestion container.
About ingestion in v0.4-0.5
FinOps hubs rely on a specific folder path in the ingestion container:
ingestion/{dataset}/{yyyy}/{mm}/{scope-id}
ingestion
is the container where the data pipeline saves data.{dataset}
is the exported dataset type.{month}
is the year and month of the exported data formatted asyyyyMM
.{scope-id}
is expected to be the fully qualified resource ID of the scope the data is from.
If you need to use hubs to monitor non-Azure data, convert the data to FOCUS and drop it into the ingestion container. This process wasn't explicitly tested in the latest release. If you experience any issues, create an issue.
About exports in v0.4-0.5
FinOps hubs use Cost Management exports to obtain cost data. Cost Management controls the folder structure for the exported data in the msexports container. A typical path looks like:
{container}/{path}/{date-range}/{export-name}/{export-time}/{guid}/{file}
As of 0.4, FinOps hubs don't rely on file paths. Hubs utilize the manifest file to identify the scope, dataset, month, and so on. The only important part of the path for hubs is the container, which must be msexports.
Note
Don't export data to the ingestion container. Exported CSVs must be published to the msexports container to be processed by the hubs engine.
To ingest custom data, save FOCUS-aligned parquet files in the ingestion container for the FinOps toolkit Power BI reports to work as expected.
Export manifests can change with API versions. Here's an example with API version 2023-07-01-preview
:
{
"exportConfig": {
"exportName": "<export-name>",
"resourceId": "/<scope>/providers/Microsoft.CostManagement/exports/<export-name>",
"dataVersion": "<dataset-version>",
"apiVersion": "<api-version>",
"type": "<dataset-type>",
"timeFrame": "OneTime|TheLastMonth|MonthToDate",
"granularity": "Daily"
},
"deliveryConfig": {
"partitionData": true,
"dataOverwriteBehavior": "CreateNewReport|OverwritePreviousReport",
"fileFormat": "Csv",
"containerUri": "<storage-resource-id>",
"rootFolderPath": "<path>"
},
"runInfo": {
"executionType": "Scheduled",
"submittedTime": "2024-02-03T18:33:03.1032074Z",
"runId": "af754a8e-30fc-4ef3-bfc6-71bd1efb8598",
"startDate": "2024-01-01T00:00:00",
"endDate": "2024-01-31T00:00:00"
},
"blobs": [
{
"blobName": "<path>/<export-name>/<date-range>/<export-time>/<guid>/<file-name>.csv",
"byteCount": ###
}
]
}
FinOps hubs use the following properties:
exportConfig.resourceId
to identify the scope.exportConfig.type
to identify the dataset type.exportConfig.dataVersion
to identify the dataset version.runInfo.startDate
to identify the exported month.
FinOps hubs support the following dataset types, versions, and API versions:
- FocusCost:
1.0
,1.0-preview(v1)
- PriceSheet:
2023-05-01
- ReservationDetails:
2023-03-01
- ReservationRecommendations:
2023-05-01
- ReservationTransactions:
2023-05-01
- API versions:
2023-07-01-preview
FinOps hubs v0.2-0.3
The following steps outline the process for exporting and processing cost data using FinOps hubs versions 0.2-0.3:
- Cost Management exports raw cost details to the msexports container. For more information, see About exports.
- The msexports_ExecuteETL pipeline kicks off the extract-transform-load (ETL) process when files are added to storage.
- The msexports_ETL_ingestion pipeline saves exported data in parquet format in the ingestion container. For more information, see About exports.
- Power BI reads cost data from the ingestion container.
FinOps hubs 0.2-0.3 use the export path to determine the exported scope and month. This point is important as updates to the path can break the data pipelines. To avoid this problem, we recommend updating to FinOps hubs 0.4. The expected path should mimic:
msexports/{scope-id}/{export-name}/{date-range}/{export-time}/{guid}/{file}
msexports
is the container specified on the export.{scope-id}
is the folder path specified on the export.Hubs 0.3 and earlier use this to identify which scope the data is coming from. We recommend using the scope ID but any value can be used. Example scope IDs include:
Scope type Example value Subscription /subscriptions/###
Resource group /subscriptions/###/resourceGroups/###
Billing account /providers/Microsoft.Billing/billingAccounts/###
Billing profile /providers/Microsoft.Billing/billingAccounts/###/billingProfiles/###
{export-name}
is the name of the export.Hubs ignore this folder.
{date-range}
is the date range data being exported.Hubs 0.3 and earlier use this to identify the month. Format for this folder is
yyyyMMdd-yyyyMMdd
. Hubs 0.4 uses the manifest instead.{export-time}
is a timestamp of when the export ran.Hubs ignore this. Format for this folder is
yyyyMMddHHmm
.{guid}
is a unique GUID and isn't always present.Hubs ignore this. Cost Management does not always include this folder. Whether or not it is included depends on the API version used to create the export.
{file}
is either a manifest or exported data.Version 0.3 and earlier ignore manifest files and only monitor *.csv files. In a future release, hubs will monitor the manifest.
FinOps hubs v0.1
The following steps outline the process for exporting and processing cost data using FinOps hubs version 0.1:
- Cost Management exports raw cost details to the msexports container.
- The msexports_transform pipeline saves the raw data in parquet format to the ingestion container.
- Power BI reads cost data from the ingestion container.