Customize Azure optimization engine

The Azure optimization engine (AOE) is a set of Azure Automation runbooks that collect, ingest, and analyze Azure consumption and performance data to provide cost optimization recommendations. The engine is designed to be flexible and customizable, allowing you to adjust its behavior to better fit your organization's needs. This article provides guidance on how to customize the engine's settings. It includes adjusting thresholds, changing schedules, and expanding the engine's scope.


Widen the engine scope

By default, the Azure Automation Managed Identity is assigned the Reader role only over the respective subscription. However, you can widen the scope of its recommendations just by granting the same Reader role to other subscriptions or, even simpler, to a top-level Management Group.

In the context of augmented virtual machine (VM) right-size recommendations, you might have your VMs reporting to multiple workspaces. If you need to include other workspaces - besides the main one AOE is using - in the recommendations scope, you just have to add their workspace IDs to the AzureOptimization_RightSizeAdditionalPerfWorkspaces variable (see more details in Configuring workspaces).

If you're a multitenant customer, you can extend the reach of AOE to a tenant other than the one where it was deployed. To achieve this extension, you must ensure the following prerequisites:

  • Create a service principal (App registration) and a secret in the secondary tenant.
  • Grant the required permissions to the service principal in the secondary tenant, namely Reader in Azure subscriptions/management groups and Global Reader in Microsoft Entra ID.
  • Create an Automation credential in the AOE's Automation Account. Set the service principal's client ID as username and the secret as password.
  • Execute the Register-MultitenantAutomationSchedules.ps1 script (available in the AOE root folder) in the context of the subscription where AOE was deployed. This script creates new job schedules for each of the export runbooks and configures them to query the secondary tenant. You just have to call the script using the following syntax:
./Register-MultitenantAutomationSchedules.ps1 -AutomationAccountName <AOE automation account> -ResourceGroupName <AOE resource group> -TargetSchedulesSuffix <suffix to append to every new job schedules, e.g., Tenant2> -TargetTenantId <secondary tenant GUID> -TargetTenantCredentialName <name of the Automation credential created in the previous step> [-TargetSchedulesOffsetMinutes <offset in minutes relative to original schedules, defaults to 0>] [-TargetAzureEnvironment <AzureUSGovernment|AzureGermanCloud|AzureCloud>] [-ExcludedRunbooks <An array of runbook names to exclude from the process>] [-IncludedRunbooks <An array of runbook names to include in the process>]

Adjust schedules

By default, the base time for the AOE Automation schedules is set as the deployment time. Soon after the initial deployment completes, the exports, ingests, and recommendations runbooks run according to the engine's default schedules. For example, if you deploy AOE on a Monday at 11:00 a.m., you get new recommendations every Monday at 2:30 p.m.. If this schedule, for some reason, doesn't fit your needs, you can reset it to the time that better suits you, by using the Reset-AutomationSchedules.ps1 script (available in the AOE root folder). You just have to call the following script. Follow the syntax and answer the input requests:

./Reset-AutomationSchedules.ps1 -AutomationAccountName <AOE automation account> -ResourceGroupName <AOE resource group> [-AzureEnvironment <AzureUSGovernment|AzureGermanCloud|AzureCloud>]

The base time you choose must be in UTC and must be defined according to the day of the week and hour you want recommendations to be generated. You must deduce 3h30m from the time you choose. It's because the base time defines the schedules for all the dependent automation runbooks that must run before the recommendations are generated. For example, let's say you want recommendations to be generated every Monday at 8h30 a.m.; the base time is the next calendar date falling on a Monday, at 5h00 AM. The format of the date you choose must be YYYY-MM-dd HH:mm:ss, for example, 2022-01-03 05:00:00.

The script also asks you to enter, if needed, the Hybrid Worker Group you want the runbooks to run in (see the next subsection).


Scale AOE runbooks with Hybrid Worker

By default, AOE Automation runbooks are executed in the context of the Azure Automation sandbox. You might face performance issues due to the memory limits of the Automation sandbox. Or, you might decide to implement private endpoints for the Storage Account or SQL Database to harden AOE's security. In either case, you need to execute runbooks from a Hybrid Worker. It’s an Azure or on-premises Virtual Machine with the Automation Hybrid Worker extension. To change the execution context for the AOE runbooks, you must use the Reset-AutomationSchedules.ps1 script. See how to use the script in the previous subsection. After setting the runbooks execution base time, enter the Hybrid Worker Group name you want the runbooks to run in.

Important

  • The Hybrid Worker machine must have the required PowerShell modules installed. The upgrade-manifest.json file contains the list of required modules.
  • Once you change the runbook execution context to Hybrid Worker, you must always use the DoPartialUpgrade flag whenever you upgrade AOE, or else you lose the runbook schedule settings and revert to the default sandbox configuration.
  • The Managed Identity used to authenticate against Azure, Microsoft Entra ID, and Billing Account scopes is still the one Azure Automation uses. It gets used even if the Hybrid Worker machine has a Managed Identity assigned (see details). User-assigned Managed Identities are supported in the context of Hybrid Workers only if:
    • The Automation Account doesn't have any associated Managed Identity, that is, only the Hybrid Worker machine can have a User-Assigned Managed Identity.
    • All runbooks run in the context of the Hybrid Worker. In this case, you must create an AzureOptimization_UAMIClientID Automation Variable with the User-Assigned Managed Identity Client ID as value.
    • The AzureOptimization_AuthenticationOption Automation variable value is updated to UserAssignedManagedIdentity.

Adjust thresholds

For Advisor cost recommendations, the AOE's default configuration produces percentile 99th VM metrics aggregations, but you can adjust them to be less conservative. There are also adjustable metrics thresholds that are used to compute the fit score. The default thresholds values are 30% for CPU (5% for shutdown recommendations), 50% for memory (100% for shutdown) and 750 Mbps for network bandwidth (10 Mbps for shutdown). All the adjustable configurations are available as Azure Automation variables. The information in the next table highlights the most relevant configuration variables. To access them, go to the Automation Account Shared Resources - Variables menu option.

Variable Description
AzureOptimization_AdvisorFilter If you aren't interested in getting recommendations for all the non-Cost Advisor pillars, you can specify a pillar-level filter (comma-separated list with at least one of the following values: HighAvailability,Security,Performance,OperationalExcellence). Defaults to all pillars.
AzureOptimization_AuthenticationOption The default authentication method for Automation Runbooks is RunAsAccount. But you can change to ManagedIdentity if you're using a Hybrid Worker in an Azure VM.
AzureOptimization_ConsumptionOffsetDays The Azure Consumption data collection runbook queries each day for billing events that occurred seven days ago (default). You can change to a closer offset, but bear in mind that some subscription types (for example, MSDN) to not support a lower value.
AzureOptimization_PerfPercentileCpu The default percentile for CPU metrics aggregations is 99. As the percentile lowers, the VM right-size fit score algorithm adjusts less conservatively.
AzureOptimization_PerfPercentileDisk The default percentile for disk IO/throughput metrics aggregations is 99. As the percentile lowers, the VM right-size fit score algorithm adjusts less conservatively.
AzureOptimization_PerfPercentileMemory The default percentile for memory metrics aggregations is 99. As the percentile lowers, the VM right-size fit score algorithm adjusts less conservatively.
AzureOptimization_PerfPercentileNetwork The default percentile for network metrics aggregations is 99. As the percentile lowers, the VM right-size fit score algorithm adjusts less conservatively.
AzureOptimization_PerfPercentileSqlDtu The default percentile to be used for SQL DB DTU metrics. As the percentile lowers, the SQL Database right-size algorithm adjusts less conservatively.
AzureOptimization_PerfThresholdCpuPercentage The CPU threshold (in % Processor Time). Above it, the VM right-size fit score decreases. Below it, the Azure Virtual Machine Scale Set (scale set) right-size Cost recommendation triggers.
AzureOptimization_PerfThresholdCpuShutdownPercentage The CPU threshold (in % Processor Time). Above it, the VM right-size fit score decreases (shutdown recommendations only).
AzureOptimization_PerfThresholdCpuDegradedMaxPercentage The CPU threshold (Maximum observed in % Processor Time). Above it, the scale set right-size Performance recommendation triggers.
AzureOptimization_PerfThresholdCpuDegradedAvgPercentage The CPU threshold (Average observed in % Processor Time). Above it, the scale set right-size Performance recommendation triggers.
AzureOptimization_PerfThresholdMemoryPercentage The memory threshold (in % Used Memory). Above it, the VM right-size fit score decreases. Below it, the scale set right-size Cost recommendation triggers.
AzureOptimization_PerfThresholdMemoryShutdownPercentage The memory threshold (in % Used Memory). Above it, the VM right-size fit score decreases (shutdown recommendations only).
AzureOptimization_PerfThresholdMemoryDegradedPercentage The memory threshold (in % Used Memory). Above it, the scale set right-size Performance recommendation triggers.
AzureOptimization_PerfThresholdNetworkMbps The network threshold (in Total Mbps). Above it, the VM right-size fit score decreases.
AzureOptimization_PerfThresholdNetworkShutdownMbps The network threshold (in Total Mbps). Above it, the VM right-size fit score decreases (shutdown recommendations only).
AzureOptimization_PerfThresholdDtuPercentage The DTU usage percentage threshold. Below it, a SQL Database instance is considered underutilized.
AzureOptimization_RecommendAdvisorPeriodInDays The interval in days to look for Advisor recommendations in the Log Analytics repository - the default is 7, as Advisor recommendations are collected once a week.
AzureOptimization_RecommendationAADMaxCredValidityYears The maximum number of years for a Service Principal credential/certificate validity - any validity above this interval generates a Security recommendation. Defaults to 2.
AzureOptimization_RecommendationAADMinCredValidityDays The minimum number of days for a Service Principal credential/certificate before it expires - any validity below this interval generates an Operational Excellence recommendation. Defaults to 30.
AzureOptimization_RecommendationLongDeallocatedVmsIntervalDays The number of consecutive days a VM was deallocated before being recommended for deletion (Virtual Machine has been deallocated for long with disks still incurring costs). Defaults to 30.
AzureOptimization_RecommendationVNetSubnetMaxUsedPercentageThreshold The maximum percentage tolerated for subnet IP space usage. Defaults to 80.
AzureOptimization_RecommendationVNetSubnetMinUsedPercentageThreshold The minimum percentage for subnet IP space usage - any usage below this value flags the respective subnet as using low IP space. Defaults to 5.
AzureOptimization_RecommendationVNetSubnetEmptyMinAgeInDays The minimum age in days for an empty subnet to be flagged, thus avoiding flagging newly created subnets. Defaults to 30.
AzureOptimization_RecommendationVNetSubnetUsedPercentageExclusions Comma-separated, single-quote enclosed list of subnet names that must be excluded from subnet usage percentage recommendations, for example, 'gatewaysubnet','azurebastionsubnet'. Defaults to 'gatewaysubnet'.
AzureOptimization_RecommendationRBACAssignmentsPercentageThreshold The maximum percentage of RBAC assignments limits usage. Defaults to 80.
AzureOptimization_RecommendationResourceGroupsPerSubPercentageThreshold The maximum percentage of Resource Groups count per subscription limits usage. Defaults to 80.
AzureOptimization_RecommendationRBACSubscriptionsAssignmentsLimit The maximum limit for RBAC assignments per subscription. Currently set to 2000 (as documented).
AzureOptimization_RecommendationRBACMgmtGroupsAssignmentsLimit The maximum limit for RBAC assignments per management group. Currently set to 500 (as documented).
AzureOptimization_RecommendationResourceGroupsPerSubLimit The maximum limit for Resource Group count per subscription. Currently set to 980 (as documented).
AzureOptimization_RecommendationStorageAcountGrowthThresholdPercentage The minimum Storage Account growth percentage required to flag Storage as not having a retention policy in place.
AzureOptimization_RecommendationStorageAcountGrowthMonthlyCostThreshold The minimum monthly cost (in your EA/MCA currency) required to flag Storage as not having a retention policy in place.
AzureOptimization_RecommendationStorageAcountGrowthLookbackDays The lookback period (in days) for analyzing Storage Account growth.
AzureOptimization_ReferenceRegion The Azure region used as a reference for getting the list of available SKUs (defaults to westeurope).
AzureOptimization_RemediateRightSizeMinFitScore The minimum fit score a VM right-size recommendation must have for the remediation to occur.
AzureOptimization_RemediateRightSizeMinWeeksInARow The minimum number of weeks in a row a VM right-size recommendation must be complete for the remediation to occur.
AzureOptimization_RemediateRightSizeTagsFilter The tag name/value pairs a VM right-size recommendation must have for the remediation to occur. Example: [ { "tagName": "a", "tagValue": "b" }, { "tagName": "c", "tagValue": "d" } ]
AzureOptimization_RemediateLongDeallocatedVMsMinFitScore The minimum fit score a long deallocated VM recommendation must have for the remediation to occur.
AzureOptimization_RemediateLongDeallocatedVMsMinWeeksInARow The minimum number of weeks in a row a long deallocated VM recommendation must be complete for the remediation to occur.
AzureOptimization_RemediateLongDeallocatedVMsTagsFilter The tag name/value pairs a long deallocated VM recommendation must have for the remediation to occur. Example: [ { "tagName": "a", "tagValue": "b" }, { "tagName": "c", "tagValue": "d" } ]
AzureOptimization_RemediateUnattachedDisksMinFitScore The minimum fit score an unattached disk recommendation must have for the remediation to occur.
AzureOptimization_RemediateUnattachedDisksMinWeeksInARow The minimum number of weeks in a row an unattached disk recommendation must be complete for the remediation to occur.
AzureOptimization_RemediateUnattachedDisksAction The action to apply for an unattached disk recommendation remediation (Delete or Downsize).
AzureOptimization_RemediateUnattachedDisksTagsFilter The tag name/value pairs an unattached disk recommendation must have for the remediation to occur. Example: [ { "tagName": "a", "tagValue": "b" }, { "tagName": "c", "tagValue": "d" } ]
AzureOptimization_RightSizeAdditionalPerfWorkspaces A comma-separated list of other Log Analytics workspace IDs where to look for VM metrics (see Configuring workspaces).
AzureOptimization_PerfThresholdDiskIOPSPercentage The disk IOPS usage percentage threshold. Below it, the underutilized Premium SSD disks recommendation triggers.
AzureOptimization_PerfThresholdDiskMBsPercentage The disk throughput usage percentage threshold. Below it, the underutilized Premium SSD disks recommendation triggers.
AzureOptimization_RecommendationsMaxAgeInDays The maximum age (in days) for a recommendation to be kept in the SQL database. Default: 365.
AzureOptimization_RetailPricesCurrencyCode The currency code (for example, EUR, USD, and so on) used to collect the Reservations retail prices.
AzureOptimization_PriceSheetMeterCategories The comma-separated meter categories used for Price sheet filtering, in order to avoid ingesting unnecessary data. Defaults to "Virtual Machines,Storage".
AzureOptimization_ConsumptionScope The scope of the consumption exports: Subscription (default), BillingProfile (MCA only), or BillingAccount (for MCA, requires adding the Billing Account Reader role to the AOE managed identity). See more details.

Related products:

Related solutions: