What factors affect the CPU utilization of Stream Analytics jobs?

kunming jian 80 Reputation points
2024-11-26T08:03:03.57+00:00

I have two Stream Analytics jobs with the same configuration.

One processes approximately 500k input events daily, while the other processes only 10k.

However, the job that processes 10k input events daily consistently has a CPU utilization of 80%, an SU (Stream Unit) utilization of 15%, and frequently experiences input event backlogs. On the other hand, the job that processes 500k input events daily maintains a CPU and SU utilization of 50% and runs very stably.

Their data processing logic is similar, with the only difference being that the problematic job has a more complex topology, with more temporary tables and outputs.

Therefore, I want to confirm what exactly is causing this issue?

Azure Stream Analytics
Azure Stream Analytics
An Azure real-time analytics service designed for mission-critical workloads.
369 questions
0 comments No comments
{count} votes

Accepted answer
  1. phemanth 12,900 Reputation points Microsoft Vendor
    2024-11-26T11:35:54.0233333+00:00

    @kunming jian

    Thanks for using Microsoft Q&A forum and posting your query.

    The differences in CPU utilization between your two Stream Analytics jobs can be attributed to several factors, particularly the complexity of the job processing logic and how it handles data. Here are some key points to consider:

    1. Complexity of Query Logic: The job processing 10k events has a more complex topology, which includes more temporary tables and outputs. This complexity can lead to higher CPU utilization because the system has to manage more state and perform additional computations, even if the input event rate is lower.
    2. Stateful Processing: Stream Analytics jobs often use stateful operations (like windowed aggregates and temporal joins) that require maintaining state in memory. If your complex job has many such operations, it can consume more CPU resources, leading to higher utilization.
    3. Data Skew: If the input data is unevenly distributed across partitions, some partitions may experience a higher load than others. This can cause certain streaming nodes to become overloaded, resulting in increased CPU utilization and potential backlogs.
    4. Streaming Units (SUs): While the SU utilization is low (15%), it’s important to note that CPU utilization can still be high if the job is not efficiently utilizing the available resources. If the job is not fully parallelized, increasing the number of SUs might help distribute the workload more evenly and reduce CPU load.
    5. Watermark Delay and Backlogs: High CPU utilization often correlates with increased watermark delays and backlogged events. If the job cannot keep up with the input rate, it may struggle to process events in a timely manner, leading to backlogs.

    Recommendations:

    • Optimize Query Logic: Simplify the query where possible, reducing the number of temporary tables or complex joins.
    • Repartition Input Data: Ensure that input data is evenly distributed across partitions to avoid data skew.
    • Monitor Metrics: Keep an eye on CPU and SU utilization metrics to identify patterns and adjust resources accordingly.
    • Scale Up SUs: Consider increasing the number of streaming units to provide more resources for processing.

    Reference: https://zcusa.951200.xyz/en-us/azure/stream-analytics/stream-analytics-streaming-unit-consumption

    https://zcusa.951200.xyz/en-us/azure/stream-analytics/stream-analytics-job-analysis-with-metric-dimensions

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Sander van de Velde | MVP 34,201 Reputation points MVP
    2024-11-26T15:10:35.0066667+00:00

    Hello @kunming jian ,

    welcome to this moderated Azure community forum.

    I expect you basically answer your question yourself.

    CPU and utilization is highly depending on the size and quality of the queries being executed.

    The guidelines listed by @phemanth are applicable too.

    Have you tested your queries using the VS Code ASA extension?

    If you have recordings of the ingested data, you can test the queries yourself.


    If the response helped, do "Accept Answer". If it doesn't work, please let us know the progress. All community members with similar issues will benefit by doing so. Your contribution is highly appreciated.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.