Process streaming data with Synapse Pipeline or Data Flow

Jia Zhang 60 Reputation points
2024-10-18T13:18:52.67+00:00

Hi Team,

May I know if I can use Synapse pipeline or Data Flow to connect Eventhub and process Streaming data?

Pipeline:

User's image

Data Flow:

User's image

I try to use this visual programming to create a process to read streaming data from eventhub and join cosmos db etc. But I couldn't find 'Event Hub' source in either Pipeline or Data Flow. May I know if this is achiveable? Or both way can only process batching data from SQL pool, Storage, ADX, etc?

Thanks!

Azure Event Hubs
Azure Event Hubs
An Azure real-time data ingestion service.
644 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,945 questions
{count} votes

Accepted answer
  1. Amira Bedhiafi 25,566 Reputation points
    2024-10-18T21:31:41.99+00:00

    Yes, you can process streaming data in Azure Synapse Analytics, but it is not directly achieved through Synapse pipelines or Synapse data flows for event-driven sources like Azure Event Hub. Synapse Pipelines and Data Flows are primarily built for batch processing rather than real-time streaming.

    However, Azure Synapse Analytics supports real-time processing of streaming data through Azure Stream Analytics (ASA) jobs or Apache Spark pools within Synapse. Here's how you can set it up:

    1. Azure Stream Analytics Integration

    You can use Azure Stream Analytics (ASA) within Synapse to process streaming data from Event Hub. This allows you to apply real-time queries to transform and analyze the data in motion, including joining with Cosmos DB or other data stores.

    • Steps to set up:
      1. Configure an Event Hub as an input to Stream Analytics.
      2. Set up the necessary transformations and windowing functions (if required) within the Stream Analytics job.
      3. Send output data to Cosmos DB, SQL Pool, ADX, or even to storage for further analysis.
    1. Apache Spark Streaming in Synapse

    Alternatively, you can use Apache Spark in Synapse for real-time streaming data processing from Event Hub. Spark Streaming allows you to continuously receive data from Event Hubs and join it with other sources, such as Cosmos DB.

    • Steps to set up:
      1. Use a Spark pool in Synapse to connect to Event Hub.
      2. Write a Spark Streaming job using Spark APIs to consume data from Event Hub and perform transformations or joins with other data sources like Cosmos DB.
      3. The output can be written back to Cosmos DB, ADX, or other supported Synapse destinations.

    Why can't you find Event Hub in Synapse Pipelines/Data Flows?

    Currently, Synapse Pipelines and Data Flows do not natively support Event Hub as a source because they are designed primarily for batch data processing. While Synapse supports orchestration of streaming processes (for example, triggering Stream Analytics jobs), pipelines and data flows are not real-time data processing engines themselves.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.