Copy from Azure Blob Storage to Lakehouse

In this tutorial, you build a data pipeline to move a CSV file from an input folder of an Azure Blob Storage source to a Lakehouse destination.

Prerequisites

To get started, you must complete the following prerequisites:

  • Make sure you have a Project Microsoft Fabric enabled Workspace: Create a workspace.

  • Select the Try it now! button to prepare the Azure Blob Storage data source of the Copy. Create a new resource group for this Azure Blob Storage and select Review + Create > Create.

    Try your first data factory demo

    Screenshot of Project details screen.

    Then an Azure Blob Storage is created and moviesDB2.csv uploaded to the input folder of the created Azure Blob Storage.

    Screenshot showing where new storage appears in folder.

Create a data pipeline

  1. Switch to Data factory on the app.powerbi.com page.

  2. Create a new workspace for this demo.

    Screenshot of Workspace screen.

  3. Select New, and then select Data Pipeline.

    Screenshot of the New menu.

Copy data using the Copy Assistant

In this session, you start to build a data pipeline by using the following steps. These steps copy a CSV file from an input folder of an Azure Blob Storage to a Lakehouse destination using the copy assistant.

Step 1: Start with copy assistant

  1. Select Copy data assistant on the canvas to open the copy assistant tool to get started. Or Select Use copy assistant from the Copy data drop down list under the Activities tab on the ribbon.

    Screenshot of two options to select copy assistant.

Step 2: Configure your source

  1. Type blob in the selection filter, then select Azure Blobs, and select Next.

    Screenshot showing where to choose Azure Blob Storage as data source.

  2. Provide your account name or URL and create a connection to your data source by selecting Create new connection under the Connection drop down.

    Screenshot showing where to select New connection.

    1. After selecting Create new connection with your storage account specified, you only need to fill in Authentication kind. In this demo, we choose Account key but you can choose other Authentication kind depending on your preference.

      Screenshot showing the Connect to data source screen of the copy data assistant.

    2. Once your connection is created successfully, you only need to select Next to Connect to data source.

  3. Choose the file moviesDB2.csv in the source configuration to preview, and then select Next.

    Screenshot showing how to choose data source.

Step 3: Configure your destination

  1. Select Lakehouse.

    Screenshot showing the Choose data destination dialog with Lakehouse selected.

  2. Provide a name for the new Lakehouse. Then select Create and connect.

    Screenshot showing the Choose data destination dialog with the new lakehouse option selected.

  3. Configure and map your source data to your destination; then select Next to finish your destination configurations.

    Screenshot showing the Connect to data destination dialog in the copy data assistant with the table name MoviesDB filled in.

Step 4: Review and create your copy activity

  1. Review your copy activity settings in the previous steps and select Save + run to finish. Or you can go back to the previous steps to edit your settings if needed in the tool.

    Screenshot showing the Review + create screen in the Copy data assistant dialog.

  2. Once finished, the copy activity is added to your data pipeline canvas and run directly if you left the Start data transfer immediately checkbox selected.

    Screenshot showing the finished Copy activity.

Run and schedule your data pipeline

  1. If you didn't leave the Start data transfer immediately checkbox on the Review + create page, switch to the Home tab and select Run. Then select Save and Run.

    Screenshot showing the Copy activity's Run button on the Home tab.

    Screenshot showing the Save and run dialog for the Copy activity.

  2. On the Output tab, select the link with the name of your Copy activity to monitor progress and check the results of the run.

    Screenshot showing the run Details button.

  3. The Copy data details dialog displays results of the run including status, volume of data read and written, start and stop times, and duration.

    Screenshot showing the Copy data details dialog.

  4. You can also schedule the pipeline to run with a specific frequency as required. The following example shows how to schedule the pipeline to run every 15 minutes.

    Screenshot showing the schedule configuration dialog.

    Screenshot showing a pipeline with a configured schedule to run every 15 minutes.

The pipeline in this sample shows you how to copy data from Azure Blob Storage to Lakehouse. You learned how to:

  • Create a data pipeline.
  • Copy data with the Copy Assistant.
  • Run and schedule your data pipeline.

Next, advance to learn more about monitoring your pipeline runs.