Connect to data sources from Azure Databricks
This article provides links to all the different data sources in Azure that can be connected to Azure Databricks. Follow the examples in these links to extract data from the Azure data sources (for example, Azure Blob Storage, Azure Event Hubs, etc.) into an Azure Databricks cluster, and run analytical jobs on them.
Prerequisites
- You must have an Azure Databricks workspace and a Spark cluster. Follow the instructions at Get started.
Data sources for Azure Databricks
The following list provides the data sources in Azure that you can use with Azure Databricks. For a complete list of data sources that can be used with Azure Databricks, see Data sources for Azure Databricks.
-
This link provides the DataFrame API for connecting to SQL databases using JDBC and how to control the parallelism of reads through the JDBC interface. This topic provides detailed examples using the Scala API, with abbreviated Python and Spark SQL examples at the end.
-
This link provides examples on how to use the Microsoft Entra ID (formerly Azure Active Directory) service principal to authenticate with Azure Data Lake Storage. It also provides instructions on how to access the data in Azure Data Lake Storage from Azure Databricks.
-
This link provides examples on how to directly access Azure Blob Storage from Azure Databricks using access key or the SAS for a given container. The link also provides info on how to access the Azure Blob Storage from Azure Databricks using the RDD API.
-
This link provides instructions on how to use the Kafka connector from Azure Databricks to access data in Azure Event Hubs.
-
This link provides instructions on how to query data in Azure Synapse.
Next steps
To learn about sources from where you can import data into Azure Databricks, see Data sources for Azure Databricks.