Read data from Databricks Delta Sharing to Azure Synapse

Cheng, Jack 0 Reputation points
2024-12-18T15:05:16.13+00:00

Hi,

I am trying to read the data from Databricks Delta Sharing and write the data to Azure Synapse by referring this documentation https://docs.databricks.com/en/connect/external-systems/synapse-analytics.html#language-python

I have constructed below Python scripts:

        data_frame.write \
            .format("com.databricks.spark.sqldw") \
            .option("url", config['synapse']['connection_string']) \
            .option("forwardSparkAzureStorageCredentials", "true") \
            .option("dbTable", f"{schema_name}.{table.name}") \
            .option("tempDir", temp_dir) \
            .save()

But it shows below error:

"

org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: com.databricks.spark.sqldw. Please find packages at https://spark.apache.org/third-party-projects.html.

"

How can I resolve this error?

Thanks.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,093 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,284 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Ganesh Gurram 2,305 Reputation points Microsoft Vendor
    2024-12-18T17:10:25.6633333+00:00

    Hello @Cheng, Jack

    Thanks for the question and using MS Q&A platform.

    The error you are encountering, org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: com.databricks.spark.sqldw, indicates that the Spark SQL Data Warehouse (SQL DW) connector is not available in your current Spark environment. This can happen if the required library is not included in your Databricks cluster.

    To resolve this error, you should ensure that the appropriate library for the SQL DW connector is installed on your Databricks cluster. You can do this by following these steps:

    1. Go to your Databricks workspace.
    2. Navigate to the "Clusters" section.
    3. Select your cluster and click on the "Libraries" tab.
    4. Install the com.databricks:spark-sqldw_2.11:<version> library, replacing <version> with the specific version compatible with your Spark version.

    Once the library is installed, try running your script again.

    Similar thread for reference: Error Connecting to Delta Share

    Hope this helps. Do let us know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.