Connecting Azure SQL Database through Spark

Sachin Nandanwar 45 Reputation points
2024-12-26T05:09:09.8+00:00

I am trying to connect to Azure SQL through a Spark session using spark-mssql-connector but the code keeps erroring out with the message: Could not find driver

This is the code :

var jdbc_df = spark.Read().Format("com.microsoft.sqlserver.jdbc.spark")

          .Option("url", servername)

          .Option("dbtable", "dbo.Customers")

         .Option("user", "username") 

          .Option("password", "password")

          .Option("driver","com.microsoft.sqlserver.jdbc.spark").Load();  

Spark Version : version 3.0.1
Scala Version : version 2.12.10
Java Version : version "1.8.0_431"

I am using "spark-mssql-connector_2.12-1.1.0.jar" version as stated here

https://zcusa.951200.xyz/en-us/sql/connect/spark/connector?view=sql-server-ver16

Azure SQL Database
{count} votes

Accepted answer
  1. Vijayalaxmi Kattimani 905 Reputation points Microsoft Vendor
    2024-12-27T02:34:06.14+00:00

    Hi @Sachin Nandanwar,

    We would like to inform you that, the error Could not find driver typically indicates that the Spark session is unable to find the JDBC driver for Azure SQL. In order to connect to Azure SQL using Spark, you need to make sure that the JDBC driver is installed and configured correctly.

    Please try to troubleshoot the issue using below mentioned steps:

    Make sure that your Spark session includes the SQL server JARs and the sqljdbc42.jar is available as it’s required for the spark-mssql-connector

    SparkSession spark = SparkSession.Builder()
        .AppName("Spark SQL Example")
        .Config("spark.jars", "/path/to/spark-mssql-connector_2.12-1.1.0.jar")
        .GetOrCreate();
    

    Check that the JAR file is placed in a directory accessible to your Spark runtime, and it's added to the classpath during Spark session initialization. If not already done, you can pass the JAR using the --jars option when running the Spark job and programmatically set the JARs in your C# application.

    spark-submit --jars /path/to/spark-mssql-connector_2.12-1.1.0.jar your_application.py
    

    The driver property is set to com.microsoft.sqlserver.jdbc.SQLServerDriver instead of com.microsoft.sqlserver.jdbc.spark.

    var jdbc_df = spark.Read().Format("com.microsoft.sqlserver.jdbc.spark")
         .Option("url", servername)
         .Option("dbtable", "dbo.Customers")
         .Option("user", "username")
         .Option("password", "password")
         .Option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
         .Load();
    

    Double-check servername and the url should follow the below mentioned format

    jdbc:sqlserver://<your-server-name>.database.windows.net:1433;database=<your-database-name>
    

    Here is an example of how to specify the driver location and configuration in C#

    var jdbc_df = spark.Read().Format("com.microsoft.sqlserver.jdbc.spark")
         .Option("url", "jdbc:sqlserver://myserver.database.windows.net:1433;databaseName=mydatabase")
         .Option("dbtable", "dbo.Customers")
         .Option("user", "myusername")
         .Option("password", "mypassword")
         .Option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
         .Load();
    

    Please refer to the below mentioned links for more information.

    https://github.com/microsoft/sql-spark-connector/issues/26#issuecomment-672006339

    https://central.sonatype.com/search?q=spark-mssql-connector&smo=true

    https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

    https://zcusa.951200.xyz/en-us/sql/connect/jdbc/download-microsoft-jdbc-driver-for-sql-server?view=sql-server-ver16

    I hope, This response will address your query and helped you to overcome on your challenges.

    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.