Databricks JDBC Driver (OSS)

Important

This driver is in Public Preview and is not yet available as open source.

Databricks provides an open source software (OSS) JDBC driver that enables you to connect tools such as DataGrip, DBeaver, and SQL Workbench/J to Azure Databricks through Java Database Connectivity (JDBC), an industry-standard specification for accessing database management systems.

This driver has implemented the JDBC APIs and provides core functionality including OAuth, Cloud Fetch, and features such as Unity Catalog volume ingestion. It runs native query mode and supports native parameterized query, and can run using Statement Execution APIs, which provides the beneficial query result retention feature, or Thrift.

This article provides information on installing and using the Databricks JDBC Driver (OSS). For information about the non-OSS Databricks JDBC Driver see Databricks JDBC Driver.

Requirements

To use the Databricks JDBC Driver (OSS), the following requirements must be met:

  • Java Runtime Environment (JRE) 11.0 or above. CI testing is supported on JRE 11, 17, and 21.

Note

As a result of a change in JDK 16 that caused a compatibility issue with the Apache Arrow library used by the JDBC driver, runtime errors may occur when using the JDBC driver with JDK 16 or or above. To prevent these errors, restart your application or driver using the following JVM command option:

--add-opens=java.base/java.nio=org.apache.arrow.memory.core ALL-UNNAMED

Install the driver

The Databricks JDBC Driver (OSS) is published in the Maven Repository. The latest version is 0.9.6-oss.

To install the driver, you can do any of the following:

  • For Maven projects, add the following dependency to the project’s pom.xml file to instruct Maven to automatically download the JDBC driver with the specified version:

    <dependency>
      <groupId>com.databricks</groupId>
      <artifactId>databricks-jdbc</artifactId>
      <version>0.9.6-oss</version>
      <scope>runtime</scope>
    </dependency>
    
  • For Gradle projects, add the following dependency to the project’s build file to instruct Gradle to automatically download the JDBC driver with the specified version:

    implementation 'com.databricks:databricks-jdbc:0.9.6-oss'
    

To view the dependency syntax for other project types, and to get the latest version number of the Databricks JDBC Driver (OSS), see the Maven Repository.

Configure the connection URL

To connect to your Azure Databricks workspace using the JDBC driver, you need to specify a JDBC connection URL that includes various connection settings such as your Azure Databricks workspace’s server hostname, the compute resource settings, and authentication credentials to connect to the workspace.

You can set the value of these properties on the JDBC connection URL, set and pass them to the DriverManager.getConnection method, or a combination of both. See the provider’s documentation for how best to connect using your specific app, client, SDK, API, or SQL tool.

The JDBC connection URL must be in the following format. Properties are case insensitive.

jdbc:databricks://<server-hostname>:<port>/<schema>;[property1]=[value];[property2]=[value];...

Alternatively, specify the settings using the java.util.Properties class or a combination:

String url = "jdbc:databricks://<server-hostname>:<port>/<schema>";
Properties properties = new java.util.Properties();
properties.put("<property1>", "<value1");
properties.put("<property2>", "<value2");
// ...
Connection conn = DriverManager.getConnection(url, properties);
String url = "jdbc:databricks://<server-hostname>:<port>/<schema>;[property1]=[value];[property2]=[value];";
Connection conn = DriverManager.getConnection(url, "token", "12345678901234667890abcdabcd");

Connection URL elements are described in the following table. For information about additional properties, including authentication properties, see the sections below. URL elements and properties are case insensitive.

URL element or property Description
<server-hostname> The Azure Databricks compute resource’s server hostname value.
<port> The Azure Databricks compute resource’s port value. The default value is 443.
<schema> The name of the schema. Alternatively you can set the ConnSchema property. See Connection properties.
httpPath The Azure Databricks compute resource’s HTTP path value. The connector forms the HTTP address to connect to by appending the httpPath value to the host and port specified in the connection URL. For example, to connect to the HTTP address http://localhost:10002/cliservice, you would use the following connection URL: jdbc:databricks://localhost:10002;httpPath=cliservice

To get the JDBC connection URL for an Azure Databricks cluster:

  1. Log in to your Azure Databricks workspace.
  2. In the sidebar, click Compute, then click the target cluster’s name.
  3. On the Configuration tab, expand Advanced options.
  4. Click the JDBC/ODBC tab.
  5. Copy the JDBC URL to use as the JDBC connection URL, or construct the URL from values in the Server hostname, Port, and HTTP Path fields.

To get the JDBC connection URL for a Databricks SQL warehouse:

  1. Log in to your Azure Databricks workspace.
  2. In the sidebar, click SQL Warehouses, then click the target warehouse’s name.
  3. Click the Connection details tab.
  4. Copy the JDBC URL to use as the JDBC connection URL, or construct the URL from values in the Server hostname, Port, and HTTP Path fields.

Authenticate the driver

You can authenticate the JDBC driver connection using one of the following authentication mechanisms:

OAuth user-to-machine (U2M) authentication

The JDBC driver supports OAuth user-to-machine (U2M) authentication for real-time human sign in and consent to authenticate the target Databricks user account. This is also known as browser-based OAuth authentication.

Azure Databricks has created the OAuth client ID databricks-sql-jdbc for customers. This is also the default OAuth client ID used in the JDBC driver. To configure OAuth U2M authentication, just add the following properties to your existing JDBC connection URL or java.util.Properties object:

Property Value
AuthMech 11
Auth_Flow 2

OAuth machine-to-machine (M2M) authentication

The JDBC driver supports OAuth machine-to-machine (M2M) authentication using an Azure Databricks service principal. This is also known as OAuth 2.0 client credentials authentication. See Authenticate access to Azure Databricks with a service principal using OAuth (OAuth M2M).

To configure OAuth M2M or OAuth 2.0 client credentials authentication:

  1. Create a Microsoft Entra ID managed service principal and then assign it to Azure Databricks accounts and workspaces. For details, see Manage service principals.

    Important

    The Databricks JDBC Driver (OSS) supports Azure Databricks OAuth secrets for OAuth M2M or OAuth 2.0 client credentials authentication. Microsoft Entra ID secrets are not supported.

  2. Create an Azure Databricks OAuth secret for the service principal. To do this, see Manually generate and use access tokens for OAuth M2M authentication.

  3. Give the service principal access to your cluster or warehouse. See Compute permissions or Manage a SQL warehouse.

Add the following properties to your existing JDBC connection URL or java.util.Properties object:

Property Value
AuthMech 11
Auth_Flow 1
OAuth2ClientID The service principal’s Application (client) ID value.
OAuth2Secret The service principal’s Azure Databricks OAuth secret. (Microsoft Entra ID secrets are not supported for OAuth M2M or OAuth 2.0 client credentials authentication.)

Azure Databricks personal access token

To authenticate your JDBC driver connection using a Azure Databricks personal access token, add the following properties to your JDBC connection URL or java.util.Properties object:

Property Value
AuthMech 3
user The value token, as a string.
PWD or password Your Azure Databricks personal access token value, as a string.

Connection properties

The following additional connection properties are supported by the JDBC driver. Properties are case insensitive.

Property Default value Description
AuthMech Required The authentication mechanism, where 3 specifies the mechanism is a Azure Databricks personal access token, and 11 specifies the mechanism is OAuth 2.0 tokens. Additional properties are required for each mechanism. See Authenticate the driver.
Auth_Flow 0 The OAuth2 authentication flow for the driver connection. This property is required if AuthMech is 11.
SSL 1 Whether the connector communicates with the Spark server through an SSL-enabled socket.
ConnCatalog or catalog SPARK The name of the default catalog to use.
ConnSchema or schema default The name of the default schema to use. This can be specified either by replacing <schema> in the URL with the name of the schema to use or by setting the ConnSchema property to the name of the schema to use.
ProxyAuth 0 If set to 1, the driver uses the proxy authentication user and password, represented by ProxyUID and ProxyPwd.
ProxyHost null A string that represents the name of the proxy host to use when UseProxy is also set to 1.
ProxyPort null An integer that represents the number of the proxy port to use when UseProxy is also set to 1.
ProxyUID null A string that represents the username to use for proxy authentication when ProxyAuth and UseProxy are also set to 1.
ProxyPwd null A string that represents the password to use for proxy authentication when ProxyAuth and UseProxy are also set to 1.
UseProxy 0 If set to 1, the driver uses the provided proxy settings (for example: ProxyAuth, ProxyHost, ProxyPort, ProxyPwd, and ProxyUID).
UseSystemProxy 0 If set to 1, the driver uses the proxy settings that have been set at the system level. If any additional proxy properties are set in the connection URL, these additional proxy properties override those that have been set at the system level.
UseCFProxy 0 If set to 1, the driver uses the cloud fetch proxy settings if they are provided, otherwise use the regular proxy.
CFProxyAuth 0 If set to 1, the driver uses the proxy authentication user and password, represented by CFProxyUID and CFProxyPwd.
CFProxyHost null A string that represents the name of the proxy host to use when UseCFProxy is also set to 1.
CFProxyPort null An integer that represents the number of the proxy port to use when UseCFProxy is also set to 1.
CFProxyUID null A string that represents the username to use for proxy authentication when CFProxyAuth and UseCFProxy are also set to 1.
CFProxyPwd null A string that represents the password to use for proxy authentication when CFProxyAuth and UseCFProxy are also set to 1.
AsyncExecPollInterval 200 The time in milliseconds between each poll for the asynchronous query execution status. Asynchronous refers to the fact that the RPC call used to execute a query against Spark is asynchronous. It does not mean that JDBC asynchronous operations are supported.
UserAgentEntry browser The User-Agent entry to be included in the HTTP request. This value is in the following format: [ProductName]/[ProductVersion] [Comment]
UseThriftClient 0 Whether the JDBC driver should use the Thrift client to connect to an all-purpose cluster. The default value works for SQL warehouses.

SQL configuration properties

The following SQL configuration properties are supported by the JDBC driver. These are also described in Configuration parameters. Properties are case insensitive.

Property Default value Description
ansi_mode TRUE Whether to enable strict ANSI SQL behavior for certain functions and casting rules.
enable_photo TRUE Whether to enable the Photon vectorized query engine.
legacy_time_parser_policy EXCEPTION The methods used to parse and format dates and timestamps. Valid values are EXCEPTION, LEGACY, and CORRECTED.
max_file_partition_bytes 128m The maximum number of bytes to pack into a single partition when reading from file based sources. The setting can be any positive integer and optionally include a measure such as b (bytes), k or kb (1024 bytes).
read_only_external_metastore false Controls whether an external metastore is treated as read-only.
statement_timeout 172800 Sets a SQL statement timeout between 0 and 172800 seconds.
timezone UTC Set the local timezone. Region IDs in the form area/city, such as America/Los_Angeles or zone offsets in the format (+|-)HH, (+|-)HH:mm or (+|-)HH:mm:ss, e.g -08, +01:00 or -13:33:33. Also, UTC is supported as an alias for +00:00
use_cached_result true Whether Databricks SQL caches and reuses results whenever possible.

Logging properties

The following logging properties are supported by the JDBC driver. Properties are case insensitive.

Property Default value Description
LogLevel OFF The logging level, which is a value 0 through 6:

- 0: Disable all logging.
- 1: Enable logging on the FATAL level, which logs very severe error events that will lead the connector to abort.
- 2: Enable logging on the ERROR level, which logs error events that might still allow the connector to continue running.
- 3: Enable logging on the WARNING level, which logs events that might result in an error if action is not taken.
- 4: Enable logging on the INFO level, which logs general information that describes the progress of the connector.
- 5: Enable logging on the DEBUG level, which logs detailed information that is useful for debugging the connector.
- 6: Enable logging on the TRACE level, which logs all connector activity.

Use this property to enable or disable logging in the connector and to specify the amount of detail included in log files.
LogPath To determine the default path for logs, the driver uses the value set for these system properties, in this priority order:

1. user.dir
2. java.io.tmpdir
3. the current directory, in other words .
The full path to the folder where the connector saves log files when logging is enabled, as a string. To ensure that the connection URL is compatible with all JDBC applications, escape the backslashes (\) in your file path by typing another backslash.

If the LogPath value is invalid, the connector sends the logged information to the standard output stream (System.out).
LogFileSize No maximum The maximum allowed log file size, specified in MB
LogFileCount No maximum The maximum number of allowed log files

Enable and configure logging

The JDBC driver supports the Simple Logging Facade for Java (SLF4J) and java.util.logging (JUL) frameworks. The driver uses the JUL logging framework by default.

To enable and configure logging for the JDBC driver:

  1. Enable the logging framework that you want to use:

    • For SLF4J logging, set the system property -Dcom.databricks.jdbc.loggerImpl=SLF4JLOGGER and provide the SLF4J binding implementation (compatible with SLF4J version 2.0.13 and above) and corresponding configuration file in the classpath.
    • For JUL logging, set the system property -Dcom.databricks.jdbc.loggerImpl=JDKLOGGER. This is the default.
  2. Set the LogLevel property on the connection string to the desired level of information to include in log files.

  3. Set the LogPath property on the connection string to the full path to the folder where you want to save log files.

    For example, the following connection URL enables logging level 6 and saves the log files to the C:temp folder:

    jdbc: databricks://localhost:11000;LogLevel=6;LogPath=C:\\temp
    
  4. Restart your JDBC application and reconnect to the server to apply the settings.

Example: Run a query using the JDBC driver

The following example shows how to use the JDBC driver to run a Databricks SQL query using an Azure Databricks compute resource.

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.sql.Statement;
import java.util.Properties;

public class DatabricksJDBCExample {

    public static void main(String[] args) {

        Class.forName("com.databricks.client.jdbc.Driver");

        // Set JDBC URL properties
        String jdbcUrl = "jdbc:databricks://dbc-a1b2345c-d6e7.cloud.databricks.com:443";
        Properties connectionProperties = new Properties();
        connectionProperties.put("httpPath", "sql/protocolv1/o/123456780012345/0123-123450-z000pi22");
        connectionProperties.put("ssl", "1");

        // Set authentication properties (personal access token)
        connectionProperties.put("AuthMech", "3");
        connectionProperties.put("user", "token");
        connectionProperties.put("password", "12345678901234667890abcdabcd");

        // Set logging properties
        connectionProperties.put("logPath", "logs/myapplication.log");

        // Establish connection and execute query
        try (Connection connection = DriverManager.getConnection(jdbcUrl, connectionProperties);
             Statement statement = connection.createStatement();
             ResultSet resultSet = statement.executeQuery("SELECT * FROM samples.nyctaxi.trips")) {

            // Get metadata and column names
            ResultSetMetaData metaData = resultSet.getMetaData();
            String[] columns = new String[metaData.getColumnCount()];
            for (int i = 0; i < columns.length; i++) {
                columns[i] = metaData.getColumnName(i + 1);
            }

            // Process and print the result set
            while (resultSet.next()) {
                System.out.print("Row " + resultSet.getRow() + "=[");
                for (int i = 0; i < columns.length; i++) {
                    if (i != 0) {
                        System.out.print(", ");
                    }
                    System.out.print(columns[i] + "='" + resultSet.getObject(i + 1) + "'");
                }
                System.out.println("]");
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Additional resources