Driver SQL di Databricks per Node.js

Articolo
11/16/2024

Databricks SQL Driver for Node.js è una libreria Node.js che consente di usare il codice JavaScript per eseguire comandi SQL nelle risorse di calcolo di Azure Databricks.

Requisiti

Un computer di sviluppo che esegue Node.js versione 14 o successiva. Per stampare la versione installata di Node.js, eseguire il comando node -v. Per installare e usare versioni diverse di Node.js, è possibile usare strumenti come Node Version Manager (nvm).
Node Package Manager (npm). Le versioni successive di Node.js includono già npm. Per verificare se npm è installato, eseguire il comando npm -v. Per installare npm, se necessario, è possibile seguire le istruzioni, ad esempio quelle disponibili in Scaricare e installare npm.
Pacchetto @databricks/sql da npm. Per installare il pacchetto @databricks/sql nel progetto Node.js come dipendenza, usare npm per eseguire il comando seguente dalla stessa directory del progetto:
```
npm i @databricks/sql
```
Se vuoi installare e usare TypeScript nel progetto Node.js come devDependencies, usare npm per eseguire i comandi seguenti dalla stessa directory del progetto:
```
npm i -D typescript
npm i -D @types/node
```
Un cluster esistente o SQL warehouse.
I valori Nome host server e Percorso HTTP per il cluster esistente o SQL Warehouse.
- Ottenere questi valori per un cluster.
- Ottenere questi valori per un'istanza di SQL Warehouse.

Autenticazione

Il driver SQL di Databricks per Node.js supporta i tipi di autenticazione di Azure Databricks seguenti:

Autenticazione con token di accesso personale di Databricks
Autenticazione con token di Microsoft Entra ID
Autenticazione OAuth da computer a computer (M2M)
Autenticazione da utente a computer (U2M) OAuth

Il driver SQL di Databricks per Node.js non supporta ancora i tipi di autenticazione di Azure Databricks seguenti:

Nota

Come procedura consigliata per la sicurezza, non è consigliabile inserire nel codice valori delle variabili di connessione hardcoded. È invece necessario recuperare questi valori di variabile di connessione da una posizione sicura. Ad esempio, i frammenti di codice e gli esempi in questo articolo usano variabili di ambiente.

Autenticazione con token di accesso personale di Databricks

Per usare il driver SQL di Databricks per Node.js con l'autenticazione, è prima necessario creare un token di accesso personale di Azure Databricks. Per informazioni dettagliate su questo passaggio, vedere Token di accesso personale di Azure Databricks per gli utenti dell'area di lavoro.

Per autenticare il driver SQL di Databricks per Node.js, usare il frammento di codice seguente. Questo frammento presuppone che sono state impostate le variabili di ambiente seguenti:

DATABRICKS_SERVER_HOSTNAMEimpostato sul valore Nome host server per il cluster o SQL Warehouse.
DATABRICKS_HTTP_PATH, impostato sul valore percorso HTTP per il cluster o SQL Warehouse.
DATABRICKS_TOKEN, impostato sul token di accesso personale di Azure Databricks.

Per impostare le variabili di ambiente, si veda la documentazione relativa al sistema operativo Windows utilizzato.

JavaScript

const { DBSQLClient } = require('@databricks/sql');

const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath       = process.env.DATABRICKS_HTTP_PATH;
const token          = process.env.DATABRICKS_TOKEN;

if (!token || !serverHostname || !httpPath) {
    throw new Error("Cannot find Server Hostname, HTTP Path, or " +
                    "personal access token. " +
                    "Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
                    "DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.");
  }

  const client = new DBSQLClient();
  const connectOptions = {
    token: token,
    host:  serverHostname,
    path:  httpPath
  };

  client.connect(connectOptions)
  // ...

TypeScript

import { DBSQLClient } from "@databricks/sql";

const serverHostname: string = process.env.DATABRICKS_SERVER_HOSTNAME || '';
const httpPath: string       = process.env.DATABRICKS_HTTP_PATH || '';
const token: string          = process.env.DATABRICKS_TOKEN || '';

if (token == '' || serverHostname == '' || httpPath == '') {
    throw new Error("Cannot find Server Hostname, HTTP Path, or personal access token. " +
                    "Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
                    "DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.");
  }

  const client: DBSQLClient = new DBSQLClient();
  const connectOptions = {
    token: token,
    host:  serverHostname,
    path:  httpPath
  };

  client.connect(connectOptions)
  // ...

Autenticazione da utente a computer (U2M) OAuth

Databricks SQL Driver for Node.js versioni 1.8.0 e successive supportano l'autenticazione da utente a computer (U2M) OAuth.

Per autenticare il driver SQL di Databricks per Node.js con l’autenticazione OAuth U2M, usare il frammento di codice seguente. Questo frammento presuppone che sono state impostate le variabili di ambiente seguenti:

DATABRICKS_SERVER_HOSTNAMEimpostato sul valore Nome host server per il cluster o SQL Warehouse.
DATABRICKS_HTTP_PATH, impostato sul valore percorso HTTP per il cluster o SQL Warehouse.

Per impostare le variabili di ambiente, si veda la documentazione relativa al sistema operativo Windows utilizzato.

JavaScript

const { DBSQLClient } = require('@databricks/sql');

const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath       = process.env.DATABRICKS_HTTP_PATH;

if (!serverHostname || !httpPath) {
    throw new Error("Cannot find Server Hostname or HTTP Path. " +
                    "Check the environment variables DATABRICKS_SERVER_HOSTNAME " +
                    "and DATABRICKS_HTTP_PATH.");
  }

  const client = new DBSQLClient();
  const connectOptions = {
    authType:                  "databricks-oauth",
    useDatabricksOAuthInAzure: true,
    host:                      serverHostname,
    path:                      httpPath
  };

  client.connect(connectOptions)
  // ...

TypeScript

import { DBSQLClient } from "@databricks/sql";

const serverHostname: string = process.env.DATABRICKS_SERVER_HOSTNAME || '';
const httpPath: string       = process.env.DATABRICKS_HTTP_PATH || '';

if (serverHostname == '' || httpPath == '') {
    throw new Error("Cannot find Server Hostname or HTTP Path. " +
                    "Check the environment variables DATABRICKS_SERVER_HOSTNAME " +
                    "and DATABRICKS_HTTP_PATH.");
  }

  const client: DBSQLClient = new DBSQLClient();
  const connectOptions = {
    authType:                  "databricks-oauth",
    useDatabricksOAuthInAzure: true,
    host:                      serverHostname,
    path:                      httpPath
  };

  client.connect(connectOptions)
  // ...

Autenticazione OAuth da computer a computer (M2M)

Databricks SQL Driver for Node.js versioni 1.8.0 e successive supportano l'autenticazione da computer a computer (U2M) OAuth.

Per usare il driver SQL di Databricks per Node.js con l’autenticazione OAuth M2M, è necessario fare quanto segue:

Creare un'entità servizio di Azure Databricks nell'area di lavoro di Azure Databricks e creare un segreto OAuth per tale entità servizio.

Per creare l'entità servizio e il relativo segreto OAuth, consultare Autenticare l'accesso ad Azure Databricks con un'entità servizio usando OAuth (OAuth M2M). Prendere nota del valore UUID o ID applicazione dell'entità servizio e del valore Secret per il segreto OAuth dell'entità servizio.
Concedere all'entità servizio l'accesso al cluster o al warehouse. Vedere Autorizzazioni di calcolo o Gestire un'istanza di SQL Warehouse.

Per autenticare il driver SQL di Databricks per Node.js, usare il frammento di codice seguente. Questo frammento presuppone che sono state impostate le variabili di ambiente seguenti:

DATABRICKS_SERVER_HOSTNAMEimpostato sul valore Nome host server per il cluster o SQL Warehouse.
DATABRICKS_HTTP_PATH, impostato sul valore percorso HTTP per il cluster o SQL Warehouse.
DATABRICKS_CLIENT_ID, impostato sul valore dell'ID applicazione e UUID dell'entità servizio.
DATABRICKS_CLIENT_SECRET, impostato sul valore del segreto OAuth dell'entità servizio.

Per impostare le variabili di ambiente, si veda la documentazione relativa al sistema operativo Windows utilizzato.

JavaScript

const { DBSQLClient } = require('@databricks/sql');

const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath       = process.env.DATABRICKS_HTTP_PATH;
const clientId       = process.env.DATABRICKS_CLIENT_ID;
const clientSecret   = process.env.DATABRICKS_CLIENT_SECRET;

if (!serverHostname || !httpPath || !clientId || !clientSecret) {
    throw new Error("Cannot find Server Hostname, HTTP Path, or " +
                    "service principal ID or secret. " +
                    "Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
                    "DATABRICKS_HTTP_PATH, DATABRICKS_CLIENT_ID, and " +
                    "DATABRICKS_CLIENT_SECRET.");
  }

  const client = new DBSQLClient();
  const connectOptions = {
    authType:                  "databricks-oauth",
    useDatabricksOAuthInAzure: true,
    host:                      serverHostname,
    path:                      httpPath,
    oauthClientId:             clientId,
    oauthClientSecret:         clientSecret
  };

  client.connect(connectOptions)
  // ...

TypeScript

import { DBSQLClient } from "@databricks/sql";

const serverHostname: string = process.env.DATABRICKS_SERVER_HOSTNAME || '';
const httpPath: string       = process.env.DATABRICKS_HTTP_PATH || '';
const clientId: string       = process.env.DATABRICKS_CLIENT_ID || '';
const clientSecret: string   = process.env.DATABRICKS_CLIENT_SECRET || '';

if (serverHostname == '' || httpPath == '' || clientId == '' || clientSecret == '') {
    throw new Error("Cannot find Server Hostname, HTTP Path, or " +
                    "service principal ID or secret. " +
                    "Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
                    "DATABRICKS_HTTP_PATH, DATABRICKS_CLIENT_ID, and " +
                    "DATABRICKS_CLIENT_SECRET.");
  }

  const client: DBSQLClient = new DBSQLClient();
  const connectOptions = {
    authType:                  "databricks-oauth",
    useDatabricksOAuthInAzure: true,
    host:                      serverHostname,
    path:                      httpPath,
    oauthClientId:             clientId,
    oauthClientSecret:         clientSecret
  };

  client.connect(connectOptions)
  // ...

Autenticazione con token di Microsoft Entra ID

Per usare il driver SQL di Databricks per Node.js con l'autenticazione del token ID di Microsoft Entra, è necessario fornire il driver SQL di Databricks per Node.js con il token ID di Microsoft Entra. Per creare un token di accesso di Microsoft Entra ID, eseguire le operazioni seguenti:

Per un utente di Azure Databricks, è possibile usare l'interfaccia della riga di comando di Azure. Consultare Ottenere i token Microsoft Entra ID per gli utenti utilizzando l’interfaccia della riga di comando di Azure.
Per un'entità servizio Microsoft Entra ID, consultare Ottenere un token di accesso microsoft Entra ID con l'interfaccia della riga di comando di Azure. Per creare un'entità servizio gestita di Microsoft Entra ID, consultare Gestire le entità servizio.

I token ID Microsoft Entra hanno una durata predefinita di circa 1 ora. Per creare un nuovo token ID Microsoft Entra, ripetere questo processo.

Per autenticare il driver SQL di Databricks per Node.js, usare il frammento di codice seguente. Questo frammento presuppone che sono state impostate le variabili di ambiente seguenti:

DATABRICKS_SERVER_HOSTNAMEimpostato sul valore Nome host server per il cluster o SQL Warehouse.
DATABRICKS_HTTP_PATH, impostato sul valore percorso HTTP per il cluster o SQL Warehouse.
DATABRICKS_TOKEN, impostato sul token Microsoft Entra ID.

Per impostare le variabili di ambiente, si veda la documentazione relativa al sistema operativo Windows utilizzato.

JavaScript

const { DBSQLClient } = require('@databricks/sql');

const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath       = process.env.DATABRICKS_HTTP_PATH;
const token          = process.env.DATABRICKS_TOKEN;

if (!token || !serverHostname || !httpPath) {
    throw new Error("Cannot find Server Hostname, HTTP Path, or " +
                    "<ms-entra-id> token. " +
                    "Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
                    "DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.");
  }

  const client = new DBSQLClient();
  const connectOptions = {
    token: token,
    host:  serverHostname,
    path:  httpPath
  };

  client.connect(connectOptions)
  // ...

TypeScript

import { DBSQLClient } from "@databricks/sql";

const serverHostname: string = process.env.DATABRICKS_SERVER_HOSTNAME || '';
const httpPath: string       = process.env.DATABRICKS_HTTP_PATH || '';
const token: string          = process.env.DATABRICKS_TOKEN || '';

if (token == '' || serverHostname == '' || httpPath == '') {
    throw new Error("Cannot find Server Hostname, HTTP Path, or " +
                    "<ms-entra-id> token. " +
                    "Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
                    "DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.");
  }

  const client: DBSQLClient = new DBSQLClient();
  const connectOptions = {
    token: token,
    host:  serverHostname,
    path:  httpPath
  };

  client.connect(connectOptions)
  // ...

Eseguire query sui dati

L'esempio seguente illustra come richiamare il driver Databricks SQL for Node.js per eseguire una query base SQL usando una risorsa di calcolo di Azure Databricks. Questo comando restituisce le prime due righe della tabella trips nello schema samples del catalogo nyctaxi.

Nota

L'esempio di codice seguente illustra come usare un token di accesso personale di Azure Databricks per l'autenticazione. Per usare altri tipi di autenticazione di Azure Databricks disponibili, consultare Autenticazione.

Questo esempio di codice recupera i valori delle variabili di connessione token, server_hostname e http_path da un set di variabili di ambiente di Azure Databricks. Queste variabili di ambiente hanno i nomi delle variabili di ambiente seguenti:

DATABRICKS_TOKEN, che rappresenta il token di accesso personale di Azure Databricks dai requisiti.
DATABRICKS_SERVER_HOSTNAME: che rappresenta il valore Server Hostname in base ai requisiti.
DATABRICKS_HTTP_PATH: che rappresenta il valore Percorso HTTP in base ai requisiti.

È possibile usare altri approcci per recuperare questi valori delle variabili di connessione. L'uso delle variabili di ambiente è solo un approccio tra molti.

L'esempio di codice seguente illustra come chiamare il connettore SQL di Databricks per Node.js per eseguire un comando SQL di base in un cluster o in un SQL warehouse. Questo comando restituisce le prime due righe dalla tabella trips.

JavaScript

const { DBSQLClient } = require('@databricks/sql');

const token          = process.env.DATABRICKS_TOKEN;
const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath       = process.env.DATABRICKS_HTTP_PATH;

if (!token || !serverHostname || !httpPath) {
  throw new Error("Cannot find Server Hostname, HTTP Path, or personal access token. " +
                  "Check the environment variables DATABRICKS_TOKEN, " +
                  "DATABRICKS_SERVER_HOSTNAME, and DATABRICKS_HTTP_PATH.");
}

const client = new DBSQLClient();
const connectOptions = {
  token: token,
  host: serverHostname,
  path: httpPath
};

client.connect(connectOptions)
  .then(async client => {
    const session = await client.openSession();
    const queryOperation = await session.executeStatement(
      'SELECT * FROM samples.nyctaxi.trips LIMIT 2',
      {
        runAsync: true,
        maxRows:  10000 // This option enables the direct results feature.
      }
    );

    const result = await queryOperation.fetchAll();

    await queryOperation.close();

    console.table(result);

    await session.close();
    await client.close();
})
.catch((error) => {
  console.error(error);
});

TypeScript

import { DBSQLClient } from '@databricks/sql';
import IDBSQLSession from '@databricks/sql/dist/contracts/IDBSQLSession';
import IOperation from '@databricks/sql/dist/contracts/IOperation';

const serverHostname: string = process.env.DATABRICKS_SERVER_HOSTNAME || '';
const httpPath: string       = process.env.DATABRICKS_HTTP_PATH || '';
const token: string          = process.env.DATABRICKS_TOKEN || '';

if (serverHostname == '' || httpPath == '' || token == '') {
  throw new Error("Cannot find Server Hostname, HTTP Path, or personal access token. " +
                  "Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
                  "DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.");
}

const client: DBSQLClient = new DBSQLClient();
const connectOptions = {
  host: serverHostname,
  path: httpPath,
  token: token
};

client.connect(connectOptions)
  .then(async client => {
    const session: IDBSQLSession = await client.openSession();

    const queryOperation: IOperation = await session.executeStatement(
      'SELECT * FROM samples.nyctaxi.trips LIMIT 2',
      {
        runAsync: true,
        maxRows: 10000 // This option enables the direct results feature.
      }
    );

    const result = await queryOperation.fetchAll();

    await queryOperation.close();

    console.table(result);

    await session.close();
    client.close();
  })
  .catch((error) => {
    console.error(error);
});

Output:

┌─────────┬─────┬────────┬───────────┬───────┬─────────┬────────┬───────┬───────┬────────┬────────┬────────┐
│ (index) │ _c0 │ carat  │    cut    │ color │ clarity │ depth  │ table │ price │   x    │   y    │   z    │
├─────────┼─────┼────────┼───────────┼───────┼─────────┼────────┼───────┼───────┼────────┼────────┼────────┤
│    0    │ '1' │ '0.23' │  'Ideal'  │  'E'  │  'SI2'  │ '61.5' │ '55'  │ '326' │ '3.95' │ '3.98' │ '2.43' │
│    1    │ '2' │ '0.21' │ 'Premium' │  'E'  │  'SI1'  │ '59.8' │ '61'  │ '326' │ '3.89' │ '3.84' │ '2.31' │
└─────────┴─────┴────────┴───────────┴───────┴─────────┴────────┴───────┴───────┴────────┴────────┴────────┘

Sessioni

Tutti i metodi IDBSQLSession che restituiscono oggetti IOperation nel riferimento API hanno i parametri comuni seguenti che influiscono sul comportamento:

L'impostazione runAsync su true avvia la modalità asincrona. I metodi IDBSQLSession inseriscono le operazioni nella coda e restituiscono il più rapidamente possibile. Lo stato corrente dell'oggetto IOperation restituito può variare e il client è responsabile del controllo dello stato prima di usare l'oggetto restituito IOperation. Consultare Operazioni. L'impostazione runAsync su false indica che i metodi IDBSQLSession attendono il completamento delle operazioni. Databricks consiglia sempre di impostare runAsync su true.
L'impostazione su maxRows un valore non Null abilita i risultati diretti. Con risultati diretti, il server tenta di attendere il completamento delle operazioni e quindi recupera una parte dei dati. A seconda della quantità di lavoro che il server è stato in grado di completare entro il tempo definito, gli oggetti IOperation restituiscono in uno stato intermedio anziché in uno stato in sospeso. Molto spesso tutti i metadati e i risultati della query vengono restituiti all'interno di una singola richiesta al server. Il server usa maxRows per determinare il numero di record che può restituire immediatamente. Tuttavia, il blocco effettivo può essere di una dimensione diversa; consultare IDBSQLSession.fetchChunk. I risultati diretti sono abilitati per impostazione predefinita. Databricks consiglia di disabilitare i risultati diretti.

Operazioni

Come descritto in Sessioni, gli oggetti IOperation restituiti dai metodi di sessione IDBSQLSession nel riferimento API non vengono popolati completamente. L'operazione del server correlata potrebbe essere ancora in corso, ad esempio l'attesa dell'avvio del databricks SQL Warehouse, l'esecuzione della query o il recupero dei dati. La classe IOperation nasconde questi dettagli agli utenti. Ad esempio, metodi come fetchAll, fetchChunk e getSchema attendere internamente il completamento delle operazioni e quindi restituire i risultati. È possibile usare il metodo IOperation.finished() per attendere in modo esplicito il completamento delle operazioni. Questi metodi eseguono un callback che viene chiamato periodicamente durante l'attesa del completamento delle operazioni. Impostazione dell'opzione progress su true tenta di richiedere dati di stato aggiuntivi dal server e passarli al callback.

I metodi close e cancel possono essere chiamati in qualsiasi momento. Quando viene chiamato, invalidano immediatamente l'oggetto IOperation. Tutte le chiamate in sospeso, ad esempio fetchAll, fetchChunk e getSchema vengono immediatamente annullate e viene restituito un errore. In alcuni casi, l'operazione server potrebbe essere già stata completata e il metodo cancel influisce solo sul client.

Il metodo fetchAll chiama fetchChunk internamente e raccoglie tutti i dati in una matrice. Sebbene sia utile, può causare errori di memoria insufficiente quando vengono usati in set di dati di grandi dimensioni. Le opzioni fetchAll vengono in genere passate a fetchChunk.

Recuperare blocchi di dati

Il recupero di blocchi di dati usa il modello di codice seguente:

do {
  const chunk = await operation.fetchChunk();
  // Process the data chunk.
} while (await operation.hasMoreRows());

Il metodo fetchChunk nel riferimento API elabora i dati in piccole parti per ridurre il consumo di memoria. fetchChunk prima attende il completamento delle operazioni se non sono già state completate, quindi chiama un callback durante il ciclo di attesa e quindi recupera il blocco di dati successivo.

È possibile usare l'opzione maxRows per specificare la grandezza desiderata dei blocchi. Tuttavia, il blocco restituito potrebbe avere una dimensione diversa, più piccola o persino più grande. fetchChunk non tenta di recuperare internamente i dati per suddividerli nelle parti richieste. Invia quindi l'opzione maxRows al server e restituisce qualsiasi risultato restituito dal server. Non confondere questa opzione maxRows con quella in IDBSQLSession. maxRows passato a fetchChunk definisce le dimensioni di ogni blocco e non esegue altre operazioni.

Gestisce i file nei volumi nel catalogo Unity

Il driver SQL di Databricks consente di scrivere file locali in volumi del catalogo Unity, scaricare file dai volumi ed eliminare file dai volumi, come illustrato nell'esempio seguente:

JavaScript

const { DBSQLClient } = require('@databricks/sql');

const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath       = process.env.DATABRICKS_HTTP_PATH;
const token          = process.env.DATABRICKS_TOKEN;

if (!token || !serverHostname || !httpPath) {
    throw new Error("Cannot find Server Hostname, HTTP Path, or " +
                    "personal access token. " +
                    "Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
                    "DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.");
}

const client = new DBSQLClient();
const connectOptions = {
  token: token,
  host:  serverHostname,
  path:  httpPath
};

client.connect(connectOptions)
  .then(async client => {
    const session = await client.openSession();

    // Write a local file to a volume in the specified path.
    // For writing local files to volumes, you must first specify the path to the
    // local folder that contains the file to be written.
    // Specify OVERWRITE to overwrite any existing file in that path.
    await session.executeStatement(
      "PUT 'my-data.csv' INTO '/Volumes/main/default/my-volume/my-data.csv' OVERWRITE", {
        stagingAllowedLocalPath: ["/tmp/"]
      }
    );

    // Download a file from a volume in the specified path.
    // For downloading files in volumes, you must first specify the path to the
    // local folder that will contain the downloaded file.
    await session.executeStatement(
      "GET '/Volumes/main/default/my-volume/my-data.csv' TO 'my-downloaded-data.csv'", {
        stagingAllowedLocalPath: ["/Users/paul.cornell/samples/nodejs-sql-driver/"]
      }
    )

      // Delete a file in a volume from the specified path.
      // For deleting files from volumes, you must add stagingAllowedLocalPath,
      // but its value will be ignored. As such, in this example, an empty string is
      // specified.
      await session.executeStatement(
        "REMOVE '/Volumes/main/default/my-volume/my-data.csv'", {
          stagingAllowedLocalPath: [""]
        }
      )

      await session.close();
      await client.close();
  })
  .catch((error) => {
    console.error(error);
  });

TypeScript

import { DBSQLClient } from '@databricks/sql';

const serverHostname: string | undefined = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath: string | undefined = process.env.DATABRICKS_HTTP_PATH;
const token: string | undefined = process.env.DATABRICKS_TOKEN;

if (!token || !serverHostname || !httpPath) {
  throw new Error("Cannot find Server Hostname, HTTP Path, or " +
                  "personal access token. " +
                  "Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
                  "DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.");
}

const client: DBSQLClient = new DBSQLClient();
const connectOptions = {
  token: token,
  host: serverHostname,
  path: httpPath
};

client.connect(connectOptions)
  .then(async client => {
    const session = await client.openSession();

    // Write a local file to a volume in the specified path.
    // For writing local files to volumes, you must first specify the path to the
    // local folder that contains the file to be written.
    // Specify OVERWRITE to overwrite any existing file in that path.
    await session.executeStatement(
      "PUT 'my-data.csv' INTO '/Volumes/main/default/my-volume/my-data.csv' OVERWRITE", {
        stagingAllowedLocalPath: ["/tmp/"]
      }
    );

    // Download a file from a volume in the specified path.
    // For downloading files in volumes, you must first specify the path to the
    // local folder that will contain the downloaded file.
    await session.executeStatement(
      "GET '/Volumes/main/default/my-volume/my-data.csv' TO 'my-downloaded-data.csv'", {
        stagingAllowedLocalPath: ["/Users/paul.cornell/samples/nodejs-sql-driver/"]
      }
    )

    // Delete a file in a volume from the specified path.
    // For deleting files from volumes, you must add stagingAllowedLocalPath,
    // but its value will be ignored. As such, in this example, an empty string is
    // specified.
    await session.executeStatement(
      "REMOVE '/Volumes/main/default/my-volume/my-data.csv'", {
        stagingAllowedLocalPath: [""]
      }
    )

    await session.close();
    await client.close();
  })
  .catch((error: any) => {
    console.error(error);
  });

Configurare la registrazione

Il logger fornisce informazioni per il debug dei problemi con il connettore. Tutti gli oggetti DBSQLClient vengono creati con un logger che stampa nella console, ma passando un logger personalizzato, è possibile inviare queste informazioni a un file. Nell'esempio seguente viene illustrato come configurare un logger e cambiare il suo livello.

JavaScript

const { DBSQLLogger, LogLevel } = require('@databricks/sql');
const logger = new DBSQLLogger({
  filepath: 'log.txt',
  level: LogLevel.info
});

// Set logger to different level.
logger.setLevel(LogLevel.debug);

TypeScript

import { DBSQLLogger, LogLevel } from '@databricks/sql';
const logger = new DBSQLLogger({
  filepath: 'log.txt',
  level: LogLevel.info,
});

// Set logger to different level.
logger.setLevel(LogLevel.debug);

Per altri esempi, consultare la cartella esempi nel repository databricks/databricks-sql-nodejs in GitHub.

Test

Per testare il codice, è possibile usare framework di test JavaScript, ad esempio Jest. Per testare il codice in condizioni simulate senza chiamare gli endpoint dell'API REST di Azure Databricks o modificare lo stato degli account o delle aree di lavoro di Azure Databricks, è possibile usare i framework di mocking integrati in Jest.

Ad esempio, dato il file seguente denominato helpers.js contenente una funzione getDBSQLClientWithPAT che usa un token di accesso personale di Azure Databricks per restituire una connessione a un'area di lavoro di Azure Databricks, una funzione getAllColumnsFromTable che usa la connessione per ottenere il numero specificato di righe di dati dalla tabella specificata (ad esempio, la tabella trips nello schema samples del catalogo nyctaxi) e una funzione printResults per stampare il contenuto delle righe di dati:

// helpers.js

const { DBSQLClient } = require('@databricks/sql');

async function getDBSQLClientWithPAT(token, serverHostname, httpPath) {
  const client = new DBSQLClient();
  const connectOptions = {
    token: token,
    host: serverHostname,
    path: httpPath
  };
  try {
    return await client.connect(connectOptions);
  } catch (error) {
    console.error(error);
    throw error;
  }
}

async function getAllColumnsFromTable(client, tableSpec, rowCount) {
  let session;
  let queryOperation;
  try {
    session = await client.openSession();
    queryOperation = await session.executeStatement(
      `SELECT * FROM ${tableSpec} LIMIT ${rowCount}`,
      {
        runAsync: true,
        maxRows: 10000 // This option enables the direct results feature.
      }
    );
  } catch (error) {
    console.error(error);
    throw error;
  }
  let result;
  try {
    result = await queryOperation.fetchAll();
  } catch (error) {
    console.error(error);
    throw error;
  } finally {
    if (queryOperation) {
      await queryOperation.close();
    }
    if (session) {
      await session.close();
    }
  }
  return result;
}

function printResult(result) {
  console.table(result);
}

module.exports = {
  getDBSQLClientWithPAT,
  getAllColumnsFromTable,
  printResult
};

E dato il file seguente denominato main.js che chiama le funzioni getDBSQLClientWithPAT, getAllColumnsFromTable, e printResults:

// main.js

const { getDBSQLClientWithPAT, getAllColumnsFromTable, printResult } = require('./helpers');

const token          = process.env.DATABRICKS_TOKEN;
const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath       = process.env.DATABRICKS_HTTP_PATH;
const tableSpec      = process.env.DATABRICKS_TABLE_SPEC;

if (!token || !serverHostname || !httpPath) {
  throw new Error("Cannot find Server Hostname, HTTP Path, or personal access token. " +
    "Check the environment variables DATABRICKS_TOKEN, " +
    "DATABRICKS_SERVER_HOSTNAME, and DATABRICKS_HTTP_PATH.");
}

if (!tableSpec) {
  throw new Error("Cannot find table spec in the format catalog.schema.table. " +
    "Check the environment variable DATABRICKS_TABLE_SPEC."
  )
}

getDBSQLClientWithPAT(token, serverHostname, httpPath)
  .then(async client => {
    const result = await getAllColumnsFromTable(client, tableSpec, 2);
    printResult(result);
    await client.close();
  })
  .catch((error) => {
    console.error(error);
  });

Il file seguente denominato helpers.test.js verifica se la funzione getAllColumnsFromTable restituisce la risposta prevista. Anziché creare una connessione reale all'area di lavoro di destinazione, questo test simula un oggetto DBSQLClient. Il test simula anche alcuni dati conformi allo schema e ai valori presenti nei dati reali. Il test restituisce i dati fittizi tramite la connessione fittizia e quindi controlla se uno dei valori delle righe di dati fittizi corrisponde al valore previsto.

// helpers.test.js

const { getDBSQLClientWithPAT, getAllColumnsFromTable, printResult} = require('./helpers')

jest.mock('@databricks/sql', () => {
  return {
    DBSQLClient: jest.fn().mockImplementation(() => {
      return {
        connect: jest.fn().mockResolvedValue({ mock: 'DBSQLClient'})
      };
    }),
  };
});

test('getDBSQLClientWithPAT returns mocked Promise<DBSQLClient> object', async() => {
  const result = await getDBSQLClientWithPAT(
    token = 'my-token',
    serverHostname = 'mock-server-hostname',
    httpPath = 'mock-http-path'
  );

  expect(result).toEqual({ mock: 'DBSQLClient' });
});

const data = [
  {
    tpep_pickup_datetime: new Date(2016, 1, 13, 15, 51, 12),
    tpep_dropoff_datetime: new Date(2016, 1, 13, 16, 15, 3),
    trip_distance: 4.94,
    fare_amount: 19.0,
    pickup_zip: 10282,
    dropoff_zip: 10171
  },
  {
    tpep_pickup_datetime: new Date(2016, 1, 3, 17, 43, 18),
    tpep_dropoff_datetime: new Date(2016, 1, 3, 17, 45),
    trip_distance: 0.28,
    fare_amount: 3.5,
    pickup_zip: 10110,
    dropoff_zip: 10110
  }
];

const mockDBSQLClientForSession = {
  openSession: jest.fn().mockResolvedValue({
    executeStatement: jest.fn().mockResolvedValue({
      fetchAll: jest.fn().mockResolvedValue(data),
      close: jest.fn().mockResolvedValue(null)
    }),
    close: jest.fn().mockResolvedValue(null)
  })
};

test('getAllColumnsFromTable returns the correct fare_amount for the second mocked data row', async () => {
  const result = await getAllColumnsFromTable(
    client    = mockDBSQLClientForSession,
    tableSpec = 'mock-table-spec',
    rowCount  = 2);
  expect(result[1].fare_amount).toEqual(3.5);
});

global.console.table = jest.fn();

test('printResult mock prints the correct fare_amount for the second mocked data row', () => {
  printResult(data);
  expect(console.table).toHaveBeenCalledWith(data);
  expect(data[1].fare_amount).toBe(3.5);
});

Per TypeScript, il codice precedente ha un aspetto simile. Per i test Jest con TypeScript, usare ts-jest.

Risorse aggiuntive

Il driver SQL di Databricks per Node.js repository in GitHub
Introduzione al driver SQL di Databricks per Node.js
Risoluzione dei problemi relativi al driver SQL di Databricks per Node.js

Informazioni di riferimento sulle API

Classi
- Classe DBSQLClient
  - Metodi
    - Metodo connect
    - Metodo openSession
    - Metodo getClient
    - Metodo close
- Classe DBSQLSession
  - Metodi
    - Metodo executeStatement
    - Metodo close
    - Metodo getId
    - Metodo getTypeInfo
    - Metodo getCatalogs
    - Metodo getSchemas
    - Metodo getTables
    - Metodo getFunctions
    - Metodo getPrimaryKeys
    - Metodo getCrossReference
- Classe DBSQLOperation
  - Metodi
    - Metodo getId
    - Metodo fetchAll
    - Metodo fetchChunk
    - Metodo close

Classi

Classe `DBSQLClient`

Punto di ingresso principale per interagire con un database.

Metodi

Metodo `connect`

Apre una connessione al database.

Parametri
options Tipo: `ConnectionOptions` Set di opzioni utilizzate per connettersi al database. I campi `host`, `path` e altri campi obbligatori devono essere popolati. Vedere Autenticazione. Esempio: `const client: DBSQLClient = new DBSQLClient();` `client.connect(` `{` `host: serverHostname,` `path: httpPath,` `// ...` `}` `)`

Restituisce: Promise<IDBSQLClient>

Metodo `openSession`

Apre la sessione tra DBSQLClient e il database.

Parametri
request Tipo: `OpenSessionRequest` Set di parametri facoltativi per specificare lo schema iniziale e il catalogo iniziale Esempio: `const session = await client.openSession(` `{initialCatalog: 'catalog'}` `);`

Restituisce: Promise<IDBSQLSession>

Metodo `getClient`

Restituisce l'oggetto thrift TCLIService.Client interno. Deve essere chiamato dopo la connessione di DBSQLClient.

Nessun parametro

Restituisce TCLIService.Client.

Metodo `close`

Chiude la connessione al database e rilascia tutte le risorse associate nel server. Eventuali chiamate aggiuntive a questo client genereranno un errore.

Nessun parametro.

Nessun valore restituito.

Classe `DBSQLSession`

Le sessioni DBSQL vengono usate principalmente per l'esecuzione di istruzioni sul database, nonché per varie operazioni di recupero dei metadati.

Metodi

Metodo `executeStatement`

Esegue un'istruzione con le opzioni disponibili.

Parametri

Parametri
statement Tipo: `str` L’istruzione che verrà eseguita.
options Tipo: `ExecuteStatementOptions` Set di parametri facoltativi per determinare il timeout della query, il numero massimo di righe per i risultati diretti e se eseguire la query in modo asincrono. Per impostazione predefinita, il valore di `maxRows` è 10000. Se `maxRows` è impostato su Null, l'operazione verrà eseguita con la funzionalità risultati diretti disattivata. Esempio: `const session = await client.openSession(` `{initialCatalog: 'catalog'}` `);` `queryOperation = await session.executeStatement(` `'SELECT "Hello, World!"', { runAsync: true }` `);`

statement

Tipo: str

L’istruzione che verrà eseguita.

options

Tipo: ExecuteStatementOptions

Set di parametri facoltativi per determinare il timeout della query, il numero massimo di righe per i risultati diretti e se eseguire la query in modo asincrono. Per impostazione predefinita, il valore di maxRows è 10000. Se maxRows è impostato su Null, l'operazione verrà eseguita con la funzionalità risultati diretti disattivata.

Esempio:

const session = await client.openSession(
{initialCatalog: 'catalog'}
);

queryOperation = await session.executeStatement(
'SELECT "Hello, World!"', { runAsync: true }
);

Restituisce: Promise<IOperation>

Metodo `close`

Chiude la sessione. Deve essere eseguita dopo l'uso della sessione.

Nessun parametro.

Nessun valore restituito.

Metodo `getId`

Restituisce il GUID della sessione.

Nessun parametro.

Restituisce: str

Metodo `getTypeInfo`

Restituisce informazioni sui tipi di dati supportati.

Parametri
request Tipo: `TypeInfoRequest` Parametri della richiesta.

Restituisce: Promise<IOperation>

Metodo `getCatalogs`

Ottiene l'elenco dei cataloghi.

Parametri
request Tipo: `CatalogsRequest` Parametri della richiesta.

Restituisce: Promise<IOperation>

Metodo `getSchemas`

Ottiene un elenco di schemi.

Parametri
request Tipo: `SchemasRequest` Parametri della richiesta. I campi `catalogName` e `schemaName` possono essere usati a scopo di filtro.

Restituisce: Promise<IOperation>

Metodo `getTables`

Ottiene l'elenco delle tabelle.

Parametri
request Tipo: `TablesRequest` Parametri della richiesta. Campi `catalogName` e `schemaName` e `tableName` può essere usato per filtrare.

Restituisce: Promise<IOperation>

Metodo `getFunctions`

Ottiene l'elenco delle tabelle.

Parametri
request Tipo: `FunctionsRequest` Parametri della richiesta. Il campo `functionName` è obbligatorio.

Restituisce: Promise<IOperation>

Metodo `getPrimaryKeys`

Ottiene l'elenco di chiavi primarie.

Parametri
request Tipo: `PrimaryKeysRequest` Parametri della richiesta. Campi `schemaName` e `tableName` sono obbligatori.

Restituisce: Promise<IOperation>

Metodo `getCrossReference`

Ottiene informazioni sulle chiavi esterne tra due tabelle.

Parametri
request Tipo: `CrossReferenceRequest` Parametri della richiesta. Schema, Padre e Nome catalogo devono essere specificati per entrambe le tabelle.

Restituisce: Promise<IOperation>

Classe `DBSQLOperation`

DBSQLOperations viene creato da DBSQLSessions e può essere usato per recuperare i risultati delle istruzioni e verificarne l'esecuzione. I dati vengono recuperati tramite funzioni fetchChunk e fetchAll.

Metodi

Metodo `getId`

Restituisce il GUID dell'operazione.

Nessun parametro.

Restituisce: str

Metodo `fetchAll`

Attende il completamento dell'operazione, quindi recupera tutte le righe dall'operazione.

Parametri: nessuno

Restituisce: Promise<Array<object>>

Metodo `fetchChunk`

Attende il completamento dell'operazione, quindi recupera fino a un numero specificato di righe da un'operazione.

Parametri
options Tipo: `FetchOptions` Opzioni utilizzate per recuperare. Attualmente, l'unica opzione è maxRows, che corrisponde al numero massimo di oggetti dati da restituire in una determinata matrice.

Restituisce: Promise<Array<object>>

Metodo `close`

Chiude l'operazione e rilascia tutte le risorse associate. Deve essere eseguita dopo l'operazione non più in uso.

Nessun parametro.

Nessun valore restituito.

Condividi tramite

Driver SQL di Databricks per Node.js

Requisiti

Autenticazione

Autenticazione con token di accesso personale di Databricks

JavaScript

TypeScript

Autenticazione da utente a computer (U2M) OAuth

JavaScript

TypeScript

Autenticazione OAuth da computer a computer (M2M)

JavaScript

TypeScript

Autenticazione con token di Microsoft Entra ID

JavaScript

TypeScript

Eseguire query sui dati

JavaScript

TypeScript

Sessioni

Operazioni

Recuperare blocchi di dati

Gestisce i file nei volumi nel catalogo Unity

JavaScript

TypeScript

Configurare la registrazione

JavaScript

TypeScript

Test

Risorse aggiuntive

Informazioni di riferimento sulle API

Classi

Classe DBSQLClient

Metodi

Metodo connect

Metodo openSession

Metodo getClient

Metodo close

Classe DBSQLSession

Metodi

Metodo executeStatement

Metodo close

Metodo getId

Metodo getTypeInfo

Metodo getCatalogs

Metodo getSchemas

Metodo getTables

Metodo getFunctions

Metodo getPrimaryKeys

Metodo getCrossReference

Classe DBSQLOperation

Metodi

Metodo getId

Metodo fetchAll

Metodo fetchChunk

Metodo close

Commenti e suggerimenti

Risorse aggiuntive

Classe `DBSQLClient`

Metodo `connect`

Metodo `openSession`

Metodo `getClient`

Metodo `close`

Classe `DBSQLSession`

Metodo `executeStatement`

Metodo `close`

Metodo `getId`

Metodo `getTypeInfo`

Metodo `getCatalogs`

Metodo `getSchemas`

Metodo `getTables`

Metodo `getFunctions`

Metodo `getPrimaryKeys`

Metodo `getCrossReference`

Classe `DBSQLOperation`

Metodo `getId`

Metodo `fetchAll`

Metodo `fetchChunk`

Metodo `close`