Deploy models for batch inference and prediction

Article
10/15/2024

This article describes what Databricks recommends for batch and streaming inference.

For real-time model serving on Azure Databricks, see Model serving with Azure Databricks.

Use ai_query for batch inference

Important

Databricks recommends using ai_query with Model Serving for batch inference. ai_query is a built-in Databricks SQL function that allows you to query existing model serving endpoints using SQL. See ai_query function for more detail about this AI function.

For quick experimentation, ai_query can be used with pay-per-token endpoints since these endpoints are pre-configured on your workspace.

When you are ready to run batch inference on large or production data, Databricks recommends using provisioned throughput endpoints for faster performance. See Provisioned throughput Foundation Model APIs to create a provisioned throughput endpoint.

See Perform batch inference using ai_query.
To get started with batch inference with LLMs on Unity Catalog tables, see the notebook examples in Batch inference using Foundation Model APIs provisioned throughput.

Share via

Deploy models for batch inference and prediction

Use ai_query for batch inference

Feedback

Additional resources