생성 AI 모델 쿼리

아티클
12/18/2024

이 문서에서는 파운데이션 모델 및 외부 모델에 대한 쿼리 요청의 형식을 지정하고 엔드포인트를 제공하는 모델에 보내는 방법을 알아봅니다.

기존 ML 또는 Python 모델 쿼리 요청의 경우 사용자 지정 모델에 대한 서비스 엔드포인트 쿼리를 참조하세요.

Mosaic AI Model Serving은 생성 AI 모델에 액세스하기 위한 파운데이션 모델 API 및 외부 모델을 지원합니다. Model Serving는 통합 OpenAI 호환 API 및 SDK를 사용하여 쿼리합니다. 이렇게 하면 지원되는 클라우드 및 providers프로덕션용 생성 AI 모델을 실험하고 사용자 지정할 수 있습니다.

Mosaic AI 모델 서비스 제공에서는 파운데이션 모델 또는 외부 모델을 제공하는 엔드포인트에 점수 매기기 요청을 보내기 위한 다음 옵션을 제공합니다.

메서드	세부 정보
OpenAI 클라이언트	OpenAI 클라이언트를 사용하여 Mosaic AI 모델 서비스 엔드포인트에서 호스트되는 모델을 쿼리합니다. 엔드포인트 이름을 `model` 입력으로 제공하는 모델을 지정합니다. Foundation Model API 또는 외부 모델에서 사용할 수 있는 채팅, 포함 및 완성 모델에 대해 지원됩니다.
서비스 UI	서비스 엔드포인트 페이지에서 Select을 통해 쿼리 엔드포인트에 쿼리를 실행합니다. JSON 형식 모델 입력 데이터를 Insert에 넣고, 요청 보내기를 클릭합니다. 모델에 기록된 입력 예제가 있는 경우 예제 표시를 사용하여 로드합니다.
REST API	REST API를 사용하여 모델을 호출하고 쿼리합니다. 자세한 내용은 POST /serving-endpoints/{name}/invocations를 참조하세요. 여러 모델을 제공하는 엔드포인트에 대한 요청의 점수를 매기려면 엔드포인트 뒤에서 개별 모델 쿼리를 참조하세요.
MLflow 배포 SDK	MLflow Deployments SDK의 predict() 함수를 사용하여 모델을 쿼리합니다.
Databricks Python SDK	Databricks Python SDK는 REST API 위에 있는 계층입니다. 인증과 같은 하위 수준 세부 정보를 처리하여 모델과 보다 쉽게 상호 작용할 수 있습니다.
SQL 함수	`ai_query` SQL 함수를 사용하여 SQL에서 직접 모델 유추를 호출합니다. ai_query사용하여 제공된 모델 쿼리 참조하세요.

요구 사항

모델 서비스 엔드포인트
지원되는 지역의 Databricks 작업 영역입니다.
- 파운데이션 모델 API 지역
- 외부 모델 지역
OpenAI 클라이언트, REST API 또는 MLflow 배포 SDK를 통해 점수 매기기 요청을 보내려면 Databricks API 토큰이 있어야 합니다.

Important

프로덕션 시나리오에 대한 보안 모범 사례로 Databricks는 프로덕션 중에 인증을 위해 컴퓨터-컴퓨터 OAuth 토큰을 사용하는 것이 좋습니다.

테스트 및 개발을 위해 Databricks는 작업 영역 사용자 대신 서비스 주체에 속하는 개인용 액세스 토큰을 사용하는 것이 좋습니다. 서비스 주체에 대한 토큰을 만들려면 서비스 주체에 대한 토큰 관리를 참조하세요.

패키지 설치

쿼리 방법을 선택한 후에는 먼저 클러스터에 적절한 패키지를 설치해야 합니다.

OpenAI 클라이언트

OpenAI 클라이언트를 사용하려면 클러스터에 databricks-sdk[openai] 패키지를 설치해야 합니다. Databricks SDK는 생성 AI 모델을 쿼리하도록 자동으로 구성된 권한 부여를 사용하여 OpenAI 클라이언트를 생성하기 위한 래퍼를 제공합니다. Notebook 또는 로컬 터미널에서 다음을 실행합니다.

!pip install databricks-sdk[openai]>=0.35.0

다음은 Databricks Notebook에 패키지를 설치하는 경우에만 필요합니다.

dbutils.library.restartPython()

REST API

서비스 REST API에 대한 액세스는 Machine Learning용 Databricks 런타임에서 사용할 수 있습니다.

MLflow 배포 SDK

!pip install mlflow

다음은 Databricks Notebook에 패키지를 설치하는 경우에만 필요합니다.

dbutils.library.restartPython()

Databricks Python SDK

Python용 Databricks SDK는 Databricks Runtime 13.3 LTS 이상을 사용하는 모든 Azure Databricks 클러스터에 이미 설치되어 있습니다. Databricks Runtime 12.2 LTS 이하를 사용하는 Azure Databricks 클러스터의 경우 먼저 Python용 Databricks SDK를 설치해야 합니다. Python용 Databricks SDK를 참조하세요.

채팅 완료 모델 쿼리

다음은 채팅 모델을 쿼리하는 예제입니다. 이 예제는 모델 서비스 기능(파운데이션 모델 API 또는 외부 모델)을 사용하여 사용할 수 있는 채팅 모델을 쿼리하는 데 적용됩니다.

일괄 처리 유추 예제는 ai_query사용하여 일괄 처리 LLM 유추 수행을 참조하세요.

OpenAI 클라이언트

다음은 작업 공간에서 Foundation Model API의 토큰당 결제 엔드포인트인 databricks-dbrx-instruct을 통해 제공되는 DBRX Instruct 모델에 대한 채팅 요청입니다.

OpenAI 클라이언트를 사용하려면 엔드포인트 이름을 model 입력으로 제공하는 모델을 지정합니다.


from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
openai_client = w.serving_endpoints.get_open_ai_client()

response = openai_client.chat.completions.create(
    model="databricks-dbrx-instruct",
    messages=[
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is a mixture of experts model?",
      }
    ],
    max_tokens=256
)

작업 영역 외부에서 기본 모델을 쿼리하려면 OpenAI 클라이언트를 직접 사용해야 합니다. OpenAI 클라이언트를 Databricks에 연결하려면 Databricks 작업 영역 인스턴스도 필요합니다. 다음 예제에서는 Databricks API 토큰이 있고 openai 컴퓨팅에 설치되어 있다고 가정합니다.


import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

response = client.chat.completions.create(
    model="databricks-dbrx-instruct",
    messages=[
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is a mixture of experts model?",
      }
    ],
    max_tokens=256
)

REST API

Important

다음 예제에서는 REST API parameters 사용하여 기본 모델을 제공하는 서비스 엔드포인트를 쿼리합니다. 이 parameters는 공개 미리 보기이며 정의가 변경될 수 있습니다. POST /serving-endpoints/{name}/invocations를 참조하세요.

다음은 작업 공간에서 Foundation Model API의 토큰당 결제 엔드포인트인 databricks-dbrx-instruct을 통해 제공되는 DBRX Instruct 모델에 대한 채팅 요청입니다.

curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": " What is a mixture of experts model?"
    }
  ]
}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-dbrx-instruct/invocations \

MLflow 배포 SDK

Important

다음 예제에서는 predict()MLflow 배포 SDK의 API를 사용합니다.

다음은 작업 공간에서 Foundation Model API의 토큰당 결제 엔드포인트인 databricks-dbrx-instruct을 통해 제공되는 DBRX Instruct 모델에 대한 채팅 요청입니다.


import mlflow.deployments

# Only required when running this example outside of a Databricks Notebook
export DATABRICKS_HOST="https://<workspace_host>.databricks.com"
export DATABRICKS_TOKEN="dapi-your-databricks-token"

client = mlflow.deployments.get_deploy_client("databricks")

chat_response = client.predict(
    endpoint="databricks-dbrx-instruct",
    inputs={
        "messages": [
            {
              "role": "user",
              "content": "Hello!"
            },
            {
              "role": "assistant",
              "content": "Hello! How can I assist you today?"
            },
            {
              "role": "user",
              "content": "What is a mixture of experts model??"
            }
        ],
        "temperature": 0.1,
        "max_tokens": 20
    }
)

Databricks Python SDK

다음은 작업 공간에서 Foundation Model API의 토큰당 결제 엔드포인트인 databricks-dbrx-instruct을 통해 제공되는 DBRX Instruct 모델에 대한 채팅 요청입니다.

이 코드는 작업 영역의 Notebook에서 실행되어야 합니다. Azure Databricks Notebook에서 Python용 Databricks SDK 사용을 참조하세요.

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole

w = WorkspaceClient()
response = w.serving_endpoints.query(
    name="databricks-dbrx-instruct",
    messages=[
        ChatMessage(
            role=ChatMessageRole.SYSTEM, content="You are a helpful assistant."
        ),
        ChatMessage(
            role=ChatMessageRole.USER, content="What is a mixture of experts model?"
        ),
    ],
    max_tokens=128,
)
print(f"RESPONSE:\n{response.choices[0].message.content}")

LangChain

LangChain을 사용하여 기본 모델 엔드포인트를 쿼리하려면 ChatDatabricks ChatModel 클래스를 사용하고 endpoint을 지정합니다.

다음 예제에서는 LangChain의 ChatDatabricks ChatModel 클래스를 사용하여 Foundation Model API의 토큰당 지불 엔드포인트인 databricks-dbrx-instruct을 쿼리합니다.

%pip install databricks-langchain

from langchain_core.messages import HumanMessage, SystemMessage
from databricks_langchain import ChatDatabricks

messages = [
    SystemMessage(content="You're a helpful assistant"),
    HumanMessage(content="What is a mixture of experts model?"),
]

llm = ChatDatabricks(endpoint_name="databricks-dbrx-instruct")
llm.invoke(messages)

SQL

Important

다음 예제에서는 기본 제공 SQL 함수 ai_query를 사용합니다. 이 함수는 공개 미리 보기이며 정의가 변경될 수 있습니다. ai_query사용하여 제공된 모델 쿼리 참조하세요.

다음은 작업 공간에서 Foundation Model API의 토큰당 결제 엔드포인트인 meta-llama-3-1-70b-instruct을 통해 제공된 databricks-meta-llama-3-1-70b-instruct에 대한 채팅 요청입니다.

참고 항목

ai_query() 함수는 DBRX 또는 DBRX 지시 모델을 제공하는 쿼리 엔드포인트를 지원하지 않습니다.

SELECT ai_query(
    "databricks-meta-llama-3-1-70b-instruct",
    "Can you explain AI in ten words?"
  )

예를 들어 REST API를 사용할 때 채팅 모델에 필요한 요청 형식은 다음과 같습니다. 외부 모델의 경우 지정된 공급자 및 엔드포인트 구성에 유효한 추가 parameters 포함할 수 있습니다. 추가 쿼리 parameters를 참조하세요.

{
  "messages": [
    {
      "role": "user",
      "content": "What is a mixture of experts model?"
    }
  ],
  "max_tokens": 100,
  "temperature": 0.1
}

다음은 REST API를 사용하여 수행된 요청에 대한 예상 응답 형식입니다.

{
  "model": "databricks-dbrx-instruct",
  "choices": [
    {
      "message": {},
      "index": 0,
      "finish_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 74,
    "total_tokens": 81
  },
  "object": "chat.completion",
  "id": null,
  "created": 1698824353
}

포함 모델 쿼리

다음은 Foundation Model API에서 사용할 수 있는 gte-large-en 모델에 대한 포함 요청입니다. 이 예제는 모델 제공 기능인 파운데이션 모델 API 또는 외부 모델을 사용하여 사용할 수 있는 포함 모델을 쿼리하는 데 적용됩니다.

OpenAI 클라이언트

OpenAI 클라이언트를 사용하려면 엔드포인트 이름을 model 입력으로 제공하는 모델을 지정합니다.


from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
openai_client = w.serving_endpoints.get_open_ai_client()

response = openai_client.embeddings.create(
  model="databricks-gte-large-en",
  input="what is databricks"
)

작업 영역 외부에서 기본 모델을 쿼리하려면 아래와 같이 OpenAI 클라이언트를 직접 사용해야 합니다. 다음 예제에서는 컴퓨팅에 Databricks API 토큰과 openai가 설치되어 있다고 가정합니다. OpenAI 클라이언트를 Databricks에 연결하려면 Databricks 작업 영역 인스턴스도 필요합니다.


import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

response = client.embeddings.create(
  model="databricks-gte-large-en",
  input="what is databricks"
)

REST API

Important

다음 예제에서는 REST API parameters 사용하여 기본 모델 또는 외부 모델을 제공하는 서비스 엔드포인트를 쿼리합니다. 이러한 parameters은 공개 미리 보기이며 정의가 변경될 수 있습니다. POST /serving-endpoints/{name}/invocations를 참조하세요.


curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d  '{ "input": "Embed this sentence!"}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-gte-large-en/invocations

MLflow 배포 SDK

Important

다음 예제에서는 predict()MLflow 배포 SDK의 API를 사용합니다.


import mlflow.deployments

export DATABRICKS_HOST="https://<workspace_host>.databricks.com"
export DATABRICKS_TOKEN="dapi-your-databricks-token"

client = mlflow.deployments.get_deploy_client("databricks")

embeddings_response = client.predict(
    endpoint="databricks-gte-large-en",
    inputs={
        "input": "Here is some text to embed"
    }
)

Databricks Python SDK


from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole

w = WorkspaceClient()
response = w.serving_endpoints.query(
    name="databricks-gte-large-en",
    input="Embed this sentence!"
)
print(response.data[0].embedding)

LangChain

LangChain에서 Databricks Foundation Model API 모델을 포함 모델로 사용하려면 DatabricksEmbeddings 클래스를 가져오고 다음과 같이 endpoint 매개 변수를 지정합니다.

%pip install databricks-langchain

from databricks_langchain import DatabricksEmbeddings

embeddings = DatabricksEmbeddings(endpoint="databricks-gte-large-en")
embeddings.embed_query("Can you explain AI in ten words?")

SQL

Important


SELECT ai_query(
    "databricks-gte-large-en",
    "Can you explain AI in ten words?"
  )

다음은 embeddings 모델에 대한 예상 요청 형식입니다. 외부 모델의 경우 지정된 공급자 및 엔드포인트 구성에 유효한 추가 parameters 포함할 수 있습니다. 추가 쿼리 parameters를 참조하세요.


{
  "input": [
    "embedding text"
  ]
}

다음은 예상되는 응답 형식입니다.

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": []
    }
  ],
  "model": "text-embedding-ada-002-v2",
  "usage": {
    "prompt_tokens": 2,
    "total_tokens": 2
  }
}

포함이 정규화되었는지 확인

다음을 사용하여 모델에서 생성된 포함이 정규화되었는지 확인합니다.


  import numpy as np

  def is_normalized(vector: list[float], tol=1e-3) -> bool:
      magnitude = np.linalg.norm(vector)
      return abs(magnitude - 1) < tol

텍스트 완료 모델 쿼리

OpenAI 클라이언트

Important

OpenAI 클라이언트를 사용하여 파운데이션 모델 API 종량제 토큰을 사용하여 사용할 수 있는 텍스트 완성 모델 쿼리는 지원되지 않습니다. 이 섹션에서 설명한 대로 OpenAI 클라이언트를 사용하여 외부 모델 쿼리만 지원됩니다.

OpenAI 클라이언트를 사용하려면 엔드포인트 이름을 model 입력으로 제공하는 모델을 지정합니다. 다음 예제에서는 OpenAI 클라이언트를 사용하여 Anthropic이 호스팅하는 claude-2 완성 모델을 쿼리합니다. 이 예제에서는 OpenAI 클라이언트를 사용하여 쿼리하려는 모델을 호스트하는 모델 서빙 엔드포인트의 이름으로 model 필드를 채워 모델을 쿼리합니다.

이 예제에서는 Anthropic 모델 공급자에서 외부 모델에 액세스하도록 구성된 이전에 만든 엔드포인트 anthropic-completions-endpoint을(를) 사용합니다. 외부 모델 엔드포인트 생성 방법을 참조하세요.

추가적으로 쿼리할 수 있는 모델 및 지원되는 모델을 참조하고 해당 providers를 확인하세요.


from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
openai_client = w.serving_endpoints.get_open_ai_client()

completion = openai_client.completions.create(
model="anthropic-completions-endpoint",
prompt="what is databricks",
temperature=1.0
)
print(completion)

REST API

다음은 외부 모델을 사용하여 사용할 수 있는 완성 모델을 쿼리하기 위한 완료 요청입니다.

Important

다음 예제에서는 REST API parameters 사용하여 외부 모델을 제공하는 서비스 엔드포인트를 쿼리합니다. 이러한 공개 미리 보기 정의가 변경될 수 있습니다. POST /serving-endpoints/{name}/invocations를 참조하세요.


curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{"prompt": "What is a quoll?", "max_tokens": 64}' \
https://<workspace_host>.databricks.com/serving-endpoints/<completions-model-endpoint>/invocations

MLflow 배포 SDK

다음은 외부 모델을 사용하여 사용할 수 있는 완성 모델을 쿼리하기 위한 완료 요청입니다.

Important

다음 예제에서는 predict()MLflow 배포 SDK의 API를 사용합니다.


import os
import mlflow.deployments

# Only required when running this example outside of a Databricks Notebook

os.environ['DATABRICKS_HOST'] = "https://<workspace_host>.databricks.com"
os.environ['DATABRICKS_TOKEN'] = "dapi-your-databricks-token"

client = mlflow.deployments.get_deploy_client("databricks")

completions_response = client.predict(
    endpoint="<completions-model-endpoint>",
    inputs={
        "prompt": "What is the capital of France?",
        "temperature": 0.1,
        "max_tokens": 10,
        "n": 2
    }
)

# Print the response
print(completions_response)

Databricks Python SDK

다음은 외부 모델을 사용하여 사용할 수 있는 완성 모델을 쿼리하기 위한 완료 요청입니다.

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole

w = WorkspaceClient()
response = w.serving_endpoints.query(
    name="<completions-model-endpoint>",
    prompt="Write 3 reasons why you should train an AI model on domain specific data sets."
)
print(response.choices[0].text)

SQL

Important

SELECT ai_query(
    "<completions-model-endpoint>",
    "Can you explain AI in ten words?"
  )

다음은 완성 모델에 대한 예상 요청 형식입니다. 외부 모델의 경우 지정된 공급자 및 엔드포인트 구성에 유효한 추가 parameters 포함할 수 있습니다. 추가 쿼리 parameters를 참조하세요.

{
  "prompt": "What is mlflow?",
  "max_tokens": 100,
  "temperature": 0.1,
  "stop": [
    "Human:"
  ],
  "n": 1,
  "stream": false,
  "extra_params":
  {
    "top_p": 0.9
  }
}

다음은 예상되는 응답 형식입니다.

{
  "id": "cmpl-8FwDGc22M13XMnRuessZ15dG622BH",
  "object": "text_completion",
  "created": 1698809382,
  "model": "gpt-3.5-turbo-instruct",
  "choices": [
    {
    "text": "MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It provides tools for tracking experiments, managing and deploying models, and collaborating on projects. MLflow also supports various machine learning frameworks and languages, making it easier to work with different tools and environments. It is designed to help data scientists and machine learning engineers streamline their workflows and improve the reproducibility and scalability of their models.",
    "index": 0,
    "logprobs": null,
    "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 83,
    "total_tokens": 88
  }
}

AI 플레이그라운드를 사용하여 지원되는 LLM과 채팅

AI Playground를 사용하여 지원되는 대규모 언어 모델과 상호 작용할 수 있습니다. AI Playground는 Azure Databricks 작업 영역에서 LLM을 테스트, 프롬프트 및 비교할 수 where 채팅과 유사한 환경입니다.

API 플레이그라운드

다음을 통해 공유

생성 AI 모델 쿼리

요구 사항

패키지 설치

OpenAI 클라이언트

REST API

MLflow 배포 SDK

Databricks Python SDK

채팅 완료 모델 쿼리

OpenAI 클라이언트

REST API

MLflow 배포 SDK

Databricks Python SDK

LangChain

SQL

포함 모델 쿼리

OpenAI 클라이언트

REST API

MLflow 배포 SDK

Databricks Python SDK

LangChain

SQL

포함이 정규화되었는지 확인

텍스트 완료 모델 쿼리

OpenAI 클라이언트

REST API

MLflow 배포 SDK

Databricks Python SDK

SQL

AI 플레이그라운드를 사용하여 지원되는 LLM과 채팅

추가 리소스

피드백

추가 리소스