Data, privacy, and security for Azure OpenAI Service

아티클
09/03/2024

This article provides details regarding how data provided by you to the Azure OpenAI service is processed, used, and stored. Azure OpenAI stores and processes data to provide the service and to monitor for uses that violate the applicable product terms. Please also see the Microsoft Products and Services Data Protection Addendum, which governs data processing by the Azure OpenAI Service. Azure OpenAI is an Azure service; learn more about applicable Azure compliance offerings.

Important

Your prompts (inputs) and completions (outputs), your embeddings, and your training data:

are NOT available to other customers.
are NOT available to OpenAI.
are NOT used to improve OpenAI models.
are NOT used to train, retrain, or improve Azure OpenAI Service foundation models.
are NOT used to improve any Microsoft or 3rd party products or services without your permission or instruction.
Your fine-tuned Azure OpenAI models are available exclusively for your use.

The Azure OpenAI Service is operated by Microsoft as an Azure service; Microsoft hosts the OpenAI models in Microsoft’s Azure environment and the Service does NOT interact with any services operated by OpenAI (e.g. ChatGPT, or the OpenAI API).

What data does the Azure OpenAI Service process?

Azure OpenAI processes the following types of data:

Prompts and generated content. Prompts are submitted by the user, and content is generated by the service, via the completions, chat completions, images, and embeddings operations.
Uploaded data. You can provide your own data for use with certain service features (e.g., fine-tuning, assistants API, batch processing) using the Files API or vector store.
Data for stateful entities. When you use certain optional features of Azure OpenAI service, such as the Threads feature of the Assistants API, the service will create a data store to persist message history and other content, in accordance with how you configure the feature.
Augmented data included with or via prompts. When using data associated with stateful entities, the service retrieves relevant data from a configured data store and augments the prompt to produce generations that are grounded with your data. Prompts may also be augmented with data retrieved from a source included in the prompt itself, such as a URL.
Training & validation data. You can provide your own training data consisting of prompt-completion pairs for the purposes of fine-tuning an OpenAI model.

How does the Azure OpenAI Service process data?

The diagram below illustrates how your data is processed. This diagram covers several types of processing:

How the Azure OpenAI Service processes your prompts via inferencing to generate content (including when additional data from a designated data source is added to a prompt using Azure OpenAI on your data, Assistants, or batch processing).
How the Assistants feature stores data in connection with Messages, Threads, and Runs.
How the Batch feature processes your uploaded data.
How the Azure OpenAI Service creates a fine-tuned (custom) model with your uploaded data.
How the Azure OpenAI Service and Microsoft personnel analyze prompts and completions (text and image) for harmful content and for patterns suggesting the use of the service in a manner that violates the Code of Conduct or other applicable product terms.

As depicted in the diagram above, managed customers may apply to modify abuse monitoring.

Generating completions, images or embeddings through inferencing

Models (base or fine-tuned) deployed in your resource process your input prompts and generate responses with text, images, or embeddings. Customer interactions with the model are logically isolated and secured employing technical measures including but not limited to transport encryption of TLS1.2 or higher, compute security perimeter, tokenization of text, and exclusive access to allocated GPU memory. Prompts and completions are evaluated in real time for harmful content types and content generation is filtered based on configured thresholds. Learn more at Azure OpenAI Service content filtering.

Prompts and responses are processed within the customer-specified geography (unless you are using a Global deployment type), but may be processed between regions within the geography for operational purposes (including performance and capacity management). See below for information about location of processing when using a Global deployment type.

The models are stateless: no prompts or generations are stored in the model. Additionally, prompts and generations are not used to train, retrain, or improve the base models.

Understanding location of processing for "Global" deployment types

In addition to standard deployments, Azure OpenAI Service offers deployment options labelled as ‘Global.’ For any deployment type labeled as ‘Global,’ prompts and responses may be processed in any geography where the relevant Azure OpenAI model is deployed (learn more about region availability of models); any data stored at rest, such as uploaded data, is stored in the customer-designated geography. Only the location of processing is affected when a customer uses a Global deployment type in Azure OpenAI Service; Azure data processing and compliance commitments remain applicable.

Augmenting prompts to "ground" generated results "on your data"

The Azure OpenAI "on your data" feature lets you connect data sources to ground the generated results with your data. The data remains stored in the data source and location you designate; Azure OpenAI Service does not create a duplicate data store. When a user prompt is received, the service retrieves relevant data from the connected data source and augments the prompt. The model processes this augmented prompt and the generated content is returned as described above. Learn more about how to use the On Your Data feature securely.

Data storage for Azure OpenAI Service features

Some Azure OpenAI Service features store data in the service. This data is either uploaded by the customer, using the Files API or vector store, or is automatically stored in connection with certain stateful entities such as the Threads feature of the Assistants API. Data stored for Azure OpenAI Service features:

Is stored at rest in the Azure OpenAI resource in the customer’s Azure tenant, within the same geography as the Azure OpenAI resource;
Can be double encrypted at rest, by default with Microsoft's AES-256 encryption and optionally with a customer managed key (except preview features may not support customer managed keys);
Can be deleted by the customer at any time.

Stored data may be used with the following service features/capabilities:

Creating a customized (fine-tuned) model. Learn more about how fine-tuning works. Fine-tuned models are exclusively available to the customer whose data was used to create the fine-tuned model, are encrypted at rest (when not deployed for inferencing), and can be deleted by the customer at any time. Training data uploaded for fine-tuning is not used to train, retrain, or improve any Microsoft or 3rd party base models.
Batch processing (preview). Learn more about how batch processing works. Batch processing is a Global deployment type; data stored at rest remains in the designated Azure geography until processing capacity becomes available; processing may occur in any geography where the relevant Azure OpenAI model is deployed (learn more about region availability of models).
Assistants API (preview). Learn more about how the Assistants API works. Some features of Assistants, such as Threads, store message history and other content.

Preventing abuse and harmful content generation

To reduce the risk of harmful use of the Azure OpenAI Service, the Azure OpenAI Service includes both content filtering, safety evaluation of fine-tuned models, and abuse monitoring features. To learn more about content filtering, see Azure OpenAI Service content filtering. To learn more about safety evaluation, see safety evaluations of fine-tuned models. To learn more about abuse monitoring, see abuse monitoring.

Content filtering occurs synchronously as the service processes prompts to generate content as described above and here. No prompts or generated results are stored in the content classifier models, and prompts and results are not used to train, retrain, or improve the classifier models.

Safety evaluations of fine-tuned models evaluate a fine-tuned model for potentially harmful responses using Azure’s risk and safety metrics. Only the resulting assessment (deployable or not deployable) is logged by the service.

Azure OpenAI abuse monitoring detects and mitigates instances of recurring content and/or behaviors that suggest use of the service in a manner that may violate the code of conduct or other applicable product terms. To detect and mitigate abuse, Azure OpenAI stores all prompts and generated content securely for up to thirty (30) days. (No prompts or completions are stored if the customer is approved for and elects to configure abuse monitoring off, as described below.)

The data store where prompts and completions are stored is logically separated by customer resource (each request includes the resource ID of the customer’s Azure OpenAI resource). A separate data store is located in each geography in which the Azure OpenAI Service is available, and a customer’s prompts and generated content are stored in the Azure Geography where the customer’s Azure OpenAI service resource is deployed, within the Azure OpenAI service boundary. Human reviewers assessing potential abuse can access prompts and completions data only when that data has been flagged by the abuse monitoring system. The human reviewers are authorized Microsoft employees who access the data via point wise queries using request IDs, Secure Access Workstations (SAWs), and Just-In-Time (JIT) request approval granted by team managers. For Azure OpenAI Service deployed in the European Economic Area, the authorized Microsoft employees are located in the European Economic Area.

How can customers get an exemption from abuse monitoring and human review?

Some customers may want to use the Azure OpenAI Service for a use case that involves the processing of sensitive, highly confidential, or legally-regulated input data but where the likelihood of harmful outputs and/or misuse is low. These customers may conclude that they do not want or do not have the right to permit Microsoft to process such data for abuse detection, as described above, due to their internal policies or applicable legal regulations. To address these concerns, Microsoft allows customers who meet additional Limited Access criteria and attest to specific use cases to apply to modify Azure OpenAI abuse monitoring features by completing this form.

If Microsoft approves a customer's request to modify abuse monitoring, then Microsoft does not store any prompts and completions associated with the approved Azure subscription for which abuse monitoring is configured off. In this case, because no prompts and completions are stored at rest in the Service Results Store, the human review process is not possible and is not performed. See Abuse monitoring for more information.

How can a customer verify if data storage for abuse monitoring is off?

There are two ways for customers, once approved to turn off abuse monitoring, to verify that data storage for abuse monitoring has been turned off in their approved Azure subscription:

Using the Azure portal, or
Azure CLI (or any management API).

Note

The value of "false" for the "ContentLogging" attribute appears only if data storage for abuse monitoring is turned off. Otherwise, this property will not appear in either Azure portal or Azure CLI's output.

Prerequisites

Sign into Azure
Select the Azure Subscription which hosts the Azure OpenAI Service resource.
Navigate to the Overview page of the Azure OpenAI Service resource.

Using the Azure portal
Using the Azure CLI (or other management API)

Go to the resource Overview page
Click on the JSON view link on the top right corner as shown in the image below.

There will be a value in the Capabilities list called "ContentLogging" which will appear and be set to FALSE when logging for abuse monitoring is off.

{ 
    "name":"ContentLogging",
    "value":"false"
}

Execute the following command in Azure CLI to see the same JSON data as shown in the Azure portal above.

az cognitiveservices account show -n resource\_name -g resource \_group

To learn more about Microsoft's privacy and security commitments see the Microsoft Trust Center.

Change log

Date	Changes
4 September 2024	Added information (and revised existing text accordingly) about data processing for new features including Assistants API (preview), Batch (preview), and Global Deployments; revised language related to location of data processing, in accordance with Azure data residency principles; added information about data processing for safety evaluations of fine-tuned models; clarified commitments related to use of prompts and completions; minor revisions to improve clarity
23 June 2023	Added information about data processing for new Azure on your data feature; removed information about abuse monitoring which is now available at Azure OpenAI Service abuse monitoring. Added summary note. Updated and streamlined content and updated diagrams for additional clarity. added change log

Date

Changes

4 September 2024

Added information (and revised existing text accordingly) about data processing for new features including Assistants API (preview), Batch (preview), and Global Deployments; revised language related to location of data processing, in accordance with Azure data residency principles; added information about data processing for safety evaluations of fine-tuned models; clarified commitments related to use of prompts and completions; minor revisions to improve clarity

23 June 2023

Added information about data processing for new Azure on your data feature; removed information about abuse monitoring which is now available at Azure OpenAI Service abuse monitoring. Added summary note. Updated and streamlined content and updated diagrams for additional clarity. added change log

다음을 통해 공유