Datasets - Create
Uploads and creates a new dataset by getting the data from a specified URL or starts waiting for data blocks to be uploaded.
POST {endpoint}/speechtotext/v3.2-preview.2/datasets
URI Parameters
Name | In | Required | Type | Description |
---|---|---|---|---|
endpoint
|
path | True |
string |
Supported Cognitive Services endpoints (protocol and hostname, for example: https://westus.api.cognitive.microsoft.com). |
Request Body
Name | Required | Type | Description |
---|---|---|---|
displayName | True |
string |
The display name of the object. |
kind | True |
DatasetKind |
|
locale | True |
string |
The locale of the contained data. |
contentUrl |
string |
The URL of the data for the dataset. |
|
customProperties |
object |
The custom properties of this entity. The maximum allowed key length is 64 characters, the maximum allowed value length is 256 characters and the count of allowed entries is 10. |
|
description |
string |
The description of the object. |
|
project |
EntityReference |
||
properties |
DatasetProperties |
Responses
Name | Type | Description |
---|---|---|
201 Created |
The response contains information about the entity as payload and its location as header. Headers Location: string |
|
Other Status Codes |
An error occurred. |
Security
Ocp-Apim-Subscription-Key
Provide your cognitive services account key here.
Type:
apiKey
In:
header
Authorization
Provide an access token from the JWT returned by the STS of this region. Make sure to add the management scope to the token by adding the following query string to the STS URL: ?scope=speechservicesmanagement
Type:
apiKey
In:
header
Examples
Create a dataset with content url |
Create dataset from data blocks |
Create a dataset with content url
Sample request
POST {endpoint}/speechtotext/v3.2-preview.2/datasets
{
"kind": "Acoustic",
"contentUrl": "https://contoso.com/location",
"locale": "en-US",
"displayName": "My speech dataset name",
"description": "My speech dataset description"
}
Sample response
Location: https://westus.api.cognitive.microsoft.com/speechtotext/v3.2-preview.2/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1
{
"self": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.2-preview.2/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1",
"kind": "Acoustic",
"contentUrl": "https://www.contoso.com/acousticdata/sourcelocation",
"links": {
"files": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.2-preview.2/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1/files"
},
"properties": {
"textNormalizationKind": "Default",
"acceptedLineCount": 11,
"rejectedLineCount": 2,
"duration": "PT4M12S"
},
"lastActionDateTime": "2019-01-07T11:36:07Z",
"status": "Succeeded",
"createdDateTime": "2019-01-07T11:34:12Z",
"locale": "en-US",
"displayName": "Acoustic dataset"
}
Create dataset from data blocks
Sample request
POST {endpoint}/speechtotext/v3.2-preview.2/datasets
{
"kind": "Acoustic",
"locale": "en-US",
"displayName": "My speech dataset name",
"description": "My speech dataset description"
}
Sample response
{
"self": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.2-preview.2/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1",
"kind": "Acoustic",
"links": {
"files": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.2-preview.2/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1/files",
"commitBlocks": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.2-preview.2/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1/blocks:commit",
"listBlocks": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.2-preview.2/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1/blocks",
"uploadBlocks": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.2-preview.2/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1/blocks"
},
"lastActionDateTime": "2019-01-07T11:36:07Z",
"status": "NotStarted",
"createdDateTime": "2019-01-07T11:34:12Z",
"locale": "en-US",
"displayName": "Acoustic dataset"
}
Definitions
Name | Description |
---|---|
Dataset |
Dataset |
Dataset |
DatasetKind |
Dataset |
DatasetLinks |
Dataset |
DatasetProperties |
Detailed |
DetailedErrorCode |
Entity |
EntityError |
Entity |
EntityReference |
Error |
Error |
Error |
ErrorCode |
Inner |
InnerError |
Status |
Status |
Text |
TextNormalizationKind |
Dataset
Dataset
Name | Type | Description |
---|---|---|
contentUrl |
string |
The URL of the data for the dataset. |
createdDateTime |
string |
The time-stamp when the object was created. The time stamp is encoded as ISO 8601 date and time format ("YYYY-MM-DDThh:mm:ssZ", see https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations). |
customProperties |
object |
The custom properties of this entity. The maximum allowed key length is 64 characters, the maximum allowed value length is 256 characters and the count of allowed entries is 10. |
description |
string |
The description of the object. |
displayName |
string |
The display name of the object. |
kind |
DatasetKind |
|
lastActionDateTime |
string |
The time-stamp when the current status was entered. The time stamp is encoded as ISO 8601 date and time format ("YYYY-MM-DDThh:mm:ssZ", see https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations). |
links |
DatasetLinks |
|
locale |
string |
The locale of the contained data. |
project |
EntityReference |
|
properties |
DatasetProperties |
|
self |
string |
The location of this entity. |
status |
Status |
DatasetKind
DatasetKind
Name | Type | Description |
---|---|---|
Acoustic |
string |
An acoustic dataset. |
AudioFiles |
string |
An audio files dataset. |
Language |
string |
A language dataset. |
LanguageMarkdown |
string |
A language markdown dataset. |
OutputFormatting |
string |
Dataset that contains rules to customize inverse text normalization, capitalization, reformulation, profanity and also defines tests for dataset validation |
Pronunciation |
string |
A pronunciation dataset. |
DatasetLinks
DatasetLinks
Name | Type | Description |
---|---|---|
commitBlocks |
string |
The location to commit the list of blocks when uploading a dataset using blocks. See operation "Datasets_CommitBlocks" for more details. |
files |
string |
The location to get all files of this entity. See operation "Datasets_ListFiles" for more details. |
listBlocks |
string |
The location to list the already uploaded blocks of this entity when uploading a dataset using blocks. See operation "Datasets_GetBlocks" for more details. |
uploadBlocks |
string |
The location to upload blocks to when uploading a dataset using blocks. See operation "Datasets_UploadBlock" for more details. |
DatasetProperties
DatasetProperties
Name | Type | Description |
---|---|---|
acceptedLineCount |
integer |
The number of lines accepted for this data set. |
duration |
string |
The total duration of the datasets if it contains audio files. The duration is encoded as ISO 8601 duration ("PnYnMnDTnHnMnS", see https://en.wikipedia.org/wiki/ISO_8601#Durations). |
string |
The email address to send email notifications to in case the operation completes. The value will be removed after successfully sending the email. |
|
error |
EntityError |
|
rejectedLineCount |
integer |
The number of lines rejected for this data set. |
textNormalizationKind |
TextNormalizationKind |
DetailedErrorCode
DetailedErrorCode
Name | Type | Description |
---|---|---|
DataImportFailed |
string |
Data import failed. |
DeleteNotAllowed |
string |
Delete not allowed. |
DeployNotAllowed |
string |
Deploy not allowed. |
DeployingFailedModel |
string |
Deploying failed model. |
EmptyRequest |
string |
Empty Request. |
EndpointCannotBeDefault |
string |
Endpoint cannot be default. |
EndpointNotUpdatable |
string |
Endpoint not updatable. |
EndpointWithoutLogging |
string |
Endpoint without logging. |
ExceededNumberOfRecordingsUris |
string |
Exceeded number of recordings uris. |
FailedDataset |
string |
Failed dataset. |
Forbidden |
string |
Forbidden. |
InUseViolation |
string |
In use violation. |
InaccessibleCustomerStorage |
string |
Inaccessible customer storage. |
InvalidAdaptationMapping |
string |
Invalid adaptation mapping. |
InvalidBaseModel |
string |
Invalid base model. |
InvalidCallbackUri |
string |
Invalid callback uri. |
InvalidCollection |
string |
Invalid collection. |
InvalidDataset |
string |
Invalid dataset. |
InvalidDocument |
string |
Invalid Document. |
InvalidDocumentBatch |
string |
Invalid Document Batch. |
InvalidLocale |
string |
Invalid locale. |
InvalidLogDate |
string |
Invalid log date. |
InvalidLogEndTime |
string |
Invalid log end time. |
InvalidLogId |
string |
Invalid log id. |
InvalidLogStartTime |
string |
Invalid log start time. |
InvalidModel |
string |
Invalid model. |
InvalidModelUri |
string |
Invalid model uri. |
InvalidParameter |
string |
Invalid parameter. |
InvalidParameterValue |
string |
Invalid parameter value. |
InvalidPayload |
string |
Invalid payload. |
InvalidPermissions |
string |
Invalid permissions. |
InvalidPrerequisite |
string |
Invalid prerequisite. |
InvalidProductId |
string |
Invalid product id. |
InvalidProject |
string |
Invalid project. |
InvalidProjectKind |
string |
Invalid project kind. |
InvalidRecordingsUri |
string |
Invalid recordings uri. |
InvalidRequestBodyFormat |
string |
Invalid request body format. |
InvalidSasValidityDuration |
string |
Invalid sas validity duration. |
InvalidSkipTokenForLogs |
string |
Invalid skip token for logs. |
InvalidSourceAzureResourceId |
string |
Invalid source Azure resource ID. |
InvalidSubscription |
string |
Invalid subscription. |
InvalidTest |
string |
Invalid test. |
InvalidTimeToLive |
string |
Invalid time to live. |
InvalidTopForLogs |
string |
Invalid top for logs. |
InvalidTranscription |
string |
Invalid transcription. |
InvalidWebHookEventKind |
string |
Invalid web hook event kind. |
MissingInputRecords |
string |
Missing Input Records. |
ModelCopyOperationExists |
string |
Model copy operation exists. |
ModelDeploymentNotCompleteState |
string |
Model deployment not complete state. |
ModelDeprecated |
string |
Model deprecated. |
ModelExists |
string |
Model exists. |
ModelMismatch |
string |
Model mismatch. |
ModelNotDeployable |
string |
Model not deployable. |
ModelVersionIncorrect |
string |
Model Version Incorrect. |
NoUtf8WithBom |
string |
No utf8 with bom. |
OnlyOneOfUrlsOrContainerOrDataset |
string |
Only one of urls or container or dataset. |
ProjectGenderMismatch |
string |
Project gender mismatch. |
QuotaViolation |
string |
Quota violation. |
SingleDefaultEndpoint |
string |
Single default endpoint. |
SkuLimitsExist |
string |
Sku limits exist. |
SubscriptionNotFound |
string |
Subscription not found. |
UnexpectedError |
string |
Unexpected error. |
UnsupportedClassBasedAdaptation |
string |
Unsupported class based adaptation. |
UnsupportedDelta |
string |
Unsupported delta. |
UnsupportedDynamicConfiguration |
string |
Unsupported dynamic configuration. |
UnsupportedFilter |
string |
Unsupported filter. |
UnsupportedLanguageCode |
string |
Unsupported language code. |
UnsupportedOrderBy |
string |
Unsupported order by. |
UnsupportedPagination |
string |
Unsupported pagination. |
UnsupportedTimeRange |
string |
Unsupported time range. |
EntityError
EntityError
Name | Type | Description |
---|---|---|
code |
string |
The code of this error. |
message |
string |
The message for this error. |
EntityReference
EntityReference
Name | Type | Description |
---|---|---|
self |
string |
The location of the referenced entity. |
Error
Error
Name | Type | Description |
---|---|---|
code |
ErrorCode |
|
details |
Error[] |
Additional supportive details regarding the error and/or expected policies. |
innerError |
InnerError |
|
message |
string |
High level error message. |
target |
string |
The source of the error. For example it would be "documents" or "document id" in case of invalid document. |
ErrorCode
ErrorCode
Name | Type | Description |
---|---|---|
Conflict |
string |
Representing the conflict error code. |
Forbidden |
string |
Representing the forbidden error code. |
InternalCommunicationFailed |
string |
Representing the internal communication failed error code. |
InternalServerError |
string |
Representing the internal server error error code. |
InvalidArgument |
string |
Representing the invalid argument error code. |
InvalidRequest |
string |
Representing the invalid request error code. |
NotAllowed |
string |
Representing the not allowed error code. |
NotFound |
string |
Representing the not found error code. |
PipelineError |
string |
Representing the pipeline error error code. |
ServiceUnavailable |
string |
Representing the service unavailable error code. |
TooManyRequests |
string |
Representing the too many requests error code. |
Unauthorized |
string |
Representing the unauthorized error code. |
UnprocessableEntity |
string |
Representing the unprocessable entity error code. |
UnsupportedMediaType |
string |
Representing the unsupported media type error code. |
InnerError
InnerError
Name | Type | Description |
---|---|---|
code |
DetailedErrorCode |
|
details |
object |
Additional supportive details regarding the error and/or expected policies. |
innerError |
InnerError |
|
message |
string |
High level error message. |
target |
string |
The source of the error. For example it would be "documents" or "document id" in case of invalid document. |
Status
Status
Name | Type | Description |
---|---|---|
Failed |
string |
The long running operation has failed. |
NotStarted |
string |
The long running operation has not yet started. |
Running |
string |
The long running operation is currently processing. |
Succeeded |
string |
The long running operation has successfully completed. |
TextNormalizationKind
TextNormalizationKind
Name | Type | Description |
---|---|---|
Default |
string |
Default text normalization (e.g. '2 to 3' is replaced by 'two to three' in en-US). |
None |
string |
No text normalization will be applied to the input text. This is an override option that should only be used when text is normalized before the upload. |