クイックスタート: 通話後の文字起こしと分析

[アーティクル]
09/12/2024

言語サービスのドキュメント | Language Studio | 音声サービスのドキュメント | Speech Studio

この C# クイックスタートでは、コールセンターの文字起こしでの感情分析と会話要約を実行します。このサンプルでは、機密情報を自動的に識別、分類、編集します。このクイックスタートでは、Azure Cognitive Speech サービスと Azure Cognitive Language サービスの機能を使用するクロスサービスシナリオを実装します。

ヒント

言語および音声サービスを使用してコールセンターの会話を分析する方法のデモンストレーションとして、言語スタジオまたは Speech Studio をお試しください。

コードなしのアプローチでコールセンターの文字起こしソリューションを Azure にデプロイするには、インジェストクライアントを試してください。

クイックスタートでは次の音声機能用の Azure AI サービスが使用されます。

バッチ文字起こし: 文字起こし用のオーディオファイルのバッチを送信します。
スピーカーの分離: モノラル 16khz 16 ビット PCM wav ファイルのダイアライゼーションによって複数のスピーカーを分離します。

言語サービスには、使用される次の機能が用意されています。

個人を特定できる情報 (PII) の抽出と編集: 会話の文字起こしで機密情報を識別、分類、編集します。
会話の概要作成: 各会話参加者が問題と解決策について述べたことを要約テキストで要約します。たとえば、コールセンターでは件数の多い製品の問題をグループ化できます。
感情分析とオピニオンマイニング: 文字起こしを分析し、発話と会話レベルで肯定的、中立的、または否定的な感情を関連付けます。

前提条件

Azure サブスクリプション - 無料アカウントを作成します
Azure portal でマルチサービスリソースを作成します。このクイックスタートでは、Azure AI サービスマルチサービスリソースが 1 つだけ必要です。サンプルコードでは、個別の言語リソースキーと音声リソースキーを指定できます。
リソースキーとリージョンを取得します。 Azure AI サービスリソースがデプロイされたら、[リソースに移動] を選択して、キーを表示および管理します。

重要

このクイックスタートでは、会話要約にアクセスする必要があります。アクセスするには、オンラインリクエストを送信し、承認を受ける必要があります。

このクイックスタートでの --languageKey と --languageEndpoint の値は、会話要約 API (eastus、northeurope、uksouth) でサポートされているいずれかのリージョンにあるリソースに対応している必要があります。

C# を使用して通話後の文字起こし分析を実行する

通話後の文字起こし分析のクイックスタートコード例をビルドして実行するには、次の手順に従います。

scenarios/csharp/dotnetcore/call-center/ サンプルファイルを GitHub からコピーします。 Git がインストールされている場合は、コマンドプロンプトを開き、git clone コマンドを実行して Speech SDK サンプルリポジトリをダウンロードします。
```
git clone https://github.com/Azure-Samples/cognitive-services-speech-sdk.git
```
コマンドプロンプトを開き、プロジェクトディレクトリに移動します。
```
cd <your-local-path>/scenarios/csharp/dotnetcore/call-center/call-center/
```
.NET CLI を使ってプロジェクトをビルドします。
```
dotnet build
```
任意のコマンドライン引数を指定してアプリケーションを実行します。使用できるオプションについては、「使用法と引数」を参照してください。

GitHub のサンプルオーディオファイルから文字起こしする例を次に示します。
```
dotnet run --languageKey YourResourceKey --languageEndpoint YourResourceEndpoint --speechKey YourResourceKey --speechRegion YourResourceRegion --input "https://github.com/Azure-Samples/cognitive-services-speech-sdk/raw/master/scenarios/call-center/sampledata/Call1_separated_16k_health_insurance.wav" --stereo  --output summary.json
```
入力用の文字起こしが既にある場合は、次に示す、言語リソースのみを必要とする例を参照してください。
```
dotnet run --languageKey YourResourceKey --languageEndpoint YourResourceEndpoint --jsonInput "YourTranscriptionFile.json" --stereo  --output summary.json
```
YourResourceKey を Azure AI サービスリソースキーに置き換え、YourResourceRegion を Azure AI サービスリソースのリージョン (eastus など) に置き換え、YourResourceEndpoint を Azure AI サービスエンドポイントに置き換えます。 --input および --output に指定されたパスが有効であることを確認します。そうでない場合は、パスを変更する必要があります。

重要

終わったらコードからキーを削除し、公開しないよう注意してください。運用環境では、Azure Key Vault などの資格情報を格納してアクセスする安全な方法を使用します。詳細については、Azure AI サービスのセキュリティに関する記事を参照してください。

結果をチェックする

コンソールの出力には、完全な会話と要約が表示されます。次に示すのは、短く編集した要約全体の例です。

Conversation summary:
    issue: Customer wants to sign up for insurance.
    resolution: Customer was advised that customer would be contacted by the insurance company.

--output FILE (省略可能な引数) を指定すると、結果の JSON バージョンがファイルに書き込まれます。ファイル出力は、バッチ文字起こし (音声)、感情 (言語)、会話要約 (言語) API からの JSON 応答の組み合わせです。

transcription プロパティには、感情分析の結果がバッチ文字起こしと統合された JSON オブジェクトが含まれています。短く編集した例を次に示します。

{
    "source": "https://github.com/Azure-Samples/cognitive-services-speech-sdk/raw/master/scenarios/call-center/sampledata/Call1_separated_16k_health_insurance.wav",
// Example results redacted for brevity
        "nBest": [
          {
            "confidence": 0.77464247,
            "lexical": "hello thank you for calling contoso who am i speaking with today",
            "itn": "hello thank you for calling contoso who am i speaking with today",
            "maskedITN": "hello thank you for calling contoso who am i speaking with today",
            "display": "Hello, thank you for calling Contoso. Who am I speaking with today?",
            "sentiment": {
              "positive": 0.78,
              "neutral": 0.21,
              "negative": 0.01
            }
          },
        ]
// Example results redacted for brevity
}

conversationAnalyticsResults プロパティには、会話 PII と会話要約分析の結果を含む JSON オブジェクトが含まれています。短く編集した例を次に示します。

{
  "conversationAnalyticsResults": {
    "conversationSummaryResults": {
      "conversations": [
        {
          "id": "conversation1",
          "summaries": [
            {
              "aspect": "issue",
              "text": "Customer wants to sign up for insurance"
            },
            {
              "aspect": "resolution",
              "text": "Customer was advised that customer would be contacted by the insurance company"
            }
          ],
          "warnings": []
        }
      ],
      "errors": [],
      "modelVersion": "2022-05-15-preview"
    },
    "conversationPiiResults": {
      "combinedRedactedContent": [
        {
          "channel": "0",
          "display": "Hello, thank you for calling Contoso. Who am I speaking with today? Hi, ****. Uh, are you calling because you need health insurance?", // Example results redacted for brevity
          "itn": "hello thank you for calling contoso who am i speaking with today hi **** uh are you calling because you need health insurance", // Example results redacted for brevity
          "lexical": "hello thank you for calling contoso who am i speaking with today hi **** uh are you calling because you need health insurance" // Example results redacted for brevity
        },
        {
          "channel": "1",
          "display": "Hi, my name is **********. I'm trying to enroll myself with Contoso. Yes. Yeah, I'm calling to sign up for insurance.", // Example results redacted for brevity
          "itn": "hi my name is ********** i'm trying to enroll myself with contoso yes yeah i'm calling to sign up for insurance", // Example results redacted for brevity
          "lexical": "hi my name is ********** i'm trying to enroll myself with contoso yes yeah i'm calling to sign up for insurance" // Example results redacted for brevity
        }
      ],
      "conversations": [
        {
          "id": "conversation1",
          "conversationItems": [
            {
              "id": "0",
              "redactedContent": {
                "itn": "hello thank you for calling contoso who am i speaking with today",
                "lexical": "hello thank you for calling contoso who am i speaking with today",
                "text": "Hello, thank you for calling Contoso. Who am I speaking with today?"
              },
              "entities": [],
              "channel": "0",
              "offset": "PT0.77S"
            },
            {
              "id": "1",
              "redactedContent": {
                "itn": "hi my name is ********** i'm trying to enroll myself with contoso",
                "lexical": "hi my name is ********** i'm trying to enroll myself with contoso",
                "text": "Hi, my name is **********. I'm trying to enroll myself with Contoso."
              },
              "entities": [
                {
                  "text": "Mary Rondo",
                  "category": "Name",
                  "offset": 15,
                  "length": 10,
                  "confidenceScore": 0.97
                }
              ],
              "channel": "1",
              "offset": "PT4.55S"
            },
            {
              "id": "2",
              "redactedContent": {
                "itn": "hi **** uh are you calling because you need health insurance",
                "lexical": "hi **** uh are you calling because you need health insurance",
                "text": "Hi, ****. Uh, are you calling because you need health insurance?"
              },
              "entities": [
                {
                  "text": "Mary",
                  "category": "Name",
                  "offset": 4,
                  "length": 4,
                  "confidenceScore": 0.93
                }
              ],
              "channel": "0",
              "offset": "PT9.55S"
            },
            {
              "id": "3",
              "redactedContent": {
                "itn": "yes yeah i'm calling to sign up for insurance",
                "lexical": "yes yeah i'm calling to sign up for insurance",
                "text": "Yes. Yeah, I'm calling to sign up for insurance."
              },
              "entities": [],
              "channel": "1",
              "offset": "PT13.09S"
            },
// Example results redacted for brevity
          ],
          "warnings": []
        }
      ]
    }
  }
}

使用法と引数

使用法: call-center -- [...]

重要

マルチサービスリソース、または個別の言語リソースと音声リソースを使用できます。どちらの場合も、--languageKey と --languageEndpoint の値は、会話要約 API (eastus、northeurope、uksouth) でサポートされているいずれかのリージョンにあるリソースに対応している必要があります。

次のような接続オプションがあります。

--speechKey KEY: Azure AI サービスまたは音声リソースキー。音声の文字起こしに必要です (URL オプションから --input を使用)。
--speechRegion REGION: Azure AI サービスまたは音声リソースリージョン。音声の文字起こしに必要です (URL オプションから --input を使用)。例: eastus、northeurope
--languageKey KEY: Azure AI サービスまたは言語リソースキー。必須。
--languageEndpoint ENDPOINT: Azure AI サービスまたは言語リソースエンドポイント。必須。例: https://YourResourceName.cognitiveservices.azure.com

次のような入力オプションがあります。

--input URL: URL から音声を入力します。 --input オプションまたは --jsonInput オプションを設定する必要があります。
--jsonInput FILE: ファイルから既存のバッチ文字起こし JSON 結果を入力します。このオプションを使用する場合、必要なのは、既にある文字起こしを処理するための言語リソースのみです。このオプションを使用する場合、オーディオファイルや音声リソースは必要ありません。 --input をオーバーライドします。 --input オプションまたは --jsonInput オプションを設定する必要があります。
--stereo: ```input URL` を介した音声がステレオ形式である必要があることを示します。ステレオを指定しない場合は、モノラル 16khz 16 ビット PCM wav ファイルと見なされます。モノラルファイルのダイアライゼーションは、複数のスピーカーを分離するために使用されます。ステレオファイルのダイアライゼーションはサポートされていません。2 チャンネルのステレオファイルには、チャネルごとに 1 つのスピーカーが既に存在する必要があるためです。
--certificate: PEM 証明書ファイル。 C++ に必要です。

次のような言語オプションがあります。

--language LANGUAGE: 感情分析と会話分析に使用する言語。この値は、2 文字の ISO 639-1 コードである必要があります。既定値は en です。
--locale LOCALE: 音声のバッチ文字起こしに使用するロケール。既定値は en-US です。

次のような出力オプションがあります。

--help: 使用法に関するヘルプを表示して停止します
--output FILE: 文字起こし、感情、会話 PII、会話要約を JSON 形式でテキストファイルに出力します。詳細については、出力例を参照してください。

リソースをクリーンアップする

Azure portal または Azure コマンドラインインターフェイス (CLI) を使用して、作成した Azure AI サービスリソースを削除できます。

次の手順

インジェストクライアントを試す

次の方法で共有