具有 LLM 提示的 Azure AI 影片索引器

發行項
11/01/2024

概觀

Azure AI 影片索引器會與大型語言模型（LLM）整合。 LLM 是自然語言 AI 模型，可用來詢問有關影片內容的問題等等。將 Azure AI 影片索引器深入解析擷取為可供 LLM 輕鬆使用的提示就緒格式。不需要重新編製影片的索引，即可建立影片的提示就緒格式。

使用案例

產生影片摘要： 您可以要求 LLM 模型產生整個影片或視訊區段的摘要。您可以結合這些區段來建立數種類型的摘要，例如資訊摘要、取戲器或其他摘要，視您的需求而定。

可搜尋性： 藉由將視訊內容轉換成以文字為基礎的提示就緒格式，您可以在影片內容中執行詳細的自然語言搜尋。這可大幅改善根據特定查詢的大型影片媒體櫃內的可探索性。

內容建立：您可以在與特定情緒或事件相關聯的影片中查詢影片庫的特定時刻。例如，您可以從影片系列擷取「有趣」或「悲傷」的時刻，並使用該片段來建立促銷或反白顯示。同樣地，您可以擷取與特定感興趣的事件相關的時刻，例如“過去十年的地震”。

教育目的：從講座影片建立摘要，讓學生更容易檢閱和瞭解材料。學生也可以詢問與講座材料相關的特定問題。您可以參考影片中討論文章的確切部分，讓學習體驗更有效率。

互動式體驗：您可以建立互動式體驗，例如視訊型聊天機器人或虛擬助理，以根據影片的內容回應用戶查詢。

運作方式

若要讓輸出準備好提示，影片會分割成符合視訊本質和提示大小的連貫區段。這些區段會根據 Azure AI 影片索引器場景分割和其他見解來分割。提示內容的結果會個別合併併產生每個區段。例如：

深入解析

下表包含用於產生提示的深入解析。

VI 深入解析	標記和格式
影片標題	[影片標題] <影片標題>
物件偵測	[偵測到的物件] <object 1>， <object 2>， ...
標籤	[視覺標籤] <label 1>， <label 2>， ...
OCR	[OCR] <ocr cluster1><ocr cluster2> ...
文字記錄和演講者	[文字記錄] <說話者名稱>： <文字記錄行>\n<說話者名稱>： <文字記錄行>\n ...
臉部	[已知人員] <臉部 1>， <臉部 2>， ...
音訊效果（AED）	[音訊效果] < 效果 1>， <效果 2>， ...
區段在影片中的位置	[標記][開始，中間，結束，滾動點數]

建立影片的提示內容

使用索引影片上的提示內容 API，以取得每個區段的提示就緒格式。

注意

提示內容深入解析會受限於用來編製影片索引的特定預設。

若要產生提示內容 API，請使用 POST 建立提示內容要求。
若要檢視提示內容，請使用取得 PromptContent 要求。

範例要求

使用您的 AVI 帳戶標識碼和視訊標識碼。

POST https://api.videoindexer.ai/trial/Accounts/{accountId}/Videos/{videoId}/PromptContent

範例回應

index
{
  "algoVersion": "2.0.0",
  "schemaVersion": "0.0.1",
  "partition": null,
  "name": "10_best_dressed_grammy",
  "sections": [
    {
      "id": 0,
      "start": "0:00:00",
      "end": "0:00:40.915875",
      "content": "[Video title] 10_best_dressed_grammy\n[Detected objects] necktie\n[Visual labels] human face, clothing, person, woman, suit, wedding dress, dress, indoor, wall, carpet, rug, fashion, lady, long hair, fashion accessory, fashion design\n[OCR] TROPHy, LIFE, SPECIAL, EDITION, news FEED, BY

 CLEVVER, CLEVVER, @NazPerez, BEST DRESSED CELEBS AT 2018 GRAMMYS\n[Transcript] Check out the 10 best dressed celebs from the 2018 Grammy Awards and don't forget to subscribe to our channel to get all the latest celebrity updates.\nFrom white roses to white hot looks, this year's Grammy Awards was a feast of fashion thanks to so many celebs bringing their A game to the show.\nSo let's kick off this list of the best dress from the red carpet, starting with Lady Gaga.\nGaga looked like a gothic Princess in her dramatic all black ball gown.\nThe Armani Preve dress featured A Lacy bodysuit and billowing black skirt with a huge train.\nAga's black heeled boots were also some of the highest we've ever seen, like ever, but we wouldn't expect anything less from Mama Monster.\nAnother look we love from the carpet was Anna Kendrick's sexy suit by Belmont."
    },
    {
      "id": 1,
      "start": "0:00:40.915875",
      "end": "0:01:17.202125",
      "content": "[Video title] 10_best_dressed_grammy\n[Detected objects] remote\n[Visual labels] human face, clothing, person, dress, carpet, rug, fashion, lady, furniture, female person, fashion model, model, haute couture, smile\n[OCR] TROPHy, LIFE, news FEED, BEST DRESSED CELEBS AT 2018 GRAMMYS, D CELEBS AT 2018 GRAMMYS, BEST DRESSED\n[Transcript] Anna gave the structured look a sexy feminine touch by wearing a Lacy strapless top underneath and some pale pink stilettos.\nHer suit may have said business, but her relaxed WAVY hairstyle said I came to get down.\nNext on our list is the literally red hot Camila Cabello.\nCamila was all glitzing glam in her strapless Vivian Westwood gown.\nThat humped her curves perfectly.\nCamila opted to wear her hair up and accessorized with some serious bling, but it's that plunging neckline that has this unable to look away.\nAnother look we loved came courtesy of Miley Cyrus, who absolutely slayed in this black velvet bodysuit.\nMiley looked beyond chic, from her classic Hollywood hairstyle to her glitter heels."
    },
}

檢查作業狀態

提示作業需要幾分鐘的時間才能完成。如果您想要檢查作業狀態，您可以使用取得作業狀態要求。

使用主要畫面格以可視化方式提示大型語言模型

提示內容要求支援可在提示中使用視覺輸入的語言模型。選取 GPT-4V 模型時，您可以在提供給模型的提示中包含主要畫面格。提示內容回應中傳回的畫面格代表影片的主要畫面格。對於影片中具有有限或沒有文字記錄的影片，或想要為語言模型提供更多內容以改善結果時，建議使用此功能。

建立並傳送提示內容要求

如上所述，提示的文字內容位於 JSON 回應中。 JSON 回應中「框架」部分的每個字串都是主要畫面格的標識碼。使用取得視訊縮圖 ThumbnailId 是來自提示內容的 FrameId。一旦您同時擁有文字內容和主要畫面格成品，就可以將它們合併為您選擇的 AI 模型提示。

限制

提示功能已針對包含盡可能多見解的影片進行優化。

共用方式為