如何使用Microsoft網狀架構 Rest API 建立及更新 Spark 作業定義

發行項
10/15/2024

Microsoft Fabric Rest API 提供 Fabric 項目的 CRUD 作業服務端點。在本教學課程中，我們會逐步解說如何建立和更新Spark作業定義成品的端對端案例。涉及三個高階步驟：

建立具有一些初始狀態的Spark作業定義項目
上傳主要定義檔和其他 lib 檔案
使用主要定義檔和其他 lib 檔案的 OneLake URL 更新 Spark 作業定義項目

必要條件

需要Microsoft Entra 令牌才能存取網狀架構 Rest API。建議使用 MSAL 連結庫來取得令牌。如需詳細資訊，請參閱 MSAL 中的驗證流程支援。
需要儲存體令牌才能存取 OneLake API。如需詳細資訊，請參閱 MSAL for Python。

建立具有初始狀態的Spark作業定義項目

Microsoft網狀架構 Rest API 會定義網狀架構項目的 CRUD 作業統一端點。端點為 https://api.fabric.microsoft.com/v1/workspaces/{workspaceId}/items。

項目詳細資料是在要求本文內指定。以下是建立Spark作業定義項目的要求本文範例：

{
    "displayName": "SJDHelloWorld",
    "type": "SparkJobDefinition",
    "definition": {
        "format": "SparkJobDefinitionV1",
        "parts": [
            {
                "path": "SparkJobDefinitionV1.json",
                "payload":"eyJleGVjdXRhYmxlRmlsZSI6bnVsbCwiZGVmYXVsdExha2Vob3VzZUFydGlmYWN0SWQiOiIiLCJtYWluQ2xhc3MiOiIiLCJhZGRpdGlvbmFsTGFrZWhvdXNlSWRzIjpbXSwicmV0cnlQb2xpY3kiOm51bGwsImNvbW1hbmRMaW5lQXJndW1lbnRzIjoiIiwiYWRkaXRpb25hbExpYnJhcnlVcmlzIjpbXSwibGFuZ3VhZ2UiOiIiLCJlbnZpcm9ubWVudEFydGlmYWN0SWQiOm51bGx9",
                "payloadType": "InlineBase64"
            }
        ]
    }
}

在這裡範例中，Spark 作業定義項目的名稱為SJDHelloWorld。欄位 payload 是詳細設定的base64編碼內容，譯碼之後的內容為：

{
    "executableFile":null,
    "defaultLakehouseArtifactId":"",
    "mainClass":"",
    "additionalLakehouseIds":[],
    "retryPolicy":null,
    "commandLineArguments":"",
    "additionalLibraryUris":[],
    "language":"",
    "environmentArtifactId":null
}

以下是用來編碼和譯碼詳細設定的兩個協助程式函式：

import base64

def json_to_base64(json_data):
    # Serialize the JSON data to a string
    json_string = json.dumps(json_data)
    
    # Encode the JSON string as bytes
    json_bytes = json_string.encode('utf-8')
    
    # Encode the bytes as Base64
    base64_encoded = base64.b64encode(json_bytes).decode('utf-8')
    
    return base64_encoded

def base64_to_json(base64_data):
    # Decode the Base64-encoded string to bytes
    base64_bytes = base64_data.encode('utf-8')
    
    # Decode the bytes to a JSON string
    json_string = base64.b64decode(base64_bytes).decode('utf-8')
    
    # Deserialize the JSON string to a Python dictionary
    json_data = json.loads(json_string)
    
    return json_data

以下是建立 Spark 作業定義項目的代碼段：

import requests

bearerToken = "breadcrumb"; # replace this token with the real AAD token

headers = {
    "Authorization": f"Bearer {bearerToken}", 
    "Content-Type": "application/json"  # Set the content type based on your request
}

payload = "eyJleGVjdXRhYmxlRmlsZSI6bnVsbCwiZGVmYXVsdExha2Vob3VzZUFydGlmYWN0SWQiOiIiLCJtYWluQ2xhc3MiOiIiLCJhZGRpdGlvbmFsTGFrZWhvdXNlSWRzIjpbXSwicmV0cnlQb2xpY3kiOm51bGwsImNvbW1hbmRMaW5lQXJndW1lbnRzIjoiIiwiYWRkaXRpb25hbExpYnJhcnlVcmlzIjpbXSwibGFuZ3VhZ2UiOiIiLCJlbnZpcm9ubWVudEFydGlmYWN0SWQiOm51bGx9"

# Define the payload data for the POST request
payload_data = {
    "displayName": "SJDHelloWorld",
    "Type": "SparkJobDefinition",
    "definition": {
        "format": "SparkJobDefinitionV1",
        "parts": [
            {
                "path": "SparkJobDefinitionV1.json",
                "payload": payload,
                "payloadType": "InlineBase64"
            }
        ]
    }
}

# Make the POST request with Bearer authentication
sjdCreateUrl = f"https://api.fabric.microsoft.com//v1/workspaces/{workspaceId}/items"
response = requests.post(sjdCreateUrl, json=payload_data, headers=headers)

上傳主要定義檔和其他 lib 檔案

需要儲存體令牌，才能將檔案上傳至 OneLake。以下是取得儲存體令牌的協助程式函式：


import msal

def getOnelakeStorageToken():
    app = msal.PublicClientApplication(
        "{client id}", # this filed should be the client id 
        authority="https://login.microsoftonline.com/microsoft.com")

    result = app.acquire_token_interactive(scopes=["https://storage.azure.com/.default"])

    print(f"Successfully acquired AAD token with storage audience:{result['access_token']}")

    return result['access_token']

現在我們已建立Spark作業定義項目，使其可執行，我們需要設定主要定義檔和必要的屬性。上傳此 SJD 項目的檔案端點為https://onelake.dfs.fabric.microsoft.com/{workspaceId}/{sjdartifactid}。應該使用上一個步驟中的相同 “workspaceId”，可以在上一個步驟的回應本文中找到 “sjdartifactid” 的值。以下是設定主要定義檔的代碼段：

import requests

# three steps are required: create file, append file, flush file

onelakeEndPoint = "https://onelake.dfs.fabric.microsoft.com/workspaceId/sjdartifactid"; # replace the id of workspace and artifact with the right one
mainExecutableFile = "main.py"; # the name of the main executable file
mainSubFolder = "Main"; # the sub folder name of the main executable file. Don't change this value


onelakeRequestMainFileCreateUrl = f"{onelakeEndPoint}/{mainSubFolder}/{mainExecutableFile}?resource=file" # the url for creating the main executable file via the 'file' resource type
onelakePutRequestHeaders = {
    "Authorization": f"Bearer {onelakeStorageToken}", # the storage token can be achieved from the helper function above
}

onelakeCreateMainFileResponse = requests.put(onelakeRequestMainFileCreateUrl, headers=onelakePutRequestHeaders)
if onelakeCreateMainFileResponse.status_code == 201:
    # Request was successful
    print(f"Main File '{mainExecutableFile}' was successfully created in onelake.")

# with previous step, the main executable file is created in OneLake, now we need to append the content of the main executable file

appendPosition = 0;
appendAction = "append";

### Main File Append.
mainExecutableFileSizeInBytes = 83; # the size of the main executable file in bytes
onelakeRequestMainFileAppendUrl = f"{onelakeEndPoint}/{mainSubFolder}/{mainExecutableFile}?position={appendPosition}&action={appendAction}";
mainFileContents = "filename = 'Files/' + Constant.filename; tablename = 'Tables/' + Constant.tablename"; # the content of the main executable file, please replace this with the real content of the main executable file
mainExecutableFileSizeInBytes = 83; # the size of the main executable file in bytes, this value should match the size of the mainFileContents

onelakePatchRequestHeaders = {
    "Authorization": f"Bearer {onelakeStorageToken}",
    "Content-Type" : "text/plain"
}

onelakeAppendMainFileResponse = requests.patch(onelakeRequestMainFileAppendUrl, data = mainFileContents, headers=onelakePatchRequestHeaders)
if onelakeAppendMainFileResponse.status_code == 202:
    # Request was successful
    print(f"Successfully Accepted Main File '{mainExecutableFile}' append data.")

# with previous step, the content of the main executable file is appended to the file in OneLake, now we need to flush the file

flushAction = "flush";

### Main File flush
onelakeRequestMainFileFlushUrl = f"{onelakeEndPoint}/{mainSubFolder}/{mainExecutableFile}?position={mainExecutableFileSizeInBytes}&action={flushAction}"
print(onelakeRequestMainFileFlushUrl)
onelakeFlushMainFileResponse = requests.patch(onelakeRequestMainFileFlushUrl, headers=onelakePatchRequestHeaders)
if onelakeFlushMainFileResponse.status_code == 200:
    print(f"Successfully Flushed Main File '{mainExecutableFile}' contents.")
else:
    print(onelakeFlushMainFileResponse.json())

請遵循相同的程序，視需要上傳其他 lib 檔案。

使用主要定義檔和其他 lib 檔案的 OneLake URL 更新 Spark 作業定義項目

到目前為止，我們已建立具有一些初始狀態的Spark作業定義項目、上傳主要定義檔和其他 lib 檔案，最後一個步驟是更新 Spark 作業定義項目，以設定主要定義檔和其他 lib 檔案的 URL 屬性。用於更新 Spark 作業定義項目的端點為https://api.fabric.microsoft.com/v1/workspaces/{workspaceId}/items/{sjdartifactid}。應該使用先前步驟中的相同「workspaceId」和「sjdartifactid」。。以下是更新 Spark 作業定義項目的代碼段：


mainAbfssPath = f"abfss://{workspaceId}@onelake.dfs.fabric.microsoft.com/{sjdartifactid}/Main/{mainExecutableFile}" # the workspaceId and sjdartifactid are the same as previous steps, the mainExecutableFile is the name of the main executable file
libsAbfssPath = f"abfss://{workspaceId}@onelake.dfs.fabric.microsoft.com/{sjdartifactid}/Libs/{libsFile}"  # the workspaceId and sjdartifactid are the same as previous steps, the libsFile is the name of the libs file
defaultLakehouseId = 'defaultLakehouseid'; # replace this with the real default lakehouse id

updateRequestBodyJson = {
    "executableFile":mainAbfssPath,
    "defaultLakehouseArtifactId":defaultLakehouseId,
    "mainClass":"",
    "additionalLakehouseIds":[],
    "retryPolicy":None,
    "commandLineArguments":"",
    "additionalLibraryUris":[libsAbfssPath],
    "language":"Python",
    "environmentArtifactId":None}

# Encode the bytes as a Base64-encoded string
base64EncodedUpdateSJDPayload = json_to_base64(updateRequestBodyJson)

# Print the Base64-encoded string
print("Base64-encoded JSON payload for SJD Update:")
print(base64EncodedUpdateSJDPayload)

# Define the API URL
updateSjdUrl = f"https://api.fabric.microsoft.com//v1/workspaces/{workspaceId}/items/{sjdartifactid}/updateDefinition"

updatePayload = base64EncodedUpdateSJDPayload
payloadType = "InlineBase64"
path = "SparkJobDefinitionV1.json"
format = "SparkJobDefinitionV1"
Type = "SparkJobDefinition"

# Define the headers with Bearer authentication
bearerToken = "breadcrumb"; # replace this token with the real AAD token

headers = {
    "Authorization": f"Bearer {bearerToken}", 
    "Content-Type": "application/json"  # Set the content type based on your request
}

# Define the payload data for the POST request
payload_data = {
    "displayName": "sjdCreateTest11",
    "Type": Type,
    "definition": {
        "format": format,
        "parts": [
            {
                "path": path,
                "payload": updatePayload,
                "payloadType": payloadType
            }
        ]
    }
}


# Make the POST request with Bearer authentication
response = requests.post(updateSjdUrl, json=payload_data, headers=headers)
if response.status_code == 200:
    print("Successfully updated SJD.")
else:
    print(response.json())
    print(response.status_code)

若要回顧整個程式，需要網狀架構 REST API 和 OneLake API 來建立和更新 Spark 作業定義項目。網狀架構 REST API 可用來建立和更新 Spark 作業定義項目，OneLake API 是用來上傳主要定義檔和其他 lib 檔案。主要定義檔和其他 lib 檔案會先上傳至 OneLake。然後，主要定義檔和其他 lib 檔案的 URL 屬性會設定在 Spark 作業定義項目中。

排程並執行 Apache Spark 作業定義

共用方式為

如何使用Microsoft網狀架構 Rest API 建立及更新 Spark 作業定義

必要條件

建立具有初始狀態的Spark作業定義項目

上傳主要定義檔和其他 lib 檔案

使用主要定義檔和其他 lib 檔案的 OneLake URL 更新 Spark 作業定義項目

意見反應

其他資源

共用方式為

如何使用Microsoft網狀架構 Rest API 建立及更新 Spark 作業定義

必要條件

建立具有初始狀態的Spark作業定義項目

上傳主要定義檔和其他 lib 檔案

使用主要定義檔和其他 lib 檔案的 OneLake URL 更新 Spark 作業定義項目

相關內容

意見反應

其他資源