如何使用Microsoft網狀架構 Rest API 建立及更新 Spark 作業定義
Microsoft Fabric Rest API 提供 Fabric 項目的 CRUD 作業服務端點。 在本教學課程中,我們會逐步解說如何建立和更新Spark作業定義成品的端對端案例。 涉及三個高階步驟:
- 建立具有一些初始狀態的Spark作業定義項目
- 上傳主要定義檔和其他 lib 檔案
- 使用主要定義檔和其他 lib 檔案的 OneLake URL 更新 Spark 作業定義項目
必要條件
- 需要Microsoft Entra 令牌才能存取網狀架構 Rest API。 建議使用 MSAL 連結庫來取得令牌。 如需詳細資訊,請參閱 MSAL 中的驗證流程支援。
- 需要儲存體令牌才能存取 OneLake API。 如需詳細資訊,請參閱 MSAL for Python。
建立具有初始狀態的Spark作業定義項目
Microsoft網狀架構 Rest API 會定義網狀架構項目的 CRUD 作業統一端點。 端點為 https://api.fabric.microsoft.com/v1/workspaces/{workspaceId}/items
。
項目詳細資料是在要求本文內指定。 以下是建立Spark作業定義項目的要求本文範例:
{
"displayName": "SJDHelloWorld",
"type": "SparkJobDefinition",
"definition": {
"format": "SparkJobDefinitionV1",
"parts": [
{
"path": "SparkJobDefinitionV1.json",
"payload":"eyJleGVjdXRhYmxlRmlsZSI6bnVsbCwiZGVmYXVsdExha2Vob3VzZUFydGlmYWN0SWQiOiIiLCJtYWluQ2xhc3MiOiIiLCJhZGRpdGlvbmFsTGFrZWhvdXNlSWRzIjpbXSwicmV0cnlQb2xpY3kiOm51bGwsImNvbW1hbmRMaW5lQXJndW1lbnRzIjoiIiwiYWRkaXRpb25hbExpYnJhcnlVcmlzIjpbXSwibGFuZ3VhZ2UiOiIiLCJlbnZpcm9ubWVudEFydGlmYWN0SWQiOm51bGx9",
"payloadType": "InlineBase64"
}
]
}
}
在這裡範例中,Spark 作業定義項目的名稱為SJDHelloWorld
。 欄位 payload
是詳細設定的base64編碼內容,譯碼之後的內容為:
{
"executableFile":null,
"defaultLakehouseArtifactId":"",
"mainClass":"",
"additionalLakehouseIds":[],
"retryPolicy":null,
"commandLineArguments":"",
"additionalLibraryUris":[],
"language":"",
"environmentArtifactId":null
}
以下是用來編碼和譯碼詳細設定的兩個協助程式函式:
import base64
def json_to_base64(json_data):
# Serialize the JSON data to a string
json_string = json.dumps(json_data)
# Encode the JSON string as bytes
json_bytes = json_string.encode('utf-8')
# Encode the bytes as Base64
base64_encoded = base64.b64encode(json_bytes).decode('utf-8')
return base64_encoded
def base64_to_json(base64_data):
# Decode the Base64-encoded string to bytes
base64_bytes = base64_data.encode('utf-8')
# Decode the bytes to a JSON string
json_string = base64.b64decode(base64_bytes).decode('utf-8')
# Deserialize the JSON string to a Python dictionary
json_data = json.loads(json_string)
return json_data
以下是建立 Spark 作業定義項目的代碼段:
import requests
bearerToken = "breadcrumb"; # replace this token with the real AAD token
headers = {
"Authorization": f"Bearer {bearerToken}",
"Content-Type": "application/json" # Set the content type based on your request
}
payload = "eyJleGVjdXRhYmxlRmlsZSI6bnVsbCwiZGVmYXVsdExha2Vob3VzZUFydGlmYWN0SWQiOiIiLCJtYWluQ2xhc3MiOiIiLCJhZGRpdGlvbmFsTGFrZWhvdXNlSWRzIjpbXSwicmV0cnlQb2xpY3kiOm51bGwsImNvbW1hbmRMaW5lQXJndW1lbnRzIjoiIiwiYWRkaXRpb25hbExpYnJhcnlVcmlzIjpbXSwibGFuZ3VhZ2UiOiIiLCJlbnZpcm9ubWVudEFydGlmYWN0SWQiOm51bGx9"
# Define the payload data for the POST request
payload_data = {
"displayName": "SJDHelloWorld",
"Type": "SparkJobDefinition",
"definition": {
"format": "SparkJobDefinitionV1",
"parts": [
{
"path": "SparkJobDefinitionV1.json",
"payload": payload,
"payloadType": "InlineBase64"
}
]
}
}
# Make the POST request with Bearer authentication
sjdCreateUrl = f"https://api.fabric.microsoft.com//v1/workspaces/{workspaceId}/items"
response = requests.post(sjdCreateUrl, json=payload_data, headers=headers)
上傳主要定義檔和其他 lib 檔案
需要儲存體令牌,才能將檔案上傳至 OneLake。 以下是取得儲存體令牌的協助程式函式:
import msal
def getOnelakeStorageToken():
app = msal.PublicClientApplication(
"{client id}", # this filed should be the client id
authority="https://login.microsoftonline.com/microsoft.com")
result = app.acquire_token_interactive(scopes=["https://storage.azure.com/.default"])
print(f"Successfully acquired AAD token with storage audience:{result['access_token']}")
return result['access_token']
現在我們已建立Spark作業定義項目,使其可執行,我們需要設定主要定義檔和必要的屬性。 上傳此 SJD 項目的檔案端點為https://onelake.dfs.fabric.microsoft.com/{workspaceId}/{sjdartifactid}
。 應該使用上一個步驟中的相同 “workspaceId”,可以在上一個步驟的回應本文中找到 “sjdartifactid” 的值。 以下是設定主要定義檔的代碼段:
import requests
# three steps are required: create file, append file, flush file
onelakeEndPoint = "https://onelake.dfs.fabric.microsoft.com/workspaceId/sjdartifactid"; # replace the id of workspace and artifact with the right one
mainExecutableFile = "main.py"; # the name of the main executable file
mainSubFolder = "Main"; # the sub folder name of the main executable file. Don't change this value
onelakeRequestMainFileCreateUrl = f"{onelakeEndPoint}/{mainSubFolder}/{mainExecutableFile}?resource=file" # the url for creating the main executable file via the 'file' resource type
onelakePutRequestHeaders = {
"Authorization": f"Bearer {onelakeStorageToken}", # the storage token can be achieved from the helper function above
}
onelakeCreateMainFileResponse = requests.put(onelakeRequestMainFileCreateUrl, headers=onelakePutRequestHeaders)
if onelakeCreateMainFileResponse.status_code == 201:
# Request was successful
print(f"Main File '{mainExecutableFile}' was successfully created in onelake.")
# with previous step, the main executable file is created in OneLake, now we need to append the content of the main executable file
appendPosition = 0;
appendAction = "append";
### Main File Append.
mainExecutableFileSizeInBytes = 83; # the size of the main executable file in bytes
onelakeRequestMainFileAppendUrl = f"{onelakeEndPoint}/{mainSubFolder}/{mainExecutableFile}?position={appendPosition}&action={appendAction}";
mainFileContents = "filename = 'Files/' + Constant.filename; tablename = 'Tables/' + Constant.tablename"; # the content of the main executable file, please replace this with the real content of the main executable file
mainExecutableFileSizeInBytes = 83; # the size of the main executable file in bytes, this value should match the size of the mainFileContents
onelakePatchRequestHeaders = {
"Authorization": f"Bearer {onelakeStorageToken}",
"Content-Type" : "text/plain"
}
onelakeAppendMainFileResponse = requests.patch(onelakeRequestMainFileAppendUrl, data = mainFileContents, headers=onelakePatchRequestHeaders)
if onelakeAppendMainFileResponse.status_code == 202:
# Request was successful
print(f"Successfully Accepted Main File '{mainExecutableFile}' append data.")
# with previous step, the content of the main executable file is appended to the file in OneLake, now we need to flush the file
flushAction = "flush";
### Main File flush
onelakeRequestMainFileFlushUrl = f"{onelakeEndPoint}/{mainSubFolder}/{mainExecutableFile}?position={mainExecutableFileSizeInBytes}&action={flushAction}"
print(onelakeRequestMainFileFlushUrl)
onelakeFlushMainFileResponse = requests.patch(onelakeRequestMainFileFlushUrl, headers=onelakePatchRequestHeaders)
if onelakeFlushMainFileResponse.status_code == 200:
print(f"Successfully Flushed Main File '{mainExecutableFile}' contents.")
else:
print(onelakeFlushMainFileResponse.json())
請遵循相同的程序,視需要上傳其他 lib 檔案。
使用主要定義檔和其他 lib 檔案的 OneLake URL 更新 Spark 作業定義項目
到目前為止,我們已建立具有一些初始狀態的Spark作業定義項目、上傳主要定義檔和其他 lib 檔案,最後一個步驟是更新 Spark 作業定義項目,以設定主要定義檔和其他 lib 檔案的 URL 屬性。 用於更新 Spark 作業定義項目的端點為https://api.fabric.microsoft.com/v1/workspaces/{workspaceId}/items/{sjdartifactid}
。 應該使用先前步驟中的相同 「workspaceId」 和 「sjdartifactid」。。 以下是更新 Spark 作業定義項目的代碼段:
mainAbfssPath = f"abfss://{workspaceId}@onelake.dfs.fabric.microsoft.com/{sjdartifactid}/Main/{mainExecutableFile}" # the workspaceId and sjdartifactid are the same as previous steps, the mainExecutableFile is the name of the main executable file
libsAbfssPath = f"abfss://{workspaceId}@onelake.dfs.fabric.microsoft.com/{sjdartifactid}/Libs/{libsFile}" # the workspaceId and sjdartifactid are the same as previous steps, the libsFile is the name of the libs file
defaultLakehouseId = 'defaultLakehouseid'; # replace this with the real default lakehouse id
updateRequestBodyJson = {
"executableFile":mainAbfssPath,
"defaultLakehouseArtifactId":defaultLakehouseId,
"mainClass":"",
"additionalLakehouseIds":[],
"retryPolicy":None,
"commandLineArguments":"",
"additionalLibraryUris":[libsAbfssPath],
"language":"Python",
"environmentArtifactId":None}
# Encode the bytes as a Base64-encoded string
base64EncodedUpdateSJDPayload = json_to_base64(updateRequestBodyJson)
# Print the Base64-encoded string
print("Base64-encoded JSON payload for SJD Update:")
print(base64EncodedUpdateSJDPayload)
# Define the API URL
updateSjdUrl = f"https://api.fabric.microsoft.com//v1/workspaces/{workspaceId}/items/{sjdartifactid}/updateDefinition"
updatePayload = base64EncodedUpdateSJDPayload
payloadType = "InlineBase64"
path = "SparkJobDefinitionV1.json"
format = "SparkJobDefinitionV1"
Type = "SparkJobDefinition"
# Define the headers with Bearer authentication
bearerToken = "breadcrumb"; # replace this token with the real AAD token
headers = {
"Authorization": f"Bearer {bearerToken}",
"Content-Type": "application/json" # Set the content type based on your request
}
# Define the payload data for the POST request
payload_data = {
"displayName": "sjdCreateTest11",
"Type": Type,
"definition": {
"format": format,
"parts": [
{
"path": path,
"payload": updatePayload,
"payloadType": payloadType
}
]
}
}
# Make the POST request with Bearer authentication
response = requests.post(updateSjdUrl, json=payload_data, headers=headers)
if response.status_code == 200:
print("Successfully updated SJD.")
else:
print(response.json())
print(response.status_code)
若要回顧整個程式,需要網狀架構 REST API 和 OneLake API 來建立和更新 Spark 作業定義項目。 網狀架構 REST API 可用來建立和更新 Spark 作業定義項目,OneLake API 是用來上傳主要定義檔和其他 lib 檔案。 主要定義檔和其他 lib 檔案會先上傳至 OneLake。 然後,主要定義檔和其他 lib 檔案的 URL 屬性會設定在 Spark 作業定義項目中。