Spark Class
Base class for spark node, used for spark component version consumption.
You should not instantiate this class directly. Instead, you should create it from the builder function: spark.
]
]]
- Inheritance
-
azure.ai.ml.entities._builders.base_node.BaseNodeSparkazure.ai.ml.entities._job.spark_job_entry_mixin.SparkJobEntryMixinSpark
Constructor
Spark(*, component: str | SparkComponent, identity: Dict | ManagedIdentityConfiguration | AmlTokenConfiguration | UserIdentityConfiguration | None = None, driver_cores: int | str | None = None, driver_memory: str | None = None, executor_cores: int | str | None = None, executor_memory: str | None = None, executor_instances: int | str | None = None, dynamic_allocation_enabled: bool | str | None = None, dynamic_allocation_min_executors: int | str | None = None, dynamic_allocation_max_executors: int | str | None = None, conf: Dict[str, str] | None = None, inputs: Dict[str, NodeOutput | Input | str | bool | int | float | Enum] | None = None, outputs: Dict[str, str | Output] | None = None, compute: str | None = None, resources: Dict | SparkResourceConfiguration | None = None, entry: Dict[str, str] | SparkJobEntry | None = None, py_files: List[str] | None = None, jars: List[str] | None = None, files: List[str] | None = None, archives: List[str] | None = None, args: str | None = None, **kwargs: Any)
Parameters
Name | Description |
---|---|
component
Required
|
The ID or instance of the Spark component or job to be run during the step. |
identity
Required
|
Union[Dict[str, str], ManagedIdentityConfiguration, AmlTokenConfiguration, UserIdentityConfiguration
The identity that the Spark job will use while running on compute. |
driver_cores
Required
|
The number of cores to use for the driver process, only in cluster mode. |
driver_memory
Required
|
The amount of memory to use for the driver process, formatted as strings with a size unit suffix ("k", "m", "g" or "t") (e.g. "512m", "2g"). |
executor_cores
Required
|
The number of cores to use on each executor. |
executor_memory
Required
|
The amount of memory to use per executor process, formatted as strings with a size unit suffix ("k", "m", "g" or "t") (e.g. "512m", "2g"). |
executor_instances
Required
|
The initial number of executors. |
dynamic_allocation_enabled
Required
|
Whether to use dynamic resource allocation, which scales the number of executors registered with this application up and down based on the workload. |
dynamic_allocation_min_executors
Required
|
The lower bound for the number of executors if dynamic allocation is enabled. |
dynamic_allocation_max_executors
Required
|
The upper bound for the number of executors if dynamic allocation is enabled. |
conf
Required
|
A dictionary with pre-defined Spark configurations key and values. |
inputs
Required
|
Dict[str, Union[ str, bool, int, float, <xref:Enum>, <xref:azure.ai.ml.entities._job.pipeline._io.NodeOutput>, Input
A mapping of input names to input data sources used in the job. |
outputs
Required
|
A mapping of output names to output data sources used in the job. |
args
Required
|
The arguments for the job. |
compute
Required
|
The compute resource the job runs on. |
resources
Required
|
The compute resource configuration for the job. |
entry
Required
|
The file or class entry point. |
py_files
Required
|
The list of .zip, .egg or .py files to place on the PYTHONPATH for Python apps. |
jars
Required
|
The list of .JAR files to include on the driver and executor classpaths. |
files
Required
|
The list of files to be placed in the working directory of each executor. |
archives
Required
|
The list of archives to be extracted into the working directory of each executor. |
Keyword-Only Parameters
Name | Description |
---|---|
component
Required
|
|
identity
Required
|
|
driver_cores
Required
|
|
driver_memory
Required
|
|
executor_cores
Required
|
|
executor_memory
Required
|
|
executor_instances
Required
|
|
dynamic_allocation_enabled
Required
|
|
dynamic_allocation_min_executors
Required
|
|
dynamic_allocation_max_executors
Required
|
|
conf
Required
|
|
inputs
Required
|
|
outputs
Required
|
|
compute
Required
|
|
resources
Required
|
|
entry
Required
|
|
py_files
Required
|
|
jars
Required
|
|
files
Required
|
|
archives
Required
|
|
args
Required
|
|
Methods
clear | |
copy | |
dump |
Dumps the job content into a file in YAML format. |
fromkeys |
Create a new dictionary with keys from iterable and values set to value. |
get |
Return the value for key if key is in the dictionary, else default. |
items | |
keys | |
pop |
If the key is not found, return the default if given; otherwise, raise a KeyError. |
popitem |
Remove and return a (key, value) pair as a 2-tuple. Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty. |
setdefault |
Insert key with a value of default if key is not in the dictionary. Return the value for key if key is in the dictionary, else default. |
update |
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k] |
values |
clear
clear() -> None. Remove all items from D.
copy
copy() -> a shallow copy of D
dump
Dumps the job content into a file in YAML format.
dump(dest: str | PathLike | IO, **kwargs: Any) -> None
Parameters
Name | Description |
---|---|
dest
Required
|
The local path or file stream to write the YAML content to. If dest is a file path, a new file will be created. If dest is an open file, the file will be written to directly. |
Exceptions
Type | Description |
---|---|
Raised if dest is a file path and the file already exists. |
|
Raised if dest is an open file and the file is not writable. |
fromkeys
Create a new dictionary with keys from iterable and values set to value.
fromkeys(value=None, /)
Positional-Only Parameters
Name | Description |
---|---|
iterable
Required
|
|
value
|
Default value: None
|
Parameters
Name | Description |
---|---|
type
Required
|
|
get
Return the value for key if key is in the dictionary, else default.
get(key, default=None, /)
Positional-Only Parameters
Name | Description |
---|---|
key
Required
|
|
default
|
Default value: None
|
items
items() -> a set-like object providing a view on D's items
keys
keys() -> a set-like object providing a view on D's keys
pop
If the key is not found, return the default if given; otherwise, raise a KeyError.
pop(k, [d]) -> v, remove specified key and return the corresponding value.
popitem
Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
popitem()
setdefault
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
setdefault(key, default=None, /)
Positional-Only Parameters
Name | Description |
---|---|
key
Required
|
|
default
|
Default value: None
|
update
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
update([E], **F) -> None. Update D from dict/iterable E and F.
values
values() -> an object providing a view on D's values
Attributes
base_path
code
The local or remote path pointing at source code.
Returns
Type | Description |
---|---|
component
The ID or instance of the Spark component or job to be run during the step.
Returns
Type | Description |
---|---|
creation_context
The creation context of the resource.
Returns
Type | Description |
---|---|
The creation metadata for the resource. |
entry
id
The resource ID.
Returns
Type | Description |
---|---|
The global ID of the resource, an Azure Resource Manager (ARM) ID. |
identity
The identity that the Spark job will use while running on compute.
Returns
Type | Description |
---|---|
inputs
Get the inputs for the object.
Returns
Type | Description |
---|---|
A dictionary containing the inputs for the object. |
log_files
Job output files.
Returns
Type | Description |
---|---|
The dictionary of log names and URLs. |
name
outputs
Get the outputs of the object.
Returns
Type | Description |
---|---|
A dictionary containing the outputs for the object. |
resources
status
The status of the job.
Common values returned include "Running", "Completed", and "Failed". All possible values are:
NotStarted - This is a temporary state that client-side Run objects are in before cloud submission.
Starting - The Run has started being processed in the cloud. The caller has a run ID at this point.
Provisioning - On-demand compute is being created for a given job submission.
Preparing - The run environment is being prepared and is in one of two stages:
Docker image build
conda environment setup
Queued - The job is queued on the compute target. For example, in BatchAI, the job is in a queued state
while waiting for all the requested nodes to be ready.
Running - The job has started to run on the compute target.
Finalizing - User code execution has completed, and the run is in post-processing stages.
CancelRequested - Cancellation has been requested for the job.
Completed - The run has completed successfully. This includes both the user code execution and run
post-processing stages.
Failed - The run failed. Usually the Error property on a run will provide details as to why.
Canceled - Follows a cancellation request and indicates that the run is now successfully cancelled.
NotResponding - For runs that have Heartbeats enabled, no heartbeat has been recently sent.
Returns
Type | Description |
---|---|
Status of the job. |
studio_url
type
CODE_ID_RE_PATTERN
CODE_ID_RE_PATTERN = re.compile('\\/subscriptions\\/(?P<subscription>[\\w,-]+)\\/resourceGroups\\/(?P<resource_group>[\\w,-]+)\\/providers\\/Microsoft\\.MachineLearningServices\\/workspaces\\/(?P<workspace>[\\w,-]+)\\/codes\\/(?P<co)
Azure SDK for Python