SparkComponent Class
Spark component version, used to define a Spark Component or Job.
- Inheritance
-
azure.ai.ml.entities._component.component.ComponentSparkComponentazure.ai.ml.entities._job.parameterized_spark.ParameterizedSparkSparkComponentazure.ai.ml.entities._job.spark_job_entry_mixin.SparkJobEntryMixinSparkComponentazure.ai.ml.entities._component._additional_includes.AdditionalIncludesMixinSparkComponent
Constructor
SparkComponent(*, code: PathLike | str | None = '.', entry: Dict[str, str] | SparkJobEntry | None = None, py_files: List[str] | None = None, jars: List[str] | None = None, files: List[str] | None = None, archives: List[str] | None = None, driver_cores: int | str | None = None, driver_memory: str | None = None, executor_cores: int | str | None = None, executor_memory: str | None = None, executor_instances: int | str | None = None, dynamic_allocation_enabled: bool | str | None = None, dynamic_allocation_min_executors: int | str | None = None, dynamic_allocation_max_executors: int | str | None = None, conf: Dict[str, str] | None = None, environment: Environment | str | None = None, inputs: Dict | None = None, outputs: Dict | None = None, args: str | None = None, additional_includes: List | None = None, **kwargs: Any)
Keyword-Only Parameters
Name | Description |
---|---|
code
|
The source code to run the job. Can be a local path or "http:", "https:", or "azureml:" url pointing to a remote location. Defaults to ".", indicating the current directory. Default value: .
|
entry
|
The file or class entry point. |
py_files
|
The list of .zip, .egg or .py files to place on the PYTHONPATH for Python apps. Defaults to None. |
jars
|
The list of .JAR files to include on the driver and executor classpaths. Defaults to None. |
files
|
The list of files to be placed in the working directory of each executor. Defaults to None. |
archives
|
The list of archives to be extracted into the working directory of each executor. Defaults to None. |
driver_cores
|
The number of cores to use for the driver process, only in cluster mode. |
driver_memory
|
The amount of memory to use for the driver process, formatted as strings with a size unit suffix ("k", "m", "g" or "t") (e.g. "512m", "2g"). |
executor_cores
|
The number of cores to use on each executor. |
executor_memory
|
The amount of memory to use per executor process, formatted as strings with a size unit suffix ("k", "m", "g" or "t") (e.g. "512m", "2g"). |
executor_instances
|
The initial number of executors. |
dynamic_allocation_enabled
|
Whether to use dynamic resource allocation, which scales the number of executors registered with this application up and down based on the workload. Defaults to False. |
dynamic_allocation_min_executors
|
The lower bound for the number of executors if dynamic allocation is enabled. |
dynamic_allocation_max_executors
|
The upper bound for the number of executors if dynamic allocation is enabled. |
conf
|
A dictionary with pre-defined Spark configurations key and values. Defaults to None. |
environment
|
The Azure ML environment to run the job in. |
inputs
|
Optional[dict[str, Union[ <xref:azure.ai.ml.entities._job.pipeline._io.NodeOutput>, Input, str, bool, int, float, <xref:Enum>, ]]]
A mapping of input names to input data sources used in the job. Defaults to None. |
outputs
|
A mapping of output names to output data sources used in the job. Defaults to None. |
args
|
The arguments for the job. Defaults to None. |
additional_includes
|
A list of shared additional files to be included in the component. Defaults to None. |
Examples
Creating SparkComponent.
from azure.ai.ml.entities import SparkComponent
component = SparkComponent(
name="add_greeting_column_spark_component",
display_name="Aml Spark add greeting column test module",
description="Aml Spark add greeting column test module",
version="1",
inputs={
"file_input": {"type": "uri_file", "mode": "direct"},
},
driver_cores=2,
driver_memory="1g",
executor_cores=1,
executor_memory="1g",
executor_instances=1,
code="./src",
entry={"file": "add_greeting_column.py"},
py_files=["utils.zip"],
files=["my_files.txt"],
args="--file_input ${{inputs.file_input}}",
base_path="./sdk/ml/azure-ai-ml/tests/test_configs/dsl_pipeline/spark_job_in_pipeline",
)
Methods
dump |
Dump the component content into a file in yaml format. |
dump
Dump the component content into a file in yaml format.
dump(dest: str | PathLike | IO, **kwargs: Any) -> None
Parameters
Name | Description |
---|---|
dest
Required
|
The destination to receive this component's content. Must be either a path to a local file, or an already-open file stream. If dest is a file path, a new file will be created, and an exception is raised if the file exists. If dest is an open file, the file will be written to directly, and an exception will be raised if the file is not writable. |
Attributes
base_path
creation_context
The creation context of the resource.
Returns
Type | Description |
---|---|
The creation metadata for the resource. |
display_name
entry
environment
The Azure ML environment to run the Spark component or job in.
Returns
Type | Description |
---|---|
The Azure ML environment to run the Spark component or job in. |
id
The resource ID.
Returns
Type | Description |
---|---|
The global ID of the resource, an Azure Resource Manager (ARM) ID. |
inputs
is_deterministic
Whether the component is deterministic.
Returns
Type | Description |
---|---|
Whether the component is deterministic |
outputs
type
version
CODE_ID_RE_PATTERN
CODE_ID_RE_PATTERN = re.compile('\\/subscriptions\\/(?P<subscription>[\\w,-]+)\\/resourceGroups\\/(?P<resource_group>[\\w,-]+)\\/providers\\/Microsoft\\.MachineLearningServices\\/workspaces\\/(?P<workspace>[\\w,-]+)\\/codes\\/(?P<co)
Azure SDK for Python