SparkComponent Class

Reference

Spark component version, used to define a Spark Component or Job.

Inheritance: azure.ai.ml.entities._component.component.Component

SparkComponent

azure.ai.ml.entities._job.parameterized_spark.ParameterizedSpark

SparkComponent

azure.ai.ml.entities._job.spark_job_entry_mixin.SparkJobEntryMixin

SparkComponent

azure.ai.ml.entities._component.code.ComponentCodeMixin

SparkComponent

Constructor

SparkComponent(*, code: PathLike | str | None = '.', entry: Dict[str, str] | SparkJobEntry | None = None, py_files: List[str] | None = None, jars: List[str] | None = None, files: List[str] | None = None, archives: List[str] | None = None, driver_cores: int | str | None = None, driver_memory: str | None = None, executor_cores: int | str | None = None, executor_memory: str | None = None, executor_instances: int | str | None = None, dynamic_allocation_enabled: bool | str | None = None, dynamic_allocation_min_executors: int | str | None = None, dynamic_allocation_max_executors: int | str | None = None, conf: Dict[str, str] | None = None, environment: Environment | str | None = None, inputs: Dict | None = None, outputs: Dict | None = None, args: str | None = None, **kwargs: Any)

Keyword-Only Parameters

Name	Description
code	The source code to run the job. Can be a local path or "http:", "https:", or "azureml:" url pointing to a remote location. Defaults to ".", indicating the current directory. Default value: .
entry	Optional[Union[dict[str, str], SparkJobEntry]] The file or class entry point.
py_files	Optional[List[str]] The list of .zip, .egg or .py files to place on the PYTHONPATH for Python apps. Defaults to None.
jars	Optional[List[str]] The list of .JAR files to include on the driver and executor classpaths. Defaults to None.
files	Optional[List[str]] The list of files to be placed in the working directory of each executor. Defaults to None.
archives	Optional[List[str]] The list of archives to be extracted into the working directory of each executor. Defaults to None.
driver_cores	Optional[int] The number of cores to use for the driver process, only in cluster mode.
driver_memory	Optional[str] The amount of memory to use for the driver process, formatted as strings with a size unit suffix ("k", "m", "g" or "t") (e.g. "512m", "2g").
executor_cores	Optional[int] The number of cores to use on each executor.
executor_memory	Optional[str] The amount of memory to use per executor process, formatted as strings with a size unit suffix ("k", "m", "g" or "t") (e.g. "512m", "2g").
executor_instances	Optional[int] The initial number of executors.
dynamic_allocation_enabled	Optional[bool] Whether to use dynamic resource allocation, which scales the number of executors registered with this application up and down based on the workload. Defaults to False.
dynamic_allocation_min_executors	Optional[int] The lower bound for the number of executors if dynamic allocation is enabled.
dynamic_allocation_max_executors	Optional[int] The upper bound for the number of executors if dynamic allocation is enabled.
conf	Optional[dict[str, str]] A dictionary with pre-defined Spark configurations key and values. Defaults to None.
environment	Optional[Union[str, Environment]] The Azure ML environment to run the job in.
inputs	Optional[dict[str, Union[ <xref:azure.ai.ml.entities._job.pipeline._io.NodeOutput>, Input, str, bool, int, float, <xref:Enum>, ]]] A mapping of input names to input data sources used in the job. Defaults to None.
outputs	Optional[dict[str, Union[str, Output]]] A mapping of output names to output data sources used in the job. Defaults to None.
args	Optional[str] The arguments for the job. Defaults to None.

Examples

Creating SparkComponent.


   from azure.ai.ml.entities import SparkComponent

   component = SparkComponent(
       name="add_greeting_column_spark_component",
       display_name="Aml Spark add greeting column test module",
       description="Aml Spark add greeting column test module",
       version="1",
       inputs={
           "file_input": {"type": "uri_file", "mode": "direct"},
       },
       driver_cores=2,
       driver_memory="1g",
       executor_cores=1,
       executor_memory="1g",
       executor_instances=1,
       code="./src",
       entry={"file": "add_greeting_column.py"},
       py_files=["utils.zip"],
       files=["my_files.txt"],
       args="--file_input ${{inputs.file_input}}",
       base_path="./sdk/ml/azure-ai-ml/tests/test_configs/dsl_pipeline/spark_job_in_pipeline",
   )

Methods

dump	Dump the component content into a file in yaml format.

dump

Dump the component content into a file in yaml format.

dump(dest: str | PathLike | IO, **kwargs: Any) -> None

Parameters

Name	Description
dest Required	Union[<xref:PathLike>, str, IO[AnyStr]] The destination to receive this component's content. Must be either a path to a local file, or an already-open file stream. If dest is a file path, a new file will be created, and an exception is raised if the file exists. If dest is an open file, the file will be written to directly, and an exception will be raised if the file is not writable.

Attributes

base_path

The base path of the resource.

Returns

Type	Description
str	The base path of the resource.

creation_context

The creation context of the resource.

Returns

Type	Description
Optional[SystemData]	The creation metadata for the resource.

display_name

Display name of the component.

Returns

Type	Description
str	Display name of the component.

entry

environment

The Azure ML environment to run the Spark component or job in.

Returns

Type	Description
Optional[Union[str, Environment]]	The Azure ML environment to run the Spark component or job in.

id

The resource ID.

Returns

Type	Description
Optional[str]	The global ID of the resource, an Azure Resource Manager (ARM) ID.

inputs

Inputs of the component.

Returns

Type	Description
dict	Inputs of the component.

is_deterministic

Whether the component is deterministic.

Returns

Type	Description
bool	Whether the component is deterministic

outputs

Outputs of the component.

Returns

Type	Description
dict	Outputs of the component.

type

Type of the component, default is 'command'.

Returns

Type	Description
str	Type of the component.

version

Version of the component.

Returns

Type	Description
str	Version of the component.

CODE_ID_RE_PATTERN

CODE_ID_RE_PATTERN = re.compile('\\/subscriptions\\/(?P<subscription>[\\w,-]+)\\/resourceGroups\\/(?P<resource_group>[\\w,-]+)\\/providers\\/Microsoft\\.MachineLearningServices\\/workspaces\\/(?P<workspace>[\\w,-]+)\\/codes\\/(?P<co)

Share via

SparkComponent Class

Constructor

Keyword-Only Parameters

Examples

Methods

dump

Parameters

Attributes

base_path

Returns

creation_context

Returns

display_name

Returns

entry

environment

Returns

id

Returns

inputs

Returns

is_deterministic

Returns

outputs

Returns

type

Returns

version

Returns

CODE_ID_RE_PATTERN

Additional resources