Mlflow and Spark
Bakhruz Dzhafarov
60
Reputation points
Hi, I encountered the following problem when I tried to use a model for spark inference (via mlflow.pyfunc.spark_udf
) that I had previously trained in pandas and saved in mlflow.
I saved a model via
from mlflow.tracking import MlflowClient
from azureml.core import Workspace
# Connect to your Azure ML workspace
ws = Workspace.from_config() # Make sure you have a config.json file
# Set the tracking URI to Azure ML workspace
mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
client = MlflowClient()
# Start an MLflow run
experiment_name = "CatBoost_Experiment"
experiment_id = client.create_experiment(experiment_name)
run = client.create_run(experiment_id)
# Log the CatBoost model using the client
mlflow.catboost.log_model(model, "catboost_model_20", registered_model_name= "catboost_model")
and read
model_uri = "models:/catboost_model/latest" # Use the latest version of the registered model
# Load the model as a PySpark UDF
mlflow.pyfunc.get_model_dependencies(model_uri)
loaded_model_udf = mlflow.pyfunc.spark_udf(spark, model_uri, env_manager="conda")
logs in the attachment stderr.txt
File "/home/trusted-service-user/cluster-env/env/lib/python3.10/site-packages/mlflow/pyfunc/__init__.py", line 1069, in udf
pyfunc_backend.prepare_env(
File "/home/trusted-service-user/cluster-env/env/lib/python3.10/site-packages/mlflow/pyfunc/backend.py", line 89, in prepare_env
conda_env_path = os.path.join(local_path, self._config[ENV])
File "/home/trusted-service-user/cluster-env/env/lib/python3.10/posixpath.py", line 90, in join
genericpath._check_arg_types('join', a, *p)
File "/home/trusted-service-user/cluster-env/env/lib/python3.10/genericpath.py", line 152, in _check_arg_types
raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'dict'
I was able to find a relatively similar error
Sign in to answer