Hello, Intro: I've trained ML algorithms using Azure Automl. Specifically timeseries forecasting algorithms. Goal: I would like to test the best model on the training data itself (i know it's not best practice)

Pruthvi Amin 0 Reputation points
2023-11-15T15:56:48.1833333+00:00

Hello,

Intro: I've trained ML algorithms using Azure Automl. Specifically timeseries forecasting algorithms.

Goal: I would like to test the best model on the training data itself (i know it's not best practice), but i would like the training predictions in a csv file.

Problem: When i do the following: Azure Machine Learning Studio > Automated ML > My ML job > Models > Best Model > Test model (preview) > select a dataset > point to the training set or a small subset of the training set, i get the following error

Error: UserErrror: Input prediction data X_pred or input forecast_destination contains dates prior to the latest date in the training data. Please remove prediction rows with datetimes in the training date range or adjust the forecast_destination date.

What i've tried: Google, stackoverflow, azure documentation.

Conclusion: No one seems to have posted this issue before and i couldn't find any solutions online.

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,071 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,302 questions
Azure Data Science Virtual Machines
Azure Data Science Virtual Machines
Azure Virtual Machine images that are pre-installed, configured, and tested with several commonly used tools for data analytics, machine learning, and artificial intelligence training.
69 questions
{count} votes

1 answer

Sort by: Most helpful
  1. YutongTie-MSFT 53,726 Reputation points
    2023-11-15T23:58:16.7233333+00:00

    @Pruthvi Amin

    Thanks for reaching out to us, unfortunately, there doesn't seem to be specific documentation that covers this exact scenario as it's not a common or recommended practice to test a model on the training data itself.

    However, the following documentation might be helpful:

    1. How to use AutoML for time series forecasting: This provides a general guide on using Azure AutoML for time series forecasting, including how to train the model and retrieve the best model.
    2. Forecast function in Azure AutoML: This provides information on the forecast function you can use to generate predictions.
    3. Pandas DataFrame to CSV: This is the official pandas documentation for the to_csv() function, which you can use to save your predictions into a CSV file.

    The error message you're receiving is indicating that the dates in your test set (which in this case is your training set) are overlapping with your training data. In time series forecasting, the test data is expected to be "future" data - data points that occur after the last date in your training data.

    In your case, since you're trying to generate predictions on your training data itself, it's causing this error.

    As a workaround, you can generate predictions on your training data by using the forecast function of your model in your local environment. Here is a simplified example:

    # Assuming 'automl_run' is your AutoML run object and 'train_data' is your training data  
    best_run, fitted_model = automl_run.get_output()  
    X_train = train_data.drop(columns='target')  # replace 'target' with your target column name  
    y_predictions = fitted_model.forecast(X_train)  
     
    Then, you can export the predictions to a CSV file:
    import pandas as pd  
      
    # Convert the predictions to a DataFrame  
    df_predictions = pd.DataFrame(y_predictions, columns=['Prediction'])  
      
    # Save to CSV  
    df_predictions.to_csv('training_predictions.csv', index=False)  
     
    

    Please replace 'target' with your actual target column name, and adjust the code as necessary based on your specific setup.

    This will allow you to get the predictions for your training data and save them into a CSV file, bypassing the restrictions of the Azure Machine Learning Studio interface.

    Please let us know how it works, I hope it helps.

    Regards,

    Yutong

    -Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.