@DJamin Thanks for the question. The algorithm will not split the data. The idea is to split the whole dataset into training and test, where the test dataset is held back from training your model. Then in the training stage, the original training dataset is divided again into the (secondary) training dataset and validation dataset, where the validation dataset is also held back from training your model. The reason for the second split of training dataset is that the most models have some hyperparameters that need to be tuned, where the role of validation dataset is to be used for this purpose with a specific model. Thus, if my model does not have hyperparameters to be tuned, I do not need to have the training dataset split into the (secondary) training and validation datasets.
• Training Dataset: The sample of data used to fit the model.
• Validation Dataset: The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters.
• Test Dataset: The sample of data used to provide an unbiased evaluation of a final model fit on the training dataset.