Not able to re-train model [Multiclassification(AveragedPerceptron)]
Hello! I am new to ML.Net, I have decided to to try using it in building a dispatcher. Basically I want it to be able to classify text in one of multiple categories. Due to the high volume of data, I want that when a prediction is confirmed by the users wrong to add it to its database(or re-train the model)
I have used AutoML to generate a base model. The algorithm with the best results chose by the AutoML for multiclassification is AveragedPerceptron. I have checked this page in order to make sure that is re-trainable.
I am able to get the first model, but struggling to re-train it.
First time I have created the model (simulate all the steps generated by AutoML)
// First Phase: Create the model
var mlContext = new MLContext(seed: 1);
// BuildTrainingPipeline
// Load Data
var data = mlContext.Data.LoadFromTextFile<ModelInput>(
path: TRAIN_DATA_FILEPATH,
hasHeader: false,
separatorChar: '\t',
allowQuoting: true,
allowSparse: false);
// Data process configuration with pipeline data transformations
var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey("col0", "col0")
.Append(mlContext.Transforms.Text.FeaturizeText("col1_tf", "col1"))
.Append(mlContext.Transforms.CopyColumns("Features", "col1_tf"))
.Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))
.AppendCacheCheckpoint(mlContext);
// Set the training algorithm
var trainer = mlContext.MulticlassClassification.Trainers.OneVersusAll(mlContext.BinaryClassification.Trainers
.AveragedPerceptron(labelColumnName: "col0", numberOfIterations: 10, featureColumnName: "Features"), labelColumnName: "col0")
.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));
IEstimator<ITransformer> trainingPipeline = dataProcessPipeline.Append(trainer);
// Train and save Model
// Create model here
ITransformer firstModel = trainingPipeline.Fit(data);
// Save the model
mlContext.Model.Save(firstModel, data.Schema, MODEL_FILEPATH);
Then I presume I have new data to train the model with
/// Second Phase - Re-training the model
// New Data
ModelInput[] ticketData = new ModelInput[]
{
new ModelInput
{
Col0 = "Category 3",
Col1 = "Text to classify 1"
},
new ModelInput
{
Col0 = "Category 2",
Col1 = "Text to classify 2"
},
new ModelInput
{
Col0 = "Category 3",
Col1 = "Text to classify 3"
},
new ModelInput
{
Col0 = "Category 2",
Col1 = "Text to classify 4"
},
new ModelInput
{
Col0 = "Category 1",
Col1 = "Text to classify 5"
},
};
// Create MLContext
MLContext mlContext = new MLContext();
// Define DataViewSchema trained model
DataViewSchema modelSchema;
// Load trained model
var trainedModel = mlContext.Model.Load(MODEL_FILEPATH, out modelSchema);
//Load New Data
IDataView newData = mlContext.Data.LoadFromEnumerable<ModelInput>(ticketData);
// And here I get stuck. Because I don't know how to retrain the model with new data. I have tried to follow the guidance from this topics: [here](https://zcusa.951200.xyz/en-us/dotnet/machine-learning/how-to-guides/retrain-model-ml-net), [here](https://github.com/dotnet/machinelearning/blob/36fab9b6806260e64e50992450a219e869c7f74a/test/Microsoft.ML.Functional.Tests/Training.cs#L80-L118) or changes suggested [here](https://github.com/dotnet/machinelearning/issues/5247) but with no result.
Issue
My issues are due to multiclassification I think, because the trainer is of type EstimatorChain and my model is of type TransformerChain.
My trainer.Fit doesn't take 2 arguments.