How to implement MultiClassClassification with tree data structure using ML.Net
I have hundreds of projects, and they all have tree data structure like this:
Or like this:
Each project has its own tree structure which is modified from a standard tree structure. What I am trying to do is to map project's tree structure to the standard tree structure, like this:
Or like this:
The mapping really depends on the text instead of the node's level.
Now I'm using multi class classification in ML.Net. First I map the existing projects' tree to the standard tree manually and save the results in the database, like this:
| Label | Level1 | Level2 | Level3 |
| -------- | -------------- | -------------- | -------------- |
| A | A | * | * |
| A-AA | A | AA1 | * |
| A-AA-AAA | A | AA1 | AAA1 |
| A-BB | A | BB2 | * |
| A-BB-BBB | A | BB2 | BBB2 |
| A | A | * | * |
| A-AA-AAA | A | AAA1 | * |
| A-BB | A | BB2 | * |
| A-BB-BBB | A | BB2 | BBB2 |
Because data in the column in ML.Net cannot be a missing value, so I replace them with *. And my tree has 15 levels (feature columns).
The multi class classification algorithm I choose is SdcaMaximumEntropy. Hopefully I can use the prediction to map the tree instead of doing this manually.
I successfully implemented the prediction. However, the prediction result is really poor.
So my question is:
- Is the way I do this right?
- If yes, should I remove the duplicate rows and should I replace the missing value with
*
?
Thanks in advance.