PretrainedTreeFeaturizationEstimator Class
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
A IEstimator<TTransformer> which contains a pre-trained TreeEnsembleModelParameters and calling its Fit(IDataView) produces a featurizer based on the pre-trained model.
public sealed class PretrainedTreeFeaturizationEstimator : Microsoft.ML.Trainers.FastTree.TreeEnsembleFeaturizationEstimatorBase
type PretrainedTreeFeaturizationEstimator = class
inherit TreeEnsembleFeaturizationEstimatorBase
Public NotInheritable Class PretrainedTreeFeaturizationEstimator
Inherits TreeEnsembleFeaturizationEstimatorBase
- Inheritance
Remarks
Input and Output Columns
The input label column data must beSingle. The input features column data must be a known-sized vector ofSingle.
This estimator outputs the following columns:
Output Column Name | Column Type | Description |
---|---|---|
Trees |
Vector ofSingle | The output values of all trees. |
Leaves |
Vector of Single | The IDs of all leaves where the input feature vector falls into. |
Paths |
Vector of Single | The paths the input feature vector passed through to reach the leaves. |
Those output columns are all optional and user can change their names. Please set the names of skipped columns to null so that they would not be produced.
Prediction Details
This estimator produces several output columns from a tree ensemble model. Assume that the model contains only one decision tree:
Node 0
/ \
/ \
/ \
/ \
Node 1 Node 2
/ \ / \
/ \ / \
/ \ Leaf -3 Node 3
Leaf -1 Leaf -2 / \
/ \
Leaf -4 Leaf -5
Assume that the input feature vector falls into Leaf -1
. The output Trees
may be a 1-element vector where
the only value is the decision value carried by Leaf -1
. The output Leaves
is a 0-1 vector. If the reached
leaf is the $i$-th (indexed by $-(i+1)$ so the first leaf is Leaf -1
) leaf in the tree, the $i$-th value in Leaves
would be 1 and all other values would be 0. The output Paths
is a 0-1 representation of the nodes passed
through before reaching the leaf. The $i$-th element in Paths
indicates if the $i$-th node (indexed by $i$) is touched.
For example, reaching Leaf -1
lead to $[1, 1, 0, 0]$ as the Paths
. If there are multiple trees, this estimator
just concatenates Trees
's, Leaves
's, Paths
's from all trees (first tree's information comes first in the concatenated vectors).
Check the See Also section for links to usage examples.
Methods
Fit(IDataView) |
Produce a TreeEnsembleModelParameters which maps the column called InputColumnName in |
GetOutputSchema(SchemaShape) |
PretrainedTreeFeaturizationEstimator adds three float-vector columns into |
Extension Methods
AppendCacheCheckpoint<TTrans>(IEstimator<TTrans>, IHostEnvironment) |
Append a 'caching checkpoint' to the estimator chain. This will ensure that the downstream estimators will be trained against cached data. It is helpful to have a caching checkpoint before trainers that take multiple data passes. |
WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>) |
Given an estimator, return a wrapping object that will call a delegate once Fit(IDataView) is called. It is often important for an estimator to return information about what was fit, which is why the Fit(IDataView) method returns a specifically typed object, rather than just a general ITransformer. However, at the same time, IEstimator<TTransformer> are often formed into pipelines with many objects, so we may need to build a chain of estimators via EstimatorChain<TLastTransformer> where the estimator for which we want to get the transformer is buried somewhere in this chain. For that scenario, we can through this method attach a delegate that will be called once fit is called. |