PretrainedTreeFeaturizationEstimator Class

Reference

Definition

Namespace:: Microsoft.ML.Trainers.FastTree

Assembly:: Microsoft.ML.FastTree.dll

Package:: Microsoft.ML.FastTree v3.0.1

Package:: Microsoft.ML.FastTree v1.2.0

Package:: Microsoft.ML.FastTree v1.3.1

Package:: Microsoft.ML.FastTree v1.4.0

Package:: Microsoft.ML.FastTree v1.5.5

Package:: Microsoft.ML.FastTree v1.6.0

Package:: Microsoft.ML.FastTree v1.7.0

Package:: Microsoft.ML.FastTree v2.0.0

Important

Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.

A IEstimator<TTransformer> which contains a pre-trained TreeEnsembleModelParameters and calling its Fit(IDataView) produces a featurizer based on the pre-trained model.

public sealed class PretrainedTreeFeaturizationEstimator : Microsoft.ML.Trainers.FastTree.TreeEnsembleFeaturizationEstimatorBase

type PretrainedTreeFeaturizationEstimator = class
    inherit TreeEnsembleFeaturizationEstimatorBase

Public NotInheritable Class PretrainedTreeFeaturizationEstimator
Inherits TreeEnsembleFeaturizationEstimatorBase

Inheritance: Object

TreeEnsembleFeaturizationEstimatorBase
PretrainedTreeFeaturizationEstimator

Remarks

Input and Output Columns

The input label column data must beSingle. The input features column data must be a known-sized vector ofSingle.

This estimator outputs the following columns:

Output Column Name	Column Type	Description
`Trees`	Vector ofSingle	The output values of all trees.
`Leaves`	Vector of Single	The IDs of all leaves where the input feature vector falls into.
`Paths`	Vector of Single	The paths the input feature vector passed through to reach the leaves.

Those output columns are all optional and user can change their names. Please set the names of skipped columns to null so that they would not be produced.

Prediction Details

This estimator produces several output columns from a tree ensemble model. Assume that the model contains only one decision tree:

               Node 0
               /    \
             /        \
           /            \
         /                \
       Node 1            Node 2
       /    \            /    \
     /        \        /        \
   /            \     Leaf -3  Node 3
  Leaf -1      Leaf -2         /    \
                             /        \
                            Leaf -4  Leaf -5

Assume that the input feature vector falls into Leaf -1. The output Trees may be a 1-element vector where the only value is the decision value carried by Leaf -1. The output Leaves is a 0-1 vector. If the reached leaf is the $i$-th (indexed by $-(i+1)$ so the first leaf is Leaf -1) leaf in the tree, the $i$-th value in Leaves would be 1 and all other values would be 0. The output Paths is a 0-1 representation of the nodes passed through before reaching the leaf. The $i$-th element in Paths indicates if the $i$-th node (indexed by $i$) is touched. For example, reaching Leaf -1 lead to $[1, 1, 0, 0]$ as the Paths. If there are multiple trees, this estimator just concatenates Trees's, Leaves's, Paths's from all trees (first tree's information comes first in the concatenated vectors).

Check the See Also section for links to usage examples.

Methods

Fit(IDataView)	Produce a TreeEnsembleModelParameters which maps the column called InputColumnName in `input` to three output columns. (Inherited from TreeEnsembleFeaturizationEstimatorBase)
GetOutputSchema(SchemaShape)	PretrainedTreeFeaturizationEstimator adds three float-vector columns into `inputSchema`. Given a feature vector column, the added columns are the prediction values of all trees, the leaf IDs the feature vector falls into, and the paths to those leaves. (Inherited from TreeEnsembleFeaturizationEstimatorBase)

Extension Methods

AppendCacheCheckpoint<TTrans>(IEstimator<TTrans>, IHostEnvironment)

Append a 'caching checkpoint' to the estimator chain. This will ensure that the downstream estimators will be trained against cached data. It is helpful to have a caching checkpoint before trainers that take multiple data passes.

WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>)

Given an estimator, return a wrapping object that will call a delegate once Fit(IDataView) is called. It is often important for an estimator to return information about what was fit, which is why the Fit(IDataView) method returns a specifically typed object, rather than just a general ITransformer. However, at the same time, IEstimator<TTransformer> are often formed into pipelines with many objects, so we may need to build a chain of estimators via EstimatorChain<TLastTransformer> where the estimator for which we want to get the transformer is buried somewhere in this chain. For that scenario, we can through this method attach a delegate that will be called once fit is called.

Applies to

Share via