Rediger

Del via


fastTrees: fastTrees

Creates a list containing the function name and arguments to train a FastTree model with rxEnsemble.

Usage

  fastTrees(numTrees = 100, numLeaves = 20, learningRate = 0.2,
    minSplit = 10, exampleFraction = 0.7, featureFraction = 1,
    splitFraction = 1, numBins = 255, firstUsePenalty = 0,
    gainConfLevel = 0, unbalancedSets = FALSE, trainThreads = 8,
    randomSeed = NULL, ...)
 

Arguments

numTrees

Specifies the total number of decision trees to create in the ensemble. By creating more decision trees, you can potentially get better coverage, but the training time increases. The default value is 100.

numLeaves

The maximum number of leaves (terminal nodes) that can be created in any tree. Higher values potentially increase the size of the tree and get better precision, but risk overfitting and requiring longer training times. The default value is 20.

learningRate

Determines the size of the step taken in the direction of the gradient in each step of the learning process. This determines how fast or slow the learner converges on the optimal solution. If the step size is too big, you might overshoot the optimal solution. If the step size is too small, training takes longer to converge to the best solution.

minSplit

Minimum number of training instances required to form a leaf. That is, the minimal number of documents allowed in a leaf of a regression tree, out of the sub-sampled data. A 'split' means that features in each level of the tree (node) are randomly divided. The default value is 10. Only the number of instances is counted even if instances are weighted.

exampleFraction

The fraction of randomly chosen instances to use for each tree. The default value is 0.7.

featureFraction

The fraction of randomly chosen features to use for each tree. The default value is 1.

splitFraction

The fraction of randomly chosen features to use on each split. The default value is 1.

numBins

Maximum number of distinct values (bins) per feature. If the feature has fewer values than the number indicated, each value is placed in its own bin. If there are more values, the algorithm creates numBins bins.

firstUsePenalty

The feature first use penalty coefficient. This is a form of regularization that incurs a penalty for using a new feature when creating the tree. Increase this value to create trees that don't use many features. The default value is 0.

gainConfLevel

Tree fitting gain confidence requirement (should be in the range [0,1)). The default value is 0.

unbalancedSets

If TRUE, derivatives optimized for unbalanced sets are used. Only applicable when type equal to "binary". The default value is FALSE.

trainThreads

The number of threads to use in training. The default value is 8.

randomSeed

Specifies the random seed. The default value is NULL.

...

Additional arguments.