Tokenizer Class
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
A Tokenizer works as a pipeline. It processes some raw text as input and outputs a TokenizerResult object.
public class Tokenizer
type Tokenizer = class
Public Class Tokenizer
- Inheritance
-
Tokenizer
Constructors
Tokenizer(Model, PreTokenizer, Normalizer) |
Create a new Tokenizer object. |
Properties
Decoder |
Gets or sets the Decoder in use by the Tokenizer. |
Model |
Gets the Model in use by the Tokenizer. |
Normalizer |
Gets or sets the Normalizer in use by the Tokenizer. |
PreTokenizer |
Gets or sets the PreTokenizer used by the Tokenizer. |
Methods
Decode(IEnumerable<Int32>, Boolean) |
Decode the given ids, back to a String. |
Decode(Int32, Boolean) |
Decodes the Id to the mapped token. |
Encode(String) |
Encodes input text to object has the tokens list, tokens Ids, tokens offset mapping. |
IsValidChar(Char) | |
TrainFromFiles(Trainer, ReportProgress, String[]) |
Train the tokenizer model using input files. |