EnglishRoberta Class
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Represent the Byte Pair Encoding model.
public sealed class EnglishRoberta : Microsoft.ML.Tokenizers.Model
type EnglishRoberta = class
inherit Model
Public NotInheritable Class EnglishRoberta
Inherits Model
- Inheritance
Constructors
EnglishRoberta(Stream, Stream, Stream) |
Construct tokenizer object to use with the English Robert model. |
EnglishRoberta(String, String, String) |
Construct tokenizer object to use with the English Robert model. |
Properties
PadIndex |
Gets the index of the pad symbol inside the symbols list. |
SymbolsCount |
Gets the symbols list length. |
Methods
AddMaskSymbol(String) |
Add the mask symbol to the symbols list. |
GetTrainer() |
Gets a trainer object to use in training the model and generate the vocabulary and merges data. |
GetVocab() |
Gets the dictionary mapping tokens to Ids. |
GetVocabSize() |
Gets the dictionary size that map tokens to Ids. |
IdsToOccurrenceRanks(IReadOnlyList<Int32>) |
Convert a list of tokens Ids to highest occurrence rankings. |
IdsToOccurrenceValues(IReadOnlyList<Int32>) |
Convert a list of tokens Ids to highest occurrence values. |
IdToString(Int32, Boolean) |
Map the tokenized Id to the original string. |
IdToToken(Int32, Boolean) |
Map the tokenized Id to the token. |
IsValidChar(Char) | |
OccurrenceRanksIds(IReadOnlyList<Int32>) |
Convert a list of highest occurrence rankings to tokens Ids list . |
Save(String, String) |
Save the model data into the vocabulary, merges, and occurrence mapping files. |
Tokenize(String) |
Tokenize a sequence string to a list of tokens. |
TokenToId(String) |
Map the token to tokenized Id. |