Byte Pair Encoding · Subword Tokenizer
unseel.com · Merge the most frequent pair · The tokenizer behind GPT
Merges 0
Vocab 10
State
Symbol
Top pair
Merged token
Merge rule
Unseel.com · Byte Pair Encoding