A minor README tweak.

This commit is contained in:
Dima 2017-03-20 14:45:53 -07:00 committed by GitHub
parent 93b6f93bfa
commit a4d5f42eb2

View File

@ -33,7 +33,7 @@ vocabulary. Unlike most unsupervised word segmentation algorithms, which
assume an infinite vocabulary, SentencePiece trains the segmentation model such
that the final vocabulary size is fixed, e.g., 8k, 16k, or 32k.
#### Whitespace is considered as as a basic symbol
#### Whitespace is treated as a basic symbol
The first step of Natural Language processing is text tokenization. For
example, a standard English tokenizer would segment the text "Hello world." into the
following three tokens.