Update normalization.md

This commit is contained in:
Taku Kudo 2018-04-30 12:41:17 +09:00 committed by GitHub
parent 7f02b6159c
commit 36a3b35e17
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -2,10 +2,10 @@
By default, SentencePiece normalizes the input sentence with a variant of Unicode
[NFKC](https://en.wikipedia.org/wiki/Unicode_equivalence).
SentencePiece framework allows us to define custom normalization rule, which is stored in the model file.
SentencePiece allows us to define custom normalization rule, which is stored in the model file.
## Use pre-defined normalization rule
SentencePiece framework provides the following pre-defined normalization rule. It is recommended to use one of them unless you have any special reasons.
SentencePiece provides the following pre-defined normalization rule. It is recommended to use one of them unless you have any special reasons.
* **nfkc**: [NFKC](https://en.wikipedia.org/wiki/Unicode_equivalence) normalization (default)
* **identity**: no normalization