Update README.md

This commit is contained in:
Taku Kudo 2018-06-29 02:44:57 +09:00 committed by GitHub
parent f4d0ddce6d
commit 6f84d6bfa5
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -12,8 +12,7 @@ Neural Network-based text generation systems where the vocabulary size
is predetermined prior to the neural model training. SentencePiece implements
**subword units** (e.g., **byte-pair-encoding (BPE)** [[Sennrich et al.](http://www.aclweb.org/anthology/P16-1162)]) and
**unigram language model** [[Kudo.](https://arxiv.org/abs/1804.10959)])
with the extension of direct training from raw sentences.
Subword segmentation with unigram language model supports probabilistic subword sampling for **subword regularization** [[Kudo.](https://arxiv.org/abs/1804.10959)], a simple technique to improve the robustness of NMT model. SentencePiece allows us to make a purely end-to-end system that does not depend on language-specific pre/postprocessing.
with the extension of direct training from raw sentences. SentencePiece allows us to make a purely end-to-end system that does not depend on language-specific pre/postprocessing.
**This is not an official Google product.**