Update README.md

Use https
2024-09-11 10:55:42 +03:00 · 2022-03-31 07:59:18 -07:00 · 2022-03-31 07:59:18 -07:00 · ea1c0ae63f
commit ea1c0ae63f
parent 0c09e31605
1 changed files with 2 additions and 2 deletions
--- a/README.md
+++ b/README.md
@ -13,7 +13,7 @@
 SentencePiece is an unsupervised text tokenizer and detokenizer mainly for
 Neural Network-based text generation systems where the vocabulary size
 is predetermined prior to the neural model training. SentencePiece implements
-**subword units** (e.g., **byte-pair-encoding (BPE)** [[Sennrich et al.](http://www.aclweb.org/anthology/P16-1162)]) and
+**subword units** (e.g., **byte-pair-encoding (BPE)** [[Sennrich et al.](https://www.aclweb.org/anthology/P16-1162)]) and
 **unigram language model** [[Kudo.](https://arxiv.org/abs/1804.10959)])
 with the extension of direct training from raw sentences. SentencePiece allows us to make a purely end-to-end system that does not depend on language-specific pre/postprocessing.

@ -23,7 +23,7 @@ with the extension of direct training from raw sentences. SentencePiece allows u
 - **Purely data driven**: SentencePiece trains tokenization and detokenization
  models from sentences. Pre-tokenization ([Moses tokenizer](https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl)/[MeCab](http://taku910.github.io/mecab/)/[KyTea](http://www.phontron.com/kytea/)) is not always required.
 - **Language independent**: SentencePiece treats the sentences just as sequences of Unicode characters. There is no language-dependent logic.
- **Multiple subword algorithms**: **BPE**  [[Sennrich et al.](http://www.aclweb.org/anthology/P16-1162)] and **unigram language model** [[Kudo.](https://arxiv.org/abs/1804.10959)] are supported.
+- **Multiple subword algorithms**: **BPE**  [[Sennrich et al.](https://www.aclweb.org/anthology/P16-1162)] and **unigram language model** [[Kudo.](https://arxiv.org/abs/1804.10959)] are supported.
 - **Subword regularization**: SentencePiece implements subword sampling for [subword regularization](https://arxiv.org/abs/1804.10959) and [BPE-dropout](https://arxiv.org/abs/1910.13267) which help to improve the robustness and accuracy of NMT models.
 - **Fast and lightweight**: Segmentation speed is around 50k sentences/sec, and memory footprint is around 6MB.
 - **Self-contained**: The same tokenization/detokenization is obtained as long as the same model file is used.