From fa326d431c430eef1943a8532b1448f96b1e30df Mon Sep 17 00:00:00 2001 From: Ivan Provilkov Date: Mon, 15 Feb 2021 11:59:34 +0300 Subject: [PATCH] Add note about BPE-Dropout during training --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 20f5693..3690de7 100644 --- a/README.md +++ b/README.md @@ -93,6 +93,8 @@ On top of the basic BPE implementation, this repository supports: use the argument `--dropout 0.1` for `subword-nmt apply-bpe` to randomly drop out possible merges. Doing this on the training corpus can improve quality of the final system; at test time, use BPE without dropout. In order to obtain reproducible results, argument `--seed` can be used to set the random seed. + + **Note:** In the original paper, the authors used BPE-Dropout on each new batch separately. You can copy the training corpus several times to get similar behavior to obtain multiple segmentations for the same sentence. - support for glossaries: use the argument `--glossaries` for `subword-nmt apply-bpe` to provide a list of words and/or regular expressions