Add note about BPE-Dropout during training

This commit is contained in:
Ivan Provilkov 2021-02-15 11:59:34 +03:00 committed by GitHub
parent 234923ed53
commit fa326d431c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -93,6 +93,8 @@ On top of the basic BPE implementation, this repository supports:
use the argument `--dropout 0.1` for `subword-nmt apply-bpe` to randomly drop out possible merges. use the argument `--dropout 0.1` for `subword-nmt apply-bpe` to randomly drop out possible merges.
Doing this on the training corpus can improve quality of the final system; at test time, use BPE without dropout. Doing this on the training corpus can improve quality of the final system; at test time, use BPE without dropout.
In order to obtain reproducible results, argument `--seed` can be used to set the random seed. In order to obtain reproducible results, argument `--seed` can be used to set the random seed.
**Note:** In the original paper, the authors used BPE-Dropout on each new batch separately. You can copy the training corpus several times to get similar behavior to obtain multiple segmentations for the same sentence.
- support for glossaries: - support for glossaries:
use the argument `--glossaries` for `subword-nmt apply-bpe` to provide a list of words and/or regular expressions use the argument `--glossaries` for `subword-nmt apply-bpe` to provide a list of words and/or regular expressions