# opus-2020-01-08.zip * dataset: opus * model: transformer-align * pre-processing: normalization + SentencePiece * download: [opus-2020-01-08.zip](https://object.pouta.csc.fi/OPUS-MT-models/en-mh/opus-2020-01-08.zip) * test set translations: [opus-2020-01-08.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/en-mh/opus-2020-01-08.test.txt) * test set scores: [opus-2020-01-08.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/en-mh/opus-2020-01-08.eval.txt) ## Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | JW300.en.mh | 29.7 | 0.479 | # opus+bt-2020-05-23.zip * dataset: opus+bt * model: transformer-align * source language(s): en * target language(s): mh * model: transformer-align * pre-processing: normalization + SentencePiece (spm4k,spm4k) * download: [opus+bt-2020-05-23.zip](https://object.pouta.csc.fi/OPUS-MT-models/en-mh/opus+bt-2020-05-23.zip) * test set translations: [opus+bt-2020-05-23.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/en-mh/opus+bt-2020-05-23.test.txt) * test set scores: [opus+bt-2020-05-23.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/en-mh/opus+bt-2020-05-23.eval.txt) ## Training data: opus+bt * en-mh: Tatoeba (22) * en-mh: total size = 22 * unused dev/test data is added to training data * total size (opus+bt): 153290 ## Validation data * en-mh: JW300 * devset = top 2500 lines of JW300.src.shuffled! * testset = next 2500 lines of JW300.src.shuffled! * remaining lines are added to traindata ## Benchmarks | testset | BLEU | chr-F | |-----------------------|-------|-------| | JW300.en.mh | 28.7 | 0.470 |