OPUS-MT-train/models/lun-en/README.md

50 lines
1.6 KiB
Markdown
Raw Normal View History

2020-01-10 17:45:42 +03:00
# opus-2020-01-09.zip
* dataset: opus
* model: transformer-align
* pre-processing: normalization + SentencePiece
* download: [opus-2020-01-09.zip](https://object.pouta.csc.fi/OPUS-MT-models/lun-en/opus-2020-01-09.zip)
* test set translations: [opus-2020-01-09.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/lun-en/opus-2020-01-09.test.txt)
* test set scores: [opus-2020-01-09.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/lun-en/opus-2020-01-09.eval.txt)
## Benchmarks
| testset | BLEU | chr-F |
|-----------------------|-------|-------|
| JW300.lun.en | 30.6 | 0.466 |
2020-05-29 14:43:36 +03:00
# opus+bt-2020-05-23.zip
* dataset: opus+bt
* model: transformer-align
* source language(s): lun
* target language(s): en
* model: transformer-align
* pre-processing: normalization + SentencePiece (spm4k,spm4k)
* download: [opus+bt-2020-05-23.zip](https://object.pouta.csc.fi/OPUS-MT-models/lun-en/opus+bt-2020-05-23.zip)
* test set translations: [opus+bt-2020-05-23.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/lun-en/opus+bt-2020-05-23.test.txt)
* test set scores: [opus+bt-2020-05-23.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/lun-en/opus+bt-2020-05-23.eval.txt)
## Training data: opus+bt
* lun-en:
* lun-en: total size = 0
* unused dev/test data is added to training data
* total size (opus+bt): 132888
## Validation data
* en-lun: JW300
* devset = top 2500 lines of JW300.src.shuffled!
* testset = next 2500 lines of JW300.src.shuffled!
* remaining lines are added to traindata
## Benchmarks
| testset | BLEU | chr-F |
|-----------------------|-------|-------|
| JW300.lun.en | 32.2 | 0.483 |