mirror of
https://github.com/Helsinki-NLP/OPUS-MT-train.git
synced 2024-10-26 21:19:02 +03:00
50 lines
1.6 KiB
Markdown
50 lines
1.6 KiB
Markdown
# opus-2020-01-08.zip
|
|
|
|
* dataset: opus
|
|
* model: transformer-align
|
|
* pre-processing: normalization + SentencePiece
|
|
* download: [opus-2020-01-08.zip](https://object.pouta.csc.fi/OPUS-MT-models/en-mfe/opus-2020-01-08.zip)
|
|
* test set translations: [opus-2020-01-08.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/en-mfe/opus-2020-01-08.test.txt)
|
|
* test set scores: [opus-2020-01-08.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/en-mfe/opus-2020-01-08.eval.txt)
|
|
|
|
## Benchmarks
|
|
|
|
| testset | BLEU | chr-F |
|
|
|-----------------------|-------|-------|
|
|
| JW300.en.mfe | 32.1 | 0.509 |
|
|
|
|
# opus+bt-2020-05-23.zip
|
|
|
|
* dataset: opus+bt
|
|
* model: transformer-align
|
|
* source language(s): en
|
|
* target language(s): mfe
|
|
* model: transformer-align
|
|
* pre-processing: normalization + SentencePiece (spm4k,spm4k)
|
|
* download: [opus+bt-2020-05-23.zip](https://object.pouta.csc.fi/OPUS-MT-models/en-mfe/opus+bt-2020-05-23.zip)
|
|
* test set translations: [opus+bt-2020-05-23.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/en-mfe/opus+bt-2020-05-23.test.txt)
|
|
* test set scores: [opus+bt-2020-05-23.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/en-mfe/opus+bt-2020-05-23.eval.txt)
|
|
|
|
## Training data: opus+bt
|
|
|
|
* en-mfe: QED (238) Tatoeba (6)
|
|
* en-mfe: total size = 244
|
|
* unused dev/test data is added to training data
|
|
* total size (opus+bt): 127051
|
|
|
|
|
|
## Validation data
|
|
|
|
* en-mfe: JW300
|
|
|
|
* devset = top 2500 lines of JW300.src.shuffled!
|
|
* testset = next 2500 lines of JW300.src.shuffled!
|
|
* remaining lines are added to traindata
|
|
|
|
## Benchmarks
|
|
|
|
| testset | BLEU | chr-F |
|
|
|-----------------------|-------|-------|
|
|
| JW300.en.mfe | 31.0 | 0.501 |
|
|
|