OPUS-MT-train/models/en-bcl
2021-05-04 08:49:16 +03:00
..
README.md added recipes for tatoeba models other than English 2021-05-04 08:49:16 +03:00

opus+nt+bt-2021-03-30.zip

  • dataset: opus+nt+bt
  • model: transformer-align
  • source language(s): en
  • target language(s): bcl
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm12k,spm32k)
  • download: opus+nt+bt-2021-03-30.zip

Training data: opus+nt+bt

  • en-bcl: JW300 (470468) new-testament (11623) wiki.aa (43432)
  • en-bcl: total size = 525523
  • total size (opus+nt+bt): 525475

Validation data

  • bcl-en: wikimedia, 1153

  • total-size-shuffled: 775

  • devset-selected: top 250 lines of wikimedia.src.shuffled!

  • testset-selected: next 525 lines of wikimedia.src.shuffled!

  • devset-unused: added to traindata

  • test set translations: opus+nt+bt-2021-03-30.test.txt

  • test set scores: opus+nt+bt-2021-03-30.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
wikimedia.en-bcl 17.3 0.426 525 28399 0.840

opus+nt+bt+bt-2021-04-01.zip

  • dataset: opus+nt+bt+bt
  • model: transformer-align
  • source language(s): en
  • target language(s): bcl
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm12k,spm32k)
  • download: opus+nt+bt+bt-2021-04-01.zip

Training data: opus+nt+bt+bt

  • en-bcl: JW300 (470468) new-testament (11623) wiki.aa (45474)
  • en-bcl: total size = 527565
  • total size (opus+nt+bt+bt): 527524

Validation data

Benchmarks

testset BLEU chr-F #sent #words BP
wikimedia.en-bcl 21.6 0.476 525 28399 0.789

opus+nt+bt+bt+bt-2021-04-03.zip

  • dataset: opus+nt+bt+bt+bt
  • model: transformer-align
  • source language(s): en
  • target language(s): bcl
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm12k,spm32k)
  • download: opus+nt+bt+bt+bt-2021-04-03.zip

Training data: opus+nt+bt+bt+bt

  • en-bcl: JW300 (470468) new-testament (11623) wiki.aa (45474)
  • en-bcl: total size = 527565
  • total size (opus+nt+bt+bt+bt): 527496

Validation data

Benchmarks

testset BLEU chr-F #sent #words BP
wikimedia.en-bcl 22.7 0.482 525 28399 0.895

opus2+nt+bt+bt+bt-2021-04-03.zip

  • dataset: opus2+nt+bt+bt+bt
  • model: transformer-align
  • source language(s): en
  • target language(s): bcl
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm12k,spm32k)
  • download: opus2+nt+bt+bt+bt-2021-04-03.zip

Training data: opus2+nt+bt+bt+bt

  • en-bcl: JW300 (470468) new-testament (11623) wiki.aa (45474) wiki.aa_opus+nt+bt-2021-04-01 (45474)
  • en-bcl: total size = 573039
  • total size (opus2+nt+bt+bt+bt): 572969

Validation data

Benchmarks

testset BLEU chr-F #sent #words BP
wikimedia.en-bcl 23.9 0.497 525 28399 0.820

opus+nt+bt+bt+bt+bt-2021-04-06.zip

  • dataset: opus+nt+bt+bt+bt+bt
  • model: transformer-align
  • source language(s): en
  • target language(s): bcl
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm12k,spm32k)
  • download: opus+nt+bt+bt+bt+bt-2021-04-06.zip

Training data: opus+nt+bt+bt+bt+bt

  • en-bcl: JW300 (470468) new-testament (11623) wiki.aa (45474) wiki.aa_opus+nt+bt+bt-2021-04-03 (45474) wiki.aa_opus+nt+bt-2021-04-01 (45474)
  • en-bcl: total size = 618513
  • total size (opus+nt+bt+bt+bt+bt): 618427

Validation data

Benchmarks

testset BLEU chr-F #sent #words BP
wikimedia.en-bcl 24.4 0.498 525 28399 0.805

opus+nt+bt+bt-2021-04-10.zip

  • dataset: opus+nt+bt+bt
  • model: transformer-align
  • source language(s): en
  • target language(s): bcl
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm12k,spm32k)
  • download: opus+nt+bt+bt-2021-04-10.zip

Training data: opus+nt+bt+bt

  • en-bcl: JW300 (470468) new-testament (11623) wiki.aa (45494) wiki.aa_opus+nt+bt+bt+bt-2021-04-05 (45474) wiki.aa_opus+nt+bt+bt-2021-04-03 (45474) wiki.aa_opus+nt+bt-2021-04-01 (45474)
  • en-bcl: total size = 664007
  • unused dev/test data is added to training data
  • total size (opus+nt+bt+bt): 665111

Validation data

Benchmarks

testset BLEU chr-F #sent #words BP
wikimedia.en-bcl 30.7 0.572 500 29131 0.921

opus+nt+bt-2021-04-11.zip

  • dataset: opus+nt+bt
  • model: transformer-align
  • source language(s): en
  • target language(s): bcl
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm12k,spm32k)
  • download: opus+nt+bt-2021-04-11.zip

Training data: opus+nt+bt

  • en-bcl: JW300 (470468) new-testament (11623) wiki.aa (45494) wiki.aa_opus+nt+bt+bt+bt-2021-04-05 (45474) wiki.aa_opus+nt+bt+bt-2021-04-03 (45474) wiki.aa_opus+nt+bt-2021-04-01 (45474)
  • en-bcl: total size = 664007
  • unused dev/test data is added to training data
  • total size (opus+nt+bt): 666118

Validation data

  • bcl-en: wikimedia, 5033

  • total-size-shuffled: 4207

  • devset-selected: top 1000 lines of wikimedia.src.shuffled!

  • testset-selected: next 1000 lines of wikimedia.src.shuffled!

  • devset-unused: added to traindata

  • test set translations: opus+nt+bt-2021-04-11.test.txt

  • test set scores: opus+nt+bt-2021-04-11.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
wikimedia.en-bcl 31.9 0.585 1000 27681 1.000