Helsinki-NLP/OPUS-MT-train

mirror of https://github.com/Helsinki-NLP/OPUS-MT-train.git synced 2024-10-27 05:30:16 +03:00

Joerg Tiedemann 31c123125a todos

2021-12-18 12:14:41 +02:00

1.1 KiB

Raw Blame History

Pivot-based data augmentation

pbt: pivot-back-translated (e.g. translate eng to swe to change fin-eng into fin-swe training data)
pft: pivot-forward-translated (e.g. translate eng to fin to change eng-swe into fin-swe training data)
pivotalign: merge fin-eng and eng-swe into fin-swe training data by simply matching eng sentences (see OPUS)
pftonly pbtonly paonly: targets to use only those pivot-based data sets (no standard train data) (--> distillation)

Cleanup and documentation

cleanup / remove complicated testset evaluation for multilingual models
cleanup recipes for tatoeba model training
improve documentation
tutorials

Evaluation

better link information about evaluation of released models (see eval subdir)
integrate NMT map (move to OPUS-MT-map)
better score tables / leaderboards for released NMT models

Knowledge distillation and compact models

systematically test model architectures
multilingual distillation
better data selection for student models
quantization and tuned int8 models (train alphas) - see browsermt/students
lexical shortlists
better integration into translateLocally / OPUS-MT-app