mirror of
https://github.com/Helsinki-NLP/OPUS-MT-train.git
synced 2024-12-11 10:23:20 +03:00
3.5 KiB
3.5 KiB
Models for Sami languages
Recipes for training multilingual language models involving Sami languages.
Overview
Relevant makefiles:
Main recipes:
sami-data
: fetch and convert external data sets and prepare the train/dev/test splitssami-train
: train a multilingual model for Sami languagessami-eval
: evaluate the multilingual model abovesami-dist
: create a release package for the multilingual model
See also implict rules for additional common recipes.
Recipes for back-translation (go to sub-directory backtranslate
):
sami-corp
: download monolingual Sami corpora from Giellateknotranslate-sami-corp
: translate monolingual Sami corpora with a multilingual model (hardcoded in backtranslate/Makefile)translate-sami-wiki
: translate Northern Sami wiki data to all kinds of languages using a multilingual model (hardcoded in backtranslate/Makefile) and translate wiki data for a selected number of languages (currently: no, nn ru, sv, en) to Sami languages using the same multilingual modeltranslate-sami
: do both of the things above (translate-sami-corp
andtranslate-sami-wiki
)translate-sami-xx-wiki
: translate wiki data from Northern Sami to other Sami languages, Finnish, Norwegian, and Swedish using a "sami-xx" modeltranslate-xx-sami-wiki
: translate wiki data from Finnish, Norwegian, and Swedish to Sami languages using a "xx-sami" modeltranslate-sami-xx-corp
: translate monolingual corpora from Giellatekno from Sami languages to Finnish, Norwegian, and Swedish using a "sami-xx" model
Recipes for pivot-based translation (go to sub-directory pivoting
), set SRC
(e.g. fi), TRG
(e.g. se) and PIVOT
(e.g. nb)
all
: fetch model, prepare data sets and translateprepare
: prepare the data sets (pivot bitexts))translate
: translate the pivot bitexts
Data-related recipes:
fetch-sami-tmx
: fetch translation memories from Giellateknoconvert-sami-tmx
: convert the TMX files from abovemerge-sami-data
: merge the converted TMX files into one bitextconvert-sami-gloss
: convert bilingual glossaries from Giellatekno
Parameters / variables:
GIELLATEKNO_HOME
: URL for Giellatekno resources (default: https://victorio.uit.no/biggies/trunk)GIELLATEKNO_TM_HOME
: directory of translation memories (default: ${GIELLATEKNO_HOME}/mt/omegat)GIELLATEKNO_SAMI_TM
: list of translation memories to be downloaded (see lib/models/sami.mk)
Implicit rules:
%-sami
: run recipe for a multilingual model including Sami languages (e.g.make train-sami
)%-sami-xx
: run recipe for a model that translates Sami languages to selected other languages%-xx-sami
: run recipe for a model that translates selected languages to Sami languages%-bt
: include back-translated data (can be combined with the implicit rules above, e.g.make train-bt-sami
)%-pivot
: include pivot-based translations (can be combined with the implicit rules above, e.g.make train-pivot-sami
)