mirror of
https://github.com/Helsinki-NLP/OPUS-MT-train.git
synced 2024-11-28 06:09:35 +03:00
.. | ||
Makefile | ||
README.md |
Translate data as synthetic training data
Use Wiki data:
- json processor: https://stedolan.github.io/jq/
- wiki JSON dumps: https://dumps.wikimedia.org/other/cirrussearch/current/
NOTE: this only works for SentencePiece models
TODO
- download base models from ObjectStorage
- make it work with multilingual models (need to adjust preprocess-scripts for those models)