mirror of
https://github.com/Helsinki-NLP/OPUS-MT-train.git
synced 2024-09-11 20:27:19 +03:00
294 B
294 B
Things to do
- add backtranslations to training data
- can use monolingual data from tokenized wikipedia dumps: https://sites.google.com/site/rmyeid/projects/polyglot
- https://dumps.wikimedia.org/backup-index.html
- better in JSON: https://dumps.wikimedia.org/other/cirrussearch/current/