Commit Graph

97 Commits

Author SHA1 Message Date
Joerg Tiedemann
e2bc2acb3b re-organised targets for multilingual models of language groups 2020-06-27 12:29:50 +03:00
Joerg Tiedemann
9e186d82d6 bugfix in tatoeba data extraction for multilingual data files (language code clash) 2020-06-25 00:45:25 +03:00
Joerg Tiedemann
844f8bf72a removed unnecessary pre-processing for chinese 2020-06-19 16:12:06 +03:00
Joerg Tiedemann
b7f45e2a74 more details in model config 2020-06-18 20:50:22 +03:00
Joerg Tiedemann
4e18da6e4c fix chinese/korean/japanese language codes 2020-06-17 22:02:39 +03:00
Joerg Tiedemann
e141772b34 fixed multilingual tatoeba evaluation 2020-06-11 00:54:40 +03:00
Joerg Tiedemann
cc16be10d4 final fixes to multilingual tatoeba model scripts 2020-06-09 11:19:58 +03:00
Joerg Tiedemann
b7691875c2 tatoeba models now operational 2020-06-09 00:12:16 +03:00
Joerg Tiedemann
035cca7c1a fixed tatoeba model scripts 2020-06-08 17:24:39 +03:00
Joerg Tiedemann
e07eb14984 fit-data-size fixed 2020-06-08 14:14:55 +03:00
Joerg Tiedemann
6cb9959e82 tatoeba challenge model scripts updated 2020-06-06 20:49:54 +03:00
Joerg Tiedemann
edaf361803 multilingual tatoeba models and some documentation added 2020-06-03 15:39:18 +03:00
Joerg Tiedemann
c44e92d52a fixed bug in tatoeba model call 2020-06-03 01:09:28 +03:00
Joerg Tiedemann
eeaef7768c tatoeba models added 2020-06-03 00:16:21 +03:00
Joerg Tiedemann
ec43fcd30a fixed a bug in eval-testsets 2020-05-29 14:43:36 +03:00
Joerg Tiedemann
d0a217cf40 wikimatrix models added 2020-05-21 20:51:38 +03:00
Joerg Tiedemann
716d7b52c1 fixed testset names and backtranslation sentence splitting 2020-05-20 23:19:48 +03:00
Joerg Tiedemann
04d72ff8ed fixes with pivoting 2020-05-18 21:36:53 +03:00
Joerg Tiedemann
b01b4f22c3 pivot-based translations added 2020-05-17 22:43:05 +03:00
Joerg Tiedemann
1246bcd271 added some size info to train data README 2020-05-17 01:21:57 +03:00
Joerg Tiedemann
198c779e91 make cascade job for train + backtranslate + retrain 2020-05-15 20:59:05 +03:00
Joerg Tiedemann
37a83a9eba information about license for pre-trained models added 2020-05-15 20:01:07 +03:00
Joerg Tiedemann
cb3b77573e make it possible to exclude certain data sets 2020-05-14 10:36:46 +03:00
Joerg Tiedemann
7ef908dcd7 translate with backtranslations 2020-05-13 00:41:07 +03:00
Joerg Tiedemann
e4455e510a a bit more info added for data sets 2020-05-09 22:33:33 +03:00
Joerg Tiedemann
c98cc9bf26 celtic - english 2020-05-08 20:05:23 +03:00
Joerg Tiedemann
d4b71e0261 fixed includes in backtranslate/evaluate/finetune makefiles 2020-05-07 22:51:31 +03:00
Joerg Tiedemann
c703bb4c2b fixed file name for wikimedia.mk and added memad-multi model 2020-05-07 19:55:28 +03:00
Joerg Tiedemann
79cc3d66f0 removed old makefiles 2020-05-03 21:56:08 +03:00
Joerg Tiedemann
5404f515aa new makefile structure 2020-05-03 21:46:30 +03:00
Joerg Tiedemann
6b8e69269a better division of the massive tasks makefile 2020-05-03 20:27:55 +03:00
Joerg Tiedemann
49d8c77444 new model for en-ml 2020-04-28 23:43:36 +03:00
Joerg Tiedemann
3f292fd7b8 all models 2020-04-27 13:56:40 +03:00
Joerg Tiedemann
9ba784419e updates celtic model 2020-04-24 13:30:16 +03:00
Joerg Tiedemann
ea2b283ad4 new sami model 2020-04-19 19:48:01 +03:00
Joerg Tiedemann
e5e58d1a37 fixed problem with missing link in reverse-data 2020-04-19 01:08:26 +03:00
Joerg Tiedemann
58f042d127 add local config parameters 2020-04-18 21:40:52 +03:00
Joerg Tiedemann
294175f0fe fixed sami models 2020-04-18 01:05:02 +03:00
Joerg Tiedemann
86e1b06da4 remove wrong celtic model 2020-04-11 15:12:10 +03:00
Joerg Tiedemann
afa2194c27 new celtic models 2020-04-11 15:05:07 +03:00
Joerg Tiedemann
d49a791cc7 some new models 2020-04-11 14:50:39 +03:00
Joerg Tiedemann
f508bb4df6 use only latest backtranslation 2020-04-01 20:18:06 +03:00
Joerg Tiedemann
24fd67cc99 sami model update 2020-03-29 11:21:39 +03:00
Joerg Tiedemann
08c17af2ee sami 2020-03-27 22:30:51 +02:00
Joerg Tiedemann
f4fdb304a5 sami language task added 2020-03-26 22:50:21 +02:00
Joerg Tiedemann
14f6ef808a more data for cy-en 2020-03-25 20:40:29 +02:00
Joerg Tiedemann
93f03a1fe7 backtranslation data for multilingual models 2020-03-24 23:47:57 +02:00
Joerg Tiedemann
3bc480db1b celtic and romance language tasks added 2020-03-22 21:18:29 +02:00
Joerg Tiedemann
c94abcbb3f finetuned packages 2020-03-21 21:36:29 +02:00
Joerg Tiedemann
87551ac387 target for extracting text from all wikis 2020-03-20 15:32:29 +02:00