Jörg Tiedemann
|
a61cf48443
|
add option to skip sentence piecce vocabs but use marian_vocab instead
|
2020-09-16 19:33:19 +03:00 |
|
Jörg Tiedemann
|
58fbf0bdd8
|
back to old subword model names
|
2020-09-14 08:53:57 +03:00 |
|
Jörg Tiedemann
|
c2798e9758
|
plain text vocab files from spm models
|
2020-09-13 22:17:21 +03:00 |
|
Jörg Tiedemann
|
24e92de56a
|
proper release packages for models with internal sentence piece vocabs
|
2020-09-13 00:00:15 +03:00 |
|
Jörg Tiedemann
|
666b2b8462
|
internal sentence piece models in transformers
|
2020-09-12 16:16:01 +03:00 |
|
Jörg Tiedemann
|
ddafb43d66
|
removed dependence on moses tools in preprocessing script for released spm packages
|
2020-09-12 14:42:10 +03:00 |
|
Jörg Tiedemann
|
1a6e29275d
|
dev data is now uniq to avoid overlaps with test data
|
2020-09-09 23:21:07 +03:00 |
|
Jörg Tiedemann
|
ad828c3124
|
started tutorial and fixes to backtranslate makefile
|
2020-09-05 00:16:22 +03:00 |
|
Tiedemann Jörg
|
d11f74ce41
|
added bpe submodule
|
2020-09-04 15:34:20 +03:00 |
|
Tiedemann
|
96eaad2d05
|
added possibility to fetch moses file from ObjectStore (instead of reading with opus_read)
|
2020-09-03 22:04:44 +03:00 |
|
Tiedemann
|
2332732577
|
make compatible with mac osx and include submodules for required tools
|
2020-09-02 15:52:34 +03:00 |
|
Joerg Tiedemann
|
639bd2adda
|
started documentation of project specific models
|
2020-08-28 15:51:37 +03:00 |
|
Joerg Tiedemann
|
94eeec13eb
|
take away dependence on local OPUS files for finding data
|
2020-08-27 22:36:50 +03:00 |
|
Joerg Tiedemann
|
dac6070069
|
started some more documentation
|
2020-08-26 09:59:24 +03:00 |
|
Joerg Tiedemann
|
1b913277b3
|
tatoeba language group models with various sample sizews
|
2020-07-25 22:52:33 +03:00 |
|
Joerg Tiedemann
|
ec6d7c7142
|
tatoeba langgroups
|
2020-07-04 23:37:39 +03:00 |
|
Joerg Tiedemann
|
7df91a9eaa
|
language group jobs with some more documentation
|
2020-06-29 12:26:45 +03:00 |
|
Joerg Tiedemann
|
e2bc2acb3b
|
re-organised targets for multilingual models of language groups
|
2020-06-27 12:29:50 +03:00 |
|
Joerg Tiedemann
|
9e186d82d6
|
bugfix in tatoeba data extraction for multilingual data files (language code clash)
|
2020-06-25 00:45:25 +03:00 |
|
Joerg Tiedemann
|
844f8bf72a
|
removed unnecessary pre-processing for chinese
|
2020-06-19 16:12:06 +03:00 |
|
Joerg Tiedemann
|
b7f45e2a74
|
more details in model config
|
2020-06-18 20:50:22 +03:00 |
|
Joerg Tiedemann
|
4e18da6e4c
|
fix chinese/korean/japanese language codes
|
2020-06-17 22:02:39 +03:00 |
|
Joerg Tiedemann
|
e141772b34
|
fixed multilingual tatoeba evaluation
|
2020-06-11 00:54:40 +03:00 |
|
Joerg Tiedemann
|
e07eb14984
|
fit-data-size fixed
|
2020-06-08 14:14:55 +03:00 |
|
Joerg Tiedemann
|
6cb9959e82
|
tatoeba challenge model scripts updated
|
2020-06-06 20:49:54 +03:00 |
|
Joerg Tiedemann
|
edaf361803
|
multilingual tatoeba models and some documentation added
|
2020-06-03 15:39:18 +03:00 |
|
Joerg Tiedemann
|
eeaef7768c
|
tatoeba models added
|
2020-06-03 00:16:21 +03:00 |
|
Joerg Tiedemann
|
ec43fcd30a
|
fixed a bug in eval-testsets
|
2020-05-29 14:43:36 +03:00 |
|
Joerg Tiedemann
|
716d7b52c1
|
fixed testset names and backtranslation sentence splitting
|
2020-05-20 23:19:48 +03:00 |
|
Joerg Tiedemann
|
b01b4f22c3
|
pivot-based translations added
|
2020-05-17 22:43:05 +03:00 |
|
Joerg Tiedemann
|
1246bcd271
|
added some size info to train data README
|
2020-05-17 01:21:57 +03:00 |
|
Joerg Tiedemann
|
cb3b77573e
|
make it possible to exclude certain data sets
|
2020-05-14 10:36:46 +03:00 |
|
Joerg Tiedemann
|
7ef908dcd7
|
translate with backtranslations
|
2020-05-13 00:41:07 +03:00 |
|
Joerg Tiedemann
|
e4455e510a
|
a bit more info added for data sets
|
2020-05-09 22:33:33 +03:00 |
|
Joerg Tiedemann
|
5404f515aa
|
new makefile structure
|
2020-05-03 21:46:30 +03:00 |
|
Joerg Tiedemann
|
6b8e69269a
|
better division of the massive tasks makefile
|
2020-05-03 20:27:55 +03:00 |
|