Commit Graph

13 Commits

Author SHA1 Message Date
Taku Kudo
9ca65fa9b6 add optional NFKD support. 2022-05-29 11:43:42 +09:00
Taku Kudo
cbfc6b3c2c updated *.tsv file. 2021-06-18 01:06:29 +09:00
Sarubi
c970dedd8f Removed codes where Zero Width Joiner replaced with whitespace. 2021-02-23 20:47:25 +05:30
Taku Kudo
329383b455 Initial release of 0.19. Merged internal sentencepiece. 2020-05-08 01:06:50 +09:00
Taku Kudo
18c337f32d remove control characters in the default nmt_* normalizers 2019-01-10 17:16:53 +09:00
Taku Kudo
e4a93f88c1 Do not parse deprecated proto fileds 2019-01-08 19:25:11 +09:00
Taku Kudo
7b19d68be0 use builtin protobuf-lite package in third_party 2019-01-08 11:40:08 +09:00
Taku Kudo
465a419250 pushed new nfkc_cf.tsv 2018-10-29 01:11:45 +09:00
Taku Kudo
573586854e Added normalization with Unicode case folding 2018-06-29 15:17:18 +09:00
Taku Kudo
cff66162c3 Remove empty lines from example data 2018-06-20 17:23:18 +09:00
Taku Kudo
a65ca0d829 Uses NMT_NFKC rule by default. 2018-06-10 01:15:34 +09:00
Taku Kudo
4e3bcf1373 Updated normalizer 2018-06-04 20:32:37 +09:00
Taku Kudo
2928ce5307 Initialize repository 2017-03-07 19:43:50 +09:00