Commit Graph

162 Commits

Author SHA1 Message Date
Taku Kudo
41c4b7f080 returns unicode characetr offsets in normalize method 2024-01-22 07:19:04 +00:00
Taku Kudo
6b468a0e01 support bytes output in decode method 2024-01-20 08:16:17 +00:00
Taku Kudo
de1747bbd4 added functionality to override normalizer spec 2024-01-16 04:06:05 +00:00
Taku Kudo
ed76ecc478 add more advanced SentencePieceNormalizer class 2024-01-13 17:19:50 +00:00
Taku Kudo
06eee09847 Added Normalization API 2024-01-04 09:04:20 +00:00
Taku Kudo
96aabaef96 add set_min_log_level function to python to change the loglevel from python wrapper. 2023-12-23 09:28:40 +00:00
Taku Kudo
8cbdf13794 Improves the thread utilization in batch encoding/decoding 2023-08-05 09:01:02 +00:00
Taku Kudo
635fe8423a Upgrade the sentencepiece_model_pb2.py and sentencepiece.py 2023-07-01 02:38:33 +00:00
Taku Kudo
f2219b53e2 prepare for 0.2.00 2023-05-14 14:35:14 +00:00
Taku Kudo
69d34c7171 prepare for v0.1.99 2023-04-15 06:33:01 +00:00
Taku Kudo
e58bb684d0 add pretokenization_delimiter options. Initialize seed pieces more accurately. 2023-04-10 02:11:37 +00:00
Chris Ha
9b53e211e8
Update sentencepiece_python_module_example.ipynb
fix typo
2023-04-08 23:26:13 +09:00
Taku Kudo
f54d8ba070 includes the sentencepiece source files in python source package 2023-04-04 03:15:11 +00:00
Taku Kudo
ba466a6bae prepare for 0.1.98 2023-04-02 18:06:40 +00:00
Taku Kudo
c0766c9870 added option to /MT flag 2023-04-02 16:56:20 +00:00
kyoto7250
2ba0a5aae3
fix the path in add_new_vocab.ipynb
Because the location of the path is different from when it was committed
2022-12-12 15:39:18 +09:00
Aleksey Morozov
df5f7fdfc6 Fixed errors in example notebook 2022-08-09 15:15:30 +03:00
Taku Kudo
58f256cf6f Updated the document 2022-08-06 20:41:00 +09:00
Taku Kudo
655b9447db Updated the document. 2022-08-06 19:24:41 +09:00
Taku Kudo
881229aeea Updated the document 2022-08-05 19:05:52 +09:00
Taku Kudo
5a53be25ba support slice in pieces/nbests objects 2022-08-05 16:34:44 +09:00
Taku Kudo
c14eb2eae2 automatically detect the number of CPUs in batch processing. 2022-08-05 14:47:02 +09:00
Taku Kudo
b738153dd7 Uses property in immutable proto 2022-08-04 16:03:31 +09:00
Taku Kudo
497ee76bd9 Fixed test failure. 2022-08-03 17:20:01 +09:00
Taku Kudo
005ad28c4d remove unused ifdef SWIG macro 2022-08-03 15:45:09 +09:00
Taku Kudo
1f21d38ced Adds SWIGPYTHON flag 2022-08-03 12:45:31 +09:00
Taku Kudo
6e6add560c Adds more unittests 2022-08-03 02:24:53 +09:00
Taku Kudo
13a877150e Supports ImmutableSentencePieceText from python module 2022-08-01 17:19:09 +09:00
Taku Kudo
631420b84b Uses absl::string_view as much as possible 2022-06-15 01:29:55 +09:00
Taku Kudo
5b21ad7804 Uses C++17 by default 2022-06-14 01:18:09 +09:00
Taku Kudo
1abd83621b add test to use tab as user defined symbols.. 2022-06-13 16:46:18 +09:00
Taku Kudo
91809e5c70 remove debug symbols from wheel package 2022-06-08 17:00:48 +09:00
Taku Kudo
c6aca036fc remove debug symbols from wheel package 2022-06-08 16:38:21 +09:00
Taku Kudo
39b902a34f update python wrapper. 2022-06-08 15:22:20 +09:00
Taku Kudo
b2fd284592 update python wrapper. 2022-06-08 02:22:21 +09:00
Taku Kudo
2f44ee41e3 Uses build/root dir to make python wrapper 2022-06-04 11:55:44 +09:00
Taku Kudo
4f55d8f3f4 update setup.py 2022-06-04 00:46:21 +09:00
Taku Kudo
a57b326d89 update setup.py 2022-06-03 13:41:40 +09:00
Taku Kudo
4b3d6bfa9d update setup.py 2022-06-03 01:02:18 +09:00
Taku Kudo
7a5d14cfdf update setup.py 2022-06-03 00:55:38 +09:00
Taku Kudo
3028663ac1 update setup.py 2022-06-03 00:19:14 +09:00
Taku Kudo
188f8ce9a6 update setup.py 2022-06-02 00:53:49 +09:00
Taku Kudo
bc28729d7b update setup.py 2022-06-02 00:41:53 +09:00
Taku Kudo
c1e40b7278 update setup.py 2022-06-02 00:33:19 +09:00
Taku Kudo
a61584b770 update setup.py 2022-06-02 00:04:45 +09:00
Taku Kudo
7b326ebd88 updated the python setup script for github actions 2022-06-01 19:50:11 +09:00
Taku Kudo
b108472a70 fixed CI errors 2022-05-31 02:50:10 +09:00
Taku Kudo
60bb2062d3 updated test case 2022-05-31 01:17:18 +09:00
Taku Kudo
c86a8a62de addd nbest|sample encoding method to python wrapper 2022-05-30 19:39:57 +09:00
Taku Kudo
7d8fabefcb 1) override logging stream in training, 2) Makes 1-best and viterbi decoding identical 2022-05-30 01:50:59 +09:00