Taku Kudo
|
41c4b7f080
|
returns unicode characetr offsets in normalize method
|
2024-01-22 07:19:04 +00:00 |
|
Taku Kudo
|
6b468a0e01
|
support bytes output in decode method
|
2024-01-20 08:16:17 +00:00 |
|
Taku Kudo
|
de1747bbd4
|
added functionality to override normalizer spec
|
2024-01-16 04:06:05 +00:00 |
|
Taku Kudo
|
ed76ecc478
|
add more advanced SentencePieceNormalizer class
|
2024-01-13 17:19:50 +00:00 |
|
Taku Kudo
|
06eee09847
|
Added Normalization API
|
2024-01-04 09:04:20 +00:00 |
|
Taku Kudo
|
96aabaef96
|
add set_min_log_level function to python to change the loglevel from python wrapper.
|
2023-12-23 09:28:40 +00:00 |
|
Taku Kudo
|
8cbdf13794
|
Improves the thread utilization in batch encoding/decoding
|
2023-08-05 09:01:02 +00:00 |
|
Taku Kudo
|
635fe8423a
|
Upgrade the sentencepiece_model_pb2.py and sentencepiece.py
|
2023-07-01 02:38:33 +00:00 |
|
Taku Kudo
|
f2219b53e2
|
prepare for 0.2.00
|
2023-05-14 14:35:14 +00:00 |
|
Taku Kudo
|
69d34c7171
|
prepare for v0.1.99
|
2023-04-15 06:33:01 +00:00 |
|
Taku Kudo
|
e58bb684d0
|
add pretokenization_delimiter options. Initialize seed pieces more accurately.
|
2023-04-10 02:11:37 +00:00 |
|
Chris Ha
|
9b53e211e8
|
Update sentencepiece_python_module_example.ipynb
fix typo
|
2023-04-08 23:26:13 +09:00 |
|
Taku Kudo
|
f54d8ba070
|
includes the sentencepiece source files in python source package
|
2023-04-04 03:15:11 +00:00 |
|
Taku Kudo
|
ba466a6bae
|
prepare for 0.1.98
|
2023-04-02 18:06:40 +00:00 |
|
Taku Kudo
|
c0766c9870
|
added option to /MT flag
|
2023-04-02 16:56:20 +00:00 |
|
kyoto7250
|
2ba0a5aae3
|
fix the path in add_new_vocab.ipynb
Because the location of the path is different from when it was committed
|
2022-12-12 15:39:18 +09:00 |
|
Aleksey Morozov
|
df5f7fdfc6
|
Fixed errors in example notebook
|
2022-08-09 15:15:30 +03:00 |
|
Taku Kudo
|
58f256cf6f
|
Updated the document
|
2022-08-06 20:41:00 +09:00 |
|
Taku Kudo
|
655b9447db
|
Updated the document.
|
2022-08-06 19:24:41 +09:00 |
|
Taku Kudo
|
881229aeea
|
Updated the document
|
2022-08-05 19:05:52 +09:00 |
|
Taku Kudo
|
5a53be25ba
|
support slice in pieces/nbests objects
|
2022-08-05 16:34:44 +09:00 |
|
Taku Kudo
|
c14eb2eae2
|
automatically detect the number of CPUs in batch processing.
|
2022-08-05 14:47:02 +09:00 |
|
Taku Kudo
|
b738153dd7
|
Uses property in immutable proto
|
2022-08-04 16:03:31 +09:00 |
|
Taku Kudo
|
497ee76bd9
|
Fixed test failure.
|
2022-08-03 17:20:01 +09:00 |
|
Taku Kudo
|
005ad28c4d
|
remove unused ifdef SWIG macro
|
2022-08-03 15:45:09 +09:00 |
|
Taku Kudo
|
1f21d38ced
|
Adds SWIGPYTHON flag
|
2022-08-03 12:45:31 +09:00 |
|
Taku Kudo
|
6e6add560c
|
Adds more unittests
|
2022-08-03 02:24:53 +09:00 |
|
Taku Kudo
|
13a877150e
|
Supports ImmutableSentencePieceText from python module
|
2022-08-01 17:19:09 +09:00 |
|
Taku Kudo
|
631420b84b
|
Uses absl::string_view as much as possible
|
2022-06-15 01:29:55 +09:00 |
|
Taku Kudo
|
5b21ad7804
|
Uses C++17 by default
|
2022-06-14 01:18:09 +09:00 |
|
Taku Kudo
|
1abd83621b
|
add test to use tab as user defined symbols..
|
2022-06-13 16:46:18 +09:00 |
|
Taku Kudo
|
91809e5c70
|
remove debug symbols from wheel package
|
2022-06-08 17:00:48 +09:00 |
|
Taku Kudo
|
c6aca036fc
|
remove debug symbols from wheel package
|
2022-06-08 16:38:21 +09:00 |
|
Taku Kudo
|
39b902a34f
|
update python wrapper.
|
2022-06-08 15:22:20 +09:00 |
|
Taku Kudo
|
b2fd284592
|
update python wrapper.
|
2022-06-08 02:22:21 +09:00 |
|
Taku Kudo
|
2f44ee41e3
|
Uses build/root dir to make python wrapper
|
2022-06-04 11:55:44 +09:00 |
|
Taku Kudo
|
4f55d8f3f4
|
update setup.py
|
2022-06-04 00:46:21 +09:00 |
|
Taku Kudo
|
a57b326d89
|
update setup.py
|
2022-06-03 13:41:40 +09:00 |
|
Taku Kudo
|
4b3d6bfa9d
|
update setup.py
|
2022-06-03 01:02:18 +09:00 |
|
Taku Kudo
|
7a5d14cfdf
|
update setup.py
|
2022-06-03 00:55:38 +09:00 |
|
Taku Kudo
|
3028663ac1
|
update setup.py
|
2022-06-03 00:19:14 +09:00 |
|
Taku Kudo
|
188f8ce9a6
|
update setup.py
|
2022-06-02 00:53:49 +09:00 |
|
Taku Kudo
|
bc28729d7b
|
update setup.py
|
2022-06-02 00:41:53 +09:00 |
|
Taku Kudo
|
c1e40b7278
|
update setup.py
|
2022-06-02 00:33:19 +09:00 |
|
Taku Kudo
|
a61584b770
|
update setup.py
|
2022-06-02 00:04:45 +09:00 |
|
Taku Kudo
|
7b326ebd88
|
updated the python setup script for github actions
|
2022-06-01 19:50:11 +09:00 |
|
Taku Kudo
|
b108472a70
|
fixed CI errors
|
2022-05-31 02:50:10 +09:00 |
|
Taku Kudo
|
60bb2062d3
|
updated test case
|
2022-05-31 01:17:18 +09:00 |
|
Taku Kudo
|
c86a8a62de
|
addd nbest|sample encoding method to python wrapper
|
2022-05-30 19:39:57 +09:00 |
|
Taku Kudo
|
7d8fabefcb
|
1) override logging stream in training, 2) Makes 1-best and viterbi decoding identical
|
2022-05-30 01:50:59 +09:00 |
|