sentencepiece/src
2023-05-14 09:08:39 +00:00
..
builtin_pb add pretokenization_delimiter options. Initialize seed pieces more accurately. 2023-04-10 02:11:37 +00:00
bpe_model_test.cc Fixed windows build failure 2020-05-10 02:01:28 +09:00
bpe_model_trainer_test.cc Use absl::flags 2020-06-01 00:53:07 +09:00
bpe_model_trainer.cc Fix bugs in the handling of duplicated bigrams 2023-04-24 07:25:10 +00:00
bpe_model_trainer.h Sync internal to github. DP related features are added. 2022-05-25 14:03:45 +09:00
bpe_model.cc Added ImmutableSentencePiece class 2022-06-20 00:55:46 +09:00
bpe_model.h clear description for alpha of BPE-dropout 2020-09-04 15:58:39 +02:00
builder_test.cc Use absl::flags 2020-06-01 00:53:07 +09:00
builder.cc Uses absl::string_view as much as possible 2022-06-15 01:29:55 +09:00
builder.h Uses absl::string_view as much as possible 2022-06-15 01:29:55 +09:00
char_model_test.cc Initial release of 0.19. Merged internal sentencepiece. 2020-05-08 01:06:50 +09:00
char_model_trainer_test.cc Use absl::flags 2020-06-01 00:53:07 +09:00
char_model_trainer.cc merges internal changes to github 2020-10-13 13:02:56 +09:00
char_model_trainer.h Port absl::flat_hash_map 2020-06-02 01:56:48 +09:00
char_model.cc stop normalization for user_defined_symbols 2018-11-08 17:26:14 +09:00
char_model.h Port absl::flat_hash_map 2020-06-02 01:56:48 +09:00
CMakeLists.txt Fixes build test errors in big-endian machines 2023-05-14 09:08:39 +00:00
common.h Fixes build test errors in big-endian machines 2023-05-14 09:08:39 +00:00
compile_charsmap_main.cc added ShutdownLibrary function to uninitialize global variables 2022-08-20 23:34:37 +09:00
error.cc added ShutdownLibrary function to uninitialize global variables 2022-08-20 23:34:37 +09:00
filesystem_test.cc Use absl::flags 2020-06-01 00:53:07 +09:00
filesystem.cc Initial release of 0.19. Merged internal sentencepiece. 2020-05-08 01:06:50 +09:00
filesystem.h Initial release of 0.19. Merged internal sentencepiece. 2020-05-08 01:06:50 +09:00
freelist_test.cc Sync internal to github. DP related features are added. 2022-05-25 14:03:45 +09:00
freelist.h Sync internal to github. DP related features are added. 2022-05-25 14:03:45 +09:00
init_test.cc change the type of input_sentence_size from int32 to uint64 2021-01-08 16:20:57 +09:00
init.h Fixes include path when using external protobuf 2023-04-10 10:15:46 +00:00
model_factory_test.cc Initialize repository 2017-03-07 19:43:50 +09:00
model_factory.cc Initial release of 0.19. Merged internal sentencepiece. 2020-05-08 01:06:50 +09:00
model_factory.h Port absl::flat_hash_map 2020-06-02 01:56:48 +09:00
model_interface_test.cc Added ImmutableSentencePiece class 2022-06-20 00:55:46 +09:00
model_interface.cc sync from internal 2021-06-16 19:04:14 +09:00
model_interface.h Added ImmutableSentencePiece class 2022-06-20 00:55:46 +09:00
normalization_rule.h sync from internal 2021-06-16 19:04:14 +09:00
normalizer_test.cc sync from internal 2021-06-16 19:04:14 +09:00
normalizer.cc Fixes build test errors in big-endian machines 2023-05-14 09:08:39 +00:00
normalizer.h Sync internal to github. DP related features are added. 2022-05-25 14:03:45 +09:00
pretokenizer_for_training_test.cc add pretokenization_delimiter options. Initialize seed pieces more accurately. 2023-04-10 02:11:37 +00:00
pretokenizer_for_training.cc add pretokenization_delimiter options. Initialize seed pieces more accurately. 2023-04-10 02:11:37 +00:00
pretokenizer_for_training.h add pretokenization_delimiter options. Initialize seed pieces more accurately. 2023-04-10 02:11:37 +00:00
sentencepiece_model.proto add pretokenization_delimiter options. Initialize seed pieces more accurately. 2023-04-10 02:11:37 +00:00
sentencepiece_processor_test.cc Adds more unittests 2022-08-03 02:49:38 +09:00
sentencepiece_processor.cc Fixed test failure. 2022-08-03 17:20:01 +09:00
sentencepiece_processor.h Fixed test failure. 2022-08-03 17:20:01 +09:00
sentencepiece_trainer_test.cc Port absl::flat_hash_map 2020-06-02 01:56:48 +09:00
sentencepiece_trainer.cc fixed link error 2021-06-17 01:56:17 +09:00
sentencepiece_trainer.h Uses absl::string_view as much as possible 2022-06-15 01:29:55 +09:00
sentencepiece.proto Initial release of 0.19. Merged internal sentencepiece. 2020-05-08 01:06:50 +09:00
spec_parser.h add pretokenization_delimiter options. Initialize seed pieces more accurately. 2023-04-10 02:11:37 +00:00
spm_decode_main.cc added ShutdownLibrary function to uninitialize global variables 2022-08-20 23:34:37 +09:00
spm_encode_main.cc added ShutdownLibrary function to uninitialize global variables 2022-08-20 23:34:37 +09:00
spm_export_vocab_main.cc added ShutdownLibrary function to uninitialize global variables 2022-08-20 23:34:37 +09:00
spm_normalize_main.cc added ShutdownLibrary function to uninitialize global variables 2022-08-20 23:34:37 +09:00
spm_train_main.cc add pretokenization_delimiter options. Initialize seed pieces more accurately. 2023-04-10 02:11:37 +00:00
test_main.cc added ShutdownLibrary function to uninitialize global variables 2022-08-20 23:34:37 +09:00
testharness.cc support to build spm with external absl 2021-01-08 11:33:31 +09:00
testharness.h support to build spm with external absl 2021-01-08 11:33:31 +09:00
trainer_factory_test.cc Initial release of 0.19. Merged internal sentencepiece. 2020-05-08 01:06:50 +09:00
trainer_factory.cc Initial release of 0.19. Merged internal sentencepiece. 2020-05-08 01:06:50 +09:00
trainer_factory.h Port absl::flat_hash_map 2020-06-02 01:56:48 +09:00
trainer_interface_test.cc support pretokenization in BPE mode. 2023-04-11 06:48:08 +00:00
trainer_interface.cc increases the max number of threads 2023-04-30 17:37:15 +00:00
trainer_interface.h Sync internal to github. DP related features are added. 2022-05-25 14:03:45 +09:00
unicode_script_map.h Port absl::flat_hash_map 2020-06-02 01:56:48 +09:00
unicode_script_test.cc Initial release of 0.19. Merged internal sentencepiece. 2020-05-08 01:06:50 +09:00
unicode_script.cc Port absl::flat_hash_map 2020-06-02 01:56:48 +09:00
unicode_script.h fix address sanitizers on clang problem 2021-12-21 19:52:47 +08:00
unigram_model_test.cc Fix the test error on windows 2023-04-28 06:20:50 +00:00
unigram_model_trainer_test.cc Fixes build test errors in big-endian machines 2023-05-14 09:08:39 +00:00
unigram_model_trainer.cc Fix the ULM training bugs 2023-04-27 17:32:57 +00:00
unigram_model_trainer.h Fix the ULM training bugs 2023-04-27 17:32:57 +00:00
unigram_model.cc Fix the ULM training bugs 2023-04-27 17:32:57 +00:00
unigram_model.h Added ImmutableSentencePiece class 2022-06-20 00:55:46 +09:00
util_test.cc Use absl::flags 2020-06-01 00:53:07 +09:00
util.cc make the error message more descriptive. null termnate string in Utf8ToWide 2023-04-03 02:24:52 +00:00
util.h fixes IS_BIGENDIAN macro places 2023-04-10 02:28:20 +00:00
word_model_test.cc Port absl::flat_hash_map 2020-06-02 01:56:48 +09:00
word_model_trainer_test.cc Use absl::flags 2020-06-01 00:53:07 +09:00
word_model_trainer.cc merges internal changes to github 2020-10-13 13:02:56 +09:00
word_model_trainer.h Port absl::flat_hash_map 2020-06-02 01:56:48 +09:00
word_model.cc Initial release of 0.19. Merged internal sentencepiece. 2020-05-08 01:06:50 +09:00
word_model.h Port absl::flat_hash_map 2020-06-02 01:56:48 +09:00