Commit Graph

  • c0f838b3b9 fix else if Christopher Hong 2023-10-12 16:46:49 -0400
  • c585cb6059 cleanup Christopher Hong 2023-10-12 16:37:10 -0400
  • 90acc12100 automated replacement of third_party/absl with absl in includes Christopher Hong 2023-10-12 12:48:41 -0400
  • 5054d1d8fb Move absl into internal directory to facilitate include space separation in third party libraries Christopher Hong 2023-10-12 12:19:50 -0400
  • 0417dcb84c change include_directories to add third_party Christopher Hong 2023-10-12 12:16:36 -0400
  • 7c0f780f08 Allow external absl, change semantics for internal/external protobuf Christopher Hong 2023-10-12 11:59:08 -0400
  • d1a4bb4c12 Add some install instructions for finding the build path. Joe Rowell 2023-10-05 11:53:19 +0100
  • 159efa2d65 Make it compile Rohit Jain 2023-09-18 12:42:02 -0700
  • 4ca953c2cd Recompile protobuf files with protoc Rohit Jain 2023-09-18 12:10:37 -0700
  • 362d1c2e17 Merge branch 'gmaster' into rjai/update Rohit Jain 2023-09-18 12:05:53 -0700
  • 55a74850a9 improve a comment Quentin Carbonneaux 2023-08-31 04:37:42 -0700
  • 7c73e86159 support overriding the model add_dummy_prefix setting Quentin Carbonneaux 2023-08-31 04:02:33 -0700
  • 8cbdf13794 Improves the thread utilization in batch encoding/decoding Taku Kudo 2023-08-05 09:01:02 +0000
  • fb6f8e408d Fix compilation on MSVC 2022 XapaJIaMnu 2023-07-13 00:26:04 +0100
  • 635fe8423a Upgrade the sentencepiece_model_pb2.py and sentencepiece.py Taku Kudo 2023-07-01 02:38:33 +0000
  • 0386f3a9f4
    v5 Johannes Thiem 2023-06-07 18:26:15 +0200
  • 315d9e28cf
    v4 Johannes Thiem 2023-06-07 18:25:31 +0200
  • 4295c14e09 v3 Johannes Thiem 2023-06-07 16:20:42 +0000
  • 40afb53a2c v2 Johannes Thiem 2023-06-07 15:29:35 +0000
  • eb802c17de v1 Johannes Thiem 2023-06-07 15:23:36 +0000
  • f6d4ed8473 v3 Johannes Thiem 2023-06-07 14:19:49 +0000
  • 272de6a2f8 v2 Johannes Thiem 2023-06-07 13:01:31 +0000
  • 8d928ac999
    Merge pull request #1 from Fovty/v1 Johannes Thiem 2023-06-07 14:00:37 +0200
  • a152e17f8d
    Update README.md Johannes Thiem 2023-06-07 14:00:24 +0200
  • 150ce9d591 Fix encoding non-verbatim text Vadim Markovtsev 2023-05-31 16:52:41 +0200
  • ba480318ee Balance sorting and freq computation Vadim Markovtsev 2023-05-29 16:50:32 +0200
  • 17acb6d1e7 Optimize the parallel BPE iterations Vadim Markovtsev 2023-05-29 14:37:53 +0200
  • b6c76660bc Avoid the second overhead of GetCachedPairSymbol() Vadim Markovtsev 2023-05-29 11:39:01 +0200
  • cc8c56d0ab Make bpe position updates parallel Vadim Markovtsev 2023-05-28 23:38:58 +0200
  • ebd85f3089 Add cache only mode in GetPairSymbol Vadim Markovtsev 2023-05-27 00:24:45 +0200
  • aeb2bd785b Polish UpdateActiveSymbols Vadim Markovtsev 2023-05-27 00:13:19 +0200
  • f637bf041d Parallelize bpe pairs construction Vadim Markovtsev 2023-05-26 19:04:35 +0200
  • cc6a20faad Switch to 32-bit bpe pointers Vadim Markovtsev 2023-05-26 15:48:28 +0200
  • 429c830b18 Successfully load 80% of the Pile Vadim Markovtsev 2023-05-26 12:34:19 +0200
  • 8e28f03be2 Initialize bpe symbols in parallel Vadim Markovtsev 2023-05-26 11:42:38 +0200
  • 4f984b0583 Fix memory pressure in sorting final merged bpe cache Vadim Markovtsev 2023-05-26 10:30:38 +0200
  • 28f7bf106c Reduce memory pressure in bpe trainer symbols construction Vadim Markovtsev 2023-05-26 10:05:08 +0200
  • e8f8d3baf8 Implement merging cached bpe frequency dicts Vadim Markovtsev 2023-05-25 20:09:42 +0200
  • 7b694e4bdb
    Merge pull request #867 from vmarkovtsev/patch-1 Taku Kudo 2023-05-25 16:31:30 +0900
  • 4d156ed8cb Cache sentence frequencies to a file Vadim Markovtsev 2023-05-24 20:00:23 +0200
  • cb22883439
    Merge pull request #870 from ryandesign/ryandesign-protobuf-lite Taku Kudo 2023-05-25 01:27:11 +0900
  • 581efe6ed9 Link external protobuf statically Vadim Markovtsev 2023-05-24 16:20:28 +0200
  • d020431d36 Indicate progress in spm_encode Vadim Markovtsev 2023-05-23 21:36:32 +0200
  • 27f78bdf49 Parallel spm_encode Vadim Markovtsev 2023-05-23 20:18:18 +0200
  • 6faa2d5966 Get rid of TBB to sort in parallel Vadim Markovtsev 2023-05-23 16:38:48 +0200
  • 20f12dc460 Fix not adding all-whitespace pairs on --verbatim_control_char Vadim Markovtsev 2023-05-23 13:32:42 +0200
  • 8f26fad99e Ensure that Encode/Decode restores whitespace in the code Vadim Markovtsev 2023-05-23 00:33:57 +0200
  • 8d932e562b Support --verbatim_control_char to mark source code sentences Vadim Markovtsev 2023-05-22 16:59:28 +0200
  • 41835971b7 Fix pkg-config file to avoid overlinking Ryan Schmidt 2023-05-21 13:38:08 -0500
  • e081c671b2 Remove empty placeholders in pkg-config file Ryan Schmidt 2023-05-21 13:26:39 -0500
  • 20be455a65 Allow missing tcmalloc Vadim Markovtsev 2023-05-19 19:28:48 +0200
  • f64cbf48da Repair the python package build Vadim Markovtsev 2023-05-19 19:19:46 +0200
  • 23c0c09ce0 Update the built-in protobuf sources Vadim Markovtsev 2023-05-19 19:14:49 +0200
  • 527af0c0a7 Fix the overflow errors Vadim Markovtsev 2023-05-18 19:52:43 +0200
  • 3805cbb616
    Fix nasty bug in BPE position encoding Vadim Markovtsev 2023-05-18 19:39:30 +0200
  • 180a6fb2c7 Allow switching between internal and external abseil Vadim Markovtsev 2023-05-18 15:40:22 +0200
  • 8fc56b5db8 Add installation instructions and use external abseil Vadim Markovtsev 2023-05-18 15:11:30 +0200
  • 9e1557377a Add multiple memory and performance optimizations Vadim Markovtsev 2023-05-18 12:48:11 +0200
  • 35c2eba4a3 Add "nfkc_code" normalization rules Vadim Markovtsev 2023-05-18 12:44:09 +0200
  • f2219b53e2 prepare for 0.2.00 Taku Kudo 2023-05-14 14:35:14 +0000
  • 0b344d0b61 Added arm architecture Taku Kudo 2023-05-14 11:21:38 +0000
  • 2f66fbff33 Added arm architecture Taku Kudo 2023-05-14 11:17:25 +0000
  • 6693e7eb68 Fixes test workpath Taku Kudo 2023-05-14 10:57:55 +0000
  • b857ba94e9 Split build and test Taku Kudo 2023-05-14 10:51:31 +0000
  • fad8ae6def Added fail first flag Taku Kudo 2023-05-14 10:36:13 +0000
  • f2fcd859b3 Fixes cross build yaml Taku Kudo 2023-05-14 10:29:06 +0000
  • 6c901b0fb5 Fixes build test errors in big-endian machines Taku Kudo 2023-05-14 09:54:52 +0000
  • 17f9c6bd2c Fixes build test errors in big-endian machines Taku Kudo 2023-05-14 09:53:35 +0000
  • 827591a0c5 Fixes build test errors in big-endian machines Taku Kudo 2023-05-14 09:08:39 +0000
  • 3863f7648e increases the max number of threads v0.1.99 Taku Kudo 2023-04-30 17:37:15 +0000
  • 25b64fc630 Fix the test error on windows v0.1.99pre1 Taku Kudo 2023-04-28 06:20:50 +0000
  • bb0b610fae Fix the ULM training bugs Taku Kudo 2023-04-27 17:32:57 +0000
  • ba44ab1ca0 Fix bugs in the handling of duplicated bigrams Taku Kudo 2023-04-24 07:25:10 +0000
  • 69d34c7171 prepare for v0.1.99 Taku Kudo 2023-04-15 06:33:01 +0000
  • d9a2b216b1 Fix bugs the seed score computation. Taku Kudo 2023-04-15 05:59:52 +0000
  • 518c57c335 build wheel from sdist for testing v0.1.98 Taku Kudo 2023-04-12 07:41:58 +0000
  • fabfe3095b build wheel from sdist for testing Taku Kudo 2023-04-12 07:35:45 +0000
  • d6e597b391 build wheel from sdist for testing Taku Kudo 2023-04-12 07:24:31 +0000
  • f2884a17e9 test loacl sdist build on github actions Taku Kudo 2023-04-12 04:42:27 +0000
  • 609a2b7d88 test loacl sdist build on github actions Taku Kudo 2023-04-12 02:17:27 +0000
  • 8fd5c6b587 test loacl sdist build on github actions Taku Kudo 2023-04-12 01:43:39 +0000
  • e07ebf74d7 support pretokenization in BPE mode. v0.1.98pre1 Taku Kudo 2023-04-11 06:48:08 +0000
  • 119e58d97a Fixes include path when using external protobuf Taku Kudo 2023-04-10 10:15:46 +0000
  • 2b0713791a fixes IS_BIGENDIAN macro places Taku Kudo 2023-04-10 02:28:20 +0000
  • e58bb684d0 add pretokenization_delimiter options. Initialize seed pieces more accurately. Taku Kudo 2023-04-10 02:11:37 +0000
  • 531646653c Bug bounty test - please ignore.... (vcxzcf) gulugulu6 2023-04-09 21:19:05 +0000
  • 6c9fd791cf
    Merge pull request #845 from chris-ha458/patch-1 Taku Kudo 2023-04-09 17:13:58 +0900
  • 9b53e211e8
    Update sentencepiece_python_module_example.ipynb Chris Ha 2023-04-08 23:26:13 +0900
  • c032c261c2 automatically detect -latomic linker option Taku Kudo 2023-04-05 00:01:15 +0000
  • 5489c0a56a add -latomic in static linking Taku Kudo 2023-04-04 17:26:29 +0000
  • c945229958 updated set-output commands Taku Kudo 2023-04-04 15:51:06 +0000
  • 799c025aea creates sdist with build_sdist.sh Taku Kudo 2023-04-04 05:05:44 +0000
  • 59d84babc9 Ubuntu 18.04 to 20.04 migration Taku Kudo 2023-04-04 03:53:27 +0000
  • f54d8ba070 includes the sentencepiece source files in python source package Taku Kudo 2023-04-04 03:15:11 +0000
  • d0d1066dbf use /MD to build wheel package on windows Taku Kudo 2023-04-03 18:18:29 +0000
  • 573cc39aab make the error message more descriptive. null termnate string in Utf8ToWide Taku Kudo 2023-04-03 02:24:52 +0000
  • 359c04397c handle the exception of std::random_device Taku Kudo 2023-04-02 18:56:19 +0000
  • d4c58fc779 handle the exception of std::random_device Taku Kudo 2023-04-02 18:23:42 +0000
  • ba466a6bae prepare for 0.1.98 Taku Kudo 2023-04-02 18:06:40 +0000
  • c0766c9870 added option to /MT flag Taku Kudo 2023-04-02 16:56:20 +0000