Commit Graph

  • e7b5260e4a
    Merge pull request #955 from pnacht/pinned-pip Taku Kudo 2024-01-03 12:29:39 +0900
  • 2909148446
    Merge pull request #957 from google/dependabot/github_actions/github-actions-bcafe21e81 Taku Kudo 2024-01-03 12:29:14 +0900
  • b888bd7295
    Bump the github-actions group with 2 updates dependabot[bot] 2024-01-01 15:40:30 +0000
  • eecae396a0
    Set up dependabot to keep them updated Pedro Kaj Kjellerup Nacht 2023-11-17 19:38:36 +0000
  • e31c147bb9
    Hash-pin CI/CD pip dependencies Pedro Kaj Kjellerup Nacht 2023-11-17 16:21:35 +0000
  • 3c2fc666dd
    Update common.h Taku Kudo 2023-12-24 02:00:56 +0900
  • 96aabaef96 add set_min_log_level function to python to change the loglevel from python wrapper. Taku Kudo 2023-12-23 09:28:40 +0000
  • bd3925a12e
    Merge pull request #936 from google/dependabot/github_actions/github-actions-a69c9d1746 Taku Kudo 2023-12-23 17:17:09 +0900
  • b937146233
    Merge pull request #938 from pnacht/token-permissions Taku Kudo 2023-12-23 17:16:39 +0900
  • a5262b57eb
    Merge pull request #947 from chenqy4933/master Taku Kudo 2023-12-23 17:16:25 +0900
  • 6b32c01286 merges internal changes to github exteranl repos Taku Kudo 2023-12-23 07:20:11 +0000
  • 20556863ed Refactor spm_encode_main Vadim Markovtsev 2023-12-19 16:49:03 +0100
  • d5e6d0389e Process the remainder of the input file Vadim Markovtsev 2023-12-19 15:50:53 +0100
  • fc1584ea36 fix(cmake): fix android build error ChenQiyou 2023-12-05 10:17:59 +0800
  • 3ced0ec995
    wheel.yml: Update cibuildwheel to pass in MacOS Pedro Kaj Kjellerup Nacht 2023-11-28 13:46:20 +0000
  • ce412d7c16 Add log lines instead of assert Razieh Behjati 2023-11-18 00:20:52 +0000
  • ad56e74c40
    Set minimal permissions for workflows Pedro Kaj Kjellerup Nacht 2023-11-17 21:38:13 +0000
  • 14f5e57dda
    Bump the github-actions group with 1 update dependabot[bot] 2023-11-17 03:49:37 +0000
  • 022f8c3fed
    Merge pull request #934 from pnacht/pinned-gha Taku Kudo 2023-11-17 12:49:04 +0900
  • 61265db0c5 Fixes after review Razieh Behjati 2023-11-15 14:36:08 +0000
  • c4bd5ea721 Use std::shared_ptr and a slow-down mechanism Razieh Behjati 2023-11-14 18:52:30 +0000
  • 02ea8ed099
    Add dependabot to monitor GHA Pedro Kaj Kjellerup Nacht 2023-11-13 20:50:39 +0000
  • eaf71c2d85
    Hash-pin GHA Pedro Kaj Kjellerup Nacht 2023-11-13 20:49:20 +0000
  • aa566d3632 Add ReadLineStdin to allow reading from stdin Razieh Behjati 2023-11-13 07:41:29 +0000
  • 1bce0469fb
    Fix a race condition issue yiyangh-ps 2023-11-03 16:07:10 +0800
  • d16e5da6fe Read from stdin Razieh Behjati 2023-10-30 12:41:22 +0000
  • 7f16648613 Refactor encoding using MixedTextCodeIterator Yiyang Hao 2023-10-31 15:50:51 +0000
  • 752a0c0c1e Remove 0x04 from encoding sequence Yiyang Hao 2023-10-26 15:33:57 +0000
  • a237ae4098 Remove unsed eos/bos/verbatim_control_char Yiyang Hao 2023-10-25 17:06:11 +0000
  • d092158de6 Fix build command Yiyang Hao 2023-10-25 17:05:56 +0000
  • 515e65b532 Support mixed code-text format Yiyang Hao 2023-10-25 17:00:03 +0000
  • dde84ebc7d Process lines fewer than 1000 Yiyang Hao 2023-10-23 09:21:11 +0000
  • f5f0392bc7 Refactor code Yiyang Hao 2023-10-31 13:09:05 +0000
  • 89859589d9 handle 0x01~0x04 delimiters Yiyang Hao 2023-10-26 15:30:12 +0000
  • dc661b824d
    Merge pull request #10 from poolsideai/devdocs rbehjati 2023-11-01 14:40:26 +0000
  • f0821a6bc0 Add development documentation Razieh Behjati 2023-10-31 16:50:58 +0000
  • 5cec47dbaa
    Fixing tests (#6) Kuba Podgórski 2023-10-28 20:44:46 +0200
  • ff48f834dd Fixing tests - step 1 Kuba Podgórski 2023-10-28 01:41:12 +0200
  • 3a7cebc401 Process lines fewer than 1000 Yiyang Hao 2023-10-23 09:21:11 +0000
  • 90db116878 Revert adding variable because it is automatically unset for some cmake reason Christopher Hong 2023-10-19 17:13:42 -0400
  • 47e1f4691b final fixes Christopher Hong 2023-10-19 17:08:01 -0400
  • 0403ababe2 fix minloglevel to spm_minloglevel Christopher Hong 2023-10-19 16:43:07 -0400
  • b76b1eed19 Fix defaults Christopher Hong 2023-10-19 16:32:04 -0400
  • c4308be3be Change minloglevel to spm_minloglevel to avoid global namespace conflict with other open source google libraries Christopher Hong 2023-10-17 18:32:17 -0400
  • b1a0293a9b Add missing external absl lib dep Christopher Hong 2023-10-13 21:29:43 -0400
  • 7211a84c7b fix namespace Christopher Hong 2023-10-12 19:09:42 -0400
  • e3f1bff839 fix includes Christopher Hong 2023-10-12 19:07:58 -0400
  • 678810df64 fix include Christopher Hong 2023-10-12 19:05:58 -0400
  • b6b46759c3 use spm_absl for random headers Christopher Hong 2023-10-12 18:12:21 -0400
  • a342cce3e7 fix Christopher Hong 2023-10-12 17:06:10 -0400
  • 8ae6d43e72 Include directories from absl package Christopher Hong 2023-10-12 16:54:12 -0400
  • c0f838b3b9 fix else if Christopher Hong 2023-10-12 16:46:49 -0400
  • c585cb6059 cleanup Christopher Hong 2023-10-12 16:37:10 -0400
  • 90acc12100 automated replacement of third_party/absl with absl in includes Christopher Hong 2023-10-12 12:48:41 -0400
  • 5054d1d8fb Move absl into internal directory to facilitate include space separation in third party libraries Christopher Hong 2023-10-12 12:19:50 -0400
  • 0417dcb84c change include_directories to add third_party Christopher Hong 2023-10-12 12:16:36 -0400
  • 7c0f780f08 Allow external absl, change semantics for internal/external protobuf Christopher Hong 2023-10-12 11:59:08 -0400
  • d1a4bb4c12 Add some install instructions for finding the build path. Joe Rowell 2023-10-05 11:53:19 +0100
  • 159efa2d65 Make it compile Rohit Jain 2023-09-18 12:42:02 -0700
  • 4ca953c2cd Recompile protobuf files with protoc Rohit Jain 2023-09-18 12:10:37 -0700
  • 362d1c2e17 Merge branch 'gmaster' into rjai/update Rohit Jain 2023-09-18 12:05:53 -0700
  • 55a74850a9 improve a comment Quentin Carbonneaux 2023-08-31 04:37:42 -0700
  • 7c73e86159 support overriding the model add_dummy_prefix setting Quentin Carbonneaux 2023-08-31 04:02:33 -0700
  • 8cbdf13794 Improves the thread utilization in batch encoding/decoding Taku Kudo 2023-08-05 09:01:02 +0000
  • fb6f8e408d Fix compilation on MSVC 2022 XapaJIaMnu 2023-07-13 00:26:04 +0100
  • 635fe8423a Upgrade the sentencepiece_model_pb2.py and sentencepiece.py Taku Kudo 2023-07-01 02:38:33 +0000
  • 0386f3a9f4
    v5 Johannes Thiem 2023-06-07 18:26:15 +0200
  • 315d9e28cf
    v4 Johannes Thiem 2023-06-07 18:25:31 +0200
  • 4295c14e09 v3 Johannes Thiem 2023-06-07 16:20:42 +0000
  • 40afb53a2c v2 Johannes Thiem 2023-06-07 15:29:35 +0000
  • eb802c17de v1 Johannes Thiem 2023-06-07 15:23:36 +0000
  • f6d4ed8473 v3 Johannes Thiem 2023-06-07 14:19:49 +0000
  • 272de6a2f8 v2 Johannes Thiem 2023-06-07 13:01:31 +0000
  • 8d928ac999
    Merge pull request #1 from Fovty/v1 Johannes Thiem 2023-06-07 14:00:37 +0200
  • a152e17f8d
    Update README.md Johannes Thiem 2023-06-07 14:00:24 +0200
  • 150ce9d591 Fix encoding non-verbatim text Vadim Markovtsev 2023-05-31 16:52:41 +0200
  • ba480318ee Balance sorting and freq computation Vadim Markovtsev 2023-05-29 16:50:32 +0200
  • 17acb6d1e7 Optimize the parallel BPE iterations Vadim Markovtsev 2023-05-29 14:37:53 +0200
  • b6c76660bc Avoid the second overhead of GetCachedPairSymbol() Vadim Markovtsev 2023-05-29 11:39:01 +0200
  • cc8c56d0ab Make bpe position updates parallel Vadim Markovtsev 2023-05-28 23:38:58 +0200
  • ebd85f3089 Add cache only mode in GetPairSymbol Vadim Markovtsev 2023-05-27 00:24:45 +0200
  • aeb2bd785b Polish UpdateActiveSymbols Vadim Markovtsev 2023-05-27 00:13:19 +0200
  • f637bf041d Parallelize bpe pairs construction Vadim Markovtsev 2023-05-26 19:04:35 +0200
  • cc6a20faad Switch to 32-bit bpe pointers Vadim Markovtsev 2023-05-26 15:48:28 +0200
  • 429c830b18 Successfully load 80% of the Pile Vadim Markovtsev 2023-05-26 12:34:19 +0200
  • 8e28f03be2 Initialize bpe symbols in parallel Vadim Markovtsev 2023-05-26 11:42:38 +0200
  • 4f984b0583 Fix memory pressure in sorting final merged bpe cache Vadim Markovtsev 2023-05-26 10:30:38 +0200
  • 28f7bf106c Reduce memory pressure in bpe trainer symbols construction Vadim Markovtsev 2023-05-26 10:05:08 +0200
  • e8f8d3baf8 Implement merging cached bpe frequency dicts Vadim Markovtsev 2023-05-25 20:09:42 +0200
  • 7b694e4bdb
    Merge pull request #867 from vmarkovtsev/patch-1 Taku Kudo 2023-05-25 16:31:30 +0900
  • 4d156ed8cb Cache sentence frequencies to a file Vadim Markovtsev 2023-05-24 20:00:23 +0200
  • cb22883439
    Merge pull request #870 from ryandesign/ryandesign-protobuf-lite Taku Kudo 2023-05-25 01:27:11 +0900
  • 581efe6ed9 Link external protobuf statically Vadim Markovtsev 2023-05-24 16:20:28 +0200
  • d020431d36 Indicate progress in spm_encode Vadim Markovtsev 2023-05-23 21:36:32 +0200
  • 27f78bdf49 Parallel spm_encode Vadim Markovtsev 2023-05-23 20:18:18 +0200
  • 6faa2d5966 Get rid of TBB to sort in parallel Vadim Markovtsev 2023-05-23 16:38:48 +0200
  • 20f12dc460 Fix not adding all-whitespace pairs on --verbatim_control_char Vadim Markovtsev 2023-05-23 13:32:42 +0200
  • 8f26fad99e Ensure that Encode/Decode restores whitespace in the code Vadim Markovtsev 2023-05-23 00:33:57 +0200
  • 8d932e562b Support --verbatim_control_char to mark source code sentences Vadim Markovtsev 2023-05-22 16:59:28 +0200
  • 41835971b7 Fix pkg-config file to avoid overlinking Ryan Schmidt 2023-05-21 13:38:08 -0500