Commit Graph

1464 Commits

Author SHA1 Message Date
austinvhuang
dcba6c0f16
id marshalling (WIP) 2021-04-11 10:51:38 -04:00
austinvhuang
83d68af8d4
prototype cleanup 2021-04-11 10:51:38 -04:00
austinvhuang
7f1c9918b6
marshall token values 2021-04-11 10:51:38 -04:00
austinvhuang
34534dd456
test getTokens array of strings return value 2021-04-11 10:51:38 -04:00
austinvhuang
29b1e0f9ce
working pre-tokenizer and roberta post processor 2021-04-11 10:51:38 -04:00
austinvhuang
68a2b9409e
remove incomplete implementations 2021-04-11 10:51:37 -04:00
austinvhuang
069f260c7e
compilation cleanup 2021-04-11 10:51:37 -04:00
austinvhuang
8d9aee0287
extending rust lib 2021-04-11 10:51:37 -04:00
austinvhuang
193bfaacfd
start work on tokenizer model artifact 2021-04-11 10:51:37 -04:00
austinvhuang
a54aef4247
add readme and makefile 2021-04-11 10:51:37 -04:00
austinvhuang
25c062acea
basic object marshalling 2021-04-11 10:51:37 -04:00
austinvhuang
f2594c95f9
format haskell shim 2021-04-11 10:51:37 -04:00
austinvhuang
3309bb27ea
test calling into rust from haskell 2021-04-11 10:51:37 -04:00
Anthony MOI
32b3b7a0f2 Python - Prepare for release 0.10.2 2021-04-05 16:47:55 -04:00
Anthony MOI
c3b3b29039 Rust - Add another test for Metaspace deserialization 2021-04-05 16:05:48 -04:00
Anthony MOI
e1627654b4 Fix Clippy warnings for Rust 1.51 2021-04-05 16:05:48 -04:00
Anthony MOI
659a835d04 Python - Accept kwargs in Metaspace constructor
This is mainly for backward compatibility with Metaspace objects that used to contain a `str_rep` field
2021-04-05 16:05:48 -04:00
Anthony MOI
a891e29c02 Rust - Remove str_rep from Metaspace serialization 2021-04-05 16:05:48 -04:00
Anthony MOI
0fe9214f44 Fix BPE continuing_subword_prefix 2021-03-18 14:39:52 -04:00
Anthony MOI
f5e9bb89b7 Fix offsets for Precompiled corner case 2021-03-16 15:04:42 -04:00
Anthony MOI
f12be3030f Try with ubuntu 18.04 2021-03-16 12:32:06 -04:00
Anthony MOI
53ab5a470c Allow unnecessary_wraps for node bindings 2021-03-16 12:32:06 -04:00
Anthony MOI
56a9196030 Fix clippy warnings 2021-03-16 12:32:06 -04:00
Anthony MOI
ee95e7f0cd
Actually fix the link to pepy.tech on downloads badge 2021-02-09 21:05:52 -05:00
Anthony MOI
1321dcf143
Hotfix link to pepy.tech on downloads badge 2021-02-09 21:04:56 -05:00
Anthony MOI
bc8bbf637a
Prepare for python v0.10.1 (#625) 2021-02-08 11:45:56 -05:00
Anthony MOI
d96442cbe8
Python - Prepare for release 0.10.1rc1 (#622) 2021-02-04 10:37:00 -05:00
Anthony MOI
57200144ca
Python - Fix ByteLevel instantiation from state (#621) 2021-02-04 10:16:05 -05:00
Anthony MOI
324cb8d380
CI - Fix conda build
Stealing this from the master @LysandreJik
2021-02-04 10:12:30 -05:00
Anthony MOI
a8f756494e
Improve Model serialization/deserialization (#620) 2021-02-04 09:59:18 -05:00
Anthony MOI
ce9325b714
Update README.md
Fix #609
2021-02-03 15:54:01 -05:00
Anthony MOI
6a29dbc070
Doc - Hotfix training from iterators tutorial 2021-02-03 15:50:09 -05:00
Anthony MOI
db22cb6315 Python - Fix Normalizer.normalize with PyNormalizedStringRefMut 2021-02-03 15:48:53 -05:00
Anthony MOI
355315e8d3 Rust - Fix offsets produced by Precompiled Normalizer 2021-02-03 15:46:45 -05:00
Anthony MOI
2c711d45ce CI - Force pyarrow<3.0.0 for now 2021-02-03 12:44:46 -05:00
Anthony MOI
a350ec3e72 Rust - Fix a bug in the Metaspace PreTokenizer 2021-02-03 12:44:46 -05:00
Anthony MOI
96b9972842 Fix SentencePiece tokenizers conversion 2021-02-03 12:44:46 -05:00
Anthony MOI
fc0a50a272 Update doc for Python 0.10.0 2021-01-12 16:47:56 -05:00
Anthony MOI
719bea76b9 Python - Prepare for release 0.10.0 2021-01-12 16:34:04 -05:00
devfon
b9c6bea75e
Add fuse_unk option to SentencePieceBPETokenizer (#574)
* Add fuse_unk option to SentencePieceBPETokenizer

* Fix style

Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
2021-01-12 16:07:59 -05:00
Anthony MOI
91dae1de15 Doc - Add documentation for training from iterators 2021-01-12 15:51:38 -05:00
François Garillot
7bee825238 Cleans up a few pattern-matches into their Option/Result equivalent 2021-01-12 15:48:15 -05:00
Anthony MOI
cca5d43038 Python - Fix breaking change in Model.save 2021-01-11 16:09:19 -05:00
Anthony MOI
49d11b1f69 Python - Add components getter/setters to BaseTokenizer 2021-01-11 16:08:38 -05:00
Anthony MOI
65b91966f7 Fix import Formatter with new serde 2021-01-11 15:55:49 -05:00
Anthony MOI
1990f51b9f Simplify Whitespace pre_tokenizer 2021-01-11 15:55:49 -05:00
Anthony MOI
d94fa220b6 Python - Add train_from_iterator to implementations 2021-01-07 09:02:20 -05:00
Anthony MOI
817c5ad317 Fix clippy warnings for rust 1.49 2021-01-06 15:03:33 -05:00
Anthony MOI
5938a12b3f Python - Improve training with iterators 2021-01-06 11:38:43 -05:00
Julien Chaumond
dad8d6249e
rm extraneous </a> (#573) 2021-01-06 11:37:37 -05:00