austinvhuang
|
dcba6c0f16
|
id marshalling (WIP)
|
2021-04-11 10:51:38 -04:00 |
|
austinvhuang
|
83d68af8d4
|
prototype cleanup
|
2021-04-11 10:51:38 -04:00 |
|
austinvhuang
|
7f1c9918b6
|
marshall token values
|
2021-04-11 10:51:38 -04:00 |
|
austinvhuang
|
34534dd456
|
test getTokens array of strings return value
|
2021-04-11 10:51:38 -04:00 |
|
austinvhuang
|
29b1e0f9ce
|
working pre-tokenizer and roberta post processor
|
2021-04-11 10:51:38 -04:00 |
|
austinvhuang
|
68a2b9409e
|
remove incomplete implementations
|
2021-04-11 10:51:37 -04:00 |
|
austinvhuang
|
069f260c7e
|
compilation cleanup
|
2021-04-11 10:51:37 -04:00 |
|
austinvhuang
|
8d9aee0287
|
extending rust lib
|
2021-04-11 10:51:37 -04:00 |
|
austinvhuang
|
193bfaacfd
|
start work on tokenizer model artifact
|
2021-04-11 10:51:37 -04:00 |
|
austinvhuang
|
a54aef4247
|
add readme and makefile
|
2021-04-11 10:51:37 -04:00 |
|
austinvhuang
|
25c062acea
|
basic object marshalling
|
2021-04-11 10:51:37 -04:00 |
|
austinvhuang
|
f2594c95f9
|
format haskell shim
|
2021-04-11 10:51:37 -04:00 |
|
austinvhuang
|
3309bb27ea
|
test calling into rust from haskell
|
2021-04-11 10:51:37 -04:00 |
|
Anthony MOI
|
32b3b7a0f2
|
Python - Prepare for release 0.10.2
|
2021-04-05 16:47:55 -04:00 |
|
Anthony MOI
|
c3b3b29039
|
Rust - Add another test for Metaspace deserialization
|
2021-04-05 16:05:48 -04:00 |
|
Anthony MOI
|
e1627654b4
|
Fix Clippy warnings for Rust 1.51
|
2021-04-05 16:05:48 -04:00 |
|
Anthony MOI
|
659a835d04
|
Python - Accept kwargs in Metaspace constructor
This is mainly for backward compatibility with Metaspace objects that used to contain a `str_rep` field
|
2021-04-05 16:05:48 -04:00 |
|
Anthony MOI
|
a891e29c02
|
Rust - Remove str_rep from Metaspace serialization
|
2021-04-05 16:05:48 -04:00 |
|
Anthony MOI
|
0fe9214f44
|
Fix BPE continuing_subword_prefix
|
2021-03-18 14:39:52 -04:00 |
|
Anthony MOI
|
f5e9bb89b7
|
Fix offsets for Precompiled corner case
|
2021-03-16 15:04:42 -04:00 |
|
Anthony MOI
|
f12be3030f
|
Try with ubuntu 18.04
|
2021-03-16 12:32:06 -04:00 |
|
Anthony MOI
|
53ab5a470c
|
Allow unnecessary_wraps for node bindings
|
2021-03-16 12:32:06 -04:00 |
|
Anthony MOI
|
56a9196030
|
Fix clippy warnings
|
2021-03-16 12:32:06 -04:00 |
|
Anthony MOI
|
ee95e7f0cd
|
Actually fix the link to pepy.tech on downloads badge
|
2021-02-09 21:05:52 -05:00 |
|
Anthony MOI
|
1321dcf143
|
Hotfix link to pepy.tech on downloads badge
|
2021-02-09 21:04:56 -05:00 |
|
Anthony MOI
|
bc8bbf637a
|
Prepare for python v0.10.1 (#625)
|
2021-02-08 11:45:56 -05:00 |
|
Anthony MOI
|
d96442cbe8
|
Python - Prepare for release 0.10.1rc1 (#622)
|
2021-02-04 10:37:00 -05:00 |
|
Anthony MOI
|
57200144ca
|
Python - Fix ByteLevel instantiation from state (#621)
|
2021-02-04 10:16:05 -05:00 |
|
Anthony MOI
|
324cb8d380
|
CI - Fix conda build
Stealing this from the master @LysandreJik
|
2021-02-04 10:12:30 -05:00 |
|
Anthony MOI
|
a8f756494e
|
Improve Model serialization/deserialization (#620)
|
2021-02-04 09:59:18 -05:00 |
|
Anthony MOI
|
ce9325b714
|
Update README.md
Fix #609
|
2021-02-03 15:54:01 -05:00 |
|
Anthony MOI
|
6a29dbc070
|
Doc - Hotfix training from iterators tutorial
|
2021-02-03 15:50:09 -05:00 |
|
Anthony MOI
|
db22cb6315
|
Python - Fix Normalizer.normalize with PyNormalizedStringRefMut
|
2021-02-03 15:48:53 -05:00 |
|
Anthony MOI
|
355315e8d3
|
Rust - Fix offsets produced by Precompiled Normalizer
|
2021-02-03 15:46:45 -05:00 |
|
Anthony MOI
|
2c711d45ce
|
CI - Force pyarrow<3.0.0 for now
|
2021-02-03 12:44:46 -05:00 |
|
Anthony MOI
|
a350ec3e72
|
Rust - Fix a bug in the Metaspace PreTokenizer
|
2021-02-03 12:44:46 -05:00 |
|
Anthony MOI
|
96b9972842
|
Fix SentencePiece tokenizers conversion
|
2021-02-03 12:44:46 -05:00 |
|
Anthony MOI
|
fc0a50a272
|
Update doc for Python 0.10.0
|
2021-01-12 16:47:56 -05:00 |
|
Anthony MOI
|
719bea76b9
|
Python - Prepare for release 0.10.0
|
2021-01-12 16:34:04 -05:00 |
|
devfon
|
b9c6bea75e
|
Add fuse_unk option to SentencePieceBPETokenizer (#574)
* Add fuse_unk option to SentencePieceBPETokenizer
* Fix style
Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
|
2021-01-12 16:07:59 -05:00 |
|
Anthony MOI
|
91dae1de15
|
Doc - Add documentation for training from iterators
|
2021-01-12 15:51:38 -05:00 |
|
François Garillot
|
7bee825238
|
Cleans up a few pattern-matches into their Option/Result equivalent
|
2021-01-12 15:48:15 -05:00 |
|
Anthony MOI
|
cca5d43038
|
Python - Fix breaking change in Model.save
|
2021-01-11 16:09:19 -05:00 |
|
Anthony MOI
|
49d11b1f69
|
Python - Add components getter/setters to BaseTokenizer
|
2021-01-11 16:08:38 -05:00 |
|
Anthony MOI
|
65b91966f7
|
Fix import Formatter with new serde
|
2021-01-11 15:55:49 -05:00 |
|
Anthony MOI
|
1990f51b9f
|
Simplify Whitespace pre_tokenizer
|
2021-01-11 15:55:49 -05:00 |
|
Anthony MOI
|
d94fa220b6
|
Python - Add train_from_iterator to implementations
|
2021-01-07 09:02:20 -05:00 |
|
Anthony MOI
|
817c5ad317
|
Fix clippy warnings for rust 1.49
|
2021-01-06 15:03:33 -05:00 |
|
Anthony MOI
|
5938a12b3f
|
Python - Improve training with iterators
|
2021-01-06 11:38:43 -05:00 |
|
Julien Chaumond
|
dad8d6249e
|
rm extraneous </a> (#573)
|
2021-01-06 11:37:37 -05:00 |
|