Commit Graph

235 Commits

Author SHA1 Message Date
Guillaume B
d0fc3ff40d Addition of multi-label classification prediction method for sequence classification pipeline
Version update
2020-05-20 17:51:29 +02:00
Guillaume B
33a623e54d Support of rust_tokenizers 3.1.0 2020-05-19 18:44:11 +02:00
guillaume-be
6ccf92fedb
Merge pull request #37 from guillaume-be/qa_optimization
Qa optimization
2020-05-16 07:58:00 +00:00
Guillaume B
b37ab31c66 Constant batch size for question answering, answers respect original input order 2020-05-16 09:42:45 +02:00
Guillaume B
2775ab1c1e handling of answers shorter than top_k 2020-05-16 09:29:25 +02:00
guillaume-be
75aeefde65 Merge pull request #36 from guillaume-be/question_answering_long_contexts
Updated QA allowing for long contexts
2020-05-15 20:25:28 +00:00
Guillaume B
32d0f7e221 Updated batching 2020-05-15 22:10:32 +02:00
Guillaume B
eee7ed4d42 Updated QA allowing for long contexts 2020-05-15 22:01:52 +02:00
guillaume-be
16753ea8fb
Merge pull request #33 from guillaume-be/token_entity_parsing
Token entity parsing
2020-05-13 15:11:09 +00:00
Guillaume B
3945416192 Updated token consolidation avoiding copy 2020-05-11 23:24:33 +02:00
Guillaume B
b31a569e50 Added documentation 2020-05-11 22:05:11 +02:00
Guillaume B
0bbb47d1db Added options for label consolidation for sub tokens 2020-05-11 16:40:25 +02:00
Guillaume B
6d61074f7f Updated consolidation and documentation 2020-05-10 12:27:42 +02:00
Guillaume B
1a9e315edf Added sub-token consolidation 2020-05-10 12:05:13 +02:00
Guillaume B
9d3a944051 Updated token classification pipeline to use next tokenization features, reference to original text 2020-05-10 11:38:28 +02:00
guillaume-be
705489169f
Merge pull request #32 from proycon/offsets
Adapted to API changes in offset-aware rust-tokenizer
2020-05-10 08:30:47 +00:00
Guillaume B
a95dd15e2d Updated dependencies
Bumped version number

Bumped version number
2020-05-10 10:15:59 +02:00
Guillaume B
bb94510075 Updated integration tests and formatting 2020-05-09 22:21:58 +02:00
Maarten van Gompel
e9c55e29b7 Adapted to API changes in offset-aware rust-tokenizer (guillaume-be/rust-tokenizers#14 , guillaume-be/rust-tokenizers#19) 2020-05-09 20:16:42 +02:00
guillaume-be
ddb90c7199
Merge pull request #31 from guillaume-be/generic_sequence_classification
Generic sequence classification
2020-05-07 17:57:20 +00:00
Guillaume B
1de940932f Addition to Electra to generic pipeline options and to token classification 2020-05-07 18:11:15 +02:00
Guillaume B
0489adafbc Shared generic pipeline components (config, tokenziers) moved to common module 2020-05-07 18:03:25 +02:00
Guillaume B
053413fcbc Addition of generic classification model (following generic token classification pattern) 2020-05-07 17:26:20 +02:00
guillaume-be
3af309cbac
Merge pull request #29 from proycon/tokclasspipeline
More generic token classification pipeline supporting multiple models
2020-05-07 14:30:22 +00:00
Maarten van Gompel
b840cbf57a Removed word_index and continuation for now (and the decoding part responsible for it), will be reintroduced once the tokenizers provide the offset information (guillaume-be/rust-tokenizers#14), issue #29 2020-05-07 14:21:22 +02:00
Maarten van Gompel
e044a9cbd9 Implemented a leaner NERModel that simply delegates to the new token_classification pipeline, but retains all backward-compatibility #29 (+ fixed tests and added docs) 2020-05-07 11:44:55 +02:00
Maarten van Gompel
bb23326ad0 Removed character offset computation to align with the original text post-prediction (also removes the 'space' attribute for now), should be solved at tokenisation time instead #29 2020-05-07 11:44:32 +02:00
Maarten van Gompel
e023420014 Removed unneeded LabelMapping abstraction and adapted DistilBert's maps to use i64 instead of i32 #29 2020-05-07 11:44:32 +02:00
Maarten van Gompel
ff0e8e2504 Implementing a more generic token classification pipeline with support for multiple model-types and less NER-centric naming #29 2020-05-07 11:44:32 +02:00
guillaume-be
18b43c1093
Merge pull request #28 from guillaume-be/updated_gpt2_decoding
Updated dependencies to use latest tokenization crate version
2020-05-03 19:25:09 +00:00
Guillaume B
1ddd1d9f51 Updated dependencies to use latest tokenization crate version 2020-05-03 20:52:24 +02:00
guillaume-be
d3a6a204dc
Merge pull request #26 from guillaume-be/electra_implementation
Electra implementation
2020-05-03 13:42:17 +00:00
Guillaume B
83e43ffcd5 Updated resource list 2020-05-03 14:59:13 +02:00
Guillaume B
139eecace7 updated documentation 2020-05-03 13:44:49 +02:00
Guillaume B
c6f5cdd859 Updated documentation 2020-05-03 13:37:18 +02:00
Guillaume B
9f608f4374 Addition of tests for Electra 2020-05-03 11:47:16 +02:00
Guillaume B
029d4bd47c Addition of Electra resources 2020-05-03 10:05:32 +02:00
Guillaume B
5a1c1ae7a0 Addition of ElectraDiscriminator 2020-05-03 09:46:59 +02:00
Guillaume B
1c1f91bcdf Merge remote-tracking branch 'remotes/origin/master' into electra_implementation 2020-05-02 13:58:27 +02:00
guillaume-be
9f4afc62ac
Merge pull request #25 from guillaume-be/additional_model_downloads
Additional model downloads
2020-05-02 11:52:35 +00:00
Guillaume B
cc8e02f03d Updated README 2020-05-02 11:25:19 +02:00
Guillaume B
b00bd8f97f Added additional GPT2 resources and license information 2020-05-02 10:43:53 +02:00
Guillaume B
75ed1f864b Updated resources to use cloudfront endpoints 2020-05-02 10:04:02 +02:00
Guillaume B
1cfef470d7 Updated version number 2020-05-02 09:57:37 +02:00
Guillaume B
e30f8a6b11 Updated cache directory setting 2020-05-02 09:12:26 +02:00
Guillaume B
259d30f58d Added possibility to define cache directory 2020-05-02 09:07:43 +02:00
Guillaume B
2c25dc1650 Addition of ElectraForTokenClassification 2020-05-01 10:00:47 +02:00
Guillaume B
4334fa1758 Addition of ElectraForMaskedLM 2020-05-01 08:47:52 +02:00
Guillaume B
5bec2548c1 Addition of Electra generator and discriminator heads 2020-04-29 18:59:37 +02:00
Guillaume B
45eeb7ae5b ElectraModel implementation, weights loaded, forward pass 2020-04-29 18:30:36 +02:00