guillaume-be
5d2b107e99
Keyword/Keyphrase extraction ( #295 )
...
* stop word tokenizer implementation
* - Addition of all-mini-lm-l6-v2
* initial implementation of keyword scorer
* Cosine Similarity keyword extraction
* Added lower case parsing from tokenizer config for sentence embeddings
* Initial draft of pipeline complete
* Addition of Maximal Marginal relevance scorer
* Addition of Max Sum scorer
* Lowercase and ngrams handling
* Improved n-gram handling
* Skip n-grams containing stopwords
* Fixed short sentence input and added documentation
* Updated documentation and defaults, added example
* Addition of tests for keywords extractions
* Updated changelog
* Fixed Clippy warnings
2022-11-13 08:51:10 +00:00
guillaume-be
c6771d3992
Update to tch=0.9.0
( #293 )
...
* Fixed short sentence input and added documentation
* Fixed Clippy warnings
* Updated CI Python version
* cleaner dim specification
2022-11-07 17:45:52 +00:00
guillaume-be
340be36ed9
Mixed resources ( #291 )
...
* - made `merges` resource optional for all pipelines
- allow mixing local and remote resources for pipelines
* Updated changelog
* Fixed Clippy warnings
2022-10-30 07:39:52 +00:00
guillaume-be
cce1e2707d
Prepare for 0.19 release ( #272 )
2022-07-25 06:36:02 +01:00
guillaume-be
a1595e6dfd
Updated sentence embeddings example ( #263 )
...
* Added conversion information for Distil-based sentence embedding models
* Fix Clippy warnings
2022-07-03 08:48:31 +01:00
Romain Leroux
4d8a298586
Add sbert implementation for inference ( #250 )
...
* Add sbert implementation for inference
* Fix clippy warnings
* Refactor sentence embeddings into a dedicated pipeline
* Add output_attentions and output_hidden_states to T5Config
* Add sbert implementation for inference
* Fix clippy warnings
* Refactor sentence embeddings into a dedicated pipeline
* Add output_attentions and output_hidden_states to T5Config
* Improve sentence_embeddings implementation
* Dedicated tokenizer config for strip_accents and add_prefix_space
* Rename forward to encode_as_tensor
* Remove _conf from Dense layer
* Add sentence embeddings docs
* Addition of remote resources and tests update
* Merge feature branch and fix doctests
* Add SentenceEmbeddingsBuilder<Remote> and improve remote resources
* Use tch::no_grad in sentence embeddings
* Updated changelog, registration of sentence embeddings integration tests
Co-authored-by: Guillaume Becquin <guillaume.becquin@gmail.com>
2022-06-21 20:24:09 +01:00
Jonas Hedman Engström
9b22c2482a
Refactor: Feature gate remote resource ( #223 )
...
* get_local_path as trait LocalPathProvider
* Remove config default impls
* Feature gate RemoteResource
* translation_builder refactoring to have remote fetching grouped
* Include dirs crate in remote feature gate
* Examples fixes
* Benches fixes
* Tests fix
* Remove Box from constructor parameters
* Fix examples no-Box
* Fix benches no-Box
* Fix tests no-Box
* Fix doc comment code
* Fix documentation `Resource` -> `ResourceProvider`
* moved remote local at same level
* moved ResourceProvider to resources mod
Co-authored-by: Guillaume Becquin <guillaume.becquin@gmail.com>
2022-02-25 21:24:03 +00:00
Flix
23c5d9112a
add async example, documentation and fix clippy ( #217 )
2022-01-30 11:51:58 +00:00
guillaume-be
e71712816e
Addition of DeBERTa MNLI example
2021-12-12 20:14:36 +01:00
guillaume-be
4175942cc4
Fixed Clippy warnings ( #204 )
2021-12-09 09:33:27 +01:00
Guillaume Becquin
d84b2819d9
Merge remote-tracking branch 'origin/master' into entity_consolidation
...
# Conflicts:
# src/pipelines/ner.rs
2021-11-20 11:03:05 +01:00
Guillaume Becquin
61e5d2d563
Addition of FNet model resource for sentiment analysis and registration in pipelines
2021-11-13 09:39:57 +01:00
Guillaume Becquin
73f017d0f7
Merge remote-tracking branch 'origin/master' into kind_reword
...
# Conflicts:
# Cargo.toml
# src/t5/layer_norm.rs
2021-11-09 16:00:21 +01:00
sftse
e297f395af
Make generics less generic. ( #189 )
...
* Make generics less generic.
Fix examples, tests and docs.
* Address outstanding issues
* Take less ownership where possible
* Fixup some clippy warnings
* Updated tokenizer crate version
Co-authored-by: Guillaume Becquin <guillaume.becquin@gmail.com>
2021-11-07 09:42:56 +01:00
Guillaume Becquin
de89e2d165
Updated XLNet for FP16 compatibility
2021-10-06 17:52:25 +02:00
Guillaume Becquin
889f509e6c
Updated T5 for FP16 compatibility
2021-10-05 18:23:34 +02:00
Guillaume Becquin
fc2b2972f9
Updated Albert for Half precision support
2021-09-30 16:04:42 +02:00
Guillaume Becquin
72fabcdbd1
Updated GPT-Neo, working half precision greedy generation
2021-09-26 11:20:05 +02:00
guillaume-be
cb6bc34eb4
Updated borrowing for XLNet, integration tests
2021-08-20 11:08:37 +02:00
guillaume-be
3ff5199376
Updated offsets fixing overlapping spans
2021-08-18 09:07:23 +02:00
guillaume-be
9cadc5d15f
Tested and fixed POS tagging on long inputs (requiring breaking input in multiple features)
2021-08-17 09:53:26 +02:00
Guillaume B
466c6b6922
Updated doctests
2021-07-28 18:10:20 +02:00
Guillaume B
ce90d8901d
Updated examples and integration tests
2021-07-11 11:13:00 +02:00
Guillaume B
89b3a327fa
Moved builder to own module, simplified Marian resource retrieval
2021-07-11 10:08:27 +02:00
Guillaume B
5dc7f33c39
Added MBart50 and M2M100 to supported translation models
2021-07-10 11:34:51 +02:00
Guillaume B
3b72b7cc9b
Added example for translation builder, language checks for MBart and M2M100
2021-07-10 10:32:31 +02:00
Guillaume B
450fe0d533
Merge branch 'm2m100_implementation' into translation_rework
2021-07-09 15:41:28 +02:00
Guillaume B
85c05cbe13
Updated Marian translation example
2021-07-07 15:54:28 +02:00
Guillaume B
1c375a817e
Use of new language enum in TranslationModel
2021-07-04 12:56:34 +02:00
Guillaume B
58eef0785f
Merged master changes
2021-06-28 18:57:27 +02:00
Guillaume B
0b2e339e87
Merge remote-tracking branch 'origin/master' into m2m100_implementation
...
# Conflicts:
# Cargo.toml
2021-06-28 18:53:46 +02:00
Guillaume B
2f6b26bb88
Addition of tests for M2M100
2021-06-27 18:18:41 +02:00
Guillaume B
f024350dee
Fixed various documentation typos
2021-06-26 11:07:17 +02:00
Guillaume B
9a04d1527a
Working example for M2M100 Translation
2021-06-26 10:49:42 +02:00
Guillaume B
f29e02ecbc
Addition of TextOutput and IndicesOutput, updated pipelines and tests
2021-06-16 18:15:22 +02:00
Guillaume B
c40a218b37
Initial implementation of scores output
2021-06-15 19:09:20 +02:00
Guillaume B
5907b7d954
Updated documentation, cleaned examples, added integration tests
2021-06-06 13:01:33 +02:00
Guillaume B
d401fea891
Updated tests and docstrings
2021-06-03 10:17:52 +02:00
Guillaume B
a9518c94fa
Addition of GPT-Neo 2.7B pretrained weights, added example, updated changelog
2021-05-06 16:57:12 +02:00
Guillaume B
71c196b0ce
Updated documentation, fixed Clippy warnings
2021-04-08 14:50:02 +02:00
Guillaume B
4c4ef41a80
Fixed Marian to be compatible with BART refactoring
2021-03-27 18:36:42 +01:00
Guillaume B
5e6b84a7a0
Updated BART embeddings for compatibility with Marian model
2021-03-27 18:03:45 +01:00
Guillaume B
32537da610
Fixed attention value calculation
2021-03-27 16:51:10 +01:00
Guillaume B
b6f722984b
updated attention reshape method
2021-03-27 16:05:06 +01:00
Guillaume B
c378b02bbe
BART refactoring (initial draft)
2021-03-27 15:55:25 +01:00
Guillaume B
d5321a8940
Updated README and documentation
2021-03-20 17:03:21 +01:00
Guillaume B
65da7afbb6
Update punctuation POS tags with low score
2021-03-15 16:41:00 +01:00
Guillaume B
02819c0a71
initial version of POS tagging pipeline
2021-03-12 09:30:21 +01:00
Guillaume B
6a6bd74533
Fixed Clippy warnings
2021-02-21 08:56:36 +01:00
Guillaume B
545d52ec9d
Longformer integration tests
2021-02-16 10:16:09 +01:00