rust-bert/examples
Matt Weber ba57704c6f
Introduce in-memory resource abstraction (#375)
* Introduce in-memory resource abstraction

This follows from discussion in #366.

The goal of this change is to allow for weights to be loaded from a copy
of `rust_model.ot` that is already present in memory. There are two ways
in which that data might be present:

1. As a `HashMap<String, Tensor>` from previous interaction with `tch`
2. As a contiguous buffer of the file data

One or the other mechanism might be preferable depending on how user
code is using the model data. In some sense, implementing a provider
based on the second option is more of a convenience method for the user
to avoid the `tch::nn::VarStore::load_from_stream` interaction.

I've changed the definition of the `ResourceProvider` trait to require
that it be both `Send` and `Sync`. There are currently certain contexts
where `dyn ResourceProvider + Send` is required, but in theory before
this change an implementation might not be `Send` (or `Sync`). The
existing providers are both `Send` and `Sync`, and it seems reasonable
(if technically incorrect) for user code to assume this to be true. I
don't see a downside to making this explicit, but that part of this
change might be better suited for separate discussion. I am not trying
to sneak it in.

The `enum Resource` data type is used here as a means to abstract over
the possible ways a `ResourceProvider` might represent an underlying
resource. Without this, it would be necessary to either call different
trait methods until one succeeded or implement `as_any` and downcast in
order to implement `load_weights` similarly to how it is now. Those
options seemed less preferable to creating a wrapper.

While it would be possible to replace all calls to `get_local_path` with
the `get_resource` API, removal of the existing function would be a very
big breaking change. As such, this change also introduces
`RustBertError::UnsupportedError` to allow for the different methods to
coexist. An alternative would be for the new `ResourceProvider`s to
write their resources to a temporary disk location and return an
appropriate path, but that is counter to the purpose of the new
`ResourceProvider`s and so I chose not to implement that.

* - Add `impl<T: ResourceProvider + ?Sized> ResourceProvider for Box<T>`
- Remove `Resource::NamedTensors`
- Change `BufferResource` to contain a `&[u8]` rather than `Vec<u8>`

* Further rework proposal for resources

* Use mutable references and locks

* Make model resources mutable in tests/examples

* Remove unnecessary mutability and TensorResource references

* Add `BufferResource` example

---------

Co-authored-by: Guillaume Becquin <guillaume.becquin@gmail.com>
2023-05-26 18:23:28 +01:00
..
async-sentiment.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
buffer_resource.rs Introduce in-memory resource abstraction (#375) 2023-05-26 18:23:28 +01:00
codebert.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
conversation.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
generation_gpt2.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
generation_gpt_neo.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
generation_gptj.rs Generation traits simplification (#339) 2023-03-17 16:21:37 +00:00
generation_reformer.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
generation_xlnet.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
keyword_extraction.rs Keyword/Keyphrase extraction (#295) 2022-11-13 08:51:10 +00:00
masked_language.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
named_entities_recognition.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
natural_language_inference_deberta.rs Introduce in-memory resource abstraction (#375) 2023-05-26 18:23:28 +01:00
part_of_speech_tagging.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
question_answering_bert.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
question_answering_longformer.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
question_answering_squad.rs Updated documentation, cleaned examples, added integration tests 2021-06-06 13:01:33 +02:00
question_answering.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
sentence_embeddings_local.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
sentence_embeddings.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
sentiment_analysis_fnet.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
sentiment_analysis_sst2.rs Fixed Clippy warnings (#204) 2021-12-09 09:33:27 +01:00
sentiment_analysis.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
sequence_classification_multilabel.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
sequence_classification.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
summarization_bart.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
summarization_pegasus.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
summarization_prophetnet.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
summarization_t5.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
token_classification.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
translation_builder.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
translation_m2m100.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
translation_marian.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
translation_mbart.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
translation_t5.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00
zero_shot_classification.rs Tokenizer special token map update (#330) 2023-01-30 17:53:18 +00:00