rust-bert

mirror of https://github.com/guillaume-be/rust-bert.git synced 2024-08-16 16:10:25 +03:00

History

Matt Weber ba57704c6f Introduce in-memory resource abstraction (#375 ) * Introduce in-memory resource abstraction This follows from discussion in #366. The goal of this change is to allow for weights to be loaded from a copy of `rust_model.ot` that is already present in memory. There are two ways in which that data might be present: 1. As a `HashMap<String, Tensor>` from previous interaction with `tch` 2. As a contiguous buffer of the file data One or the other mechanism might be preferable depending on how user code is using the model data. In some sense, implementing a provider based on the second option is more of a convenience method for the user to avoid the `tch::nn::VarStore::load_from_stream` interaction. I've changed the definition of the `ResourceProvider` trait to require that it be both `Send` and `Sync`. There are currently certain contexts where `dyn ResourceProvider + Send` is required, but in theory before this change an implementation might not be `Send` (or `Sync`). The existing providers are both `Send` and `Sync`, and it seems reasonable (if technically incorrect) for user code to assume this to be true. I don't see a downside to making this explicit, but that part of this change might be better suited for separate discussion. I am not trying to sneak it in. The `enum Resource` data type is used here as a means to abstract over the possible ways a `ResourceProvider` might represent an underlying resource. Without this, it would be necessary to either call different trait methods until one succeeded or implement `as_any` and downcast in order to implement `load_weights` similarly to how it is now. Those options seemed less preferable to creating a wrapper. While it would be possible to replace all calls to `get_local_path` with the `get_resource` API, removal of the existing function would be a very big breaking change. As such, this change also introduces `RustBertError::UnsupportedError` to allow for the different methods to coexist. An alternative would be for the new `ResourceProvider`s to write their resources to a temporary disk location and return an appropriate path, but that is counter to the purpose of the new `ResourceProvider`s and so I chose not to implement that. * - Add `impl<T: ResourceProvider + ?Sized> ResourceProvider for Box<T>` - Remove `Resource::NamedTensors` - Change `BufferResource` to contain a `&[u8]` rather than `Vec<u8>` * Further rework proposal for resources * Use mutable references and locks * Make model resources mutable in tests/examples * Remove unnecessary mutability and TensorResource references * Add `BufferResource` example --------- Co-authored-by: Guillaume Becquin <guillaume.becquin@gmail.com>		2023-05-26 18:23:28 +01:00
..
async-sentiment.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
buffer_resource.rs	Introduce in-memory resource abstraction (#375 )	2023-05-26 18:23:28 +01:00
codebert.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
conversation.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
generation_gpt2.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
generation_gpt_neo.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
generation_gptj.rs	Generation traits simplification (#339 )	2023-03-17 16:21:37 +00:00
generation_reformer.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
generation_xlnet.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
keyword_extraction.rs	Keyword/Keyphrase extraction (#295 )	2022-11-13 08:51:10 +00:00
masked_language.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
named_entities_recognition.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
natural_language_inference_deberta.rs	Introduce in-memory resource abstraction (#375 )	2023-05-26 18:23:28 +01:00
part_of_speech_tagging.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
question_answering_bert.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
question_answering_longformer.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
question_answering_squad.rs	Updated documentation, cleaned examples, added integration tests	2021-06-06 13:01:33 +02:00
question_answering.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
sentence_embeddings_local.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
sentence_embeddings.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
sentiment_analysis_fnet.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
sentiment_analysis_sst2.rs	Fixed Clippy warnings (#204 )	2021-12-09 09:33:27 +01:00
sentiment_analysis.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
sequence_classification_multilabel.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
sequence_classification.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
summarization_bart.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
summarization_pegasus.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
summarization_prophetnet.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
summarization_t5.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
token_classification.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
translation_builder.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
translation_m2m100.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
translation_marian.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
translation_mbart.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
translation_t5.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00
zero_shot_classification.rs	Tokenizer special token map update (#330 )	2023-01-30 17:53:18 +00:00