mirror of https://github.com/guillaume-be/rust-bert.git synced 2024-09-11 12:55:34 +03:00

Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)

bart bert deep-learning electra gpt gpt-2 language-generation machine-learning ner nlp question-answering roberta rust rust-lang sentiment-analysis starred-guillaume-be-repo starred-repo transformer translation

Go to file

Guillaume B 6a3bfee4a3 Added pre-trained NER model to pipelines		2020-02-24 14:19:14 +01:00
examples	Added pre-trained NER model to pipelines	2020-02-24 14:19:14 +01:00
src	Added pre-trained NER model to pipelines	2020-02-24 14:19:14 +01:00
tests	Updated tokenizer dependency	2020-02-21 21:38:09 +01:00
utils	Added pre-trained NER model to pipelines	2020-02-24 14:19:14 +01:00
.gitignore	Added pre-trained NER model to pipelines	2020-02-24 14:19:14 +01:00
.travis.yml	Updated tokenizer dependency	2020-02-21 21:38:09 +01:00
Cargo.toml	Updated tokenizer dependency	2020-02-21 21:38:09 +01:00
LICENSE	Initial commit	2020-01-25 10:40:08 +01:00
README.md	Added RoBERTa Token classification and QA, updated README	2020-02-19 21:29:58 +01:00
requirements.txt	Added tensor conversion script (from tch-rs crate)	2020-02-10 22:37:52 +01:00

README.md

rust-bert

Rust native BERT implementation. Port of Huggingface's Transformers library, using the tch-rs crate and pre-processing from rust-tokenizers. Supports multithreaded tokenization and GPU inference.

The following models are currently implemented:

	DistilBERT	BERT	RoBERTa
Masked LM	✅	✅	✅
Sequence classification	✅	✅	✅
Token classification	✅	✅	✅
Question answering	✅	✅	✅
Multiple choices		✅	✅

An example for sentiment analysis classification is provided:

    let device = Device::cuda_if_available();
    let sentiment_classifier = SentimentClassifier::new(vocab_path,
                                                        config_path,
                                                        weights_path, device)?;
                                                        
    let input = [
        "Probably my all-time favorite movie, a story of selflessness, sacrifice and dedication to a noble cause, but it's not preachy or boring.",
        "This film tried to be too many things all at once: stinging political satire, Hollywood blockbuster, sappy romantic comedy, family values promo...",
        "If you like original gut wrenching laughter you will like this movie. If you are young or old then you will love this movie, hell even my mom liked it.",
    ];

    let output = sentiment_classifier.predict(input.to_vec());

(Example courtesy of IMDb (http://www.imdb.com))

Output:

[
    Sentiment { polarity: Positive, score: 0.9981985493795946 },
    Sentiment { polarity: Negative, score: 0.9927982091903687 },
    Sentiment { polarity: Positive, score: 0.9997248985164333 }
]

Setup

The model configuration and vocabulary are downloaded directly from Huggingface's repository.

The model weights need to be converter to a binary format that can be read by Libtorch (the original .pth files are pickled and cannot be used directly). A Python script for downloading the required files & running the necessary steps is provided.

Compile the package: cargo build --release
Download the model files & perform necessary conversions
- Set-up a virtual environment and install dependencies
- run the conversion script python /utils/download-dependencies.py. The dependencies will be downloaded to the user's home directory, under ~/rustbert
Run the example cargo run --release