Cross platform C++ library focusing on optimized machine translation on the consumer-grade device.
Go to file
Abhishek Aggarwal a06530e92b Fixed a bug in TranslationModel class
- Using bergamot-translator as a library fails at run time because
   necessary parser options are not set
2021-02-11 16:14:03 +01:00
.github/workflows CI scripts: master -> main 2021-01-23 14:39:08 +00:00
3rd_party Updated ssplit submodule 2021-02-10 11:27:16 +01:00
app Changed translate() API from non-blocking to blocking 2021-02-10 11:15:16 +01:00
doc Unified api draft (#1) 2020-10-29 09:17:32 +01:00
src Fixed a bug in TranslationModel class 2021-02-11 16:14:03 +01:00
.gitignore Adding vim temporary files to .gitignore 2021-01-22 11:31:20 +00:00
.gitmodules Updated ssplit submodule to a different repository 2021-02-10 10:33:01 +01:00
CMakeLists.txt Set cmake options of marian properly for this project 2021-02-11 15:42:18 +01:00
LICENSE Initial commit 2020-10-19 13:49:38 +02:00
README.md Updating README.md with instructions to run service-cli 2021-01-22 11:51:49 +00:00

Bergamot Translator

Bergamot translator provides a unified API for (Marian NMT framework based) neural machine translation functionality in accordance with the Bergamot project that focuses on improving client-side machine translation in a web browser.

Build Instructions

$ git clone https://github.com/browsermt/bergamot-translator
$ cd bergamot-translator
$ mkdir build
$ cd build
$ cmake ../
$ make -j

Usage

Bergamot Translator

The build will generate the library that can be linked to any project. All the public header files are specified in src folder.

service-cli

An executable service-cli is generated by the build in the app folder and provides command line interface to the underlying translator. The models required to run the command-line are available at data.statmt.org/bergamot/models/. The following example uses an English to German tiny11 student model, available at:

MODEL_DIR=... # path to where the model-files are.
ARGS=(
    -m $MODEL_DIR/model.intgemm.alphas.bin # Path to model file.
    --vocabs 
        $MODEL_DIR/vocab.deen.spm # source-vocabulary
        $MODEL_DIR/vocab.deen.spm # target-vocabulary

    # The following increases speed through one-best-decoding, shortlist and quantization.
    --beam-size 1 --skip-cost --shortlist $MODEL_DIR/lex.s2t.gz 50 50 --int8shiftAlphaAll 

    # Number of CPU threads (workers to launch). Parallelizes over cores and improves speed.
    --cpu-threads 4

    # Hyperparameters of how many tokens to be accounted for in a batch and maximum tokens in a sentence.
    --max-input-sentence-tokens 1024 --max-input-tokens 1024 

    # Three modes are supported
    #   - sentence: One sentence per line
    #   - paragraph: One paragraph per line.
    #   - wrapped text: Paragraphs are separated by empty line.

    --ssplit-mode paragraph 

)

./app/service-cli "${ARGS[@]}" < path-to-input-file