mirror of https://github.com/browsermt/bergamot-translator.git synced 2024-09-11 05:35:33 +03:00

Cross platform C++ library focusing on optimized machine translation on the consumer-grade device.

cpp cross-platform emscripten machine-translation neural-machine-translation neural-networks python starred-browsermt-repo starred-repo wasm webassembly

Go to file

Jerin Philip 44a44fa156 CMake build with submodule recursive clones		2021-02-17 11:48:00 +00:00
.github/workflows	CI scripts: master -> main	2021-01-23 14:39:08 +00:00
3rd_party	Updates: marian-dev, ssplit for marian-decoder-new	2021-02-12 14:23:24 +00:00
app	Comments and lazy stuff to response	2021-02-16 17:00:53 +00:00
doc	Unified api draft (#1 )	2020-10-29 09:17:32 +01:00
src	Improving Batcher error message with new option names	2021-02-17 01:05:20 +00:00
.gitignore	Updates: marian-dev, ssplit for marian-decoder-new	2021-02-12 14:23:24 +00:00
.gitmodules	Added "browsermt/marian-dev" as submodule	2020-11-09 12:02:51 +01:00
CMakeLists.txt	CMake build with submodule recursive clones	2021-02-17 11:48:00 +00:00
LICENSE	Initial commit	2020-10-19 13:49:38 +02:00
README.md	Updates to README with option changes	2021-02-17 01:12:30 +00:00

README.md

Bergamot Translator

Bergamot translator provides a unified API for (Marian NMT framework based) neural machine translation functionality in accordance with the Bergamot project that focuses on improving client-side machine translation in a web browser.

Build Instructions

$ git clone https://github.com/browsermt/bergamot-translator
$ cd bergamot-translator
$ mkdir build
$ cd build
$ cmake ../
$ make -j

Usage

Bergamot Translator

The build will generate the library that can be linked to any project. All the public header files are specified in src folder.

`service-cli`

An executable service-cli is generated by the build in the app folder and provides command line interface to the underlying translator. The models required to run the command-line are available at data.statmt.org/bergamot/models/. The following example uses an English to German tiny11 student model, available at:

data.statmt.org/bergamot/models/deen/ende.student.tiny11.tar.gz

MODEL_DIR=... # path to where the model-files are.
ARGS=(
    -m $MODEL_DIR/model.intgemm.alphas.bin # Path to model file.
    --vocabs 
        $MODEL_DIR/vocab.deen.spm # source-vocabulary
        $MODEL_DIR/vocab.deen.spm # target-vocabulary

    # The following increases speed through one-best-decoding, shortlist and quantization.
    --beam-size 1 --skip-cost --shortlist $MODEL_DIR/lex.s2t.gz 50 50 --int8shiftAlphaAll 

    # Number of CPU threads (workers to launch). Parallelizes over cores and improves speed.
    # A value of 0 allows a path with no worker thread-launches and a single-thread.
    --cpu-threads 4

    # Maximum size of a sentence allowed. If a sentence is above this length,
    # it's broken into pieces of less than or equal to this size.
    --max-length-break 1024  

    # Maximum number of tokens that can be fit in a batch. The optimal value 
    # for the parameter is dependant on hardware and can be obtained by running
    # with variations and benchmarking.
    --mini-batch-words 1024 

    # Three modes are supported
    #   - sentence: One sentence per line
    #   - paragraph: One paragraph per line.
    #   - wrapped_text: Paragraphs are separated by empty line.
    --ssplit-mode paragraph 
)

./app/service-cli "${ARGS[@]}" < path-to-input-file