mirror of
https://github.com/browsermt/bergamot-translator.git
synced 2024-10-26 05:43:59 +03:00
Cross platform C++ library focusing on optimized machine translation on the consumer-grade device.
cppcross-platformemscriptenmachine-translationneural-machine-translationneural-networkspythonstarred-browsermt-repostarred-repowasmwebassembly
.github/workflows | ||
3rd_party | ||
app | ||
doc | ||
src | ||
.gitignore | ||
.gitmodules | ||
CMakeLists.txt | ||
LICENSE | ||
README.md |
Bergamot Translator
Bergamot translator provides a unified API for (Marian NMT framework based) neural machine translation functionality in accordance with the Bergamot project that focuses on improving client-side machine translation in a web browser.
Build Instructions
$ git clone https://github.com/browsermt/bergamot-translator
$ cd bergamot-translator
$ mkdir build
$ cd build
$ cmake ../
$ make -j
Usage
Bergamot Translator
The build will generate the library that can be linked to any project. All the public header files are specified in src
folder.
service-cli
An executable service-cli
is generated by the build in the app
folder and
provides command line interface to the underlying translator. The models
required to run the command-line are available at
data.statmt.org/bergamot/models/.
The following example uses an English to German tiny11 student model, available
at:
MODEL_DIR=... # path to where the model-files are.
ARGS=(
-m $MODEL_DIR/model.intgemm.alphas.bin # Path to model file.
--vocabs
$MODEL_DIR/vocab.deen.spm # source-vocabulary
$MODEL_DIR/vocab.deen.spm # target-vocabulary
# The following increases speed through one-best-decoding, shortlist and quantization.
--beam-size 1 --skip-cost --shortlist $MODEL_DIR/lex.s2t.gz 50 50 --int8shiftAlphaAll
# Number of CPU threads (workers to launch). Parallelizes over cores and improves speed.
--cpu-threads 4
# Hyperparameters of how many tokens to be accounted for in a batch and maximum tokens in a sentence.
--max-input-sentence-tokens 1024 --max-input-tokens 1024
# Three modes are supported
# - sentence: One sentence per line
# - paragraph: One paragraph per line.
# - wrapped text: Paragraphs are separated by empty line.
--ssplit-mode paragraph
)
./app/service-cli "${ARGS[@]}" < path-to-input-file