bergamot-translator/README.md

61 lines
2.1 KiB
Markdown
Raw Normal View History

# Bergamot Translator
Bergamot translator provides a unified API for ([Marian NMT](https://marian-nmt.github.io/) framework based) neural machine translation functionality in accordance with the [Bergamot](https://browser.mt/) project that focuses on improving client-side machine translation in a web browser.
## Build Instructions
```
$ git clone https://github.com/browsermt/bergamot-translator
$ cd bergamot-translator
$ mkdir build
$ cd build
$ cmake ../
$ make -j
```
## Usage
### Bergamot Translator
The build will generate the library that can be linked to any project. All the public header files are specified in `src` folder.
### `service-cli`
An executable `service-cli` is generated by the build in the `app` folder and
provides command line interface to the underlying translator. The models
required to run the command-line are available at
[data.statmt.org/bergamot/models/](http://data.statmt.org/bergamot/models/).
The following example uses an English to German tiny11 student model, available
at:
* [data.statmt.org/bergamot/models/deen/ende.student.tiny11.tar.gz](http://data.statmt.org/bergamot/models/deen/ende.student.tiny11.tar.gz)
```bash
MODEL_DIR=... # path to where the model-files are.
ARGS=(
-m $MODEL_DIR/model.intgemm.alphas.bin # Path to model file.
--vocabs
$MODEL_DIR/vocab.deen.spm # source-vocabulary
$MODEL_DIR/vocab.deen.spm # target-vocabulary
# The following increases speed through one-best-decoding, shortlist and quantization.
--beam-size 1 --skip-cost --shortlist $MODEL_DIR/lex.s2t.gz 50 50 --int8shiftAlphaAll
# Number of CPU threads (workers to launch). Parallelizes over cores and improves speed.
--cpu-threads 4
# Hyperparameters of how many tokens to be accounted for in a batch and maximum tokens in a sentence.
--max-input-sentence-tokens 1024 --max-input-tokens 1024
# Three modes are supported
# - sentence: One sentence per line
# - paragraph: One paragraph per line.
# - wrapped text: Paragraphs are separated by empty line.
--ssplit-mode paragraph
)
./app/service-cli "${ARGS[@]}" < path-to-input-file
```