Cross platform C++ library focusing on optimized machine translation on the consumer-grade device.
Go to file
Jerin Philip b86c76b004
Faithful to source-structure translation (#115)
* First draft of faithful translation

* Comments explaining pre and post

* Comments on response_builder

* Updating bergamot-translator-tests with new outputs

* Cosmetic changes in response target text construction

* Replacing &(x[0]) -> x.data() to avoid illegal indices

* Removing nullptr given both branches init pointer with legal values

* pre, post -> gap(i) addressing review comments

Functions which were pre and post before are subsumed by gap(i), and the
algorithm in ResponseBuilder adjusted to fix.

`x = nullptr` is back, should be harmless.

* Updating brt with paragraph outputs

* Bumping brt with updated outputs, buffer text at begin as well

* Bumping BRT with sync after bytearray collapse merge

* Pointing BRT to main after merge

Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
2021-05-06 16:19:27 +01:00
.github/workflows Making bytearray a commandline switch (#127) 2021-05-06 00:26:03 +01:00
3rd_party Full windows support with ssplit from browsermt, not a fork (#109) 2021-05-01 00:29:23 +01:00
app Making bytearray a commandline switch (#127) 2021-05-06 00:26:03 +01:00
bergamot-translator-tests@9209aa51e7 Faithful to source-structure translation (#115) 2021-05-06 16:19:27 +01:00
doc Marian submodule update (#74) 2021-04-01 16:29:02 +01:00
src Faithful to source-structure translation (#115) 2021-05-06 16:19:27 +01:00
wasm Extension desired changes (#129) 2021-05-04 14:25:12 +02:00
.gitignore Merge remote-tracking branch 'origin/wasm-integration' into jp/absorb-batch-translator 2021-02-17 13:08:58 +00:00
.gitmodules Moving small tests to GitHub CI (#93) 2021-04-16 11:58:53 +01:00
BERGAMOT_VERSION Marian compatible documentation tooling (#67) 2021-03-24 17:00:53 +00:00
CMakeLists.txt Full windows support with ssplit from browsermt, not a fork (#109) 2021-05-01 00:29:23 +01:00
Doxyfile.in Marian compatible documentation tooling (#67) 2021-03-24 17:00:53 +00:00
LICENSE Initial commit 2020-10-19 13:49:38 +02:00
README.md Improved wasm scripts and README (#128) 2021-05-04 10:18:45 +01:00

Bergamot Translator

Bergamot translator provides a unified API for (Marian NMT framework based) neural machine translation functionality in accordance with the Bergamot project that focuses on improving client-side machine translation in a web browser.

Build Instructions

Build Natively

  1. Clone the repository using these instructions:

    git clone https://github.com/browsermt/bergamot-translator
    cd bergamot-translator
    
  2. Compile

    Create a folder where you want to build all the artifacts (build-native in this case) and compile in that folder

    mkdir build-native
    cd build-native
    cmake ../
    make -j
    

Build WASM

Compiling for the first time

  1. Download and Install Emscripten using following instructions

    • Get the latest sdk: git clone https://github.com/emscripten-core/emsdk.git
    • Enter the cloned directory: cd emsdk
    • Install the lastest sdk tools: ./emsdk install latest
    • Activate the latest sdk tools: ./emsdk activate latest
    • Activate path variables: source ./emsdk_env.sh
  2. Clone the repository using these instructions:

    git clone https://github.com/browsermt/bergamot-translator
    cd bergamot-translator
    
  3. Download files (only required if you want to perform inference using build artifacts)

    It packages the vocabulary files into wasm binary, which is required only if you want to perform inference. The compilation commands will preload these files in Emscriptens virtual file system.

    If you want to package bergamot project specific files, please follow these instructions:

    git clone --depth 1 --branch main --single-branch https://github.com/mozilla-applied-ml/bergamot-models
    mkdir models
    cp -rf bergamot-models/prod/* models
    gunzip models/*/*
    find models \( -type f -name "model*" -or -type f -name "lex*" \) -delete
    
  4. Compile

    1. Create a folder where you want to build all the artefacts (build-wasm in this case)

      mkdir build-wasm
      cd build-wasm
      
    2. Compile the artefacts

      • If you want to package files into wasm binary then execute following commands (Replace FILES_TO_PACKAGE with the directory containing all the files to be packaged)

        emcmake cmake -DCOMPILE_WASM=on -DPACKAGE_DIR=FILES_TO_PACKAGE ../
        emmake make -j
        

        e.g. If you want to package bergamot project specific files (downloaded using step 3 above) then replace FILES_TO_PACKAGE with ../models

      • If you don't want to package any file into wasm binary then execute following commands:

        emcmake cmake -DCOMPILE_WASM=on ../
        emmake make -j
        

      The wasm artifacts (.js and .wasm files) will be available in the build directory ("build-wasm" in this case).

    3. Enable SIMD Wormhole via Wasm instantiation API in generated artifacts

      bash ../wasm/patch-artifacts-enable-wormhole.sh
      

Recompiling

As long as you don't update any submodule, just follow steps in 4.ii and 4.iii to recompile.
If you update a submodule, execute following command before executing steps in 4.ii and 4.iii to recompile.

git submodule update --init --recursive

How to use

Using Native version

The builds generate library that can be integrated to any project. All the public header files are specified in src folder.
A short example of how to use the APIs is provided in app/main.cpp file.

Using WASM version

Please follow the README inside the wasm folder of this repository that demonstrates how to use the translator in JavaScript.