Cross platform C++ library focusing on optimized machine translation on the consumer-grade device.
Go to file
Andre Barbosa 63120c174e
QualityEstimation: Preliminary Implementation (#197)
Unifies quality estimation with an interface, refactors previously available
quality scores to fit this interface. Adds a new class of  model with Logistic
Regression powering the predictions as an implementation of said interface. 
QE now provides annotations on words using subwords to word rule-based 
algorithms working with space characters. 

QualityEstimation
-----------------

Implementations of QE are bound together by a `QualityEstimator`
Interface. 

1. The log-probabilities from the machine-translation model re-interpreted
   as quality scores are crafted as an implementation of QualityEstimator.

2. A Logistic-Regression based model is added. This class of models is
   trained supervised with scores labeled by a human annotator.
   Handcrafted features - number of words, log probs from MT model and 
   statistics over the sequence are used to generate the numeric features.
   LogisticRegressor, Matrix (to hold features) are added.

The creation of an instance is switched by the `AlignedMemory` supplied
(be it loaded from the file-system or supplied as a parameter). An empty
AlignedMemory leads to quality scores from NMT while supplying weights
of a trained logistic-regression model in binary format as the contents
lead to an additional pass through the said model to provide more
refined scores.

Both the above now transform subwords into "words" using a heuristic
algorithm, scanning for spaces. This allows the client to work with "words"
to denote quality instead of subwords, as the former is more sensible to
the user.

Testing
-------

1. BRT now has two new test apps to check the QE outputs in text
  (covers subword to words) and numbers domain (covers quality scores).
  These are tested with en-et models for which QualityEstimation is
  available now, on a new input to avoid architecture/compiler issues.
2. Unit test for LogisticRegression model is added.


Docs
----

Doxygen now supports MathJax properly to render explanations for
Logistic Regressions' reductions in place to make computation more
efficient correctly.

Co-authored-by: Felipe C. Dos Santos <felipe.santos.k@gmail.com>
Co-authored-by: Jerin Philip <jerinphilip@live.in>
2021-09-16 16:28:40 +01:00
.circleci Circle CI wasm artifacts for non-wormhole builds 2021-08-31 17:01:52 +02:00
.github/workflows Add a clang-tidy run (#214) 2021-08-13 16:26:44 +01:00
3rd_party Updated marian submodule to latest commit of master 2021-08-27 09:07:06 +02:00
app Change ResponseBuilder to accept callback instead of future (#142) 2021-07-05 14:51:01 +01:00
bergamot-translator-tests@53c6e42a97 QualityEstimation: Preliminary Implementation (#197) 2021-09-16 16:28:40 +01:00
cmake CMake fixes: Generate project.h in binary dir, fix GetVersionFromFile for use as submodule. (#193) 2021-06-09 10:12:00 +01:00
doc QualityEstimation: Preliminary Implementation (#197) 2021-09-16 16:28:40 +01:00
scripts/ci Enabling ccache on github builds for Ubuntu (#95) 2021-05-17 11:42:47 +01:00
src QualityEstimation: Preliminary Implementation (#197) 2021-09-16 16:28:40 +01:00
wasm Wasm test page using web workers now (#218) 2021-08-26 15:22:52 +02:00
.clang-format Adding clang-format and updating existing sources to adhere (#151) 2021-05-19 21:50:21 +01:00
.clang-format-ignore Adding clang-format and updating existing sources to adhere (#151) 2021-05-19 21:50:21 +01:00
.clang-tidy Add a clang-tidy run (#214) 2021-08-13 16:26:44 +01:00
.gitignore QualityEstimation: Preliminary Implementation (#197) 2021-09-16 16:28:40 +01:00
.gitmodules Moving small tests to GitHub CI (#93) 2021-04-16 11:58:53 +01:00
BERGAMOT_VERSION Corrected the version number 2021-05-18 12:17:56 +02:00
build-wasm.sh Circle CI wasm artifacts for non-wormhole builds 2021-08-31 17:01:52 +02:00
CMakeLists.txt Wasm builds without SharedArrayBuffer 2021-08-27 09:07:06 +02:00
Doxyfile.in QualityEstimation: Preliminary Implementation (#197) 2021-09-16 16:28:40 +01:00
LICENSE Initial commit 2020-10-19 13:49:38 +02:00
README.md Added build instructions to run on other browsers 2021-08-11 13:28:15 +02:00
run-clang-format.py Adding clang-format and updating existing sources to adhere (#151) 2021-05-19 21:50:21 +01:00

Bergamot Translator

CircleCI badge

Bergamot translator provides a unified API for (Marian NMT framework based) neural machine translation functionality in accordance with the Bergamot project that focuses on improving client-side machine translation in a web browser.

Build Instructions

Build Natively

Create a folder where you want to build all the artifacts (build-native in this case) and compile

mkdir build-native
cd build-native
cmake ../
make -j2

Build WASM

Prerequisite

Building on wasm requires Emscripten toolchain. It can be downloaded and installed using following instructions:

  • Get the latest sdk: git clone https://github.com/emscripten-core/emsdk.git
  • Enter the cloned directory: cd emsdk
  • Install the lastest sdk tools: ./emsdk install 2.0.9
  • Activate the latest sdk tools: ./emsdk activate 2.0.9
  • Activate path variables: source ./emsdk_env.sh

Compile

To build a version that translates with higher speeds on Firefox Nightly browser, follow these instructions:

  1. Create a folder where you want to build all the artifacts (build-wasm in this case) and compile

    mkdir build-wasm
    cd build-wasm
    emcmake cmake -DCOMPILE_WASM=on ../
    emmake make -j2
    

    The wasm artifacts (.js and .wasm files) will be available in the build directory ("build-wasm" in this case).

  2. Enable SIMD Wormhole via Wasm instantiation API in generated artifacts

    bash ../wasm/patch-artifacts-enable-wormhole.sh
    

To build a version that runs on all browsers (including Firefox Nightly) but translates slowly, follow these instructions:

  1. Create a folder where you want to build all the artifacts (build-wasm in this case) and compile
    mkdir build-wasm
    cd build-wasm
    emcmake cmake -DCOMPILE_WASM=on -DWORMHOLE=off ../
    emmake make -j2
    

Recompiling

As long as you don't update any submodule, just follow Compile steps.
If you update a submodule, execute following command in repository root folder before executing Compile steps.

git submodule update --init --recursive

How to use

Using Native version

The builds generate library that can be integrated to any project. All the public header files are specified in src folder.
A short example of how to use the APIs is provided in app/main.cpp file.

Using WASM version

Please follow the README inside the wasm folder of this repository that demonstrates how to use the translator in JavaScript.