63120c174e
Unifies quality estimation with an interface, refactors previously available quality scores to fit this interface. Adds a new class of model with Logistic Regression powering the predictions as an implementation of said interface. QE now provides annotations on words using subwords to word rule-based algorithms working with space characters. QualityEstimation ----------------- Implementations of QE are bound together by a `QualityEstimator` Interface. 1. The log-probabilities from the machine-translation model re-interpreted as quality scores are crafted as an implementation of QualityEstimator. 2. A Logistic-Regression based model is added. This class of models is trained supervised with scores labeled by a human annotator. Handcrafted features - number of words, log probs from MT model and statistics over the sequence are used to generate the numeric features. LogisticRegressor, Matrix (to hold features) are added. The creation of an instance is switched by the `AlignedMemory` supplied (be it loaded from the file-system or supplied as a parameter). An empty AlignedMemory leads to quality scores from NMT while supplying weights of a trained logistic-regression model in binary format as the contents lead to an additional pass through the said model to provide more refined scores. Both the above now transform subwords into "words" using a heuristic algorithm, scanning for spaces. This allows the client to work with "words" to denote quality instead of subwords, as the former is more sensible to the user. Testing ------- 1. BRT now has two new test apps to check the QE outputs in text (covers subword to words) and numbers domain (covers quality scores). These are tested with en-et models for which QualityEstimation is available now, on a new input to avoid architecture/compiler issues. 2. Unit test for LogisticRegression model is added. Docs ---- Doxygen now supports MathJax properly to render explanations for Logistic Regressions' reductions in place to make computation more efficient correctly. Co-authored-by: Felipe C. Dos Santos <felipe.santos.k@gmail.com> Co-authored-by: Jerin Philip <jerinphilip@live.in> |
||
---|---|---|
.circleci | ||
.github/workflows | ||
3rd_party | ||
app | ||
bergamot-translator-tests@53c6e42a97 | ||
cmake | ||
doc | ||
scripts/ci | ||
src | ||
wasm | ||
.clang-format | ||
.clang-format-ignore | ||
.clang-tidy | ||
.gitignore | ||
.gitmodules | ||
BERGAMOT_VERSION | ||
build-wasm.sh | ||
CMakeLists.txt | ||
Doxyfile.in | ||
LICENSE | ||
README.md | ||
run-clang-format.py |
Bergamot Translator
Bergamot translator provides a unified API for (Marian NMT framework based) neural machine translation functionality in accordance with the Bergamot project that focuses on improving client-side machine translation in a web browser.
Build Instructions
Build Natively
Create a folder where you want to build all the artifacts (build-native
in this case) and compile
mkdir build-native
cd build-native
cmake ../
make -j2
Build WASM
Prerequisite
Building on wasm requires Emscripten toolchain. It can be downloaded and installed using following instructions:
- Get the latest sdk:
git clone https://github.com/emscripten-core/emsdk.git
- Enter the cloned directory:
cd emsdk
- Install the lastest sdk tools:
./emsdk install 2.0.9
- Activate the latest sdk tools:
./emsdk activate 2.0.9
- Activate path variables:
source ./emsdk_env.sh
Compile
To build a version that translates with higher speeds on Firefox Nightly browser, follow these instructions:
-
Create a folder where you want to build all the artifacts (
build-wasm
in this case) and compilemkdir build-wasm cd build-wasm emcmake cmake -DCOMPILE_WASM=on ../ emmake make -j2
The wasm artifacts (.js and .wasm files) will be available in the build directory ("build-wasm" in this case).
-
Enable SIMD Wormhole via Wasm instantiation API in generated artifacts
bash ../wasm/patch-artifacts-enable-wormhole.sh
To build a version that runs on all browsers (including Firefox Nightly) but translates slowly, follow these instructions:
- Create a folder where you want to build all the artifacts (
build-wasm
in this case) and compilemkdir build-wasm cd build-wasm emcmake cmake -DCOMPILE_WASM=on -DWORMHOLE=off ../ emmake make -j2
Recompiling
As long as you don't update any submodule, just follow Compile steps.
If you update a submodule, execute following command in repository root folder before executing
Compile steps.
git submodule update --init --recursive
How to use
Using Native version
The builds generate library that can be integrated to any project. All the public header files are specified in src
folder.
A short example of how to use the APIs is provided in app/main.cpp
file.
Using WASM version
Please follow the README
inside the wasm
folder of this repository that demonstrates how to use the translator in JavaScript.