* Partial test applications Previously service-cli was used to generate output and accomplish regression testing for all of: (1) translated-text (2) alignment tokens + scores (3) quality scores (4) indirectly annotation and tokenizations. The --mode native now only outputs a faithful to source translated text of the input source on stdin. Test apps are separated into testing only individual functionalities. This can help in independently testing ssplit-cpp, quality-scores for the quality estimation implementation etc. Separating numbers and text have the advantage of being able to compare one with tolerance using BLEU (text) and some allowed error-rates (numbers). * Removing #mac tag * Moving test apps to src/tests * Tests are always on for CI Unit tests are turned off looking for WASM_COMPATIBLE_SOURCES. * Fixing WASM_COMPATIBLE_SOURCE -> USE_WASM_COMPATIBLE_SOURCE * Workaround for now; CMakeLists.txt horrors are starting to bite * BRT: use bergamot-test instead of bergamot now * This should fix issues: CMakeLists.txt has so many paths * Casing to camelCase and removing legacyServiceCli * removing leftover service-cli declaration, some doc updates * #pragma once is starting to look easier * All the more reasons to do #pragma once * Updating marian-dev with intgemm::kCPU print, resolved from INTGEMM_CPUID * BRT: Use --gemm-highest-arch instead of python script * Adding intgemm resolve here, where always(?) have intgemm on? * intgemm-resolve in default binary directory * BRT: Update to use intgemm-resolve * marian-dev: Reset to without --gemm-highest-precision Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com> |
||
---|---|---|
.circleci | ||
.github/workflows | ||
3rd_party | ||
app | ||
bergamot-translator-tests@b0ba62eade | ||
cmake | ||
doc | ||
scripts/ci | ||
src | ||
wasm | ||
.clang-format | ||
.clang-format-ignore | ||
.gitignore | ||
.gitmodules | ||
BERGAMOT_VERSION | ||
build-wasm.sh | ||
CMakeLists.txt | ||
Doxyfile.in | ||
LICENSE | ||
README.md | ||
run-clang-format.py |
Bergamot Translator
Bergamot translator provides a unified API for (Marian NMT framework based) neural machine translation functionality in accordance with the Bergamot project that focuses on improving client-side machine translation in a web browser.
Build Instructions
Build Natively
Create a folder where you want to build all the artifacts (build-native
in this case) and compile
mkdir build-native
cd build-native
cmake ../
make -j2
Build WASM
Prerequisite
Building on wasm requires Emscripten toolchain. It can be downloaded and installed using following instructions:
- Get the latest sdk:
git clone https://github.com/emscripten-core/emsdk.git
- Enter the cloned directory:
cd emsdk
- Install the lastest sdk tools:
./emsdk install 2.0.9
- Activate the latest sdk tools:
./emsdk activate 2.0.9
- Activate path variables:
source ./emsdk_env.sh
Compile
-
Create a folder where you want to build all the artifacts (
build-wasm
in this case) and compilemkdir build-wasm cd build-wasm emcmake cmake -DCOMPILE_WASM=on ../ emmake make -j2
The wasm artifacts (.js and .wasm files) will be available in the build directory ("build-wasm" in this case).
-
Enable SIMD Wormhole via Wasm instantiation API in generated artifacts
bash ../wasm/patch-artifacts-enable-wormhole.sh
Recompiling
As long as you don't update any submodule, just follow Compile steps.
If you update a submodule, execute following command in repository root folder before executing
Compile steps.
git submodule update --init --recursive
How to use
Using Native version
The builds generate library that can be integrated to any project. All the public header files are specified in src
folder.
A short example of how to use the APIs is provided in app/main.cpp
file.
Using WASM version
Please follow the README
inside the wasm
folder of this repository that demonstrates how to use the translator in JavaScript.