mirror of https://github.com/browsermt/bergamot-translator.git synced 2024-08-15 16:40:26 +03:00

Cross platform C++ library focusing on optimized machine translation on the consumer-grade device.

cpp cross-platform emscripten machine-translation neural-machine-translation neural-networks python starred-browsermt-repo starred-repo wasm webassembly

Go to file

Jerin Philip e9e5ac6782 Partial test-apps and tolerance in evaluations (#184 ) * Partial test applications Previously service-cli was used to generate output and accomplish regression testing for all of: (1) translated-text (2) alignment tokens + scores (3) quality scores (4) indirectly annotation and tokenizations. The --mode native now only outputs a faithful to source translated text of the input source on stdin. Test apps are separated into testing only individual functionalities. This can help in independently testing ssplit-cpp, quality-scores for the quality estimation implementation etc. Separating numbers and text have the advantage of being able to compare one with tolerance using BLEU (text) and some allowed error-rates (numbers). * Removing #mac tag * Moving test apps to src/tests * Tests are always on for CI Unit tests are turned off looking for WASM_COMPATIBLE_SOURCES. * Fixing WASM_COMPATIBLE_SOURCE -> USE_WASM_COMPATIBLE_SOURCE * Workaround for now; CMakeLists.txt horrors are starting to bite * BRT: use bergamot-test instead of bergamot now * This should fix issues: CMakeLists.txt has so many paths * Casing to camelCase and removing legacyServiceCli * removing leftover service-cli declaration, some doc updates * #pragma once is starting to look easier * All the more reasons to do #pragma once * Updating marian-dev with intgemm::kCPU print, resolved from INTGEMM_CPUID * BRT: Use --gemm-highest-arch instead of python script * Adding intgemm resolve here, where always(?) have intgemm on? * intgemm-resolve in default binary directory * BRT: Update to use intgemm-resolve * marian-dev: Reset to without --gemm-highest-precision Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>		2021-06-14 15:02:42 +01:00
.circleci	Avoid packaging vocab files into wasm binary in CI builds	2021-05-12 09:55:49 +02:00
.github/workflows	Partial test-apps and tolerance in evaluations (#184 )	2021-06-14 15:02:42 +01:00
3rd_party	Partial test-apps and tolerance in evaluations (#184 )	2021-06-14 15:02:42 +01:00
app	Partial test-apps and tolerance in evaluations (#184 )	2021-06-14 15:02:42 +01:00
bergamot-translator-tests@b0ba62eade	Partial test-apps and tolerance in evaluations (#184 )	2021-06-14 15:02:42 +01:00
cmake	CMake fixes: Generate project.h in binary dir, fix GetVersionFromFile for use as submodule. (#193 )	2021-06-09 10:12:00 +01:00
doc	Including WASM documentation in sphinx build toc (#176 )	2021-06-01 12:39:28 +01:00
scripts/ci	Enabling ccache on github builds for Ubuntu (#95 )	2021-05-17 11:42:47 +01:00
src	Partial test-apps and tolerance in evaluations (#184 )	2021-06-14 15:02:42 +01:00
wasm	Generating cmake configured project version (.js) file in build folder (#194 )	2021-06-09 13:57:23 +01:00
.clang-format	Adding clang-format and updating existing sources to adhere (#151 )	2021-05-19 21:50:21 +01:00
.clang-format-ignore	Adding clang-format and updating existing sources to adhere (#151 )	2021-05-19 21:50:21 +01:00
.gitignore	Imported CI scripts from mozilla/bergamot-translator-old (#1 )	2021-03-10 09:30:39 -08:00
.gitmodules	Moving small tests to GitHub CI (#93 )	2021-04-16 11:58:53 +01:00
BERGAMOT_VERSION	Corrected the version number	2021-05-18 12:17:56 +02:00
build-wasm.sh	Consistent EMSDK version and parallel make jobs in README and github actions	2021-06-09 11:10:10 +02:00
CMakeLists.txt	Partial test-apps and tolerance in evaluations (#184 )	2021-06-14 15:02:42 +01:00
Doxyfile.in	Marian compatible documentation tooling (#67 )	2021-03-24 17:00:53 +00:00
LICENSE	Initial commit	2020-10-19 13:49:38 +02:00
README.md	Consistent EMSDK version and parallel make jobs in README and github actions	2021-06-09 11:10:10 +02:00
run-clang-format.py	Adding clang-format and updating existing sources to adhere (#151 )	2021-05-19 21:50:21 +01:00

README.md

Bergamot Translator

Bergamot translator provides a unified API for (Marian NMT framework based) neural machine translation functionality in accordance with the Bergamot project that focuses on improving client-side machine translation in a web browser.

Build Instructions

Build Natively

Create a folder where you want to build all the artifacts (build-native in this case) and compile

mkdir build-native
cd build-native
cmake ../
make -j2

Build WASM

Prerequisite

Building on wasm requires Emscripten toolchain. It can be downloaded and installed using following instructions:

Get the latest sdk: git clone https://github.com/emscripten-core/emsdk.git
Enter the cloned directory: cd emsdk
Install the lastest sdk tools: ./emsdk install 2.0.9
Activate the latest sdk tools: ./emsdk activate 2.0.9
Activate path variables: source ./emsdk_env.sh

Compile

Create a folder where you want to build all the artifacts (build-wasm in this case) and compile
```
mkdir build-wasm
cd build-wasm
emcmake cmake -DCOMPILE_WASM=on ../
emmake make -j2
```
The wasm artifacts (.js and .wasm files) will be available in the build directory ("build-wasm" in this case).
Enable SIMD Wormhole via Wasm instantiation API in generated artifacts
```
bash ../wasm/patch-artifacts-enable-wormhole.sh
```

Recompiling

As long as you don't update any submodule, just follow Compile steps.
If you update a submodule, execute following command in repository root folder before executing Compile steps.

git submodule update --init --recursive

How to use

Using Native version

The builds generate library that can be integrated to any project. All the public header files are specified in src folder.
A short example of how to use the APIs is provided in app/main.cpp file.

Using WASM version

Please follow the README inside the wasm folder of this repository that demonstrates how to use the translator in JavaScript.