Merge remote-tracking branch 'origin/wasm-integration' into jp/absorb-batch-translator

This commit is contained in:
Jerin Philip 2021-02-22 16:33:46 +00:00
commit fd9e79a817
7 changed files with 88 additions and 189 deletions

108
README.md
View File

@ -16,64 +16,78 @@ make -j
```
### Build WASM
#### Compiling for the first time
To compile WASM, first download and Install Emscripten using following instructions:
1. Download and Install Emscripten using following instructions
* Get the latest sdk: `git clone https://github.com/emscripten-core/emsdk.git`
* Enter the cloned directory: `cd emsdk`
* Install the lastest sdk tools: `./emsdk install latest`
* Activate the latest sdk tools: `./emsdk activate latest`
* Activate path variables: `source ./emsdk_env.sh`
1. Get the latest sdk: `git clone https://github.com/emscripten-core/emsdk.git`
2. Enter the cloned directory: `cd emsdk`
3. Install the lastest sdk tools: `./emsdk install latest`
4. Activate the latest sdk tools: `./emsdk activate latest`
5. Activate path variables: `source ./emsdk_env.sh`
2. Clone the repository and checkout the appropriate branch using these instructions:
```bash
git clone https://github.com/browsermt/bergamot-translator
cd bergamot-translator
git checkout -b wasm-integration origin/wasm-integration
git submodule update --init --recursive
```
After the successful installation of Emscripten, perform these steps:
3. Download files (only required if you want to package files in wasm binary)
This step is only required if you want to package files (e.g. models, vocabularies etc.)
into wasm binary. If you don't then just skip this step.
The build preloads the files in Emscriptens virtual file system.
If you want to package bergamot project specific models, please follow these instructions:
```bash
mkdir models
git clone https://github.com/mozilla-applied-ml/bergamot-models
cp -rf bergamot-models/* models
gunzip models/*/*
```
4. Compile
1. Create a folder where you want to build all the artefacts (`build-wasm` in this case)
```bash
mkdir build-wasm
cd build-wasm
```
2. Compile the artefacts
* If you want to package files into wasm binary then execute following commands (Replace `FILES_TO_PACKAGE` with the path of the
directory containing the files to be packaged in wasm binary)
```bash
emcmake cmake -DCOMPILE_WASM=on -DPACKAGE_DIR=FILES_TO_PACKAGE ../
emmake make -j
```
e.g. If you want to package bergamot project specific models (downloaded using step 3 above) then
replace `FILES_TO_PACKAGE` with `../models`
* If you don't want to package any file into wasm binary then execute following commands:
```bash
emcmake cmake -DCOMPILE_WASM=on ../
emmake make -j
```
The artefacts (.js and .wasm files) will be available in `wasm` folder of build directory ("build-wasm" in this case).
#### Recompiling
As long as you don't update any submodule, just follow steps in `4.ii` to recompile.\
If you update a submodule, execute following command before executing steps in `4.ii` to recompile.
```bash
git clone --recursive https://github.com/browsermt/bergamot-translator
cd bergamot-translator
git checkout wasm-integration
git submodule update --recursive
mkdir build-wasm
cd build-wasm
emcmake cmake -DCOMPILE_WASM=on ../
emmake make -j
git submodule update --init --recursive
```
It should generate the artefacts (.js and .wasm files) in `wasm` folder inside build directory ("build-wasm" in this case).
Download the models from `https://github.com/mozilla-applied-ml/bergamot-models`, and place all the desired ones to package in a folder called `models`.
The build also allows packaging files into wasm binary (i.e. preloading in Emscriptens virtual file system) using cmake
option `PACKAGE_DIR`. The compile command below packages all the files in PATH directory (in these case, your models) into wasm binary.
```bash
emcmake cmake -DCOMPILE_WASM=on -DPACKAGE_DIR=/repo/models ../
```
Files packaged this way are preloaded in the root of the virtual file system.
To package the set of files expected by the test page:
```bash
mkdir models
git clone https://github.com/motin/bergamot-models
cp -r bergamot-models/* models
gunzip models/*/*
```
After Editing Files:
```bash
emmake make -j
```
After Adding/Removing Files:
```bash
emcmake cmake -DCOMPILE_WASM=on ../
emmake make -j
```
## How to use
### Using Native version
The builds generate library that can be integrated to any project. All the public header files are specified in `src` folder. A short example of how to use the APIs is provided in `app/main.cpp` file
The builds generate library that can be integrated to any project. All the public header files are specified in `src` folder.\
A short example of how to use the APIs is provided in `app/main.cpp` file.
### Using WASM version

View File

@ -1,55 +0,0 @@
# -*- mode: makefile-gmake; indent-tabs-mode: true; tab-width: 4 -*-
SHELL = bash
PWD = $(shell pwd)
WASM_IMAGE = local/bergamot-translator-build-wasm
all: wasm-image compile-wasm
# Build the Docker image for WASM builds
wasm-image:
docker build -t local/bergamot-translator-build-wasm ./wasm/
# Commands for compilation:
cmake_cmd = cmake
wasm_cmake_cmd = ${cmake_cmd}
wasm_cmake_cmd += -DCOMPILE_WASM=on
wasm_cmake_cmd += -DProtobuf_INCLUDE_DIR=/usr/opt/protobuf-wasm-lib/dist/include
wasm_cmake_cmd += -DProtobuf_LIBRARY=/usr/opt/protobuf-wasm-lib/dist/lib/libprotobuf.a
wasm_cmake_cmd += -DPACKAGE_DIR=/repo/models
make_cmd = make
#make_cmd += VERBOSE=1
# ... and running things on Docker
docker_mounts = ${PWD}/..:/repo
docker_mounts += ${HOME}/.ccache:/.ccache
run_on_docker = docker run --rm
run_on_docker += $(addprefix -v, ${docker_mounts})
run_on_docker += ${INTERACTIVE_DOCKER_SESSION}
${HOME}/.ccache:
mkdir -p $@
# Remove the bergamot-translator WASM build dir, forcing a clean compilation attempt
clean-wasm: BUILD_DIR = /repo/build-wasm-docker
clean-wasm: ${HOME}/.ccache
${run_on_docker} ${WASM_IMAGE} bash -c '(rm -rf ${BUILD_DIR} || true)'
# Compile bergamot-translator to WASM
compile-wasm: BUILD_DIR = /repo/build-wasm-docker
compile-wasm: ${HOME}/.ccache
${run_on_docker} ${WASM_IMAGE} bash -c 'mkdir -p ${BUILD_DIR} && \
cd ${BUILD_DIR} && \
(emcmake ${wasm_cmake_cmd} .. && \
(emmake ${make_cmd}) || \
rm CMakeCache.txt)'
# Start interactive shells for development / debugging purposes
native-shell: INTERACTIVE_DOCKER_SESSION = -it
native-shell:
${run_on_docker} ${NATIVE_IMAGE} bash
wasm-shell: INTERACTIVE_DOCKER_SESSION = -it
wasm-shell:
${run_on_docker} ${WASM_IMAGE} bash

View File

@ -1,27 +0,0 @@
## WASM
Prepare docker image for WASM compilation:
```bash
make wasm-image
```
Compile to wasm:
```bash
make compile-wasm
```
## Debugging
Remove the marian-decoder build dir, forcing the next compilation attempt to start from scratch:
```bash
make clean-wasm
```
Enter a docker container shell for manually running commands:
```bash
make wasm-shell
```

View File

@ -1,36 +0,0 @@
FROM emscripten/emsdk:2.0.9
# Install specific version of CMake
WORKDIR /usr
RUN wget https://github.com/Kitware/CMake/releases/download/v3.17.2/cmake-3.17.2-Linux-x86_64.tar.gz -qO-\
| tar xzf - --strip-components 1
# Install Python and Java (needed for Closure Compiler minification)
RUN apt-get update \
&& apt-get install -y \
python3 \
default-jre
# Deps to compile protobuf from source + the protoc binary which we need natively
RUN apt-get update -y && apt-get --no-install-recommends -y install \
protobuf-compiler \
autoconf \
autotools-dev \
automake \
autogen \
libtool && ln -s /usr/bin/libtoolize /usr/bin/libtool \
&& mkdir -p /usr/opt \
&& cd /usr/opt \
&& git clone https://github.com/menduz/protobuf-wasm-lib
RUN cd /usr/opt/protobuf-wasm-lib \
&& /bin/bash -c "BRANCH=v3.6.1 ./prepare.sh"
RUN cd /usr/opt/protobuf-wasm-lib/protobuf \
&& bash -x ../build.sh
RUN cp /usr/bin/protoc /usr/opt/protobuf-wasm-lib/dist/bin/protoc
RUN apt-get --no-install-recommends -y install \
libprotobuf-dev
# Necessary for benchmarking
RUN pip3 install sacrebleu

View File

@ -16,7 +16,8 @@ target_compile_options(bergamot-translator-worker PRIVATE ${WASM_COMPILE_FLAGS})
set(LINKER_FLAGS "--bind -s ASSERTIONS=0 -s DISABLE_EXCEPTION_CATCHING=1 -s FORCE_FILESYSTEM=1 -s ALLOW_MEMORY_GROWTH=1 -s NO_DYNAMIC_EXECUTION=1")
if (NOT PACKAGE_DIR STREQUAL "")
set(LINKER_FLAGS "${LINKER_FLAGS} --preload-file ${PACKAGE_DIR}@/")
get_filename_component(REALPATH_PACKAGE_DIR ${PACKAGE_DIR} REALPATH BASE_DIR ${CMAKE_BINARY_DIR})
set(LINKER_FLAGS "${LINKER_FLAGS} --preload-file ${REALPATH_PACKAGE_DIR}@/")
endif()
set_target_properties(bergamot-translator-worker PROPERTIES

View File

@ -1,13 +1,14 @@
## Using Bergamot Translator in JavaScript
The example file `bergamot.html` in the folder `test_page` demonstrates how to use the bergamot translator in JavaScript via a `<script>` tag.
This example assumes that files were packaged in wasm binary.
A brief summary is here though:
Please note that everything below assumes that the [bergamot project specific model files](https://github.com/mozilla-applied-ml/bergamot-models) were packaged in wasm binary (using the compile instructions given in the top level README).
### Using JS APIs
```js
// The model configuration as YAML formatted string. For available configuration options, please check: https://marian-nmt.github.io/docs/cmd/marian-decoder/
// This example captures the most relevant options: model file, vocabulary files and shortlist file
const modelConfig = "{\"models\":[\"/model.npz\"],\"vocabs\":[\"/vocab.esen.spm\",\"/vocab.esen.spm\"],\"shortlist\":[\"/lex.s2t\"],\"beam-size\":1}";
const modelConfig = "{\"models\":[\"/esen/model.esen.npz\"],\"vocabs\":[\"/esen/vocab.esen.spm\",\"/esen/vocab.esen.spm\"],\"shortlist\":[\"/esen/lex.esen.s2t\"],\"beam-size\":1}";
// Instantiate the TranslationModel
const model = new Module.TranslationModel(modelConfig);
@ -33,29 +34,30 @@ request.delete();
input.delete();
```
You can also see everything in action by following the next steps:
### Demo (see everything in action)
* Start the test webserver (ensure you have the latest nodejs installed)
```bash
cd test_page
bash start_server.sh
```
```bash
cd test_page
bash start_server.sh
```
* Open any of the browsers below
* Firefox Nightly +87: make sure the following prefs are on (about:config)
```
dom.postMessage.sharedArrayBuffer.bypassCOOP_COEP.insecure.enabled = true
javascript.options.wasm_simd = true
javascript.options.wasm_simd_wormhole = true
```
```
dom.postMessage.sharedArrayBuffer.bypassCOOP_COEP.insecure.enabled = true
javascript.options.wasm_simd = true
javascript.options.wasm_simd_wormhole = true
```
* Chrome Canary +90: start with the following argument
```
--js-flags="--experimental-wasm-simd"
```
```
--js-flags="--experimental-wasm-simd"
```
* Browse to the following page:
```
http://localhost:8000/bergamot.html
```
```
http://localhost:8000/bergamot.html
```
* Run some translations:
* Choose a model and press `Load Model`

View File

@ -1,8 +1,8 @@
#!/bin/bash
cp ../../build-wasm-docker/wasm/bergamot-translator-worker.data .
cp ../../build-wasm-docker/wasm/bergamot-translator-worker.js .
cp ../../build-wasm-docker/wasm/bergamot-translator-worker.wasm .
cp ../../build-wasm-docker/wasm/bergamot-translator-worker.worker.js .
cp ../../build-wasm/wasm/bergamot-translator-worker.data .
cp ../../build-wasm/wasm/bergamot-translator-worker.js .
cp ../../build-wasm/wasm/bergamot-translator-worker.wasm .
cp ../../build-wasm/wasm/bergamot-translator-worker.worker.js .
npm install
node bergamot-httpserver.js