More portable WASM demo (#437)

* Replace most of the wasm demo page with code from the firefox extension This code should be more generic and copy/pastable into other projects. Maybe one day it will be an npm package? * Fix Ukrainian model support * Add quality estimation output Automatically enabled when the model(s) support it * Little "Translating…" indicator * Don't make Safari fail on something tiny * Rewire lots of async state to be able to predictably know when the translator is working or not Previously so much was lazy loaded that it was not easy to catch lack of SIMD support. Now I can just enable the interface only after it has properly loaded. * No need for a two-stage setup for the worker. Just promise to call `initialize()`! * More (correct) types and comments for code * Keyboard shortcuts for input area for bold, italic and underline. Enough to demo mark-up translation * Fix `delete()` * Move javascript glue code into its own npm package * Add nodejs support and test to package * More stand-alone build command …for now, not really used by anything I think * Ignore build packages * Use local filesystem for build so it is automatically cached * fix overflow on demo page But this might break the mobile demo? I'll have to check into that * Bring back integrity check, except for NodeJS for now * Make `build` part of `prepare` so we always make sure we build a complete package * Move worker code into its own folder This way I can mark it as a commonjs module which will help cause nodejs treat the files the same as WebWorkers do right now. Firefox doesn't implement `{type: 'module'}` yet for WebWorkers. * Add README * Fix paths * Add npm publish automation * Make sure webpack ignores node compatibility code * Add missing webpack:ignore around a worker * Default to getting models from S3 * Separate "loading" and "translating" indicators * Bump npm package version * Add credits * Don't block on the worker loading * Not just Mozilla, but Bergamot! * Make individual translation requests cancelable * Swap button turns vertically when in skyscraper mode * Make it easier to debug errors from inside the worker * Don't bork on deleting a failed worker * Don't bork on calling translate() with a failed worker * Handle compilation error with more grace * `contenteditable=true` seems to work better with some browser extensions Looking at you, Vimium! * Clean up abort promise * Bump npm package version * Remove `workerUrl` option in favour of better webpack support With that option it was hard for Webpack to figure out dependencies, and it did not enter my worker script for rewriting. With the hardcoded url it does, and with a bit of `new webpack.DefinePlugin({'typeof self': JSON.stringify('object')}),` we can have webpack remove node-specific code on build! * Bump version Minor API change hehe Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
2024-07-14 17:00:28 +03:00 · 2023-01-18 19:41:39 +00:00 · 2023-01-18 19:41:39 +00:00 · 8d5f877596
commit 8d5f877596
parent 2834f046dc
16 changed files with 1959 additions and 495 deletions
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@ -281,6 +281,29 @@ jobs:
                ${{github.workspace}}/build-wasm/bergamot-translator-worker.wasm
                ${{github.workspace}}/build-wasm/bergamot-translator-worker.js.bak

+    
+    upload-wasm:
+      name: "Upload node package to NPM"
+      runs-on: ubuntu-latest
+      if: ${{ startsWith(github.ref, 'refs/tags/v') }}
+      needs: [build-wasm]
+      steps:
+      - name: Download artifacts
+        uses: actions/download-artifact@v2
+        with:
+          name: wasm-artefacts
+          path: wasm/module/worker
+
+      - uses: actions/setup-node@v3
+        with:
+          node-version: '18.x'
+          registry-url: 'https://registry.npmjs.org'
+      - run: npm ci
+      - run: npm publish
+        env:
+          NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
+
+

  # Try to upload a release using https://github.com/marvinpinto/actions/issues/177#issuecomment-917605585 as a model
    release-latest:
--- a/.gitignore
+++ b/.gitignore
@ -19,7 +19,8 @@ _deps
 wasm/test_page/node_modules
 build-wasm
 models
-wasm/test_page/js/bergamot-translator-worker.*
+wasm/module/worker/bergamot-translator-worker.*
+wasm/module/browsermt-bergamot-translator-*.tgz

 # VSCode
 .vscode
--- a/wasm/module/README.md
+++ b/wasm/module/README.md
@ -0,0 +1,238 @@
+# Installation
+
+```bash
+npm install @browsermt/bergamot-translator
+```
+
+# Quick start
+
+```js
+import {BatchTranslator} from "@browsermt/bergamot-translator/translator.js";
+
+const translator = new BatchTranslator();
+
+const response = await translator.translate({
+  from: "en",
+  to: "es",
+  text: "Hello <em>world</em>!",
+  html: true
+});
+
+console.log(response.target.text);
+
+// Stops worker threads
+translator.delete();
+```
+
+# Throughput vs Latency
+
+This package comes with two translator implementations:
+
+- [LatencyOptimisedTranslator](#latencyoptimisedtranslator) is more useful for an interactive session, say like Google Translate, where you're only working on translating one input at a time.
+- [BatchTranslator](#batchtranslator) is optimised for processing a large number of translations as fast as possible (but individual translations might take some time), e.g. translating a large number of strings or all paragraphs in a document.
+
+## LantencyOptimisedTranslator
+
+Translator best suited for interactive usage. Runs with a single worker thread and a batch-size of 1 to give you a response as quickly as possible. It will cancel any pending translations that aren't currently being processed if you submit a new one.
+
+```js
+const translator = new LatencyOptimisedTranslator({
+  pivotLanguage?: string?,
+  registryUrl?: string,
+  workerUrl?: string,
+  downloadTimeout?: number,
+  cacheSize?: number,
+  useNativeIntGemm?: boolean,
+})
+```
+
+- `pivotLanguage` - language code for the language to use as an intermediate if there is no direct translation model available. Defaults to `"en"`. Set to `null` to disable pivoting.
+- `registryUrl` - url to a list of models and their paths. Defaults to `https://storage.googleapis.com/bergamot-models-sandbox/0.3.3/registry.json`.
+- `workerUrl` - url to `translator-worker.js`. Defaults to `"worker/translator-worker.js"` relative to the path of `translator.js`.
+- `downloadTimeout` - Maximum time we're attempting to download model files before failing. Defaults to `60000` or 60 seconds. Set to `0` to disable.
+- `cacheSize` - Maximum number of sentences in kept translation cache (per worker, workers do not share their cache). This is an ideal maximum as it is a hash-map, in practice about 1/3th is occupied. If set to `0`, translation cache is disabled (the default).
+- `useNativeIntGemm` - Try to link to native IntGEMM implementation when loading the WASM binary. This is only implemented in the privileged extension context of Firefox Nightly. If it fails, it will always fall back to the included implementation. Defaults to `false`.
+
+### translate()
+
+```js
+const {request, target: {text:string}} = await translator.translate({
+  from: string,
+  to: string,
+  text: string,
+  html?: boolean,
+  qualityScores?: boolean
+})
+```
+
+Submits a translation request. Multiple of these are processed in a batch. A batch will be started the next tick (if there is a worker available).
+
+- `from` - language code of the source language, e.g. `"de"`
+- `to` - language code of the target language, e.g. `"en"`
+- `text` - string of text to translate, e.g. `"Hallo Welt!"`
+- `html` - boolean indicating whether `text` contains just plain text or HTML
+- `qualityScores` - whether to calculate quality scores. Not all models support this, and you need to load a separate quality scores model file for it. Quality scores are returned as `<font x-bergamot-sentence-quality="">` and `<font x-bergamot-word-quality="">` wrapped around sentences and words in the output. When enabled, the output is always HTML, regardless of whether the input was.
+
+Returns:
+
+A promise to a translation response object, with `target.text` being the text or HTML of the translated output, and `request` a reference to the original translation request.
+
+### delete()
+
+```js
+translator.delete()
+```
+
+Cancels all pending requests with a `CancelledError` and terminates the worker immediately. This will free all the resources used.
+
+In a nodejs context you'll need to call this, otherwise your script won't exit because the translator will still be listening for messages from the worker.
+
+## BatchTranslator
+
+```js
+const translator = new BatchTranslator({
+  pivotLanguage?: string?,
+  registryUrl?: string,
+  workerUrl?: string,
+  downloadTimeout?: number,
+  cacheSize?: number,
+  useNativeIntGemm?: boolean,
+  workers?: number,
+  batchSize?: number,
+})
+```
+
+General translator options:
+
+See [LatencyOptimisedTranslator](#latencyoptimisedtranslator).
+
+BatchTranslator-specific options:
+
+- `workers` - Number of worker threads. These are full-on instances of the translator, with their own copy of the model loaded. This is an upper bound. If not that many workers can be fed, it won't create new ones. Minimally 1. Default is `1`.
+- `batchSize` - Number of translation requests per batch. All sentences from all translation requests are packed into a bunch of matrix operations. With a larger batch size the translator has more material to find ideal sets of sentences for filling the matrix. However, you'll only get the results for each of the requests in a batch once the whole batch is finished. Defaults to 8.
+
+### translate()
+
+```js
+const {target: {text:string}} = await translator.translate({
+  from: string,
+  to: string,
+  text: string,
+  html?: boolean,
+  qualityScores?: boolean,
+  priority?: number
+})
+```
+
+Submits a translation request. Multiple of these are processed in a batch. A batch will be started the next tick (if there is a worker available).
+
+- (See [LatencyOptimisedTranslator.translate()](#translate) for most options)
+- `priority` - When grouping translation requests into batches to give to workers, requests with a lower number are considered first. For example, if you're translating a web page, you can give requests of parts that are in the current frame a lower number to make sure they're processed first.
+
+### remove()
+
+```js
+translator.remove(request => {
+  // true deletes the request from the queue.
+  return true;
+})
+```
+
+Removes requests from the translation queue, i.e. only when they haven't been sent to a worker yet.
+
+The filter function should return true-ish for each request that should be cancelled. Their promises are rejected with a `CancelledError` error.
+
+
+### delete()
+
+```js
+translator.delete()
+```
+
+Cancels all pending requests with a `CancelledError` and terminates all workers immediately. This will free all the resources used.
+
+
+# Models
+
+Both translators accept a `backing` option, which tells it where to get model data and the translation engine implementation from. They default to using `BergamotTranslator` which gets its models from the same repository as [firefox-translations](https://github.com/mozilla/firefox-translations).
+
+To customize the model, reimplement the `loadModelRegistry` and `loadTranslationModel` methods.
+
+`loadModelRegistry()` has the hard requirement to return a promise to a list that looks like `{from: string, to: string, ...}[]`. The `from` and `to` keys are used as key for model selection.
+
+`loadTranslationModel()` should return a promise with ArrayBuffers for `model`, `shortlist`, `vocabs`, and optionally `qualityModel`. It can include a `config` object as well.
+
+Example of an alternative implementation that loads models from data.statmt.org, i.e. the same as [translateLocally](https://translateLocally.com):
+
+```js
+class CustomBacking extends TranslatorBacking {
+    async loadModelRegistery() {
+        const response = await fetch('https://translatelocally.com/models.json');
+        const {models} = await response.json();
+
+        // Add 'from' and 'to' keys for each model. Since theoretically a model
+        // can have multiple froms keys in TranslateLocally, we do a little
+        // product here.
+        return models.reduce((list, model) => {
+            try {
+                const to = first(Intl.getCanonicalLocales(model.trgTag));
+                for (let from of Intl.getCanonicalLocales(Object.keys(model.srcTags))) {
+                    list.push({from, to, model});
+                }
+            } catch (err) {
+                console.log('Skipped model', model, 'because', err);
+            }
+
+            return list;
+        }, []);
+    }
+
+    async loadTranslationModel({from, to}) {
+        // Find that model in the registry which will tell us about its files
+        const entries = (await this.registry).filter(model => model.from === from && model.to === to);
+
+        // Prefer tiny models above non-tiny ones
+        entries.sort(({model: a}, {model: b}) => (a.shortName.indexOf('tiny') === -1 ? 1 : 0) - (b.shortName.indexOf('tiny') === -1 ? 1 : 0));
+
+        if (!entries)
+            throw new Error(`No model for '${from}' -> '${to}'`);
+
+        const entry = entries[0].model;
+
+        const response = await fetch(entry.url, {
+            integrity: `sha256-${entry.checksum}`
+        });
+
+        // pako from https://www.npmjs.com/package/pako
+        const archive = pako.inflate(await response.arrayBuffer());
+
+        // untar from https://www.npmjs.com/package/js-untar
+        const files = await untar(archive.buffer);
+
+        const find = (filename) => {
+            const found = files.find(file => file.name.match(/(?:^|\/)([^\/]+)$/)[1] === filename)
+            if (found === undefined)
+                throw new Error(`Could not find '${filename}' in model archive`);
+            return found;
+        };
+
+        // YAML.parse is found in worker/translator-worker.js
+        const config = YAML.parse(find('config.intgemm8bitalpha.yml').readAsString());
+
+        const model = find(config.models[0]).buffer;
+
+        const vocabs = config.vocabs.map(vocab => find(vocab).buffer);
+
+        const shortlist = find(config.shortlist[0]).buffer;
+
+        // Return the buffers
+        return {model, vocabs, shortlist, config};
+    }
+}
+
+const translator = new BatchTranslator(options, new CustomBacking(options));
+```
+
+# Supported languages
+
+See https://github.com/mozilla/firefox-translations-models#currently-supported-languages. You may need to set the `registryUrl` option to point to the latest release.
--- a/wasm/module/main.js
+++ b/wasm/module/main.js
@ -0,0 +1,21 @@
+import * as readline from 'node:readline/promises';
+import {stdin, stdout} from 'node:process';
+import {BatchTranslator} from "./translator.js";
+
+const rl = readline.createInterface({input: stdin, output: stdout});
+
+const translator = new BatchTranslator();
+
+for await (const line of rl) {
+	const response = await translator.translate({
+		from: "en",
+		to: "es",
+		text: line,
+		html: false,
+		qualityScores: false
+	});
+
+	console.log(response.target.text);
+}
+
+translator.delete();
--- a/wasm/module/package.json
+++ b/wasm/module/package.json
@ -0,0 +1,39 @@
+{
+  "name": "@browsermt/bergamot-translator",
+  "version": "0.4.9",
+  "description": "Cross platform C++ library focusing on optimized machine translation on the consumer-grade device.",
+  "homepage": "https://github.com/browsermt/bergamot-translator#readme",
+  "repository": {
+    "type": "git",
+    "url": "git+ssh://git@github.com/browsermt/bergamot-translator.git"
+  },
+  "keywords": [
+    "machine",
+    "translation"
+  ],
+  "author": "",
+  "license": "MPL-2.0",
+  "bugs": {
+    "url": "https://github.com/browsermt/bergamot-translator/issues"
+  },
+  "type": "module",
+  "main": "translator.js",
+  "scripts": {
+    "test": "echo \"Error: no test specified\" && exit 1"
+  },
+  "files": [
+    "worker/bergamot-translator-worker.js",
+    "worker/bergamot-translator-worker.wasm",
+    "worker/translator-worker.js",
+    "translator.js",
+    "main.js"
+  ],
+  "config": {
+    "emscripten_version": "3.1.8"
+  },
+  "scripts": {
+    "prepare": "test -f worker/bergamot-translator-worker.wasm || npm run build",
+    "build": "mkdir -p ../../build-wasm && docker run --rm -v $(realpath ../../):/src -v $(realpath ../../build-wasm):/build -v $(pwd)/worker:/dst -w /build emscripten/emsdk:$npm_package_config_emscripten_version sh -c \"emcmake cmake -DCOMPILE_WASM=on -DWORMHOLE=off /src && emmake make -j2 && cp bergamot-translator-worker.wasm bergamot-translator-worker.js /dst\"",
+    "test": "echo \"Hello world!\" | node main.js"
+  }
+}
--- a/wasm/module/translator.js
+++ b/wasm/module/translator.js
@ -0,0 +1,879 @@
+/**
+ * @typedef {Object} TranslationRequest
+ * @property {String} from
+ * @property {String} to
+ * @property {String} text
+ * @property {Boolean} html
+ * @property {Integer?} priority
+ */
+
+/**
+ * @typedef {Object} TranslationResponse
+ * @property {TranslationRequest} request
+ * @property {{text: string}} target
+ */
+
+/**
+ * NodeJS compatibility, a thin WebWorker layer around node:worker_threads.
+ */
+if (!(typeof window !== 'undefined' && window.Worker)) {
+    globalThis.Worker = class {
+        #worker;
+
+        constructor(url) {
+            this.#worker = new Promise(async (accept) => {
+                const {Worker} = await import(/* webpackIgnore: true */ 'node:worker_threads');
+                accept(new Worker(url));
+            });
+        }
+
+        addEventListener(eventName, callback) {
+            this.#worker.then(worker => worker.on(eventName, (data) => callback({data})));
+        }
+
+        postMessage(message) {
+            this.#worker.then(worker => worker.postMessage(message));
+        }
+
+        terminate() {
+            this.#worker.then(worker => worker.terminate());
+        }
+    }
+}
+
+/**
+ * Thrown when a pending translation is replaced by another newer pending
+ * translation.
+ */
+export class SupersededError extends Error {}
+
+
+/**
+ * Thrown when a translation was removed from the queue.
+ */
+export class CancelledError extends Error {}
+
+
+/**
+ * Wrapper around bergamot-translator loading and model management.
+ */
+ export class TranslatorBacking {
+    
+    /**
+     * @param {{
+     *  cacheSize?: number,
+     *  useNativeIntGemm?: boolean,
+     *  downloadTimeout?: number,
+     *  registryUrl?: string
+     *  pivotLanguage?: string?
+     *  onerror?: (err: Error)
+     * }} options
+     */
+    constructor(options) {
+        this.options = options || {};
+
+        this.registryUrl = this.options.registryUrl || 'https://bergamot.s3.amazonaws.com/models/index.json';
+
+        this.downloadTimeout = 'downloadTimeout' in this.options ? parseInt(this.options.downloadTimeout) : 60000;
+
+        /**
+         * registry of all available models and their urls
+         * @type {Promise<Model[]>}
+         */
+        this.registry = this.loadModelRegistery();
+
+        /**
+         * Map of downloaded model data files as buffers per model.
+         * @type {Map<{from:string,to:string}, Promise<Map<string,ArrayBuffer>>>}
+         */
+        this.buffers = new Map();
+
+        /**
+         * @type {string?}
+         */
+        this.pivotLanguage = 'pivotLanguage' in this.options ? options.pivotLanguage : 'en';
+        
+        /**
+         * A map of language-pairs to a list of models you need for it.
+         * @type {Map<{from:string,to:string}, Promise<{from:string,to:string}[]>>}
+         */
+        this.models = new Map();
+
+        /**
+         * Error handler for all errors that are async, not tied to a specific
+         * call and that are unrecoverable.
+         * @type {(error: Error)}
+         */
+        this.onerror = this.options.onerror || (err => console.error('WASM Translation Worker error:', err));
+    }
+
+    /**
+     * Loads a worker thread, and wraps it in a message passing proxy. I.e. it
+     * exposes the entire interface of TranslationWorker here, and all calls
+     * to it are async. Do note that you can only pass arguments that survive
+     * being copied into a message. 
+     * @return {Promise<{worker:Worker, exports:Proxy<TranslationWorker>}>}
+     */
+    async loadWorker() {
+        const worker = new Worker(new URL('./worker/translator-worker.js', import.meta.url));
+
+        /**
+         * Incremental counter to derive request/response ids from.
+         */
+        let serial = 0;
+
+        /**
+         * Map of pending requests
+         * @type {Map<number,{accept:(any), reject:(Error)}>}
+         */
+        const pending = new Map();
+
+        // Function to send requests
+        const call = (name, ...args) => new Promise((accept, reject) => {
+            const id = ++serial;
+            pending.set(id, {
+                accept,
+                reject,
+                callsite: { // for debugging which call caused the error
+                    message: `${name}(${args.map(arg => String(arg)).join(', ')})`,
+                    stack: new Error().stack
+                }
+            });
+            worker.postMessage({id, name, args});
+        });
+
+        // … receive responses
+        worker.addEventListener('message', function({data: {id, result, error}}) {
+            if (!pending.has(id)) {
+                console.debug('Received message with unknown id:', arguments[0]);
+                throw new Error(`BergamotTranslator received response from worker to unknown call '${id}'`);
+            }
+
+            const {accept, reject, callsite} = pending.get(id);
+            pending.delete(id);
+
+            if (error !== undefined)
+                reject(Object.assign(new Error(), error, {
+                    message: error.message + ` (response to ${callsite.message})`,
+                    stack: error.stack ? `${error.stack}\n${callsite.stack}` : callsite.stack
+                }));
+            else
+                accept(result);
+        });
+
+        // … and general errors
+        worker.addEventListener('error', this.onerror.bind(this));
+
+        // Await initialisation. This will also nicely error out if the WASM
+        // runtime fails to load.
+        await call('initialize', this.options);
+
+        /**
+         * Little wrapper around the message passing api of Worker to make it
+         * easy to await a response to a sent message. This wraps the worker in
+         * a Proxy so you can treat it as if it is an instance of the
+         * TranslationWorker class that lives inside the worker. All function
+         * calls to it are transparently passed through the message passing
+         * channel.
+         */
+        return {
+            worker,
+            exports: new Proxy({}, {
+                get(target, name, receiver) {
+                    // Prevent this object from being marked "then-able"
+                    if (name !== 'then')
+                        return (...args) => call(name, ...args);
+                }
+            })
+        };
+    }
+
+    /**
+     * Loads the model registry. Uses the registry shipped with this extension,
+     * but formatted a bit easier to use, and future-proofed to be swapped out
+     * with a TranslateLocally type registry.
+     * @return {Promise<{
+     *   from: string,
+     *   to: string,
+     *   files: {
+     *     [part:string]: {
+     *       name: string,
+     *       size: number,
+     *       expectedSha256Hash: string
+     *     }
+     *   }[]
+     * }>}
+     */
+    async loadModelRegistery() {
+        const response = await fetch(this.registryUrl, {credentials: 'omit'});
+        const registry = await response.json();
+
+        // Add 'from' and 'to' keys for each model.
+        return Array.from(Object.entries(registry), ([key, files]) => {
+            return {
+                from: key.substring(0, 2),
+                to: key.substring(2, 4),
+                files
+            }
+        });
+    }
+
+    /**
+     * Gets or loads translation model data. Caching wrapper around
+     * `loadTranslationModel()`.
+     * @param {{from:string, to:string}}
+     * @return {Promise<{
+     *   model: ArrayBuffer,
+     *   vocab: ArrayBuffer,
+     *   shortlist: ArrayBuffer,
+     *   qualityModel: ArrayBuffer?
+     * }>}
+     */
+    getTranslationModel({from, to}, options) {
+        const key = JSON.stringify({from, to});
+
+        if (!this.buffers.has(key)) {
+            const promise = this.loadTranslationModel({from, to}, options);
+
+            // set the promise so we return the same promise when its still pending
+            this.buffers.set(key, promise);
+
+            // But if loading fails, remove the promise again so we can try again later
+            promise.catch(err => this.buffers.delete(key))
+        }
+
+        return this.buffers.get(key);
+    }
+
+    /**
+     * Downloads a translation model and returns a set of
+     * ArrayBuffers. These can then be passed to a TranslationWorker thread
+     * to instantiate a TranslationModel inside the WASM vm.
+     * @param {{from:string, to:string}}
+     * @param {{signal:AbortSignal?}?}
+     * @return {Promise<{
+     *   model: ArrayBuffer,
+     *   vocab: ArrayBuffer,
+     *   shortlist: ArrayBuffer,
+     *   qualityModel: ArrayBuffer?
+     *   config: string?
+     * }>}
+     */
+    async loadTranslationModel({from, to}, options) {
+        performance.mark(`loadTranslationModule.${JSON.stringify({from, to})}`);
+
+        // Find that model in the registry which will tell us about its files
+        const entries = (await this.registry).filter(model => model.from == from && model.to == to);
+
+        if (!entries)
+            throw new Error(`No model for '${from}' -> '${to}'`);
+
+        const files = entries[0].files;
+
+        const abort = () => reject(new CancelledError('abort signal'));
+
+        // Promise that resolves (or rejects really) when the abort signal hits
+        const escape = new Promise((accept, reject) => {
+            if (options?.signal)
+                options.signal.addEventListener('abort', abort);
+        });
+
+        // Download all files mentioned in the registry entry. Race the promise
+        // of all fetch requests, and a promise that rejects on the abort signal
+        const buffers = Object.fromEntries(await Promise.race([
+            Promise.all(Object.entries(files).map(async ([part, file]) => {
+                // Special case where qualityModel is not part of the model, and this
+                // should also catch the `config` case.
+                if (file === undefined || file.name === undefined)
+                    return [part, null];
+
+                try {
+                    return [part, await this.fetch(file.name, file.expectedSha256Hash, options)];
+                } catch (cause) {
+                    throw new Error(`Could not fetch ${file.name} for ${from}->${to} model`, {cause});
+                }
+            })),
+            escape
+        ]));
+
+        // Nothing to abort now, clean up abort promise
+        if (options?.signal)
+            options.signal.removeEventListener('abort', abort);
+
+        performance.measure('loadTranslationModel', `loadTranslationModule.${JSON.stringify({from, to})}`);
+
+        let vocabs = [];
+
+        if (buffers.vocab)
+            vocabs = [buffers.vocab]
+        else if (buffers.trgvocab && buffers.srcvocab)
+            vocabs = [buffers.srcvocab, buffers.trgvocab]
+        else
+            throw new Error(`Could not identify vocab files for ${from}->${to} model among: ${Array.from(Object.keys(files)).join(' ')}`);
+
+        let config = {};
+
+        // For the Ukrainian models we need to override the gemm-precision
+        if (files.model.name.endsWith('intgemm8.bin'))
+            config['gemm-precision'] = 'int8shiftAll';
+
+        // If quality estimation is used, we need to turn off skip-cost. Turning
+        // this off causes quite the slowdown.
+        if (files.qualityModel)
+            config['skip-cost'] = false;
+
+        // Allow the registry to also specify marian configuration parameters
+        if (files.config)
+            Object.assign(config, files.config);
+
+        // Translate to generic bergamot-translator format that also supports
+        // separate vocabularies for input & output language, and calls 'lex'
+        // a more descriptive 'shortlist'.
+        return {
+            model: buffers.model,
+            shortlist: buffers.lex,
+            vocabs,
+            qualityModel: buffers.qualityModel,
+            config
+        };
+    }
+
+    /**
+     * Helper to download file from the web. Verifies the checksum.
+     * @param {string} url
+     * @param {string?} checksum sha256 checksum as hexadecimal string
+     * @param {{signal:AbortSignal}?} extra fetch options
+     * @returns {Promise<ArrayBuffer>}
+     */
+    async fetch(url, checksum, extra) {
+        // Rig up a timeout cancel signal for our fetch
+        const controller = new AbortController();
+        const abort = () => controller.abort();
+
+        const timeout = this.downloadTimeout ? setTimeout(abort, this.downloadTimeout) : null;
+
+        try {
+            // Also maintain the original abort signal
+            if (extra?.signal)
+                extra.signal.addEventListener('abort', abort);
+
+            const options = {
+                credentials: 'omit',
+                signal: controller.signal,
+            };
+
+            if (checksum)
+                options['integrity'] = `sha256-${this.hexToBase64(checksum)}`;
+
+            // Disable the integrity check for NodeJS because of
+            // https://github.com/nodejs/undici/issues/1594
+            if (typeof window === 'undefined')
+                delete options['integrity'];
+
+            // Start downloading the url, using the hex checksum to ask
+            // `fetch()` to verify the download using subresource integrity 
+            const response = await fetch(url, options);
+
+            // Finish downloading (or crash due to timeout)
+            return await response.arrayBuffer();
+
+        } finally {
+            if (timeout)
+                clearTimeout(timeout);
+
+            if (extra?.signal)
+                extra.signal.removeEventListener('abort', abort);
+        }
+    }
+
+    /**
+     * Converts the hexadecimal hashes from the registry to something we can use with
+     * the fetch() method.
+     */
+    hexToBase64(hexstring) {
+        return btoa(hexstring.match(/\w{2}/g).map(function(a) {
+            return String.fromCharCode(parseInt(a, 16));
+        }).join(""));
+    }
+
+    /**
+     * Crappy named method that gives you a list of models to translate from
+     * one language into the other. Generally this will be the same as you
+     * just put in if there is a direct model, but it could return a list of
+     * two models if you need to pivot through a third language.
+     * Returns just [{from:str,to:str}...]. To be used something like this:
+     * ```
+     * const models = await this.getModels(from, to);
+     * models.forEach(({from, to}) => {
+     *   const buffers = await this.loadTranslationModel({from,to});
+     *   [TranslationWorker].loadTranslationModel({from,to}, buffers)
+     * });
+     * ```
+     * @returns {Promise<TranslationModel[]>}
+     */
+    getModels({from, to}) {
+        const key = JSON.stringify({from, to});
+
+        // Note that the `this.models` map stores Promises. This so that
+        // multiple calls to `getModels` that ask for the same model will
+        // return the same promise, and the actual lookup is only done once.
+        // The lookup is async because we need to await `this.registry`
+        if (!this.models.has(key))
+            this.models.set(key, this.findModels(from, to));
+
+        return this.models.get(key);
+    }
+
+    /**
+     * Find model (or model pair) to translate from `from` to `to`.
+     * @param {string} from
+     * @param {string} to
+     * @returns {Promise<TranslationModel[]>}
+     */
+    async findModels(from, to) {
+        const registry = await this.registry;
+
+        let direct = [], outbound = [], inbound = [];
+
+        registry.forEach(model => {
+            if (model.from === from && model.to === to)
+                direct.push(model);
+            else if (model.from === from && model.to === this.pivotLanguage)
+                outbound.push(model);
+            else if (model.to === to && model.from === this.pivotLanguage)
+                inbound.push(model);
+        });
+
+        if (direct.length)
+            return [direct[0]];
+
+        if (outbound.length && inbound.length)
+            return [outbound[0], inbound[0]];
+
+        throw new Error(`No model available to translate from '${from}' to '${to}'`);
+    }
+}
+
+/**
+ * Translator balancing between throughput and latency. Can use multiple worker
+ * threads.
+ */
+export class BatchTranslator {
+    /**
+     * @param {{
+     *  cacheSize?: number,
+     *  useNativeIntGemm?: boolean,
+     *  workers?: number,
+     *  batchSize?: number,
+     *  downloadTimeout?: number,
+     *  workerUrl?: string,
+     *  registryUrl?: string
+     *  pivotLanguage?: string?
+     * }} options
+     */
+    constructor(options, backing) {
+        if (!backing)
+            backing = new TranslatorBacking(options);
+
+        this.backing = backing;
+
+        /**
+         * @type {Array<{idle:Boolean, worker:Proxy}>} List of active workers
+         * (and a flag to mark them idle or not)
+         */
+        this.workers = [];
+
+        /**
+         * Maximum number of workers
+         * @type {number} 
+         */
+        this.workerLimit = Math.max(options?.workers || 0, 1);
+
+        /**
+         * List of batches we push() to & shift() from using `enqueue`.
+         * @type {{
+         *    id: number,
+         *    key: string,
+         *    priority: number,
+         *    models: TranslationModel[],
+         *    requests: Array<{
+         *      request: TranslationRequest,
+         *      resolve: (response: TranslationResponse),
+         *      reject: (error: Error)
+         *    }>
+         * }}
+         */
+        this.queue = [];
+
+        /**
+         * batch serial to help keep track of batches when debugging
+         * @type {Number}
+         */
+        this.batchSerial = 0;
+
+        /**
+         * Number of requests in a batch before it is ready to be translated in
+         * a single call. Bigger is better for throughput (better matrix packing)
+         * but worse for latency since you'll have to wait for the entire batch
+         * to be translated.
+         * @type {Number}
+         */
+        this.batchSize = Math.max(options?.batchSize || 8, 1);
+
+        this.onerror = options?.onerror || (err => console.error('WASM Translation Worker error:', err));
+    }
+    
+    /**
+     * Destructor that stops and cleans up.
+     */
+    async delete() {
+        // Empty the queue
+        this.remove(() => true);
+
+        // Terminate the workers
+        this.workers.forEach(({worker}) => worker.terminate());
+    }
+
+    /**
+     * Makes sure queued work gets send to a worker. Will delay it till `idle`
+     * to make sure the batches have been filled to some degree. Will keep
+     * calling itself as long as there is work in the queue, but it does not
+     * hurt to call it multiple times. This function always returns immediately.
+     */
+    notify() {
+        setTimeout(async () => {
+            // Is there work to be done?
+            if (!this.queue.length)
+                return;
+
+            // Find an idle worker
+            let worker = this.workers.find(worker => worker.idle);
+
+            // No worker free, but space for more?
+            if (!worker && this.workers.length < this.workerLimit) {
+                try {
+                    // Claim a place in the workers array (but mark it busy so
+                    // it doesn't get used by any other `notify()` calls).
+                    const placeholder = {idle: false};
+                    this.workers.push(placeholder);
+
+                    // adds `worker` and `exports` props
+                    Object.assign(placeholder, await this.backing.loadWorker());
+
+                    // At this point we know our new worker will be usable.
+                    worker = placeholder;
+                } catch (e) {
+                    this.onerror(new Error(`Could not initialise translation worker: ${e.message}`));
+                }
+            }
+
+            // If no worker, that's the end of it.
+            if (!worker)
+                return;
+
+            // Up to this point, this function has not used await, so no
+            // chance that another call stole our batch since we did the check
+            // at the beginning of this function and JavaScript is only
+            // cooperatively parallel.
+            const batch = this.queue.shift();
+
+            // Put this worker to work, marking as busy
+            worker.idle = false;
+            try {
+                await this.consumeBatch(batch, worker.exports);
+            } catch (e) {
+                batch.requests.forEach(({reject}) => reject(e));
+            }
+            worker.idle = true;
+
+            // Is there more work to be done? Do another idleRequest
+            if (this.queue.length)
+                this.notify();
+        });
+    }
+
+    /**
+     * The only real public call you need!
+     * ```
+     * const {target: {text:string}} = await this.translate({
+     *   from: 'de',
+     *   to: 'en',
+     *   text: 'Hallo Welt!',
+     *   html: false, // optional
+     *   priority: 0 // optional, like `nice` lower numbers are translated first
+     * })
+     * ```
+     * @param {TranslationRequest} request
+     * @returns {Promise<TranslationResponse>}
+     */
+    translate(request) {
+        const {from, to, priority} = request;
+
+        return new Promise(async (resolve, reject) => {
+            try {
+                // Batching key: only requests with the same key can be batched
+                // together. Think same translation model, same options.
+                const key = JSON.stringify({from, to});
+
+                // (Fetching models first because if we would do it between looking
+                // for a batch and making a new one, we end up with a race condition.)
+                const models = await this.backing.getModels(request);
+                
+                // Put the request and its callbacks into a fitting batch
+                this.enqueue({key, models, request, resolve, reject, priority});
+
+                // Tell a worker to pick up the work at some point.
+                this.notify();
+            } catch (e) {
+                reject(e);
+            }
+        });
+    }
+
+    /**
+     * Prune pending requests by testing each one of them to whether they're
+     * still relevant. Used to prune translation requests from tabs that got
+     * closed.
+     * @param {(request:TranslationRequest) => boolean} filter evaluates to true if request should be removed
+     */
+    remove(filter) {
+        const queue = this.queue;
+
+        this.queue = [];
+
+        queue.forEach(batch => {
+            batch.requests.forEach(({request, resolve, reject}) => {
+                if (filter(request)) {
+                    // Add error.request property to match response.request for
+                    // a resolve() callback. Pretty useful if you don't want to
+                    // do all kinds of Funcion.bind() dances.
+                    reject(Object.assign(new CancelledError('removed by filter'), {request}));
+                    return;
+                }
+
+                this.enqueue({
+                    key: batch.key,
+                    priority: batch.priority,
+                    models: batch.models,
+                    request,
+                    resolve,
+                    reject
+                });
+            });
+        });
+    }
+
+    /**
+     * Internal function used to put a request in a batch that still has space.
+     * Also responsible for keeping the batches in order of priority. Called by
+     * `translate()` but also used when filtering pending requests.
+     * @param {{request:TranslateRequest, models:TranslationModel[], key:String, priority:Number?, resolve:(TranslateResponse)=>any, reject:(Error)=>any}}
+     */
+    enqueue({key, models, request, resolve, reject, priority}) {
+        if (priority === undefined)
+            priority = 0;
+         // Find a batch in the queue that we can add to
+         // (TODO: can we search backwards? that would speed things up)
+        let batch = this.queue.find(batch => {
+            return batch.key === key
+                && batch.priority === priority
+                && batch.requests.length < this.batchSize
+        });
+
+        // No batch or full batch? Queue up a new one
+        if (!batch) {
+            batch = {id: ++this.batchSerial, key, priority, models, requests: []};
+            this.queue.push(batch);
+            this.queue.sort((a, b) => a.priority - b.priority);
+        }
+
+        batch.requests.push({request, resolve, reject});
+    }
+
+    /**
+     * Internal method that uses a worker thread to process a batch. You can
+     * wait for the batch to be done by awaiting this call. You should only
+     * then reuse the worker otherwise you'll just clog up its message queue.
+     */
+    async consumeBatch(batch, worker) {
+        performance.mark('BergamotBatchTranslator.start');
+
+        // Make sure the worker has all necessary models loaded. If not, tell it
+        // first to load them.
+        await Promise.all(batch.models.map(async ({from, to}) => {
+            if (!await worker.hasTranslationModel({from, to})) {
+                const buffers = await this.backing.getTranslationModel({from, to});
+                await worker.loadTranslationModel({from, to}, buffers);
+            }
+        }));
+
+        // Call the worker to translate. Only sending the actually necessary
+        // parts of the batch to avoid trying to send things that don't survive
+        // the message passing API between this thread and the worker thread.
+        const responses = await worker.translate({
+            models: batch.models.map(({from, to}) => ({from, to})),
+            texts: batch.requests.map(({request: {text, html, qualityScores}}) => ({
+                text: text.toString(),
+                html: !!html,
+                qualityScores: !!qualityScores
+            }))
+        });
+
+        // Responses are in! Connect them back to their requests and call their
+        // callbacks.
+        batch.requests.forEach(({request, resolve, reject}, i) => {
+            // TODO: look at response.ok and reject() if it is false
+            resolve({
+                request, // Include request for easy reference? Will allow you
+                         // to specify custom properties and use that to link
+                         // request & response back to each other.
+                ...responses[i] // {target: {text: String}}
+            });
+        });
+        
+        performance.measure('BergamotBatchTranslator', 'BergamotBatchTranslator.start');
+    }
+}
+
+
+/**
+ * Translator optimised for interactive use.
+ */
+export class LatencyOptimisedTranslator {
+    /**
+     * @type {TranslatorBacking}
+     */
+    backing;
+
+    /**
+     * @type {Promise<{idle:boolean, worker:Worker, exports:Proxy<TranslationWorker>}>}
+     */
+    worker;
+
+    /**
+     * @type {{request: TranslationRequest, accept:(TranslationResponse), reject:(Error)} | null}
+     */
+    pending;
+
+    /**
+     * @param {{
+     *  cacheSize?: number,
+     *  useNativeIntGemm?: boolean,
+     *  downloadTimeout?: number,
+     *  workerUrl?: string,
+     *  registryUrl?: string
+     *  pivotLanguage?: string?
+     * }} options
+     */
+    constructor(options, backing) {
+        if (!backing)
+            backing = new TranslatorBacking(options);
+
+        this.backing = backing;
+
+        // Exposing the this.loadWorker() returned promise through this.worker
+        // so that you can use that to catch any errors that happened during
+        // loading.
+        this.worker = this.backing.loadWorker().then(worker => ({...worker, idle:true}));
+    }
+
+    /**
+     * Destructor that stops and cleans up.
+     */
+    async delete() {
+        // Cancel pending translation
+        if (this.pending) {
+            this.pending.reject(new CancelledError('translator got deleted'));
+            this.pending = null;
+        }
+
+        // Terminate the worker (I don't care if this fails)
+        try {
+            const {worker} = await this.worker;
+            worker.terminate();
+        } finally {
+            this.worker = null;
+        }
+    }
+    
+    /**
+     * Sets `request` as the next translation to process. If there was already
+     * a translation waiting to be processed, their promise is rejected with a
+     * SupersededError.
+     * @param {TranslationRequest} request
+     * @return {Promise<TranslationResponse>}
+     */
+    translate(request, options) {
+        if (this.pending)
+            this.pending.reject(new SupersededError());
+        
+        return new Promise((accept, reject) => {
+            const pending = {request, accept, reject, options};
+
+            if (options?.signal) {
+                options.signal.addEventListener('abort', e => {
+                    reject(new CancelledError('abort signal'));
+                    if (this.pending === pending)
+                        this.pending = null;
+                });
+            }
+
+            this.pending = pending;
+            this.notify();
+        });
+    }
+    
+    notify() {
+        setTimeout(async () => {
+            if (!this.pending)
+                return;
+
+            // Catch errors such as the worker not working
+            try {
+                // Possibly wait for the worker to finish loading. After it loaded
+                // these calls are pretty much instantaneous.
+                const worker = await this.worker;
+
+                // Is another notify() call hogging the worker? Then stop.
+                if (!worker.idle)
+                    return;
+
+                // Claim the pending translation request.
+                const {request, accept, reject, options} = this.pending;
+                this.pending = null;
+
+                // Mark the worker as occupied
+                worker.idle = false;
+                    
+                try {
+                    const models = await this.backing.getModels(request)
+
+                    await Promise.all(models.map(async ({from, to}) => {
+                        if (!await worker.exports.hasTranslationModel({from, to})) {
+                            const buffers = await this.backing.getTranslationModel({from, to}, {signal: options?.signal});
+                            await worker.exports.loadTranslationModel({from, to}, buffers);
+                        }
+                    }));
+
+                    const {text, html, qualityScores} = request;
+                    const responses = await worker.exports.translate({
+                        models: models.map(({from,to}) => ({from, to})),
+                        texts: [{text, html, qualityScores}]
+                    });
+
+                    accept({request, ...responses[0]});
+                } catch (e) {
+                    reject(e);
+                }
+
+                worker.idle = true;
+
+                // Is there more work to be done? Do another idleRequest
+                if (this.pending)
+                    this.notify();
+            } catch (e) {
+                this.backing.onerror(e);
+            }
+        });
+    }
+}
--- a/wasm/module/worker/package.json
+++ b/wasm/module/worker/package.json
@ -0,0 +1,3 @@
+{
+	"type": "commonjs"
+}
--- a/wasm/module/worker/translator-worker.js
+++ b/wasm/module/worker/translator-worker.js
@ -0,0 +1,475 @@
+/**
+ * Wrapper around the dirty bits of Bergamot's WASM bindings.
+ */
+
+// Global because importScripts is global.
+var Module = {};
+
+/**
+ * node.js compatibility: Fake GlobalWorkerScope that emulates being inside a
+ * WebWorker
+ */
+if (typeof self === 'undefined') {
+    global.Module = Module;
+
+    global.self = new class GlobalWorkerScope {
+        /** @type {import("node:worker_threads").MessagePort} */
+        #port;
+
+        constructor() {
+            const {parentPort} = require(/* webpackIgnore: true */ 'node:worker_threads');
+            this.#port = parentPort;
+        }
+
+        /**
+         * Add event listener to listen for messages posted to the worker.
+         * @param {string} eventName
+         * @param {(object)} callback
+         */
+        addEventListener(eventName, callback) {
+            this.#port.on(eventName, (data) => callback({data}));
+        }
+
+        /**
+         * Post message outside, to the owner of the Worker.
+         * @param {any} message
+         */
+        postMessage(message) {
+            this.#port.postMessage(message);
+        }
+
+        /**
+         * @param {...string} scripts - Paths to scripts to import in that order
+         */
+        importScripts(...scripts) {
+            const {readFileSync} = require(/* webpackIgnore: true */ 'node:fs');
+            const {join} = require(/* webpackIgnore: true */ 'node:path');
+            for (let pathname of scripts) {
+                const script = readFileSync(join(__dirname, pathname), {encoding: 'utf-8'});
+                eval.call(global, script);
+            }
+        }
+
+        /**
+         * Adds support for local file urls. Assumes anything that doesn't start
+         * with "http" to be a local path.
+         * @param {string} url - path or url
+         * @param {object?} options - See `fetch()` options
+         * @return {Promise<Response>}
+         */
+        async fetch(url, options) {
+            if (url.protocol === 'file:') {
+                const {readFile} = require(/* webpackIgnore: true */ 'node:fs/promises');
+                const buffer = await readFile(url.pathname);
+                const blob = new Blob([buffer]);
+                return new Response(blob, {
+                    status: 200,
+                    statusText: 'OK',
+                    headers: {
+                        'Content-Type': 'application/wasm',
+                        'Content-Length': blob.size.toString()
+                    }
+                });
+            }
+
+            return await fetch(url, options);
+        }
+
+        get location() {
+            return new URL(`file://${__filename}`);
+        }
+    }
+}
+
+class YAML {
+    /**
+     * Parses YAML into dictionary. Does not interpret types, all values are a
+     * string or a list of strings. No support for objects other than the top
+     * level.
+     * @param {string} yaml
+     * @return {{[string]: string | string[]}}
+     */
+    static parse(yaml) {
+        const out = {};
+
+        yaml.split('\n').reduce((key, line, i) => {
+            let match;
+            if (match = line.match(/^\s*-\s+(.+?)$/)) {
+                if (!Array.isArray(out[key]))
+                    out[key] = out[key].trim() ? [out[key]] : [];
+                out[key].push(match[1].trim());
+            }
+            else if (match = line.match(/^\s*([A-Za-z0-9_][A-Za-z0-9_-]*):\s*(.*)$/)) {
+                key = match[1];
+                out[key] = match[2].trim();
+            }
+            else if (!line.trim()) {
+                // whitespace, ignore
+            }
+            else {
+                throw Error(`Could not parse line ${i+1}: "${line}"`);
+            }
+            return key;
+        }, null);
+
+        return out;
+    }
+
+    /**
+     * Turns an object into a YAML string. No support for objects, only simple
+     * types and lists of simple types.
+     * @param {{[string]: string | number | boolean | string[]}} data
+     * @return {string}
+     */
+    static stringify(data) {
+        return Object.entries(data).reduce((str, [key, value]) => {
+            let valstr = '';
+            if (Array.isArray(value))
+                valstr = value.map(val => `\n  - ${val}`).join('');
+            else if (typeof value === 'number' || typeof value === 'boolean' || value.match(/^\d*(\.\d+)?$/))
+                valstr = `${value}`;
+            else
+                valstr = `${value}`; // Quote?
+
+            return `${str}${key}: ${valstr}\n`;
+        }, '');
+    }
+}
+
+/**
+ * Wrapper around the bergamot-translator exported module that hides the need
+ * of working with C++ style data structures and does model management.
+ */
+class BergamotTranslatorWorker {
+    /**
+     * Map of expected symbol -> name of fallback symbol for functions that can
+     * be swizzled for a faster implementation. Firefox Nightly makes use of
+     * this.
+     */
+    static GEMM_TO_FALLBACK_FUNCTIONS_MAP = {
+        'int8_prepare_a': 'int8PrepareAFallback',
+        'int8_prepare_b': 'int8PrepareBFallback',
+        'int8_prepare_b_from_transposed': 'int8PrepareBFromTransposedFallback',
+        'int8_prepare_b_from_quantized_transposed': 'int8PrepareBFromQuantizedTransposedFallback',
+        'int8_prepare_bias': 'int8PrepareBiasFallback',
+        'int8_multiply_and_add_bias': 'int8MultiplyAndAddBiasFallback',
+        'int8_select_columns_of_b': 'int8SelectColumnsOfBFallback'
+    };
+
+    /**
+     * Name of module exported by Firefox Nightly that exports an optimised
+     * implementation of the symbols mentioned above.
+     */
+    static NATIVE_INT_GEMM = 'mozIntGemm';
+
+    /**
+     * Empty because we can't do async constructors yet. It is the
+     * responsibility of whoever owns this WebWorker to call `initialize()`.
+     */
+    constructor(options) {}
+
+    /**
+     * Instantiates a new translation worker with optional options object.
+     * If this call succeeds, the WASM runtime is loaded and ready.
+     * 
+     * Available options are:
+     *   useNativeIntGemm: {true | false} defaults to false. If true, it will
+     *                     attempt to link to the intgemm module available in
+     *                     Firefox Nightly which makes translations much faster.
+     *          cacheSize: {Number} defaults to 0 which disables translation
+     *                     cache entirely. Note that this is a theoretical
+     *                     upper bound. In practice it will use about 1/3th of
+     *                     the cache specified here. 2^14 is not a bad starting
+     *                     value.
+     * @param {{useNativeIntGemm: boolean, cacheSize: number}} options
+     */
+    async initialize(options) {
+        this.options = options || {};
+        this.models = new Map(); // Map<str,Promise<TranslationModel>>
+        this.module = await this.loadModule();
+        this.service = await this.loadTranslationService();
+    }
+
+    /**
+     * Tries to load native IntGEMM module for bergamot-translator. If that
+     * fails because it or any of the expected functions is not available, it
+     * falls back to using the naive implementations that come with the wasm
+     * binary itself through `linkFallbackIntGemm()`.
+     * @param {{env: {memory: WebAssembly.Memory}}} info
+     * @return {{[method:string]: (...any) => any}}
+     */
+    linkNativeIntGemm(info) {
+        if (!WebAssembly['mozIntGemm']) {
+            console.warn('Native gemm requested but not available, falling back to embedded gemm');
+            return this.linkFallbackIntGemm(info);
+        }
+
+        const instance = new WebAssembly.Instance(WebAssembly['mozIntGemm'](), {
+            '': {memory: info['env']['memory']}
+        });
+
+        if (!Array.from(Object.keys(BergamotTranslatorWorker.GEMM_TO_FALLBACK_FUNCTIONS_MAP)).every(fun => instance.exports[fun])) {
+            console.warn('Native gemm is missing expected functions, falling back to embedded gemm');
+            return this.linkFallbackIntGemm(info);
+        }
+
+        return instance.exports;
+    }
+
+    /**
+     * Links intgemm functions that are already available in the wasm binary,
+     * but just exports them under the name that is expected by
+     * bergamot-translator.
+     * @param {{env: {memory: WebAssembly.Memory}}} info
+     * @return {{[method:string]: (...any) => any}}
+     */
+    linkFallbackIntGemm(info) {
+        const mapping = Object.entries(BergamotTranslatorWorker.GEMM_TO_FALLBACK_FUNCTIONS_MAP).map(([key, name]) => {
+            return [key, (...args) => Module['asm'][name](...args)]
+        });
+
+        return Object.fromEntries(mapping);
+    }
+
+    /**
+     * Internal method. Reads and instantiates the WASM binary. Returns a
+     * promise for the exported Module object that contains all the classes
+     * and functions exported by bergamot-translator.
+     * @return {Promise<BergamotTranslator>}
+     */
+    loadModule() {
+        return new Promise(async (resolve, reject) => {
+            try {
+                const response = await self.fetch(new URL('./bergamot-translator-worker.wasm', self.location));
+
+                Object.assign(Module, {
+                    instantiateWasm: (info, accept) => {
+                        try {
+                            WebAssembly.instantiateStreaming(response, {
+                                ...info,
+                                'wasm_gemm': this.options.useNativeIntGemm
+                                    ? this.linkNativeIntGemm(info)
+                                    : this.linkFallbackIntGemm(info)
+                            }).then(({instance}) => accept(instance)).catch(reject);
+                        } catch (err) {
+                            reject(err);
+                        }
+                        return {};
+                    },
+                    onRuntimeInitialized: () => {
+                        resolve(Module);
+                    }
+                });
+
+                // Emscripten glue code. Webpack et al. should not mangle the `Module` property name!
+                self.Module = Module;
+                self.importScripts('bergamot-translator-worker.js');
+            } catch (err) {
+                reject(err);
+            }
+        });
+    }
+
+    /**
+     * Internal method. Instantiates a BlockingService()
+     * @return {BergamotTranslator.BlockingService}
+     */
+    loadTranslationService() {
+        return new this.module.BlockingService({
+            cacheSize: Math.max(this.options.cacheSize || 0, 0)
+        });
+    }
+
+    /**
+     * Returns whether a model has already been loaded in this worker. Marked
+     * async because the message passing interface we use expects async methods.
+     * @param {{from:string, to:string}}
+     * @return boolean
+     */ 
+    hasTranslationModel({from,to}) {
+        const key = JSON.stringify({from,to});
+        return this.models.has(key);
+    }
+
+    /**
+     * Loads a translation model from a set of file buffers. After this, the
+     * model is available to translate with and `hasTranslationModel()` will
+     * return true for this pair.
+     * @param {{from:string, to:string}}
+     * @param {{
+     *   model: ArrayBuffer,
+     *   shortlist: ArrayBuffer,
+     *   vocabs: ArrayBuffer[],
+     *   qualityModel: ArrayBuffer?,
+     *   config?: {
+     *     [key:string]: string
+     *   }
+     * }} buffers
+     */ 
+    loadTranslationModel({from, to}, buffers) {
+        // This because service_bindings.cpp:prepareVocabsSmartMemories :(
+        const uniqueVocabs = buffers.vocabs.filter((vocab, index, vocabs) => {
+            return !vocabs.slice(0, index).includes(vocab);
+        });
+
+        const [modelMemory, shortlistMemory, qualityModel, ...vocabMemory] = [
+            this.prepareAlignedMemoryFromBuffer(buffers.model, 256),
+            this.prepareAlignedMemoryFromBuffer(buffers.shortlist, 64),
+            buffers.qualityModel // optional quality model
+                ? this.prepareAlignedMemoryFromBuffer(buffers.qualityModel, 64)
+                : null,
+            ...uniqueVocabs.map(vocab => this.prepareAlignedMemoryFromBuffer(vocab, 64))
+        ];
+
+        const vocabs = new this.module.AlignedMemoryList();
+        vocabMemory.forEach(vocab => vocabs.push_back(vocab));
+
+        // Defaults
+        let modelConfig = YAML.parse(`
+            beam-size: 1
+            normalize: 1.0
+            word-penalty: 0
+            cpu-threads: 0
+            gemm-precision: int8shiftAlphaAll
+            skip-cost: true
+        `);
+
+        if (buffers.config)
+            Object.assign(modelConfig, buffers.config);
+
+        // WASM marian is only compiled with support for shiftedAll.
+        if (modelConfig['gemm-precision'] === 'int8')
+            modelConfig['gemm-precision'] = 'int8shiftAll';
+
+        // Override these
+        Object.assign(modelConfig, YAML.parse(`
+            alignment: soft
+            quiet: true
+            quiet-translation: true
+            max-length-break: 128
+            mini-batch-words: 1024
+            workspace: 128
+            max-length-factor: 2.0
+        `));
+
+        const key = JSON.stringify({from,to});
+        this.models.set(key, new this.module.TranslationModel(YAML.stringify(modelConfig), modelMemory, shortlistMemory, vocabs, qualityModel));
+    }
+
+    /**
+     * Frees up memory used by old translation model. Does nothing if model is
+     * already deleted.
+     * @param {{from:string, to:string}}
+     */
+    freeTranslationModel({from, to}) {
+        const key = JSON.stringify({from,to});
+        
+        if (!this.models.has(key))
+            return;
+        
+        const model = this.models.get(key);
+        this.models.delete(key);
+
+        model.delete();
+    }
+
+    /**
+     * Internal function. Copies the data from an ArrayBuffer into memory that
+     * can be used inside the WASM vm by Marian.
+     * @param {{ArrayBuffer}} buffer
+     * @param {number} alignmentSize
+     * @return {BergamotTranslator.AlignedMemory}
+     */
+    prepareAlignedMemoryFromBuffer(buffer, alignmentSize) {
+        const bytes = new Int8Array(buffer);
+        const memory = new this.module.AlignedMemory(bytes.byteLength, alignmentSize);
+        memory.getByteArrayView().set(bytes);
+        return memory;
+    }
+
+    /**
+     * Public. Does actual translation work. You have to make sure that the
+     * models necessary for translating text are already loaded before calling
+     * this method. Returns a promise with translation responses.
+     * @param {{models: {from:string, to:string}[], texts: {text: string, html: boolean}[]}}
+     * @return {Promise<{target: {text: string}}[]>}
+     */
+    translate({models, texts}) {
+        // Convert texts array into a std::vector<std::string>.
+        let input = new this.module.VectorString();
+        texts.forEach(({text}) => input.push_back(text));
+
+        // Extracts the texts[].html options into ResponseOption objects
+        let options = new this.module.VectorResponseOptions();
+        texts.forEach(({html, qualityScores}) => options.push_back({alignment: false, html, qualityScores}));
+
+        // Turn our model names into a list of TranslationModel pointers
+        const translationModels = models.map(({from,to}) => {
+            const key = JSON.stringify({from,to});
+            return this.models.get(key);
+        });
+
+        // translate the input, which is a vector<String>; the result is a vector<Response>
+        const responses = models.length > 1
+            ? this.service.translateViaPivoting(...translationModels, input, options)
+            : this.service.translate(...translationModels, input, options);
+        
+        input.delete();
+        options.delete();
+
+        // Convert the Response WASM wrappers into native JavaScript types we
+        // can send over the 'wire' (message passing) in the same format as we
+        // use in bergamot-translator.
+        const translations = texts.map((_, i) => ({
+            target: {
+                text: responses.get(i).getTranslatedText()
+            }
+        }));
+
+        responses.delete();
+
+        return translations;
+    }
+}
+
+/**
+ * Because you can't put an Error object in a message. But you can post a
+ * generic object!
+ * @param {Error} error
+ * @return {{
+ *  name: string?,
+ *  message: string?,
+ *  stack: string?
+ * }}
+ */
+function cloneError(error) {
+    return {
+        name: error.name,
+        message: error.message,
+        stack: error.stack
+    };
+}
+
+// (Constructor doesn't really do anything, we need to call `initialize()`
+// first before using it. That happens from outside the worker.)
+const worker = new BergamotTranslatorWorker();
+
+self.addEventListener('message', async function({data: {id, name, args}}) {
+    if (!id)
+        console.error('Received message without id', arguments[0]);
+
+    try {
+        if (typeof worker[name] !== 'function')
+            throw TypeError(`worker[${name}] is not a function`);
+
+        // Using `Promise.resolve` to await any promises that worker[name]
+        // possibly returns.
+        const result = await Promise.resolve(Reflect.apply(worker[name], worker, args));
+        self.postMessage({id, result});
+    } catch (error) {
+        self.postMessage({
+            id,
+            error: cloneError(error)
+        })
+    }
+});
--- a/wasm/test_page/css/index.css
+++ b/wasm/test_page/css/index.css
@ -14,20 +14,33 @@ body {
  padding: 1rem;
 }

+[hidden] {
+  display: none;
+}
+
 .app {
  padding: 1rem;
  display: grid;
-  grid: "from swap to" 1fr "status status status" auto / 1fr auto 1fr;
+  grid: "from swap to" auto "credits credits credits" min-content / 1fr auto 1fr;
  grid-gap: 1rem;
  overflow: hidden;
-  min-height: 400px;
+  min-height: 100%;
  max-width: 1024px;
-  margin: 1em auto;
+  margin: 0 auto;
+}
+
+.swap::before {
+  display: inline-block;
+  content: '↔️';
 }

@media screen and (max-width: 640px) {
  .app {
-    grid: "from from" auto "status swap" auto "to to" auto / 1fr;
+    grid: "from from" auto "swap swap" auto "to to" auto "credits credits" auto / 1fr;
+  }
+
+  .swap::before {
+    content: '↕️';
  }
 }

@ -35,6 +48,8 @@ body {
  display: grid;
  grid-template-rows: auto 1fr;
  grid-gap: 1rem;
+  max-height: 100%;
+  overflow: hidden;
 }

 label {
@ -67,19 +82,25 @@ label {
  font-size: 1.1rem;
 }

-#status {
-  grid-area: status;
-  text-align: center;
-  align-self: center;
+.credits {
+  grid-area: credits;
 }

-textarea, .output-area {
+.credits img {
+  float: left;
+  margin: 1em 0;
+}
+
+textarea, [contenteditable], .output-area {
  padding: 1rem;
  font-family: sans-serif;
  font-size: 1rem;
  resize: none;
  border-radius: 2px;
  border: 1px solid #ccc;
+  min-height: 100px;
+  max-height: 100%;
+  overflow: auto;
 }

 button {
@ -96,6 +117,7 @@ button:hover {

 #output {
  background-color: #f4f4f4;
+  position: relative;
 }

 .output-area [x-bergamot-word-score].bad {
@ -115,4 +137,32 @@ button:hover {

 .output-area [x-bergamot-sentence-index].highlight-sentence {
  background: rgba(255, 255, 128, 0.8);
-}
+}
+
+.app.translating #output::after {
+  position: absolute;
+  bottom: 4px;
+  right: 4px;
+  content: 'Translating…';
+}
+
+/* Loading indicator takes priority, so below the .translating selector */
+.app.loading #output::after {
+  position: absolute;
+  bottom: 4px;
+  right: 4px;
+  content: 'Loading translation model…';
+}
+
+.app {
+  position: relative;
+}
+
+#unsupported-browser {
+  position: absolute;
+  top: 0;
+  left: 0;
+  width: 100%;
+  height: 100%;
+  background: white;
+}
--- a/wasm/test_page/index.html
+++ b/wasm/test_page/index.html
@ -1,7 +1,7 @@
 <!DOCTYPE html>
 <html>
  <head>
-    <title>Mozilla Translations</title>
+    <title>Bergamot Translations</title>
    <link rel="stylesheet" href="css/index.css" />
    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
    <meta
@ -16,9 +16,9 @@
          From
          <select id="lang-from" name="from" class="lang-select"></select>
        </label>
-        <textarea id="input" name="input"></textarea>
+        <div id="input" contenteditable="true"></div>
      </div>
-      <button class="swap" title="swap">↔️</button>
+      <button class="swap" title="swap"></button>
      <div class="panel panel--to">
        <label>
          To
@ -26,8 +26,16 @@
        </label>
        <div id="output" class="output-area"></div>
      </div>
-      <div class="footer" id="status"></div>
+      <div id="unsupported-browser" hidden>
+        <p>Your CPU or browser is not able to run Bergamot translator.</p>
+        <p>Try using Firefox or a Chromium based browser with <a href="https://webassembly.org/roadmap/">Fixed-width SIMD support</a>.</p>
+        <p>If you already are, you might be using a CPU that does not have support for SSE4.1 instructions.</p>
+      </div>
+      <footer class="credits">
+        <img src="logos.png" alt="Logos of the OPUS project, the Bergamot project and the European Union.">
+        <p>This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825303.</p>
+      </footer>
    </div>
-    <script src="js/index.js"></script>
+    <script type="module" src="js/index.js"></script>
  </body>
 </html>
--- a/wasm/test_page/js/index.js
+++ b/wasm/test_page/js/index.js
@ -1,156 +1,215 @@
-let worker;
-let modelRegistry;
+import {LatencyOptimisedTranslator, TranslatorBacking, CancelledError, SupersededError} from '../node_modules/@browsermt/bergamot-translator/translator.js';

-const $ = selector => document.querySelector(selector);
-const status = message => ($("#status").innerText = message);
-
-const langFrom = $("#lang-from");
-const langTo = $("#lang-to");
-
-if (window.Worker) {
-  worker = new Worker("js/worker.js");
-  worker.postMessage(["import"]);
+function $(selector) {
+  return document.querySelector(selector);
 }

-document.querySelector("#input").addEventListener("keyup", function (event) {
-  translateCall();
-});
+function $$(selector) {
+  return document.querySelectorAll(selector);
+}

-const _prepareTranslateOptions = (paragraphs) => {
-  const translateOptions = [];
-  paragraphs.forEach(paragraph => {
-    // Each option object can be different for each entry. But to keep the test page simple,
-    // we just keep all the options same (specifically avoiding parsing the input to determine
-    // html/non-html text)
-    translateOptions.push({"isQualityScores": true, "isHtml": true});
-  });
-  return translateOptions;
-};
-
-const textToHTML = (text) => {
+function encodeHTML(text) {
  const div = document.createElement('div');
  div.appendChild(document.createTextNode(text));
  return div.innerHTML;
-};
+}

-const translateCall = () => {
-  const text = document.querySelector("#input").value;
-  if (!text.trim().length) return;
-
-  const paragraphs = text.split(/\n+/).map(textToHTML); // escape HTML 
-  const translateOptions = _prepareTranslateOptions(paragraphs);
-  const lngFrom = langFrom.value;
-  const lngTo = langTo.value;
-  worker.postMessage(["translate", lngFrom, lngTo, paragraphs, translateOptions]);
-};
-
-const addQualityClasses = (root) => {
-  // You can do this wit CSS variables, calc() and min/max, but JS is just easier
-
-  root.querySelectorAll('[x-bergamot-sentence-score]').forEach(el => {
+function addQualityIndicators() {
+  $$('#output [x-bergamot-sentence-score]').forEach(el => {
    // The threshold is ln(0.5) (https://github.com/browsermt/bergamot-translator/pull/370#issuecomment-1058123399)
-    el.classList.toggle('bad', parseFloat(el.getAttribute('x-bergamot-sentence-score')) < -0.6931);
+    el.classList.toggle('bad', parseFloat(el.getAttribute('x-bergamot-sentence-score')) < Math.log(0.5));
  });

-  root.querySelectorAll('[x-bergamot-word-score]').forEach(el => {
+  $$('#output [x-bergamot-word-score]').forEach(el => {
    // The threshold is ln(0.5) (https://github.com/browsermt/bergamot-translator/pull/370#issuecomment-1058123399)
-    el.classList.toggle('bad', parseFloat(el.getAttribute('x-bergamot-word-score')) < -0.6931);
+    el.classList.toggle('bad', parseFloat(el.getAttribute('x-bergamot-word-score')) < Math.log(0.5));
  });

  // Add tooltips to each (sub)word with sentence and word score.
-  root.querySelectorAll('[x-bergamot-sentence-score] > [x-bergamot-word-score]').forEach(el => {
+  $$('#output [x-bergamot-sentence-score] > [x-bergamot-word-score]').forEach(el => {
    const sentenceScore = parseFloat(el.parentNode.getAttribute('x-bergamot-sentence-score'));
    const wordScore = parseFloat(el.getAttribute('x-bergamot-word-score'));
-    el.title = `Sentence: ${sentenceScore}  Word: ${wordScore}`;
+    el.title = `Sentence: ${Math.exp(sentenceScore).toFixed(2)}  Word: ${Math.exp(wordScore).toFixed(2)}`;
  });
 }

-worker.onmessage = function (e) {
-  if (e.data[0] === "translate_reply" && e.data[1]) {
-    // Clear output of previous translation
-    document.querySelector("#output").innerHTML = '';
-
-    // Add each translation in its own div to have a known root in which the
-    // sentence ids are unique. Used for highlighting sentences.
-    e.data[1].forEach(translatedHTML => {
-      const translation = document.createElement('div');
-      translation.classList.add('translation');
-      translation.innerHTML = translatedHTML;
-      addQualityClasses(translation);
-      document.querySelector("#output").appendChild(translation);
-    });
-  } else if (e.data[0] === "load_model_reply" && e.data[1]) {
-    status(e.data[1]);
-    translateCall();
-  } else if (e.data[0] === "import_reply" && e.data[1]) {
-    modelRegistry = e.data[1];
-    init();
-  }
-};
-
-const loadModel = () => {
-  const lngFrom = langFrom.value;
-  const lngTo = langTo.value;
-  if (lngFrom !== lngTo) {
-    status(`Installing model...`);
-    console.log(`Loading model '${lngFrom}${lngTo}'`);
-    worker.postMessage(["load_model", lngFrom, lngTo]);
-  } else {
-    const input = textToHTML(document.querySelector("#input").value);
-    document.querySelector("#output").innerHTML = input;
-  }
-};
-
-langFrom.addEventListener("change", e => {
-  loadModel();
-});
-
-langTo.addEventListener("change", e => {
-  loadModel();
-});
-
-$(".swap").addEventListener("click", e => {
-  [langFrom.value, langTo.value] = [langTo.value, langFrom.value];
-  $("#input").value = $("#output").innerText;
-  loadModel();
-});
-
-$('#output').addEventListener('mouseover', e => {
-  const root = e.target.closest('.translation');
-  const sentence = e.target.parentNode.hasAttribute('x-bergamot-sentence-index') ? e.target.parentNode.getAttribute('x-bergamot-sentence-index') : null;  
-  document.querySelectorAll('#output font[x-bergamot-sentence-index]').forEach(el => {
-    el.classList.toggle('highlight-sentence', el.getAttribute('x-bergamot-sentence-index') === sentence && el.closest('.translation') === root);
+function highlightSentence(element) {
+  const sentence = element.parentNode.hasAttribute('x-bergamot-sentence-index')
+    ? element.parentNode.getAttribute('x-bergamot-sentence-index')
+    : null;
+  $$('#output font[x-bergamot-sentence-index]').forEach(el => {
+    el.classList.toggle('highlight-sentence', el.getAttribute('x-bergamot-sentence-index') === sentence);
  })
-})
+}

-function init() {
-  // Populate langs
-  const langs = Array.from(new Set(Object.keys(modelRegistry).reduce((acc, key) => acc.concat([key.substr(0, 2), key.substr(2, 2)]), [])));
-  const langNames = new Intl.DisplayNames(undefined, {type: "language"});
+/**
+ * Very minimal WISYWIG editor. Just keyboard shortcuts for the IYKYK crowd.
+ */
+class Editor {
+  constructor(root) {
+    this.isApple = window.navigator.platform.startsWith('Mac');

-  // Sort languages by display name
-  langs.sort((a, b) => langNames.of(a).localeCompare(langNames.of(b)));
+    this.root = root;
+    this.root.addEventListener('keydown', this.onkeydown.bind(this));

-  // Populate the dropdowns 
-  langs.forEach(code => {
-    const name = langNames.of(code);
-    langFrom.innerHTML += `<option value="${code}">${name}</option>`;
-    langTo.innerHTML += `<option value="${code}">${name}</option>`;
+    this.mapping = {
+      "b": "bold",
+      "i": "italic",
+      "u": "underline",
+    };
+  }
+
+  onkeydown(event) {
+    if (!(this.isApple ? event.metaKey : event.ctrlKey))
+      return;
+
+    if (!(event.key in this.mapping))
+      return;
+
+    document.execCommand(this.mapping[event.key], false, null);
+
+    event.preventDefault();
+  }
+}
+
+async function main() {
+  const options = {
+    cacheSize: 2^13,
+    downloadTimeout: null // Disable timeout
+  };
+  
+  const backing = new TranslatorBacking(options);
+
+  let pending = 0; // Number of pending requests
+
+  // Patch the fetch() function to track number of pending requests
+  backing.fetch = async function(...args) {
+    try {
+      $('.app').classList.toggle('loading', ++pending > 0);
+      return await TranslatorBacking.prototype.fetch.call(backing, ...args);
+    } finally {
+      $('.app').classList.toggle('loading', --pending > 0);
+    }
+  };
+
+  // Wait for the language model registry to load. Once it is loaded, use
+  // it to fill the "from" and "to" language selection dropdowns.
+  await backing.registry.then(models => {
+    const names = new Intl.DisplayNames(['en'], {type: 'language'});
+
+    ['from', 'to'].forEach(field => {
+      const languages = new Set(models.map(model => model[field]));
+      const select = $(`#lang-${field}`);
+
+      const pairs = Array.from(languages, code => ({code, name: names.of(code)}));
+      
+      pairs.sort(({name: a}, {name: b}) => a.localeCompare(b));
+
+      pairs.forEach(({name, code}) => {
+        select.add(new Option(name, code));
+      })
+    });
+
+    $('#lang-from').value = 'en';
+    $('#lang-to').value = 'es';
  });

-  // try to guess input language from user agent
-  let myLang = navigator.language;
-  if (myLang) {
-    myLang = myLang.split("-")[0];
-    let langIndex = langs.indexOf(myLang);
-    if (langIndex > -1) {
-      console.log("guessing input language is", myLang);
-      langFrom.value = myLang;
+  // Intentionally do this after querying backing.registry to make sure that
+  // that request is fired off first. Now we can start thinking about loading
+  // the WASM binary etc.
+  const translator = new LatencyOptimisedTranslator(options, backing);
+
+  let abortController = new AbortController();
+
+  const translate = async () => {
+    try {
+      const from = $('#lang-from').value;
+      const to = $('#lang-to').value;
+      
+      // Querying models to see whether quality estimation is supported by all
+      // of them.
+      const models = await backing.getModels({from, to});
+      const qualityScores = models.every(model => 'qualityModel' in model.files);
+
+      $('.app').classList.add('translating');
+
+      const response = await translator.translate({
+        from,
+        to,
+        text: $('#input').innerHTML,
+        html: true,
+        qualityScores
+      }, {signal: abortController.signal});
+
+      $('#output').innerHTML = response.target.text;
+      $('#output').classList.toggle('has-quality-scores', qualityScores);
+
+      if (qualityScores)
+        addQualityIndicators();
+
+    } catch (error) {
+      // Ignore errors caused by changing the language pair (which triggers abort())
+      if (error.constructor === CancelledError) {
+        return;
+      }
+      
+      // Ignore 'errors' caused by typing too fast or by changing the language
+      // pair while a translation was still in progress (or being loaded)
+      if (error.constructor === SupersededError || error.constructor === CancelledError)
+        return;
+
+      // Ignore errors caused by selecting a bad pair (e.g. en -> en)
+      if (error.message.startsWith('No model available to translate from'))
+        return;
+
+      alert(`Error during translation: ${error}\n\n${error.stack}`);
+    } finally {
+      const worker = await Promise.race([translator.worker, Promise.resolve(null)]);
+      $('.app').classList.toggle('translating', worker === null || !worker.idle);
    }
  }

-  // find first output lang that *isn't* input language
-  langTo.value = langs.find(code => code !== langFrom.value);
-  // load this model
-  loadModel();
+  const reset = async () => {
+    // Cancel any pending loading/translation
+    abortController.abort();
+
+    // Reset abort controller to a fresh un-aborted one
+    abortController = new AbortController();
+
+    // Clear output to make it more clear something is happening
+    $('#output').innerHTML = '';
+
+    // Immediately start loading the new selection
+    translate();
+  }
+
+  $('button.swap').addEventListener('click', () => {
+    const tmp = $('#lang-from').value;
+    $('#lang-from').value = $('#lang-to').value;
+    $('#lang-to').value = tmp;
+    translate();
+  })
+
+  // Simple WYSIWYG controls
+  const editor = new Editor($('#input'));
+
+  // Translate on any change
+  $('#input').addEventListener('input', translate);
+  $('#lang-from').addEventListener('input', reset);
+  $('#lang-to').addEventListener('input', reset);
+
+  // Hook up sentence boundary highlighting if that information is available.
+  $('#output').addEventListener('mouseover', (e) => highlightSentence(e.target))
+
+  // Wait for bergamot-translator to load. This could throw a CompileError
+  // which we want to catch so we can show "oh noes browser not supported!"
+  translator.worker.catch(error => {
+    // Catch CompileErrors because for those we know what to do.
+    if (error.name === 'CompileError')
+      $('#unsupported-browser').hidden = false;
+    else
+      throw error;
+  });
 }
+
+main();
--- a/wasm/test_page/js/worker.js
+++ b/wasm/test_page/js/worker.js
@ -1,352 +0,0 @@
-// All variables specific to translation service
-var translationService = undefined;
-
-// Model registry
-let modelRegistry = undefined;
-
-// A map of language-pair to TranslationModel object
-var languagePairToTranslationModels = new Map();
-
-const BERGAMOT_TRANSLATOR_MODULE = "bergamot-translator-worker.js";
-const MODEL_REGISTRY = "../models/registry.json";
-const MODEL_ROOT_URL = "../models/";
-const PIVOT_LANGUAGE = 'en';
-
-// Information corresponding to each file type
-const fileInfo = [
-  {"type": "model", "alignment": 256},
-  {"type": "lex", "alignment": 64},
-  {"type": "vocab", "alignment": 64},
-  {"type": "qualityModel", "alignment": 64}
-];
-
-const encoder = new TextEncoder(); // string to utf-8 converter
-const decoder = new TextDecoder(); // utf-8 to string converter
-
-const start = Date.now();
-let moduleLoadStart;
-var Module = {
-  preRun: [function() {
-    log(`Time until Module.preRun: ${(Date.now() - start) / 1000} secs`);
-    moduleLoadStart = Date.now();
-  }],
-  onRuntimeInitialized: async function() {
-    log(`Wasm Runtime initialized Successfully (preRun -> onRuntimeInitialized) in ${(Date.now() - moduleLoadStart) / 1000} secs`);
-    const response = await fetch(MODEL_REGISTRY);
-    modelRegistry = await response.json();
-    postMessage([`import_reply`, modelRegistry]);
-  }
-};
-
-const log = (message) => {
-  console.debug(message);
-}
-
-onmessage = async function(e) {
-  const command = e.data[0];
-  log(`Message '${command}' received from main script`);
-  let result = "";
-  if (command === 'import') {
-      importScripts(BERGAMOT_TRANSLATOR_MODULE);
-  } else if (command === 'load_model') {
-      let start = Date.now();
-      let from = e.data[1];
-      let to = e.data[2];
-      try {
-        await constructTranslationService();
-        await constructTranslationModel(from, to);
-        log(`Model '${from}${to}' successfully constructed. Time taken: ${(Date.now() - start) / 1000} secs`);
-        result = "Model successfully loaded";
-      } catch (error) {
-        log(`Model '${from}${to}' construction failed: '${error.message}'`);
-        result = "Model loading failed";
-      }
-      log(`'${command}' command done, Posting message back to main script`);
-      postMessage([`${command}_reply`, result]);
-  } else if (command === 'translate') {
-      const from = e.data[1];
-      const to = e.data[2];
-      const input = e.data[3];
-      const translateOptions = e.data[4];
-      let inputWordCount = 0;
-      let inputBlockElements = 0;
-      input.forEach(sentence => {
-        inputWordCount += sentence.trim().split(" ").filter(word => word.trim() !== "").length;
-        inputBlockElements++;
-      })
-      let start = Date.now();
-      try {
-        log(`Blocks to translate: ${inputBlockElements}`);
-        result = translate(from, to, input, translateOptions);
-        const secs = (Date.now() - start) / 1000;
-        log(`Translation '${from}${to}' Successful. Speed: ${Math.round(inputWordCount / secs)} WPS (${inputWordCount} words in ${secs} secs)`);
-      } catch (error) {
-        log(`Error: ${error.message}`);
-      }
-      log(`'${command}' command done, Posting message back to main script`);
-      postMessage([`${command}_reply`, result]);
-  }
-}
-
-// Instantiates the Translation Service
-const constructTranslationService = async () => {
-  if (!translationService) {
-    var translationServiceConfig = {cacheSize: 20000};
-    log(`Creating Translation Service with config: ${translationServiceConfig}`);
-    translationService = new Module.BlockingService(translationServiceConfig);
-    log(`Translation Service created successfully`);
-  }
-}
-
-// Constructs translation model(s) for the source and target language pair (using
-// pivoting if required).
-const constructTranslationModel = async (from, to) => {
-  // Delete all previously constructed translation models and clear the map
-  languagePairToTranslationModels.forEach((value, key) => {
-    log(`Destructing model '${key}'`);
-    value.delete();
-  });
-  languagePairToTranslationModels.clear();
-
-  if (_isPivotingRequired(from, to)) {
-    // Pivoting requires 2 translation models
-    const languagePairSrcToPivot = _getLanguagePair(from, PIVOT_LANGUAGE);
-    const languagePairPivotToTarget = _getLanguagePair(PIVOT_LANGUAGE, to);
-    await Promise.all([_constructTranslationModelHelper(languagePairSrcToPivot),
-                      _constructTranslationModelHelper(languagePairPivotToTarget)]);
-  }
-  else {
-    // Non-pivoting case requires only 1 translation model
-    await _constructTranslationModelHelper(_getLanguagePair(from, to));
-  }
-}
-
-// Translates text from source language to target language (via pivoting if necessary).
-const translate = (from, to, input, translateOptions) => {
-  let vectorResponseOptions, vectorSourceText, vectorResponse;
-  try {
-    // Prepare the arguments (vectorResponseOptions and vectorSourceText (vector<string>)) of Translation API and call it.
-    // Result is a vector<Response> where each of its item corresponds to one item of vectorSourceText in the same order.
-    vectorResponseOptions = _prepareResponseOptions(translateOptions);
-    vectorSourceText = _prepareSourceText(input);
-
-    if (_isPivotingRequired(from, to)) {
-      // Translate via pivoting
-      const translationModelSrcToPivot = _getLoadedTranslationModel(from, PIVOT_LANGUAGE);
-      const translationModelPivotToTarget = _getLoadedTranslationModel(PIVOT_LANGUAGE, to);
-      vectorResponse = translationService.translateViaPivoting(translationModelSrcToPivot,
-                                                              translationModelPivotToTarget,
-                                                              vectorSourceText,
-                                                              vectorResponseOptions);
-    }
-    else {
-      // Translate without pivoting
-      const translationModel = _getLoadedTranslationModel(from, to);
-      vectorResponse = translationService.translate(translationModel, vectorSourceText, vectorResponseOptions);
-    }
-
-    // Parse all relevant information from vectorResponse
-    const listTranslatedText = _parseTranslatedText(vectorResponse);
-    const listSourceText = _parseSourceText(vectorResponse);
-    const listTranslatedTextSentences = _parseTranslatedTextSentences(vectorResponse);
-    const listSourceTextSentences = _parseSourceTextSentences(vectorResponse);
-
-    log(`Source text: ${listSourceText}`);
-    log(`Translated text: ${listTranslatedText}`);
-    log(`Translated sentences: ${JSON.stringify(listTranslatedTextSentences)}`);
-    log(`Source sentences: ${JSON.stringify(listSourceTextSentences)}`);
-
-    return listTranslatedText;
-  } finally {
-    // Necessary clean up
-    if (vectorSourceText != null) vectorSourceText.delete();
-    if (vectorResponseOptions != null) vectorResponseOptions.delete();
-    if (vectorResponse != null) vectorResponse.delete();
-  }
-}
-
-// Downloads file from a url and returns the array buffer
-const _downloadAsArrayBuffer = async(url) => {
-  const response = await fetch(url);
-  if (!response.ok) {
-    throw Error(`Downloading ${url} failed: HTTP ${response.status} - ${response.statusText}`);
-  }
-  return response.arrayBuffer();
-}
-
-// Constructs and initializes the AlignedMemory from the array buffer and alignment size
-const _prepareAlignedMemoryFromBuffer = async (buffer, alignmentSize) => {
-  var byteArray = new Int8Array(buffer);
-  var alignedMemory = new Module.AlignedMemory(byteArray.byteLength, alignmentSize);
-  const alignedByteArrayView = alignedMemory.getByteArrayView();
-  alignedByteArrayView.set(byteArray);
-  return alignedMemory;
-}
-
-async function prepareAlignedMemory(file, languagePair) {
-  const fileName = `${MODEL_ROOT_URL}/${languagePair}/${modelRegistry[languagePair][file.type].name}`;
-  const buffer = await _downloadAsArrayBuffer(fileName);
-  const alignedMemory = await _prepareAlignedMemoryFromBuffer(buffer, file.alignment);
-  log(`"${file.type}" aligned memory prepared. Size:${alignedMemory.size()} bytes, alignment:${file.alignment}`);
-  return alignedMemory;
-}
-
-const _constructTranslationModelHelper = async (languagePair) => {
-  log(`Constructing translation model ${languagePair}`);
-
-  /*Set the Model Configuration as YAML formatted string.
-    For available configuration options, please check: https://marian-nmt.github.io/docs/cmd/marian-decoder/
-    Vocab files are re-used in both translation directions.
-    DO NOT CHANGE THE SPACES BETWEEN EACH ENTRY OF CONFIG
-  */
-  const modelConfig = `beam-size: 1
-normalize: 1.0
-word-penalty: 0
-max-length-break: 128
-mini-batch-words: 1024
-workspace: 128
-max-length-factor: 2.0
-skip-cost: false
-cpu-threads: 0
-quiet: true
-quiet-translation: true
-gemm-precision: int8shiftAlphaAll
-alignment: soft
-`;
-
-  const promises = [];
-  fileInfo.filter(file => modelRegistry[languagePair].hasOwnProperty(file.type))
-  .map((file) => {
-      promises.push(prepareAlignedMemory(file, languagePair));
-  });
-
-  const alignedMemories = await Promise.all(promises);
-
-  log(`Translation Model config: ${modelConfig}`);
-  log(`Aligned memory sizes: Model:${alignedMemories[0].size()} Shortlist:${alignedMemories[1].size()} Vocab:${alignedMemories[2].size()}`);
-  const alignedVocabMemoryList = new Module.AlignedMemoryList();
-  alignedVocabMemoryList.push_back(alignedMemories[2]);
-  let translationModel;
-  if (alignedMemories.length === fileInfo.length) {
-    log(`QE:${alignedMemories[3].size()}`);
-    translationModel = new Module.TranslationModel(modelConfig, alignedMemories[0], alignedMemories[1], alignedVocabMemoryList, alignedMemories[3]);
-  }
-  else {
-    translationModel = new Module.TranslationModel(modelConfig, alignedMemories[0], alignedMemories[1], alignedVocabMemoryList, null);
-  }
-  languagePairToTranslationModels.set(languagePair, translationModel);
-}
-
-const _isPivotingRequired = (from, to) => {
-  return (from !== PIVOT_LANGUAGE) && (to !== PIVOT_LANGUAGE);
-}
-
-const _getLanguagePair = (srcLang, tgtLang) => {
-  return `${srcLang}${tgtLang}`;
-}
-
-const _getLoadedTranslationModel = (srcLang, tgtLang) => {
-  const languagePair = _getLanguagePair(srcLang, tgtLang);
-  if (!languagePairToTranslationModels.has(languagePair)) {
-    throw Error(`Translation model '${languagePair}' not loaded`);
-  }
-  return languagePairToTranslationModels.get(languagePair);
-}
-
-const _parseTranslatedText = (vectorResponse) => {
-  const result = [];
-  for (let i = 0; i < vectorResponse.size(); i++) {
-    const response = vectorResponse.get(i);
-    result.push(response.getTranslatedText());
-  }
-  return result;
-}
-
-const _parseTranslatedTextSentences = (vectorResponse) => {
-  const result = [];
-  for (let i = 0; i < vectorResponse.size(); i++) {
-    const response = vectorResponse.get(i);
-    result.push(_getTranslatedSentences(response));
-  }
-  return result;
-}
-
-const _parseSourceText = (vectorResponse) => {
-  const result = [];
-  for (let i = 0; i < vectorResponse.size(); i++) {
-    const response = vectorResponse.get(i);
-    result.push(response.getOriginalText());
-  }
-  return result;
-}
-
-const _parseSourceTextSentences = (vectorResponse) => {
-  const result = [];
-  for (let i = 0; i < vectorResponse.size(); i++) {
-    const response = vectorResponse.get(i);
-    result.push(_getSourceSentences(response));
-  }
-  return result;
-}
-
-const _prepareResponseOptions = (translateOptions) => {
-  let vectorResponseOptions = new Module.VectorResponseOptions;
-  translateOptions.forEach(translateOption => {
-    vectorResponseOptions.push_back({
-      qualityScores: translateOption["isQualityScores"],
-      alignment: true,
-      html: translateOption["isHtml"]
-    });
-  });
-  if (vectorResponseOptions.size() == 0) {
-    vectorResponseOptions.delete();
-    throw Error(`No Translation Options provided`);
-  }
-  return vectorResponseOptions;
-}
-
-const _prepareSourceText = (input) => {
-  let vectorSourceText = new Module.VectorString;
-  input.forEach(paragraph => {
-    // prevent empty paragraph - it breaks the translation
-    if (paragraph.trim() === "") {
-      return;
-    }
-    vectorSourceText.push_back(paragraph.trim())
-  })
-  if (vectorSourceText.size() == 0) {
-    vectorSourceText.delete();
-    throw Error(`No text provided to translate`);
-  }
-  return vectorSourceText;
-}
-
-const _getTranslatedSentences = (response) => {
-  const sentences = [];
-  const text = response.getTranslatedText();
-  for (let sentenceIndex = 0; sentenceIndex < response.size(); sentenceIndex++) {
-    const utf8SentenceByteRange = response.getTranslatedSentence(sentenceIndex);
-    sentences.push(_getSubString(text, utf8SentenceByteRange));
-  }
-  return sentences;
-}
-
-const _getSourceSentences = (response) => {
-  const sentences = [];
-  const text = response.getOriginalText();
-  for (let sentenceIndex = 0; sentenceIndex < response.size(); sentenceIndex++) {
-    const utf8SentenceByteRange = response.getSourceSentence(sentenceIndex);
-    sentences.push(_getSubString(text, utf8SentenceByteRange));
-  }
-  return sentences;
-}
-
-/*
- * Returns a substring of text (a string). The substring is represented by
- * byteRange (begin and end endices) within the utf-8 encoded version of the text.
- */
-const _getSubString = (text, utf8ByteRange) => {
-  const textUtf8ByteView = encoder.encode(text);
-  const substringUtf8ByteView = textUtf8ByteView.subarray(utf8ByteRange.begin, utf8ByteRange.end);
-  return decoder.decode(substringUtf8ByteView);
-}
--- a/wasm/test_page/logos.png
+++ b/wasm/test_page/logos.png
--- a/wasm/test_page/package-lock.json
+++ b/wasm/test_page/package-lock.json
@ -5,11 +5,21 @@
  "packages": {
    "": {
      "dependencies": {
+        "@browsermt/bergamot-translator": "file:../module",
        "cors": "^2.8.5",
        "express": "^4.18.2",
        "nocache": "^2.1.0"
      }
    },
+    "../module": {
+      "name": "@browsermt/bergamot-translator",
+      "version": "0.4.8",
+      "license": "MPL-2.0"
+    },
+    "node_modules/@browsermt/bergamot-translator": {
+      "resolved": "../module",
+      "link": true
+    },
    "node_modules/accepts": {
      "version": "1.3.8",
      "resolved": "https://registry.npmjs.org/accepts/-/accepts-1.3.8.tgz",
@ -616,6 +626,9 @@
    }
  },
  "dependencies": {
+    "@browsermt/bergamot-translator": {
+      "version": "file:../module"
+    },
    "accepts": {
      "version": "1.3.8",
      "resolved": "https://registry.npmjs.org/accepts/-/accepts-1.3.8.tgz",
--- a/wasm/test_page/package.json
+++ b/wasm/test_page/package.json
@ -1,7 +1,14 @@
 {
  "dependencies": {
+    "@browsermt/bergamot-translator": "file:../module",
    "cors": "^2.8.5",
    "express": "^4.18.2",
    "nocache": "^2.1.0"
+  },
+  "config": {
+    "port": 80
+  },
+  "scripts": {
+    "start": "node ./bergamot-httpserver.js $npm_package_config_port 1 0"
  }
 }
--- a/wasm/test_page/start_server.sh
+++ b/wasm/test_page/start_server.sh
@ -24,7 +24,7 @@ fi
 # Prepare a list all wasm artifacts to be copied and copy them to the destination folder
 ARTIFACTS_BASE_NAME="bergamot-translator-worker"
 ARTIFACTS="$1/$ARTIFACTS_BASE_NAME.js $1/$ARTIFACTS_BASE_NAME.wasm"
-ARTIFACTS_DESTINATION_FOLDER=$SCRIPT_ABSOLUTE_PATH/js
+ARTIFACTS_DESTINATION_FOLDER=$SCRIPT_ABSOLUTE_PATH/../module/worker

 for i in $ARTIFACTS; do
    [ -f "$i" ] || breaks