Commit Graph

91 Commits

Author SHA1 Message Date
Nikolay Bogoychev
534ed37a3d
Remove wormhole references (#459)
* Remove warmhole references

* Remove more references to the WORMHOLE

* Update marian to wormhole removed marian

* Whoops

---------

Co-authored-by: Jelmer van der Linde <jelmer@ikhoefgeen.nl>
2023-08-14 15:22:54 +01:00
Graeme Nail
4b0da8d434
Enables model ensembles (#450)
* Enables model ensembles

Adds the ability to use ensembles of models. This supports ensembles of
binary- or npz-format models, as well as mixtures of both.

When all models in the ensembles are of binary format, the load from
memory path is used. Otherwise, they are loaded via the file system.
Enable log-level debug for output related to this.

* Fix formatting

* Fix WASM bindings for MemoryBundle

For now, this does not support ensembles.

* Remove shared_ptr wrapping the AlignedMemory of models.

* Fix formatting
2023-08-01 19:35:11 +01:00
Jelmer
8d5f877596
More portable WASM demo (#437)
* Replace most of the wasm demo page with code from the firefox extension

This code should be more generic and copy/pastable into other projects. Maybe one day it will be an npm package?

* Fix Ukrainian model support

* Add quality estimation output

Automatically enabled when the model(s) support it

* Little "Translating…" indicator

* Don't make Safari fail on something tiny

* Rewire lots of async state to be able to predictably know when the translator is working or not

Previously so much was lazy loaded that it was not easy to catch lack of SIMD support. Now I can just enable the interface only after it has properly loaded.

* No need for a two-stage setup for the worker. Just promise to call `initialize()`!

* More (correct) types and comments for code

* Keyboard shortcuts for input area for bold, italic and underline.

Enough to demo mark-up translation

* Fix `delete()`

* Move javascript glue code into its own npm package

* Add nodejs support and test to package

* More stand-alone build command

…for now, not really used by anything I think

* Ignore build packages

* Use local filesystem for build so it is automatically cached

* fix overflow on demo page

But this might break the mobile demo? I'll have to check into that

* Bring back integrity check, except for NodeJS for now

* Make `build` part of `prepare` so we always make sure we build a complete package

* Move worker code into its own folder

This way I can mark it as a commonjs module which will help cause nodejs treat the files the same as WebWorkers do right now. Firefox doesn't implement `{type: 'module'}` yet for WebWorkers.

* Add README

* Fix paths

* Add npm publish automation

* Make sure webpack ignores node compatibility code

* Add missing webpack:ignore around a worker

* Default to getting models from S3

* Separate "loading" and "translating" indicators

* Bump npm package version

* Add credits

* Don't block on the worker loading

* Not just Mozilla, but Bergamot!

* Make individual translation requests cancelable

* Swap button turns vertically when in skyscraper mode

* Make it easier to debug errors from inside the worker

* Don't bork on deleting a failed worker

* Don't bork on calling translate() with a failed worker

* Handle compilation error with more grace

* `contenteditable=true` seems to work better with some browser extensions

Looking at you, Vimium!

* Clean up abort promise

* Bump npm package version

* Remove `workerUrl` option in favour of better webpack support

With that option it was hard for Webpack to figure out dependencies, and it did not enter my worker script for rewriting. With the hardcoded url it does, and with a bit of `new webpack.DefinePlugin({'typeof self': JSON.stringify('object')}),` we can have webpack remove node-specific code on build!

* Bump version

Minor API change hehe

Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
2023-01-18 19:41:39 +00:00
Jelmer
2834f046dc
Expand the node-test.js example code with documentation (#434)
* Expand the node-test.js example code with documentation

Is there a better way to document code than by providing an annotated & working example of it? Just listing all the exposed methods feels like giving people a box of bricks and expecting them to build a house with it.

* Use @Jerin's feedback to simplify node-test.js explanations

* Use native `console.assert` instead

See #426 for an explanation

* Fix comment

Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
2023-01-18 19:09:47 +00:00
dependabot[bot]
620c8b00ec
Bump qs and express in /wasm/test_page (#444)
Bumps [qs](https://github.com/ljharb/qs) to 6.11.0 and updates ancestor dependency [express](https://github.com/expressjs/express). These dependencies need to be updated together.


Updates `qs` from 6.7.0 to 6.11.0
- [Release notes](https://github.com/ljharb/qs/releases)
- [Changelog](https://github.com/ljharb/qs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/ljharb/qs/compare/v6.7.0...v6.11.0)

Updates `express` from 4.17.1 to 4.18.2
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/master/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.17.1...4.18.2)

---
updated-dependencies:
- dependency-name: qs
  dependency-type: indirect
- dependency-name: express
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-01-18 15:30:58 +00:00
Jerin Philip
8771078177
Basic HTML property testing for WebAssembly (#425)
Import
https://gist.github.com/jelmervdl/a4c8b6b92ad88a885e1cbd51c6ad4902 and
attach it to CI.  NodeJS-14 is failing on trying to use the WebAssembly
binary. So we use node-16 independently setup.  This paves way for more
complicated testing for WebAssembly bindings in the future.
2022-06-21 14:07:17 +01:00
Abhishek Aggarwal
e34420647d
Upgrade emsdk to 3.1.8 (#414)
* Rework WASM compilation options

Necessary to work with newer versions of emscripten that are more picky about which option goes to the compiler, and which to the linker. Also took the opportunity to remove the need for the patching of the bergamot-translation-worker.js file, this can now easily be done through supported apis. Furthermore, I tried to downsize the generated javascript and wasm code a bit.

Initial estimates show that bergamot-translator compiled with emscripten 3.0.0 runs at about 3x the speed of 2.0.9 (when using embedded intgemm). Speed-up when using mozIntGemm is less dramatic.

* Updated marian-dev submodule
* Revert changes specific to patching external gemm modules for wasm
* Better Compilation and Link flags

 - Added "-O3" optimization flag for linking as well
 - "-g2" only for release and debug builds
 - "-g1" for release builds
 - Replaced deprecated "--bind" flag with "-lembind"
 - Removed redundant link flag

* Upgraded emsdk to 3.1.8
* Enclosed EXPORTED_FUNCTIONS values in a list
* Fixed the remaining 2.0.9 reference in circle ci build script
* Updated README

Co-authored-by: Jelmer van der Linde <jelmer@ikhoefgeen.nl>
2022-04-20 00:39:32 +01:00
Jerin Philip
46882e7cfe
JS: Fix swap button on test-page (#388) 2022-03-24 15:05:45 +00:00
Jelmer
ed3160524d
JS: Update languages & use Intl API for their display names (#379)
Got the languages from registry.json, including non-prod models. 
Code now calls into `Intl.DisplayNames()`[1] to make life easier.

[1] (http://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/DisplayNames/DisplayNames)
2022-03-23 12:14:51 +00:00
Abhishek Aggarwal
0a52a6d405
JS: Using supervised QE models for available language pairs (#378)
* JS: Refactored model loading
 - Passing single vocab memory via JS
* JS: Use supervised QE models when available
* Ran clang format
2022-03-15 15:55:28 +01:00
Abhishek Aggarwal
2c0e65c2ec
JS: Reuse Model registry from firefox-translation-models for test page (#377)
* JS: Reuse Model registry from firefox-translation-models repo for test page

 - https://github.com/mozilla/firefox-translations-models/blob/main/registry.json
   is reused
 - Removed existing registry
2022-03-14 18:05:22 +01:00
Abhishek Aggarwal
89a96bf71e
Use right range and threshold for showing "bad" words/sentences (#370)
* Use ln(0.5) as the threshold
* Use right range for showing "bad" words/sentences
2022-03-03 17:24:32 +01:00
Jelmer
fe3f3982de
Embed quality-scores as HTML tag attributes (#358)
Quality scores for HTML translation exposed as <font
x-bergamot-sentence-score=""> and <font x-bergamot-word-score=""> tags
in the HTML output. While this increases the size of the HTML returned,
the resulting rendered HTML can easily be styled to show the scores.
With Javascript or CSS, developers can easily have some interface based
on these extra attributes.

Also includes updates to the test page to show a proof-of-concept 
demonstration.

Fixes: #355
2022-02-25 22:01:32 +00:00
Jerin Philip
96b0f82343
Simplify cache config and bind for use in JS (#359)
Deprecates cacheEnabled parameter to be replaced with cacheSize=0.
Python bindings, Documentation in comments and tests updated to reflect
this change.

Exposes the fields corresponding to cache via embind as a value object.
The equivalent object-based syntax in worker.js allows propagation
from JS.

Fixes: #351
See also: mozilla/firefox-translations#96
2022-02-23 13:25:12 +00:00
Abhishek Aggarwal
2844cedb0d
JS: Refactoring wasm test page (#354)
* Free all the objects properly that were constructed for translation api
* Refactored pivot detection mechanism
2022-02-17 14:16:26 +01:00
Abhishek Aggarwal
c76e630e00
JS/WASM: Passing ResponseOptions for every item for translation batch api (#348)
- Now translate() JS API accepts ResponseOptions per batch item

 - Fixed the logic to create vector<ResponseOption>
2022-02-14 13:16:33 +01:00
Jerin Philip
ec469193c6
Allow per-input options (#346)
Changes signature of BlockingService::{translate,pivot}Multiple
functions to take per input options, so a mix of HTML and plaintext
can be sent from the extension. Templating over testing is adjusted
to allow for continuous evaluations by modifying the test code.

Updates WebAssembly bindings to reflect the change in signature
and the javascript test-page to work with the new bindings.

This change lacks an accompanying test specific to the mixed HTML
and plaintext inputs.

Fixes: #345
See also: mozilla/firefox-translations#94
Co-authored-by: Jelmer van der Linde <jelmer@ikhoefgeen.nl>
2022-02-11 13:06:26 +00:00
Jelmer
80bd4e7651
Print errors by default in WASM build (#343)
* Remove BadHTML exception in favour of ABORT macro
   `ABORT()` gives us readable error messages, even when exception support is disabled.
* Control marian exception global setting in tests through fixture
* WASM: construct BlockingService with critical logging by default
   This log level is only used by ABORT()

See also: 
- mozilla/firefox-translations#65, 
- mozilla/firefox-translations#68
- mozilla/firefox-translations#70 
- mozilla/firefox-translations#56
2022-02-09 12:54:36 +00:00
Abhishek Aggarwal
6b2a855234
JS/WASM: Re-enable importing optimized gemm module for (#336)
- Re-enabled the code that imports optimized gemm module
   for wasm when available
2022-02-07 16:55:31 +01:00
Abhishek Aggarwal
d95b014562
Wasm/JS: Pivot translation API JS binding and test page update (#327) 2022-02-02 17:01:23 +01:00
Abhishek Aggarwal
8884b39055
Disabled importing optimized gemm module (#282)
- Until the optimized gemm module stops requiring
   Shared Array Buffer, we can't really use it in
   Firefox
2021-12-17 17:39:43 +01:00
Abhishek Aggarwal
feb9c90429
Additional logs in JS translation worker (#277)
- Print source text received in the response
 - Print no. of block elements in the input
2021-12-14 21:52:00 +01:00
Abhishek Aggarwal
e75a9e1da3
More robust logic to import wasm gemm (#276)
- Import optimized gemm implementation only if all the necessary functions
   are provided by it, othewise use the fallback gemm
2021-12-14 16:39:19 +01:00
Abhishek Aggarwal
8e79897f30
Updated configuration for html text translation to work in wasm test page (#269)
* Updated translator configuration in wasm test page
 - Added alignment: soft

* Set ResponseOptions::alignment to "true"
 - Had to be set for html text translation to work
2021-12-01 11:32:51 +01:00
Kenneth Heafield
40366162d8
HTML input (#253)
Co-authored-by: Jelmer van der Linde <jelmer@ikhoefgeen.nl>
Co-authored-by: Abhishek Aggarwal <aaggarwal@mozilla.com>
2021-11-25 13:57:50 +00:00
Abhishek Aggarwal
2b1b0531ff
Import optimized gemm implementation (when available) for wasm target (#265)
* Enable importing optimized gemm module for wasm

 - Updated emscripten generated JS code to
   -- import and use the optimized gemm module when available, otherwise
     use fallback gemm implementation

* Added logging for gemm implementation being used for wasm target
2021-11-17 09:18:55 +01:00
Abhishek Aggarwal
f9e55b3cd8
Make script run from any directory (#262)
* Make script run from any directory
2021-11-15 22:30:52 +01:00
Jerin Philip
0bb8095bca
Deprecate hardAlignment in favour of softAlignment (#250) 2021-11-01 19:21:28 +00:00
Abhishek Aggarwal
c5bc3f5191
Update config "skip-cost" to enable log probabilities for QE scores (#247)
- Updated wasm test page
2021-11-01 13:06:23 +01:00
Abhishek Aggarwal
d0d08c0f54
JS bindings for Quality Estimation (#239)
* Quality Score bindings complete
* Updated wasm test page to test the bindings
  - Word and sentence scores can be seen in browser console
2021-10-27 19:26:55 +02:00
Abhishek Aggarwal
c5167b3d8c
Import matrix-multiply from a separate wasm module (#232)
* Updated marian-dev submodule
* Import wasm gemm from a separate wasm module
 - The fallback implementation of gemm is currently being imported dynamically
   for wasm target
* Updated CI scripts and README to import GEMM from a separate wasm module
* Setting model config to int8shiftAlphaAll in wasm test page
2021-10-27 11:54:39 +02:00
Abhishek Aggarwal
a0cb1e4b3d
Wasm test page UI for translating b/w non-English language pairs (#231)
* Updated Wasm test page UI for translating b/w non-English language pairs
* Both "from" and "to" language dropdowns now allow non-English languages
2021-10-19 14:40:54 +02:00
Abhishek Aggarwal
c7b626dfd0
Adapted wasm test page for new Service interface (#224)
- The new interface now supports running multiple TranslationModels
2021-09-28 15:53:02 +05:30
Jerin Philip
cf541c68f9
Multiple TranslationModels Implementation (#210)
For outbound translation, we require having multiple models in the
inventory at the same time and abstracting the "how-to-translate" 
using a model out.

Reorganization: TranslationModel + Service. The new entity which
contains everything required to translate in one direction is
`TranslationModel`. The how-to-translate blocking single-threaded mode
of operation or async multi-threaded mode of operation is decoupled as
`BlockingService` and `AsyncService`. There is a new regression-test
using multiple models in conjunction added, also serving as
a demonstration for using multiple models in Outbound Translation.

WASM: WebAssembly due to the inability to use threads uses
`BlockingService.  Bindings are provided with a new API to work with a
Service, and multiple TranslationModels which the client (JS extension)
can inventory and maintain.  Ownership of a given `TranslationModel` is
shared while translations using the model are active in the internal
mechanism.

Config-Parsing: So far bergamot-translator has been hijacking marian's
config-parsing mechanisms. However, in order to support multiple models,
it has become impractical to continue this approach and a new
config-parsing that is bergamot specific is provisioned for
command-line applications constituting tests. The original marian
config-parsing tooling is only associated with a subset of
`TranslationModel` now. The new config-parsing for the library manages
workers and other common options (tentatively).

There is a known issue of: Inefficient placing of workspaces, leading to
more memory usage than what's necessary. This is to be fixed trickling
down from marian-dev in a later pull request. 

This PR also brings in BRT changes which fix speed-tests that were
broken and also fixes some QE outputs which were different due to not
using shortlist.
2021-09-21 18:10:40 +01:00
Abhishek Aggarwal
b64ffce496
Wasm test page using web workers now (#218) 2021-08-26 15:22:52 +02:00
Abhishek Aggarwal
5a8fe209ce Wasm: Enabled sentence byte ranges in the wasm test page
- Use JS bindings to print all sentences individually on
   console
2021-07-19 12:06:22 +02:00
Abhishek Aggarwal
7052722cd2 JS bindings to return sentence byte ranges 2021-07-19 12:06:22 +02:00
Abhishek Aggarwal
b00116cb94
Refactor wasm bindings to use consistent interface names as in native (#195)
* Refactored wasm bindings code
 - Replaced TranslationModel, TranslationRequest and TranslationResult
    with Service, ResponseOptions and Response
 - Corresponding documentation changes
 - Names of the bindings files changed
 - Moved Vector<Response> definition in Response specific bindings
   file
2021-06-15 16:02:14 +02:00
Abhishek Aggarwal
16eb47f47e
Generating cmake configured project version (.js) file in build folder (#194)
- Earlier this file was being generated in folder containing
   actual sources

 - Fixes https://github.com/browsermt/bergamot-translator/issues/161
2021-06-09 13:57:23 +01:00
Jerin Philip
330840338c
Including WASM documentation in sphinx build toc (#176) 2021-06-01 12:39:28 +01:00
Jerin Philip
9dcf6ab665
Adding clang-format and updating existing sources to adhere (#151)
* Adding a first version of clang-format

* Adding run-clang-format.py

* Adding coding styles to workflow

* Fix indentation on coding-styles workflow

* run-clang-format.'py'

* -style -> --style in python

* Updating ColumnLimit: 120

* Format update with clang-format

* Revert "Format update with clang-format"

This reverts commit 5340b19eae.

* Apply update after sync

* Removing a few empty lines

* Removing one more empty line

* Removing empty in workflow file

* Updating README with coding style instructions

* clang-format-* provided in this repository doc update

Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
2021-05-19 21:50:21 +01:00
Jerin Philip
269edc7ce5
Collapsing TranslationRequest -> ResponseOptions (#139) 2021-05-18 14:25:25 +01:00
Abhishek Aggarwal
2e5880d3d4 Modified wasm cmake file to include version information in built artifacts 2021-05-17 19:34:58 +02:00
Qianqian Zhu
6c7e6156ab
Bundle AlignedMemory inputs with MemoryBundle (#147) 2021-05-13 13:18:08 +01:00
Abhishek Aggarwal
6c063c607e Updated CMakeLists.txt to remove packaging steps for wasm compilation
- Removed PACKAGE_DIR cmake option
 - Removed Workerfs, FORCE_FILESYSTEM=1 in wasm builds
   -- File system support is not needed any more (since model,
     shortlist and vocabs are being passed as bytes now)
2021-05-12 16:23:09 +02:00
Abhishek Aggarwal
e0b9bad058 Updated wasm README to update for passing vocabs as bytes
- Updated Using JS APIs section to pass vocabs as bytes
2021-05-12 16:23:09 +02:00
Abhishek Aggarwal
d7cb859ab7 Refactoring TranslationModelBindings class
- typdef AlignedMemory for code readability

 - Added documentation for one of the binding function
2021-05-12 07:32:42 +02:00
Abhishek Aggarwal
5025285e5c Updated wasm test page to pass vocabulary files as bytes 2021-05-12 07:32:42 +02:00
Abhishek Aggarwal
9f78985e45 JS bindings for vocabularies as bytes 2021-05-12 07:32:42 +02:00
Abhishek Aggarwal
331216e017 Enable Debugging information in wasm module builds
- Added "-g2" flag furing linking step
2021-05-11 18:50:55 +02:00