Commit Graph

12 Commits

Author SHA1 Message Date
Greg Tatum
47024ec7a3
Add more things to the gitignore that are not being ignored (#462) 2023-08-16 15:35:26 +01:00
Jelmer
8d5f877596
More portable WASM demo (#437)
* Replace most of the wasm demo page with code from the firefox extension

This code should be more generic and copy/pastable into other projects. Maybe one day it will be an npm package?

* Fix Ukrainian model support

* Add quality estimation output

Automatically enabled when the model(s) support it

* Little "Translating…" indicator

* Don't make Safari fail on something tiny

* Rewire lots of async state to be able to predictably know when the translator is working or not

Previously so much was lazy loaded that it was not easy to catch lack of SIMD support. Now I can just enable the interface only after it has properly loaded.

* No need for a two-stage setup for the worker. Just promise to call `initialize()`!

* More (correct) types and comments for code

* Keyboard shortcuts for input area for bold, italic and underline.

Enough to demo mark-up translation

* Fix `delete()`

* Move javascript glue code into its own npm package

* Add nodejs support and test to package

* More stand-alone build command

…for now, not really used by anything I think

* Ignore build packages

* Use local filesystem for build so it is automatically cached

* fix overflow on demo page

But this might break the mobile demo? I'll have to check into that

* Bring back integrity check, except for NodeJS for now

* Make `build` part of `prepare` so we always make sure we build a complete package

* Move worker code into its own folder

This way I can mark it as a commonjs module which will help cause nodejs treat the files the same as WebWorkers do right now. Firefox doesn't implement `{type: 'module'}` yet for WebWorkers.

* Add README

* Fix paths

* Add npm publish automation

* Make sure webpack ignores node compatibility code

* Add missing webpack:ignore around a worker

* Default to getting models from S3

* Separate "loading" and "translating" indicators

* Bump npm package version

* Add credits

* Don't block on the worker loading

* Not just Mozilla, but Bergamot!

* Make individual translation requests cancelable

* Swap button turns vertically when in skyscraper mode

* Make it easier to debug errors from inside the worker

* Don't bork on deleting a failed worker

* Don't bork on calling translate() with a failed worker

* Handle compilation error with more grace

* `contenteditable=true` seems to work better with some browser extensions

Looking at you, Vimium!

* Clean up abort promise

* Bump npm package version

* Remove `workerUrl` option in favour of better webpack support

With that option it was hard for Webpack to figure out dependencies, and it did not enter my worker script for rewriting. With the hardcoded url it does, and with a bit of `new webpack.DefinePlugin({'typeof self': JSON.stringify('object')}),` we can have webpack remove node-specific code on build!

* Bump version

Minor API change hehe

Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
2023-01-18 19:41:39 +00:00
Jelmer
e061b5613e
Treat most HTML elements as word-breaking (#286) 2022-01-16 10:26:40 +00:00
Andre Barbosa
63120c174e
QualityEstimation: Preliminary Implementation (#197)
Unifies quality estimation with an interface, refactors previously available
quality scores to fit this interface. Adds a new class of  model with Logistic
Regression powering the predictions as an implementation of said interface. 
QE now provides annotations on words using subwords to word rule-based 
algorithms working with space characters. 

QualityEstimation
-----------------

Implementations of QE are bound together by a `QualityEstimator`
Interface. 

1. The log-probabilities from the machine-translation model re-interpreted
   as quality scores are crafted as an implementation of QualityEstimator.

2. A Logistic-Regression based model is added. This class of models is
   trained supervised with scores labeled by a human annotator.
   Handcrafted features - number of words, log probs from MT model and 
   statistics over the sequence are used to generate the numeric features.
   LogisticRegressor, Matrix (to hold features) are added.

The creation of an instance is switched by the `AlignedMemory` supplied
(be it loaded from the file-system or supplied as a parameter). An empty
AlignedMemory leads to quality scores from NMT while supplying weights
of a trained logistic-regression model in binary format as the contents
lead to an additional pass through the said model to provide more
refined scores.

Both the above now transform subwords into "words" using a heuristic
algorithm, scanning for spaces. This allows the client to work with "words"
to denote quality instead of subwords, as the former is more sensible to
the user.

Testing
-------

1. BRT now has two new test apps to check the QE outputs in text
  (covers subword to words) and numbers domain (covers quality scores).
  These are tested with en-et models for which QualityEstimation is
  available now, on a new input to avoid architecture/compiler issues.
2. Unit test for LogisticRegression model is added.


Docs
----

Doxygen now supports MathJax properly to render explanations for
Logistic Regressions' reductions in place to make computation more
efficient correctly.

Co-authored-by: Felipe C. Dos Santos <felipe.santos.k@gmail.com>
Co-authored-by: Jerin Philip <jerinphilip@live.in>
2021-09-16 16:28:40 +01:00
abhi-agg
c64deb50a8
Imported CI scripts from mozilla/bergamot-translator-old (#1)
* CircleCI config, docs and badge

* Increase CircleCI RAM from 4gb to 16gb

Co-authored-by: Motin <motin@motin.eu>
2021-03-10 09:30:39 -08:00
Jerin Philip
10dcb8f548 Merge remote-tracking branch 'origin/wasm-integration' into jp/absorb-batch-translator
Merging wasm-integration. Single thread codepath seems functional.
Multithreading is broken.
2021-02-17 13:08:58 +00:00
Motin
49ad6514ae Add reproducible docker-based builds + let test page use these by default 2021-02-15 11:27:47 +02:00
Motin
7030fa0157 Ignore test page bundled artifacts 2021-02-15 11:25:13 +02:00
Motin
e50dd0909f Ignore contents in models directory 2021-02-15 11:23:08 +02:00
Andre Natal
1e413f71cd Including a more elaborated test page, a node webserver containing the proper cors headers and wasm mimetype 2021-02-13 18:23:25 -08:00
Jerin Philip
38e8b3cd6d Updates: marian-dev, ssplit for marian-decoder-new
Updates marian-dev and ssplit submodules to point to the upstream
commits which implements the following:

 - marian-dev: encodeWithByteRanges(...) to get source token byte-ranges
 - ssplit: Has a trivial sentencesplitter functionality implemented, and
   now is faster to benchmark with marian-decoder.

This enables a marian-decoder replacement written through ssplit in this
source to be benchmarked constantly with existing marian-decoder.

Nits: Removes logging introduced for multiple workers, and respective
log statements.
2021-02-12 14:23:24 +00:00
Jerin Philip
e75bd7eb57 Adding vim temporary files to .gitignore 2021-01-22 11:31:20 +00:00