Commit Graph

275 Commits

Author SHA1 Message Date
Motin
0f8f8e026a
Pin emsdk version to the same one used in Circle CI (#165) 2021-05-20 08:59:30 +03:00
Jerin Philip
9dcf6ab665
Adding clang-format and updating existing sources to adhere (#151)
* Adding a first version of clang-format

* Adding run-clang-format.py

* Adding coding styles to workflow

* Fix indentation on coding-styles workflow

* run-clang-format.'py'

* -style -> --style in python

* Updating ColumnLimit: 120

* Format update with clang-format

* Revert "Format update with clang-format"

This reverts commit 5340b19eae.

* Apply update after sync

* Removing a few empty lines

* Removing one more empty line

* Removing empty in workflow file

* Updating README with coding style instructions

* clang-format-* provided in this repository doc update

Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
2021-05-19 21:50:21 +01:00
Qianqian Zhu
7ad8d0a04d
initialise MemoryBundle members (#167) 2021-05-19 20:11:20 +01:00
Kenneth Heafield
89bd47342b
Use binary lexical shortlist in documentation (#152)
* Use binary lexical shortlist in documentation

* MKL/AppleAccelerate note

Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
Co-authored-by: Jerin Philip <jphilip@ed.ac.uk>
2021-05-19 10:44:32 +01:00
Kenneth Heafield
b25f223fe4
Rewriting batching for threadsafety (#155)
This does make the batcher a critical section across job submission and
cleaving though.  If that becomes a problem, we should go back to
incoming and outgoing queues with a batcher thread.

Also removes blocking mode from native compiles.

Note that translateMultiple no longer guarantees great batching.  Guess
we could lease the mutex from ThreadsafeBatcher and create a session.

There is the risk that one sentence comes in at a time and each thread
grabs one sentence at a time instead of better batching.  Not sure what
to do about that other than some sort of Nagle algorithm.

Due to non-deterministic batching, even with one thread, the regression
tests will go haywire.
2021-05-18 16:11:14 +01:00
Jerin Philip
269edc7ce5
Collapsing TranslationRequest -> ResponseOptions (#139) 2021-05-18 14:25:25 +01:00
abhi-agg
8b621de358
Merge pull request #159 from mozilla/main
Merge histories across bergamot-translator forks
2021-05-18 13:53:26 +02:00
abhi-agg
813e81c10c
Merge branch 'main' into main 2021-05-18 13:53:12 +02:00
Nikolay Bogoychev
10131c731a
Marian submodule with unified loading (#157) 2021-05-18 12:45:22 +01:00
Motin
1c40cc8289
Merge branch 'main' into main 2021-05-18 13:44:08 +03:00
Abhishek Aggarwal
7a973df74d Corrected the version number
- To be in sync with versioning in mozilla/bergamot-translator repo
2021-05-18 12:17:56 +02:00
Abhishek Aggarwal
b73714e222 Merge remote-tracking branch 'upstream/main' into main
- Sync with upstream (https://github.com/browsermt/bergamot-translator)
2021-05-18 08:48:41 +02:00
Abhishek Aggarwal
067076fbc1 Bumped version to 0.3.0
- This brings the version info in sync with the various releases
   of extension
2021-05-17 19:34:58 +02:00
Abhishek Aggarwal
0ad583cc34 Generate project version file for native builds
- The header file exposes a function that provides version information
   for native binaries
2021-05-17 19:34:58 +02:00
Abhishek Aggarwal
2e5880d3d4 Modified wasm cmake file to include version information in built artifacts 2021-05-17 19:34:58 +02:00
Abhishek Aggarwal
c44868e1fd Import GetVersionFromFile cmake file in root level CMakeLists.txt 2021-05-17 19:34:58 +02:00
Abhishek Aggarwal
c1ef6f2bcb Added cmake file to compute version information
- Reads BERGAMOT_VERSION file for generating various strings
   for versioning
2021-05-17 19:34:58 +02:00
Kenneth Heafield
3e70587672
Rewrite annotation class to remove corner cases (#135) 2021-05-17 16:42:18 +01:00
Qianqian Zhu
5bd1fc6b83
Refactor vocabs in Service (#143)
Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
2021-05-17 13:09:03 +01:00
Jerin Philip
77424a3df1
Enabling ccache on github builds for Ubuntu (#95)
* CI Changes to add tiny regression tests

* Adding an inspect cache step

* Removing ccache, pursue in another

* Incorporating Nick's changes through submodule merge

* Submodule now points to master

* Restoring ccache enabled workflow file

* Restoring ccache enabled CMakeLists

* cache -> ccache typo fix

* Moving CCACHE setup to GitHub runner file

* Find also uses CCACHE dir

* Updating CMakeLists not to override env

* Cache compiler binary's contents

* Changing a few names to trigger new build; Testing cache looks fun

* USE_CCACHE=on, -L for inspection

* Adding a ccache_cmd, but will only use in next commit

* Using ccache_cmd

* Removing "

* Adding compiler hash script

* Bunch of absolute paths

* GITHUB_WORKSPACE typo

* Nah, I'll keep -L and trigger another build

* Trying something with compiler hash on cache key backup as well

* builtin, bash it seems

* Empty commit #1

* Move ccache stats to after compile

* Reshuffling ccache vars

* No comments

* Updates to Github output set syntax

* Empty Commit 1

* Empty Commit 2

* Empty commit 3

* /bin/bash -> bash; ccache_cmd for consistency

* Adding ccache -s before and after build

* Adding comments to compiler-hash script

* Let's build cached and non-cached variants together for comparison

* Fixing quotes, /bin/bash -> bash

* Minor var/env adjustment

* Adding ccache -z before the job

* Reverting CMakeLists.txt without CCACHE

* Switching to CMAKE_LANG_COMPILER_LAUNCHER instead of CMakeLists.txt rule

* 5G -> 1G cache size

* 1G -> 2G; Hyperparameter tuning
2021-05-17 11:42:47 +01:00
Qianqian Zhu
6c7e6156ab
Bundle AlignedMemory inputs with MemoryBundle (#147) 2021-05-13 13:18:08 +01:00
Abhishek Aggarwal
6c063c607e Updated CMakeLists.txt to remove packaging steps for wasm compilation
- Removed PACKAGE_DIR cmake option
 - Removed Workerfs, FORCE_FILESYSTEM=1 in wasm builds
   -- File system support is not needed any more (since model,
     shortlist and vocabs are being passed as bytes now)
2021-05-12 16:23:09 +02:00
Abhishek Aggarwal
0189500160 Updated README to remove packaging steps for wasm compilation
- We don't need to package model, shortlist or vocab files into wasm
   binary at build time
2021-05-12 16:23:09 +02:00
Abhishek Aggarwal
e0b9bad058 Updated wasm README to update for passing vocabs as bytes
- Updated Using JS APIs section to pass vocabs as bytes
2021-05-12 16:23:09 +02:00
Abhishek Aggarwal
8a6c7b44a3 Avoid packaging vocab files into wasm binary in CI builds
- We don't need to package vocab files into wasm binary any more
   as a sync with upstream enabled passing vocabs as bytes
2021-05-12 09:55:49 +02:00
Abhishek Aggarwal
451ab047ff Merge remote-tracking branch 'upstream/main' into main 2021-05-12 08:53:25 +02:00
Abhishek Aggarwal
d7cb859ab7 Refactoring TranslationModelBindings class
- typdef AlignedMemory for code readability

 - Added documentation for one of the binding function
2021-05-12 07:32:42 +02:00
Abhishek Aggarwal
5025285e5c Updated wasm test page to pass vocabulary files as bytes 2021-05-12 07:32:42 +02:00
Abhishek Aggarwal
9f78985e45 JS bindings for vocabularies as bytes 2021-05-12 07:32:42 +02:00
Abhishek Aggarwal
331216e017 Enable Debugging information in wasm module builds
- Added "-g2" flag furing linking step
2021-05-11 18:50:55 +02:00
Abhishek Aggarwal
ce576c27f1 Export "addOnPreMain" function from wasm module
- This is required in the extension while using wasm module in a worker environment
2021-05-11 18:50:55 +02:00
Kenneth Heafield
ce01de939d
Change USE_WASM_COMPATIBLE_SOURCE =OFF by default on native, force on for WASM (#138)
* Change WASM_COMPATIBLE_SOURCE=OFF by default

The default was WASN_COMPATIBLE_SOURCE=ON COMPILE_WASM=OFF which is a
testing configuration, not a sensible default for native or wasm.

* Always USE_WASM_COMPATIBLE_SOURCE with COMPILE_WASM

* Set CMP0077 to fix variable handling
2021-05-10 12:28:37 +02:00
Jerin Philip
354e7ac6be
Remove unused used types TokenRanges, SentenceTokenRanges, UPtr (#137) 2021-05-09 13:42:57 +01:00
Nikolay Bogoychev
87adb5d60a Target master of ssplit-cpp 2021-05-07 18:41:08 +01:00
Jerin Philip
bef12765ad
Minor rename: sentence_ranges -> annotation (#134) 2021-05-07 18:38:27 +01:00
Nikolay Bogoychev
21c1cae472
Update ssplit submodule, removing absl (#132)
* Update ssplit submodule, removing absl

* Fix ssplit variables

* Update ssplit branch

* Fix emscripten compilaiton

* Update tests
2021-05-07 17:58:58 +01:00
Qianqian Zhu
5b02008a97
Enable vocabs pass as byte arrays (#122)
* first attempt to enable vocabs pass as byte arrays

* pass vocabs bytes as AlignedMemory

* add vocabIndices to avoid double loading

* small fix on parameter names and documentation

* fix windows build plus tiny update on documentation

* update marian-dev submodule

* move validate model bytearray in BatchTranslator

* small refactors on validateBinaryModel()

* switch vocab memories to std::vector<marian::Ptr<AlignedMemory>>

* update marian-dev submodule

* replace marian::Ptr to std::shared_ptr for vocab memories

* add note for vocab memories
2021-05-07 14:54:48 +01:00
Jerin Philip
b86c76b004
Faithful to source-structure translation (#115)
* First draft of faithful translation

* Comments explaining pre and post

* Comments on response_builder

* Updating bergamot-translator-tests with new outputs

* Cosmetic changes in response target text construction

* Replacing &(x[0]) -> x.data() to avoid illegal indices

* Removing nullptr given both branches init pointer with legal values

* pre, post -> gap(i) addressing review comments

Functions which were pre and post before are subsumed by gap(i), and the
algorithm in ResponseBuilder adjusted to fix.

`x = nullptr` is back, should be harmless.

* Updating brt with paragraph outputs

* Bumping brt with updated outputs, buffer text at begin as well

* Bumping BRT with sync after bytearray collapse merge

* Pointing BRT to main after merge

Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
2021-05-06 16:19:27 +01:00
Jerin Philip
bc2e4eee5c
Making bytearray a commandline switch (#127)
* Adding bytearray option

* collapse intermediate for bytearray apps

* Removing service-cli-bytearray

* Removing the bergamot bytearray app

* Bumping updates to brt collapsing apps

* Reasonable defaults and hard check when cmd enabled

* Update documentation for flags

* Bump brt with MKL check and skip

* Bumping BRT with MKL_FOUND instead of USE_MKL

* Bumping BRT with no mkl enforce

* Bumping BRT with ssse3 output

* Let's try disabling OpenBLAS

* Trying to disable apple accelerate

* Using WASM compatible BLAS can enable intgemm

* Adding a CMake -L to see what exactly is the diff

* Revert "Let's try disabling OpenBLAS"

This reverts commit 9a6b9bc53b.

* Revert "Using WASM compatible BLAS can enable intgemm"

This reverts commit 936a592e18.

* Restricting mac tests through tags and on GitHub CI

* Using only check-bytearray

* Bumping BRT with change of default behaviour
2021-05-06 00:26:03 +01:00
Kenneth Heafield
c61b2bdd10
Fix busy loop in windows (#131)
* Fix busy loop in windows

* Nick wants the while loop gone

* Fix continue leftover

Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
2021-05-06 00:21:50 +01:00
Motin
743ebcd3bc Extension desired changes (#129)
* Enable worker file system
* Avoid node.js-code in emscripten glue-code
2021-05-04 14:53:05 +02:00
Motin
a63533b241
Extension desired changes (#129)
* Enable worker file system
* Avoid node.js-code in emscripten glue-code
2021-05-04 14:25:12 +02:00
Abhishek Aggarwal
c478a626a8 Updating ci scripts for the latest upstream changes
- The upstream browsermt/bergamot-translator builds the wasm artifacts
   in top level build folder now
2021-05-04 13:53:33 +02:00
Abhishek Aggarwal
d8f7e51792 Minor README change
- Changed "browsermt" to "mozilla"
2021-05-04 13:53:33 +02:00
Abhishek Aggarwal
ec3a785d17 Merge remote-tracking branch 'upstream/main' into main
- Sync with upstream (https://github.com/browsermt/bergamot-translator)
2021-05-04 12:09:26 +02:00
abhi-agg
8de368c166
Improved wasm scripts and README (#128) 2021-05-04 10:18:45 +01:00
abhi-agg
1a4add19da
Improve script to patch wasm artifacts and load EN->DE vocabulary in wasm test (#125)
* Improved script that patches wasm artifacts to enable wormhole

 - Made the regex pattern ignore multiple whitespaces b/w words of
   the matching pattern

* Fix for loading EN->DE vocabularies in wasm test page

 - Loading vocabularies for EN->DE was failing because of
   the new structure of bergamot-models
2021-05-03 16:13:43 +01:00
Jerin Philip
36b3c7291a
WASM Bindings collapse (#87)
* Safe transfer of bindings through typedefs

* Removing Translation* files and bringing in counterparts

* Remove previously commented out code

* Removing commented out include

* Absorb Translation* documentation

Co-authored-by: abhi-agg <66322306+abhi-agg@users.noreply.github.com>
2021-05-03 13:41:37 +01:00
Abhishek Aggarwal
4908e4019e Updated wasm/README file with instructions for byte loading APIs 2021-05-03 10:03:17 +02:00
Abhishek Aggarwal
f3a257d40b Enabled gemm-precision in wasm test page
- This increases the inference speed while providing
   models as bytes to the translation engine
   (it wasn't needed while providing models as files)
2021-05-03 10:03:17 +02:00