Commit Graph

403 Commits

Author SHA1 Message Date
Abhishek Aggarwal
0189500160 Updated README to remove packaging steps for wasm compilation
- We don't need to package model, shortlist or vocab files into wasm
   binary at build time
2021-05-12 16:23:09 +02:00
Abhishek Aggarwal
e0b9bad058 Updated wasm README to update for passing vocabs as bytes
- Updated Using JS APIs section to pass vocabs as bytes
2021-05-12 16:23:09 +02:00
Abhishek Aggarwal
8a6c7b44a3 Avoid packaging vocab files into wasm binary in CI builds
- We don't need to package vocab files into wasm binary any more
   as a sync with upstream enabled passing vocabs as bytes
2021-05-12 09:55:49 +02:00
Abhishek Aggarwal
451ab047ff Merge remote-tracking branch 'upstream/main' into main 2021-05-12 08:53:25 +02:00
Abhishek Aggarwal
d7cb859ab7 Refactoring TranslationModelBindings class
- typdef AlignedMemory for code readability

 - Added documentation for one of the binding function
2021-05-12 07:32:42 +02:00
Abhishek Aggarwal
5025285e5c Updated wasm test page to pass vocabulary files as bytes 2021-05-12 07:32:42 +02:00
Abhishek Aggarwal
9f78985e45 JS bindings for vocabularies as bytes 2021-05-12 07:32:42 +02:00
Abhishek Aggarwal
331216e017 Enable Debugging information in wasm module builds
- Added "-g2" flag furing linking step
2021-05-11 18:50:55 +02:00
Abhishek Aggarwal
ce576c27f1 Export "addOnPreMain" function from wasm module
- This is required in the extension while using wasm module in a worker environment
2021-05-11 18:50:55 +02:00
Kenneth Heafield
ce01de939d
Change USE_WASM_COMPATIBLE_SOURCE =OFF by default on native, force on for WASM (#138)
* Change WASM_COMPATIBLE_SOURCE=OFF by default

The default was WASN_COMPATIBLE_SOURCE=ON COMPILE_WASM=OFF which is a
testing configuration, not a sensible default for native or wasm.

* Always USE_WASM_COMPATIBLE_SOURCE with COMPILE_WASM

* Set CMP0077 to fix variable handling
2021-05-10 12:28:37 +02:00
Jerin Philip
354e7ac6be
Remove unused used types TokenRanges, SentenceTokenRanges, UPtr (#137) 2021-05-09 13:42:57 +01:00
Nikolay Bogoychev
87adb5d60a Target master of ssplit-cpp 2021-05-07 18:41:08 +01:00
Jerin Philip
bef12765ad
Minor rename: sentence_ranges -> annotation (#134) 2021-05-07 18:38:27 +01:00
Nikolay Bogoychev
21c1cae472
Update ssplit submodule, removing absl (#132)
* Update ssplit submodule, removing absl

* Fix ssplit variables

* Update ssplit branch

* Fix emscripten compilaiton

* Update tests
2021-05-07 17:58:58 +01:00
Qianqian Zhu
5b02008a97
Enable vocabs pass as byte arrays (#122)
* first attempt to enable vocabs pass as byte arrays

* pass vocabs bytes as AlignedMemory

* add vocabIndices to avoid double loading

* small fix on parameter names and documentation

* fix windows build plus tiny update on documentation

* update marian-dev submodule

* move validate model bytearray in BatchTranslator

* small refactors on validateBinaryModel()

* switch vocab memories to std::vector<marian::Ptr<AlignedMemory>>

* update marian-dev submodule

* replace marian::Ptr to std::shared_ptr for vocab memories

* add note for vocab memories
2021-05-07 14:54:48 +01:00
Jerin Philip
b86c76b004
Faithful to source-structure translation (#115)
* First draft of faithful translation

* Comments explaining pre and post

* Comments on response_builder

* Updating bergamot-translator-tests with new outputs

* Cosmetic changes in response target text construction

* Replacing &(x[0]) -> x.data() to avoid illegal indices

* Removing nullptr given both branches init pointer with legal values

* pre, post -> gap(i) addressing review comments

Functions which were pre and post before are subsumed by gap(i), and the
algorithm in ResponseBuilder adjusted to fix.

`x = nullptr` is back, should be harmless.

* Updating brt with paragraph outputs

* Bumping brt with updated outputs, buffer text at begin as well

* Bumping BRT with sync after bytearray collapse merge

* Pointing BRT to main after merge

Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
2021-05-06 16:19:27 +01:00
Jerin Philip
bc2e4eee5c
Making bytearray a commandline switch (#127)
* Adding bytearray option

* collapse intermediate for bytearray apps

* Removing service-cli-bytearray

* Removing the bergamot bytearray app

* Bumping updates to brt collapsing apps

* Reasonable defaults and hard check when cmd enabled

* Update documentation for flags

* Bump brt with MKL check and skip

* Bumping BRT with MKL_FOUND instead of USE_MKL

* Bumping BRT with no mkl enforce

* Bumping BRT with ssse3 output

* Let's try disabling OpenBLAS

* Trying to disable apple accelerate

* Using WASM compatible BLAS can enable intgemm

* Adding a CMake -L to see what exactly is the diff

* Revert "Let's try disabling OpenBLAS"

This reverts commit 9a6b9bc53b.

* Revert "Using WASM compatible BLAS can enable intgemm"

This reverts commit 936a592e18.

* Restricting mac tests through tags and on GitHub CI

* Using only check-bytearray

* Bumping BRT with change of default behaviour
2021-05-06 00:26:03 +01:00
Kenneth Heafield
c61b2bdd10
Fix busy loop in windows (#131)
* Fix busy loop in windows

* Nick wants the while loop gone

* Fix continue leftover

Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
2021-05-06 00:21:50 +01:00
Motin
743ebcd3bc Extension desired changes (#129)
* Enable worker file system
* Avoid node.js-code in emscripten glue-code
2021-05-04 14:53:05 +02:00
Motin
a63533b241
Extension desired changes (#129)
* Enable worker file system
* Avoid node.js-code in emscripten glue-code
2021-05-04 14:25:12 +02:00
Abhishek Aggarwal
c478a626a8 Updating ci scripts for the latest upstream changes
- The upstream browsermt/bergamot-translator builds the wasm artifacts
   in top level build folder now
2021-05-04 13:53:33 +02:00
Abhishek Aggarwal
d8f7e51792 Minor README change
- Changed "browsermt" to "mozilla"
2021-05-04 13:53:33 +02:00
Abhishek Aggarwal
ec3a785d17 Merge remote-tracking branch 'upstream/main' into main
- Sync with upstream (https://github.com/browsermt/bergamot-translator)
2021-05-04 12:09:26 +02:00
abhi-agg
8de368c166
Improved wasm scripts and README (#128) 2021-05-04 10:18:45 +01:00
abhi-agg
1a4add19da
Improve script to patch wasm artifacts and load EN->DE vocabulary in wasm test (#125)
* Improved script that patches wasm artifacts to enable wormhole

 - Made the regex pattern ignore multiple whitespaces b/w words of
   the matching pattern

* Fix for loading EN->DE vocabularies in wasm test page

 - Loading vocabularies for EN->DE was failing because of
   the new structure of bergamot-models
2021-05-03 16:13:43 +01:00
Jerin Philip
36b3c7291a
WASM Bindings collapse (#87)
* Safe transfer of bindings through typedefs

* Removing Translation* files and bringing in counterparts

* Remove previously commented out code

* Removing commented out include

* Absorb Translation* documentation

Co-authored-by: abhi-agg <66322306+abhi-agg@users.noreply.github.com>
2021-05-03 13:41:37 +01:00
Abhishek Aggarwal
4908e4019e Updated wasm/README file with instructions for byte loading APIs 2021-05-03 10:03:17 +02:00
Abhishek Aggarwal
f3a257d40b Enabled gemm-precision in wasm test page
- This increases the inference speed while providing
   models as bytes to the translation engine
   (it wasn't needed while providing models as files)
2021-05-03 10:03:17 +02:00
Nikolay Bogoychev
d82e01eda4
Full windows support with ssplit from browsermt, not a fork (#109)
* Update marian-dev to the newest mac version

* Attempt windows workflow

* force workflow rerun

* Separate id

* Attempt 3 at github action

* Marian dev submodule now compiles with apple clang

* Updated ssplit version to something more recent

* Attempt to fix compile on wasm

* Do not compile subproject tests

* Fix emscripten compilation on Mac

* 99% on the way to windows compile

* Try with a different generator

* Build release not debug

* Revert CMakeLists.txt hacks

* Fix sse2 compilation failure

* MSVC settings for WIN32

* Add nodefaultlib LIBCMT

* Do not compile ssplit.cpp as it contains sys/mman.h

* Revert ab56b9aa4f

* Update paths

* Set the build type to release if not set previously

* Attempt to build release with the windows workflow

* Attempt 5 at VS studio release build

* Attempt 6 at getting release build on MSVC generator

* The windows build is debug at the moment...

* fix ssplit for ubuntu 16.04

* Fix compilation with clang

* Compile on ubuntu16.04

* Explain what is going on

* Updated ssplit and workflow
2021-05-01 00:29:23 +01:00
Nikolay Bogoychev
e286533164 Update to marian-dev master 2021-04-30 22:34:44 +01:00
Abhishek Aggarwal
2788116f8b Better error logging for wasm test page 2021-04-30 09:09:41 +02:00
Abhishek Aggarwal
3525af6a45 Make wasm test page work with bergamot-models repository
- bergamot-models now contains lexical shortlist bin files as well
2021-04-30 09:09:41 +02:00
abhi-agg
de0abfd795
JS bindings for loading model and shortlist files as bytes (#117)
* Bindings to load model and shortlist files as bytes
* Modified wasm test page for byte based loading of files
* Updates wasm README for byte loading based usage of TranslationModel
2021-04-29 12:04:04 +02:00
abhi-agg
e5ec5bdd33
Control validating the config options via a boolean flag (#116)
* Control validating the config options via a boolean flag

 - parseOptions() function now validates the parsed options
   based on the validate argument

* Minor syntactic fix
2021-04-29 09:38:09 +01:00
Jerin Philip
4be96a97d7 Handle empty translation requests
Fixes https://github.com/browsermt/bergamot-translator/issues/101.
ResponseBuilder is called with empty histories to trigger a valid but
mostly-empty response.
2021-04-28 10:13:45 +02:00
Jerin Philip
fa2003e70d
Cleanup API: Refactor request on-complete transition (#80) 2021-04-27 15:56:39 +01:00
Nikolay Bogoychev
fdf9e66cef
Windows workflows and mac framework accelerate (#108)
Windows still failing but getting closer
2021-04-26 18:59:20 +01:00
abhi-agg
7d2e74f3c0
Changed underlying template parameter of AlignedMemory class (#111)
- AlignedMemory is AlignedVector<char> now instead of AlignedVector<const void*>
 - This solves the issue of allocating 8x of the actual required memory for
   loading files as bytes
2021-04-26 16:26:27 +01:00
Nikolay Bogoychev
fc6976ae29
Remove dead code (#107)
Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
2021-04-22 17:29:22 +01:00
Kenneth Heafield
1184875cc9
Windows PCQueue support without Boost (#106) 2021-04-22 16:01:39 +01:00
Jerin Philip
c00c263f8f
Moving small tests to GitHub CI (#93)
Adds regression-tests to the workflow for native minimal/custom marian and full builds. 

Co-authored-by: abhi-agg <66322306+abhi-agg@users.noreply.github.com>
2021-04-16 11:58:53 +01:00
Nikolay Bogoychev
f1fc4f8041
Fix the target_include_directories (#98) 2021-04-14 14:53:35 +02:00
Abhishek Aggarwal
a7f6bb51d9 Minor cleanup in build-wasm.sh file 2021-04-14 14:51:08 +02:00
Abhishek Aggarwal
1574a4586c Merge remote-tracking branch 'upstream/main' into upstream-sync 2021-04-14 14:36:17 +02:00
Abhishek Aggarwal
f5dffeb5ca Downgraded resource class to 'medium' for circle ci
- Also restricted parallel make compilation to 3
2021-04-14 14:19:13 +02:00
Nikolay Bogoychev
e4b58357db
Clarify misleading comment (#99) 2021-04-14 09:56:07 +01:00
Jerin Philip
3daa024eb3
Strengthen the Annotation class: Handle empty sentences and tests (#85)
* Changing Annotation to adhere to [begin, end)

* Stronger unit tests on sentences + num words, num sentences

* Hotfix with empty string view from EOS

* No more absolving empty-sentence; Added tests now defined behaviour

* Uncommenting important section in unit test

* Ensure empty string view default, beginning at end so marker points

* Further strengthen and comment unit-tests, mark exactly where empty sentence is happening

* Review comments: Dummy sentence + docs

- What should be a simple fast accessor is turning into compute.
  Normally the way to deal with this, for better or worse, is to put 0 at
  the beginning of sentenceEndIds_. (Putting 0 at the beginning of
  sentenceEndIds_)

- Indices into what? Mentioned to be flatByteRanges_.

* Documentation updates

* More changes to docs

Co-authored-by: abhi-agg <66322306+abhi-agg@users.noreply.github.com>
2021-04-12 17:05:23 +01:00
Nikolay Bogoychev
b345b0e035
Rudimentary validator for binary files (#94)
* Rudimentary validator for binary files
2021-04-12 13:46:47 +02:00
Nikolay Bogoychev
5e15d73b7e
Consistent api usage (#91)
* Consistent api between the two versions of the executables in app folder

* Remove shared ptrs
2021-04-09 10:34:24 +02:00
Jerin Philip
b71b3a18d8
Removes vocabs and propogates fixes for breaks (#79)
* Removes vocabs and propogates fixes for breaks

* Prettify diff: Undoing comment shuffles due to merge conflict edits

* 20% of time actual work, 80% prettifying diff

* Histories members -> poof!

We however have Histories in constructor, which we will remove out of the way
soon.

Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
2021-04-07 12:15:46 +01:00