Commit Graph

407 Commits

Author SHA1 Message Date
Qianqian Zhu
5bd1fc6b83
Refactor vocabs in Service (#143)
Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
2021-05-17 13:09:03 +01:00
Jerin Philip
77424a3df1
Enabling ccache on github builds for Ubuntu (#95)
* CI Changes to add tiny regression tests

* Adding an inspect cache step

* Removing ccache, pursue in another

* Incorporating Nick's changes through submodule merge

* Submodule now points to master

* Restoring ccache enabled workflow file

* Restoring ccache enabled CMakeLists

* cache -> ccache typo fix

* Moving CCACHE setup to GitHub runner file

* Find also uses CCACHE dir

* Updating CMakeLists not to override env

* Cache compiler binary's contents

* Changing a few names to trigger new build; Testing cache looks fun

* USE_CCACHE=on, -L for inspection

* Adding a ccache_cmd, but will only use in next commit

* Using ccache_cmd

* Removing "

* Adding compiler hash script

* Bunch of absolute paths

* GITHUB_WORKSPACE typo

* Nah, I'll keep -L and trigger another build

* Trying something with compiler hash on cache key backup as well

* builtin, bash it seems

* Empty commit #1

* Move ccache stats to after compile

* Reshuffling ccache vars

* No comments

* Updates to Github output set syntax

* Empty Commit 1

* Empty Commit 2

* Empty commit 3

* /bin/bash -> bash; ccache_cmd for consistency

* Adding ccache -s before and after build

* Adding comments to compiler-hash script

* Let's build cached and non-cached variants together for comparison

* Fixing quotes, /bin/bash -> bash

* Minor var/env adjustment

* Adding ccache -z before the job

* Reverting CMakeLists.txt without CCACHE

* Switching to CMAKE_LANG_COMPILER_LAUNCHER instead of CMakeLists.txt rule

* 5G -> 1G cache size

* 1G -> 2G; Hyperparameter tuning
2021-05-17 11:42:47 +01:00
Qianqian Zhu
6c7e6156ab
Bundle AlignedMemory inputs with MemoryBundle (#147) 2021-05-13 13:18:08 +01:00
Abhishek Aggarwal
6c063c607e Updated CMakeLists.txt to remove packaging steps for wasm compilation
- Removed PACKAGE_DIR cmake option
 - Removed Workerfs, FORCE_FILESYSTEM=1 in wasm builds
   -- File system support is not needed any more (since model,
     shortlist and vocabs are being passed as bytes now)
2021-05-12 16:23:09 +02:00
Abhishek Aggarwal
0189500160 Updated README to remove packaging steps for wasm compilation
- We don't need to package model, shortlist or vocab files into wasm
   binary at build time
2021-05-12 16:23:09 +02:00
Abhishek Aggarwal
e0b9bad058 Updated wasm README to update for passing vocabs as bytes
- Updated Using JS APIs section to pass vocabs as bytes
2021-05-12 16:23:09 +02:00
Abhishek Aggarwal
8a6c7b44a3 Avoid packaging vocab files into wasm binary in CI builds
- We don't need to package vocab files into wasm binary any more
   as a sync with upstream enabled passing vocabs as bytes
2021-05-12 09:55:49 +02:00
Abhishek Aggarwal
451ab047ff Merge remote-tracking branch 'upstream/main' into main 2021-05-12 08:53:25 +02:00
Abhishek Aggarwal
d7cb859ab7 Refactoring TranslationModelBindings class
- typdef AlignedMemory for code readability

 - Added documentation for one of the binding function
2021-05-12 07:32:42 +02:00
Abhishek Aggarwal
5025285e5c Updated wasm test page to pass vocabulary files as bytes 2021-05-12 07:32:42 +02:00
Abhishek Aggarwal
9f78985e45 JS bindings for vocabularies as bytes 2021-05-12 07:32:42 +02:00
Abhishek Aggarwal
331216e017 Enable Debugging information in wasm module builds
- Added "-g2" flag furing linking step
2021-05-11 18:50:55 +02:00
Abhishek Aggarwal
ce576c27f1 Export "addOnPreMain" function from wasm module
- This is required in the extension while using wasm module in a worker environment
2021-05-11 18:50:55 +02:00
Kenneth Heafield
ce01de939d
Change USE_WASM_COMPATIBLE_SOURCE =OFF by default on native, force on for WASM (#138)
* Change WASM_COMPATIBLE_SOURCE=OFF by default

The default was WASN_COMPATIBLE_SOURCE=ON COMPILE_WASM=OFF which is a
testing configuration, not a sensible default for native or wasm.

* Always USE_WASM_COMPATIBLE_SOURCE with COMPILE_WASM

* Set CMP0077 to fix variable handling
2021-05-10 12:28:37 +02:00
Jerin Philip
354e7ac6be
Remove unused used types TokenRanges, SentenceTokenRanges, UPtr (#137) 2021-05-09 13:42:57 +01:00
Nikolay Bogoychev
87adb5d60a Target master of ssplit-cpp 2021-05-07 18:41:08 +01:00
Jerin Philip
bef12765ad
Minor rename: sentence_ranges -> annotation (#134) 2021-05-07 18:38:27 +01:00
Nikolay Bogoychev
21c1cae472
Update ssplit submodule, removing absl (#132)
* Update ssplit submodule, removing absl

* Fix ssplit variables

* Update ssplit branch

* Fix emscripten compilaiton

* Update tests
2021-05-07 17:58:58 +01:00
Qianqian Zhu
5b02008a97
Enable vocabs pass as byte arrays (#122)
* first attempt to enable vocabs pass as byte arrays

* pass vocabs bytes as AlignedMemory

* add vocabIndices to avoid double loading

* small fix on parameter names and documentation

* fix windows build plus tiny update on documentation

* update marian-dev submodule

* move validate model bytearray in BatchTranslator

* small refactors on validateBinaryModel()

* switch vocab memories to std::vector<marian::Ptr<AlignedMemory>>

* update marian-dev submodule

* replace marian::Ptr to std::shared_ptr for vocab memories

* add note for vocab memories
2021-05-07 14:54:48 +01:00
Jerin Philip
b86c76b004
Faithful to source-structure translation (#115)
* First draft of faithful translation

* Comments explaining pre and post

* Comments on response_builder

* Updating bergamot-translator-tests with new outputs

* Cosmetic changes in response target text construction

* Replacing &(x[0]) -> x.data() to avoid illegal indices

* Removing nullptr given both branches init pointer with legal values

* pre, post -> gap(i) addressing review comments

Functions which were pre and post before are subsumed by gap(i), and the
algorithm in ResponseBuilder adjusted to fix.

`x = nullptr` is back, should be harmless.

* Updating brt with paragraph outputs

* Bumping brt with updated outputs, buffer text at begin as well

* Bumping BRT with sync after bytearray collapse merge

* Pointing BRT to main after merge

Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
2021-05-06 16:19:27 +01:00
Jerin Philip
bc2e4eee5c
Making bytearray a commandline switch (#127)
* Adding bytearray option

* collapse intermediate for bytearray apps

* Removing service-cli-bytearray

* Removing the bergamot bytearray app

* Bumping updates to brt collapsing apps

* Reasonable defaults and hard check when cmd enabled

* Update documentation for flags

* Bump brt with MKL check and skip

* Bumping BRT with MKL_FOUND instead of USE_MKL

* Bumping BRT with no mkl enforce

* Bumping BRT with ssse3 output

* Let's try disabling OpenBLAS

* Trying to disable apple accelerate

* Using WASM compatible BLAS can enable intgemm

* Adding a CMake -L to see what exactly is the diff

* Revert "Let's try disabling OpenBLAS"

This reverts commit 9a6b9bc53b.

* Revert "Using WASM compatible BLAS can enable intgemm"

This reverts commit 936a592e18.

* Restricting mac tests through tags and on GitHub CI

* Using only check-bytearray

* Bumping BRT with change of default behaviour
2021-05-06 00:26:03 +01:00
Kenneth Heafield
c61b2bdd10
Fix busy loop in windows (#131)
* Fix busy loop in windows

* Nick wants the while loop gone

* Fix continue leftover

Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
2021-05-06 00:21:50 +01:00
Motin
743ebcd3bc Extension desired changes (#129)
* Enable worker file system
* Avoid node.js-code in emscripten glue-code
2021-05-04 14:53:05 +02:00
Motin
a63533b241
Extension desired changes (#129)
* Enable worker file system
* Avoid node.js-code in emscripten glue-code
2021-05-04 14:25:12 +02:00
Abhishek Aggarwal
c478a626a8 Updating ci scripts for the latest upstream changes
- The upstream browsermt/bergamot-translator builds the wasm artifacts
   in top level build folder now
2021-05-04 13:53:33 +02:00
Abhishek Aggarwal
d8f7e51792 Minor README change
- Changed "browsermt" to "mozilla"
2021-05-04 13:53:33 +02:00
Abhishek Aggarwal
ec3a785d17 Merge remote-tracking branch 'upstream/main' into main
- Sync with upstream (https://github.com/browsermt/bergamot-translator)
2021-05-04 12:09:26 +02:00
abhi-agg
8de368c166
Improved wasm scripts and README (#128) 2021-05-04 10:18:45 +01:00
abhi-agg
1a4add19da
Improve script to patch wasm artifacts and load EN->DE vocabulary in wasm test (#125)
* Improved script that patches wasm artifacts to enable wormhole

 - Made the regex pattern ignore multiple whitespaces b/w words of
   the matching pattern

* Fix for loading EN->DE vocabularies in wasm test page

 - Loading vocabularies for EN->DE was failing because of
   the new structure of bergamot-models
2021-05-03 16:13:43 +01:00
Jerin Philip
36b3c7291a
WASM Bindings collapse (#87)
* Safe transfer of bindings through typedefs

* Removing Translation* files and bringing in counterparts

* Remove previously commented out code

* Removing commented out include

* Absorb Translation* documentation

Co-authored-by: abhi-agg <66322306+abhi-agg@users.noreply.github.com>
2021-05-03 13:41:37 +01:00
Abhishek Aggarwal
4908e4019e Updated wasm/README file with instructions for byte loading APIs 2021-05-03 10:03:17 +02:00
Abhishek Aggarwal
f3a257d40b Enabled gemm-precision in wasm test page
- This increases the inference speed while providing
   models as bytes to the translation engine
   (it wasn't needed while providing models as files)
2021-05-03 10:03:17 +02:00
Nikolay Bogoychev
d82e01eda4
Full windows support with ssplit from browsermt, not a fork (#109)
* Update marian-dev to the newest mac version

* Attempt windows workflow

* force workflow rerun

* Separate id

* Attempt 3 at github action

* Marian dev submodule now compiles with apple clang

* Updated ssplit version to something more recent

* Attempt to fix compile on wasm

* Do not compile subproject tests

* Fix emscripten compilation on Mac

* 99% on the way to windows compile

* Try with a different generator

* Build release not debug

* Revert CMakeLists.txt hacks

* Fix sse2 compilation failure

* MSVC settings for WIN32

* Add nodefaultlib LIBCMT

* Do not compile ssplit.cpp as it contains sys/mman.h

* Revert ab56b9aa4f

* Update paths

* Set the build type to release if not set previously

* Attempt to build release with the windows workflow

* Attempt 5 at VS studio release build

* Attempt 6 at getting release build on MSVC generator

* The windows build is debug at the moment...

* fix ssplit for ubuntu 16.04

* Fix compilation with clang

* Compile on ubuntu16.04

* Explain what is going on

* Updated ssplit and workflow
2021-05-01 00:29:23 +01:00
Nikolay Bogoychev
e286533164 Update to marian-dev master 2021-04-30 22:34:44 +01:00
Abhishek Aggarwal
2788116f8b Better error logging for wasm test page 2021-04-30 09:09:41 +02:00
Abhishek Aggarwal
3525af6a45 Make wasm test page work with bergamot-models repository
- bergamot-models now contains lexical shortlist bin files as well
2021-04-30 09:09:41 +02:00
abhi-agg
de0abfd795
JS bindings for loading model and shortlist files as bytes (#117)
* Bindings to load model and shortlist files as bytes
* Modified wasm test page for byte based loading of files
* Updates wasm README for byte loading based usage of TranslationModel
2021-04-29 12:04:04 +02:00
abhi-agg
e5ec5bdd33
Control validating the config options via a boolean flag (#116)
* Control validating the config options via a boolean flag

 - parseOptions() function now validates the parsed options
   based on the validate argument

* Minor syntactic fix
2021-04-29 09:38:09 +01:00
Jerin Philip
4be96a97d7 Handle empty translation requests
Fixes https://github.com/browsermt/bergamot-translator/issues/101.
ResponseBuilder is called with empty histories to trigger a valid but
mostly-empty response.
2021-04-28 10:13:45 +02:00
Jerin Philip
fa2003e70d
Cleanup API: Refactor request on-complete transition (#80) 2021-04-27 15:56:39 +01:00
Nikolay Bogoychev
fdf9e66cef
Windows workflows and mac framework accelerate (#108)
Windows still failing but getting closer
2021-04-26 18:59:20 +01:00
abhi-agg
7d2e74f3c0
Changed underlying template parameter of AlignedMemory class (#111)
- AlignedMemory is AlignedVector<char> now instead of AlignedVector<const void*>
 - This solves the issue of allocating 8x of the actual required memory for
   loading files as bytes
2021-04-26 16:26:27 +01:00
Nikolay Bogoychev
fc6976ae29
Remove dead code (#107)
Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
2021-04-22 17:29:22 +01:00
Kenneth Heafield
1184875cc9
Windows PCQueue support without Boost (#106) 2021-04-22 16:01:39 +01:00
Jerin Philip
c00c263f8f
Moving small tests to GitHub CI (#93)
Adds regression-tests to the workflow for native minimal/custom marian and full builds. 

Co-authored-by: abhi-agg <66322306+abhi-agg@users.noreply.github.com>
2021-04-16 11:58:53 +01:00
Nikolay Bogoychev
f1fc4f8041
Fix the target_include_directories (#98) 2021-04-14 14:53:35 +02:00
Abhishek Aggarwal
a7f6bb51d9 Minor cleanup in build-wasm.sh file 2021-04-14 14:51:08 +02:00
Abhishek Aggarwal
1574a4586c Merge remote-tracking branch 'upstream/main' into upstream-sync 2021-04-14 14:36:17 +02:00
Abhishek Aggarwal
f5dffeb5ca Downgraded resource class to 'medium' for circle ci
- Also restricted parallel make compilation to 3
2021-04-14 14:19:13 +02:00
Nikolay Bogoychev
e4b58357db
Clarify misleading comment (#99) 2021-04-14 09:56:07 +01:00