Commit Graph

242 Commits

Author SHA1 Message Date
Abhishek Aggarwal
c1ef6f2bcb Added cmake file to compute version information
- Reads BERGAMOT_VERSION file for generating various strings
   for versioning
2021-05-17 19:34:58 +02:00
Kenneth Heafield
3e70587672
Rewrite annotation class to remove corner cases (#135) 2021-05-17 16:42:18 +01:00
Qianqian Zhu
5bd1fc6b83
Refactor vocabs in Service (#143)
Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
2021-05-17 13:09:03 +01:00
Jerin Philip
77424a3df1
Enabling ccache on github builds for Ubuntu (#95)
* CI Changes to add tiny regression tests

* Adding an inspect cache step

* Removing ccache, pursue in another

* Incorporating Nick's changes through submodule merge

* Submodule now points to master

* Restoring ccache enabled workflow file

* Restoring ccache enabled CMakeLists

* cache -> ccache typo fix

* Moving CCACHE setup to GitHub runner file

* Find also uses CCACHE dir

* Updating CMakeLists not to override env

* Cache compiler binary's contents

* Changing a few names to trigger new build; Testing cache looks fun

* USE_CCACHE=on, -L for inspection

* Adding a ccache_cmd, but will only use in next commit

* Using ccache_cmd

* Removing "

* Adding compiler hash script

* Bunch of absolute paths

* GITHUB_WORKSPACE typo

* Nah, I'll keep -L and trigger another build

* Trying something with compiler hash on cache key backup as well

* builtin, bash it seems

* Empty commit #1

* Move ccache stats to after compile

* Reshuffling ccache vars

* No comments

* Updates to Github output set syntax

* Empty Commit 1

* Empty Commit 2

* Empty commit 3

* /bin/bash -> bash; ccache_cmd for consistency

* Adding ccache -s before and after build

* Adding comments to compiler-hash script

* Let's build cached and non-cached variants together for comparison

* Fixing quotes, /bin/bash -> bash

* Minor var/env adjustment

* Adding ccache -z before the job

* Reverting CMakeLists.txt without CCACHE

* Switching to CMAKE_LANG_COMPILER_LAUNCHER instead of CMakeLists.txt rule

* 5G -> 1G cache size

* 1G -> 2G; Hyperparameter tuning
2021-05-17 11:42:47 +01:00
Qianqian Zhu
6c7e6156ab
Bundle AlignedMemory inputs with MemoryBundle (#147) 2021-05-13 13:18:08 +01:00
Abhishek Aggarwal
6c063c607e Updated CMakeLists.txt to remove packaging steps for wasm compilation
- Removed PACKAGE_DIR cmake option
 - Removed Workerfs, FORCE_FILESYSTEM=1 in wasm builds
   -- File system support is not needed any more (since model,
     shortlist and vocabs are being passed as bytes now)
2021-05-12 16:23:09 +02:00
Abhishek Aggarwal
0189500160 Updated README to remove packaging steps for wasm compilation
- We don't need to package model, shortlist or vocab files into wasm
   binary at build time
2021-05-12 16:23:09 +02:00
Abhishek Aggarwal
e0b9bad058 Updated wasm README to update for passing vocabs as bytes
- Updated Using JS APIs section to pass vocabs as bytes
2021-05-12 16:23:09 +02:00
Abhishek Aggarwal
d7cb859ab7 Refactoring TranslationModelBindings class
- typdef AlignedMemory for code readability

 - Added documentation for one of the binding function
2021-05-12 07:32:42 +02:00
Abhishek Aggarwal
5025285e5c Updated wasm test page to pass vocabulary files as bytes 2021-05-12 07:32:42 +02:00
Abhishek Aggarwal
9f78985e45 JS bindings for vocabularies as bytes 2021-05-12 07:32:42 +02:00
Abhishek Aggarwal
331216e017 Enable Debugging information in wasm module builds
- Added "-g2" flag furing linking step
2021-05-11 18:50:55 +02:00
Abhishek Aggarwal
ce576c27f1 Export "addOnPreMain" function from wasm module
- This is required in the extension while using wasm module in a worker environment
2021-05-11 18:50:55 +02:00
Kenneth Heafield
ce01de939d
Change USE_WASM_COMPATIBLE_SOURCE =OFF by default on native, force on for WASM (#138)
* Change WASM_COMPATIBLE_SOURCE=OFF by default

The default was WASN_COMPATIBLE_SOURCE=ON COMPILE_WASM=OFF which is a
testing configuration, not a sensible default for native or wasm.

* Always USE_WASM_COMPATIBLE_SOURCE with COMPILE_WASM

* Set CMP0077 to fix variable handling
2021-05-10 12:28:37 +02:00
Jerin Philip
354e7ac6be
Remove unused used types TokenRanges, SentenceTokenRanges, UPtr (#137) 2021-05-09 13:42:57 +01:00
Nikolay Bogoychev
87adb5d60a Target master of ssplit-cpp 2021-05-07 18:41:08 +01:00
Jerin Philip
bef12765ad
Minor rename: sentence_ranges -> annotation (#134) 2021-05-07 18:38:27 +01:00
Nikolay Bogoychev
21c1cae472
Update ssplit submodule, removing absl (#132)
* Update ssplit submodule, removing absl

* Fix ssplit variables

* Update ssplit branch

* Fix emscripten compilaiton

* Update tests
2021-05-07 17:58:58 +01:00
Qianqian Zhu
5b02008a97
Enable vocabs pass as byte arrays (#122)
* first attempt to enable vocabs pass as byte arrays

* pass vocabs bytes as AlignedMemory

* add vocabIndices to avoid double loading

* small fix on parameter names and documentation

* fix windows build plus tiny update on documentation

* update marian-dev submodule

* move validate model bytearray in BatchTranslator

* small refactors on validateBinaryModel()

* switch vocab memories to std::vector<marian::Ptr<AlignedMemory>>

* update marian-dev submodule

* replace marian::Ptr to std::shared_ptr for vocab memories

* add note for vocab memories
2021-05-07 14:54:48 +01:00
Jerin Philip
b86c76b004
Faithful to source-structure translation (#115)
* First draft of faithful translation

* Comments explaining pre and post

* Comments on response_builder

* Updating bergamot-translator-tests with new outputs

* Cosmetic changes in response target text construction

* Replacing &(x[0]) -> x.data() to avoid illegal indices

* Removing nullptr given both branches init pointer with legal values

* pre, post -> gap(i) addressing review comments

Functions which were pre and post before are subsumed by gap(i), and the
algorithm in ResponseBuilder adjusted to fix.

`x = nullptr` is back, should be harmless.

* Updating brt with paragraph outputs

* Bumping brt with updated outputs, buffer text at begin as well

* Bumping BRT with sync after bytearray collapse merge

* Pointing BRT to main after merge

Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
2021-05-06 16:19:27 +01:00
Jerin Philip
bc2e4eee5c
Making bytearray a commandline switch (#127)
* Adding bytearray option

* collapse intermediate for bytearray apps

* Removing service-cli-bytearray

* Removing the bergamot bytearray app

* Bumping updates to brt collapsing apps

* Reasonable defaults and hard check when cmd enabled

* Update documentation for flags

* Bump brt with MKL check and skip

* Bumping BRT with MKL_FOUND instead of USE_MKL

* Bumping BRT with no mkl enforce

* Bumping BRT with ssse3 output

* Let's try disabling OpenBLAS

* Trying to disable apple accelerate

* Using WASM compatible BLAS can enable intgemm

* Adding a CMake -L to see what exactly is the diff

* Revert "Let's try disabling OpenBLAS"

This reverts commit 9a6b9bc53b.

* Revert "Using WASM compatible BLAS can enable intgemm"

This reverts commit 936a592e18.

* Restricting mac tests through tags and on GitHub CI

* Using only check-bytearray

* Bumping BRT with change of default behaviour
2021-05-06 00:26:03 +01:00
Kenneth Heafield
c61b2bdd10
Fix busy loop in windows (#131)
* Fix busy loop in windows

* Nick wants the while loop gone

* Fix continue leftover

Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
2021-05-06 00:21:50 +01:00
Motin
a63533b241
Extension desired changes (#129)
* Enable worker file system
* Avoid node.js-code in emscripten glue-code
2021-05-04 14:25:12 +02:00
abhi-agg
8de368c166
Improved wasm scripts and README (#128) 2021-05-04 10:18:45 +01:00
abhi-agg
1a4add19da
Improve script to patch wasm artifacts and load EN->DE vocabulary in wasm test (#125)
* Improved script that patches wasm artifacts to enable wormhole

 - Made the regex pattern ignore multiple whitespaces b/w words of
   the matching pattern

* Fix for loading EN->DE vocabularies in wasm test page

 - Loading vocabularies for EN->DE was failing because of
   the new structure of bergamot-models
2021-05-03 16:13:43 +01:00
Jerin Philip
36b3c7291a
WASM Bindings collapse (#87)
* Safe transfer of bindings through typedefs

* Removing Translation* files and bringing in counterparts

* Remove previously commented out code

* Removing commented out include

* Absorb Translation* documentation

Co-authored-by: abhi-agg <66322306+abhi-agg@users.noreply.github.com>
2021-05-03 13:41:37 +01:00
Abhishek Aggarwal
4908e4019e Updated wasm/README file with instructions for byte loading APIs 2021-05-03 10:03:17 +02:00
Abhishek Aggarwal
f3a257d40b Enabled gemm-precision in wasm test page
- This increases the inference speed while providing
   models as bytes to the translation engine
   (it wasn't needed while providing models as files)
2021-05-03 10:03:17 +02:00
Nikolay Bogoychev
d82e01eda4
Full windows support with ssplit from browsermt, not a fork (#109)
* Update marian-dev to the newest mac version

* Attempt windows workflow

* force workflow rerun

* Separate id

* Attempt 3 at github action

* Marian dev submodule now compiles with apple clang

* Updated ssplit version to something more recent

* Attempt to fix compile on wasm

* Do not compile subproject tests

* Fix emscripten compilation on Mac

* 99% on the way to windows compile

* Try with a different generator

* Build release not debug

* Revert CMakeLists.txt hacks

* Fix sse2 compilation failure

* MSVC settings for WIN32

* Add nodefaultlib LIBCMT

* Do not compile ssplit.cpp as it contains sys/mman.h

* Revert ab56b9aa4f

* Update paths

* Set the build type to release if not set previously

* Attempt to build release with the windows workflow

* Attempt 5 at VS studio release build

* Attempt 6 at getting release build on MSVC generator

* The windows build is debug at the moment...

* fix ssplit for ubuntu 16.04

* Fix compilation with clang

* Compile on ubuntu16.04

* Explain what is going on

* Updated ssplit and workflow
2021-05-01 00:29:23 +01:00
Nikolay Bogoychev
e286533164 Update to marian-dev master 2021-04-30 22:34:44 +01:00
Abhishek Aggarwal
2788116f8b Better error logging for wasm test page 2021-04-30 09:09:41 +02:00
Abhishek Aggarwal
3525af6a45 Make wasm test page work with bergamot-models repository
- bergamot-models now contains lexical shortlist bin files as well
2021-04-30 09:09:41 +02:00
abhi-agg
de0abfd795
JS bindings for loading model and shortlist files as bytes (#117)
* Bindings to load model and shortlist files as bytes
* Modified wasm test page for byte based loading of files
* Updates wasm README for byte loading based usage of TranslationModel
2021-04-29 12:04:04 +02:00
abhi-agg
e5ec5bdd33
Control validating the config options via a boolean flag (#116)
* Control validating the config options via a boolean flag

 - parseOptions() function now validates the parsed options
   based on the validate argument

* Minor syntactic fix
2021-04-29 09:38:09 +01:00
Jerin Philip
4be96a97d7 Handle empty translation requests
Fixes https://github.com/browsermt/bergamot-translator/issues/101.
ResponseBuilder is called with empty histories to trigger a valid but
mostly-empty response.
2021-04-28 10:13:45 +02:00
Jerin Philip
fa2003e70d
Cleanup API: Refactor request on-complete transition (#80) 2021-04-27 15:56:39 +01:00
Nikolay Bogoychev
fdf9e66cef
Windows workflows and mac framework accelerate (#108)
Windows still failing but getting closer
2021-04-26 18:59:20 +01:00
abhi-agg
7d2e74f3c0
Changed underlying template parameter of AlignedMemory class (#111)
- AlignedMemory is AlignedVector<char> now instead of AlignedVector<const void*>
 - This solves the issue of allocating 8x of the actual required memory for
   loading files as bytes
2021-04-26 16:26:27 +01:00
Nikolay Bogoychev
fc6976ae29
Remove dead code (#107)
Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
2021-04-22 17:29:22 +01:00
Kenneth Heafield
1184875cc9
Windows PCQueue support without Boost (#106) 2021-04-22 16:01:39 +01:00
Jerin Philip
c00c263f8f
Moving small tests to GitHub CI (#93)
Adds regression-tests to the workflow for native minimal/custom marian and full builds. 

Co-authored-by: abhi-agg <66322306+abhi-agg@users.noreply.github.com>
2021-04-16 11:58:53 +01:00
Nikolay Bogoychev
f1fc4f8041
Fix the target_include_directories (#98) 2021-04-14 14:53:35 +02:00
Nikolay Bogoychev
e4b58357db
Clarify misleading comment (#99) 2021-04-14 09:56:07 +01:00
Jerin Philip
3daa024eb3
Strengthen the Annotation class: Handle empty sentences and tests (#85)
* Changing Annotation to adhere to [begin, end)

* Stronger unit tests on sentences + num words, num sentences

* Hotfix with empty string view from EOS

* No more absolving empty-sentence; Added tests now defined behaviour

* Uncommenting important section in unit test

* Ensure empty string view default, beginning at end so marker points

* Further strengthen and comment unit-tests, mark exactly where empty sentence is happening

* Review comments: Dummy sentence + docs

- What should be a simple fast accessor is turning into compute.
  Normally the way to deal with this, for better or worse, is to put 0 at
  the beginning of sentenceEndIds_. (Putting 0 at the beginning of
  sentenceEndIds_)

- Indices into what? Mentioned to be flatByteRanges_.

* Documentation updates

* More changes to docs

Co-authored-by: abhi-agg <66322306+abhi-agg@users.noreply.github.com>
2021-04-12 17:05:23 +01:00
Nikolay Bogoychev
b345b0e035
Rudimentary validator for binary files (#94)
* Rudimentary validator for binary files
2021-04-12 13:46:47 +02:00
Nikolay Bogoychev
5e15d73b7e
Consistent api usage (#91)
* Consistent api between the two versions of the executables in app folder

* Remove shared ptrs
2021-04-09 10:34:24 +02:00
Jerin Philip
b71b3a18d8
Removes vocabs and propogates fixes for breaks (#79)
* Removes vocabs and propogates fixes for breaks

* Prettify diff: Undoing comment shuffles due to merge conflict edits

* 20% of time actual work, 80% prettifying diff

* Histories members -> poof!

We however have Histories in constructor, which we will remove out of the way
soon.

Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
2021-04-07 12:15:46 +01:00
Kenneth Heafield
27a3a3253f
Make AlignedMemory the means of passing in memory (#86) 2021-04-06 13:23:55 +01:00
Qianqian Zhu
f654ab0f71
Enable binary shortlist loading from bytebuffer (#69)
Contains "hack" that must go immediately by editing TranslationModel, to come in following commit.  

* add shortlist_memory and update service-cli-bytearray test

* update marian-dev

* address review comments

* fix ccompliation and tests failures and further address review comments

* small update on marian-dev (based on browsermt/marian-dev PR#28)

* update marian-dev with upstream

* code refactoring according to review

* fix marian-dev submodule conflicts

* switch MemoryGift to AlignedVector

* copy aligned.h from kpu/intgemm for AlignedVector

* changes based on memory ownership and AlignedVector

* fix BatchTranslator inits

* small fixes according to review comments

* update submodule marian-dev to master

* update submodule marian-dev with upstream

Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
2021-04-01 19:36:07 +01:00
abhi-agg
2e5daac978
Marian submodule update (#74)
* Updated marian-dev submodule

 - cmake changes required after the submodule update

* Added workflows for building custom marian on mac and ubuntu

* Renamed cmake option

 - Renamed USE_WASM_COMPATIBLE_SOURCES to USE_WASM_COMPATIBLE_SOURCE
 - Use proper compile defnitions
2021-04-01 16:29:02 +01:00