Commit Graph

231 Commits

Author SHA1 Message Date
Abhishek Aggarwal
ec3a785d17 Merge remote-tracking branch 'upstream/main' into main
- Sync with upstream (https://github.com/browsermt/bergamot-translator)
2021-05-04 12:09:26 +02:00
abhi-agg
8de368c166
Improved wasm scripts and README (#128) 2021-05-04 10:18:45 +01:00
abhi-agg
1a4add19da
Improve script to patch wasm artifacts and load EN->DE vocabulary in wasm test (#125)
* Improved script that patches wasm artifacts to enable wormhole

 - Made the regex pattern ignore multiple whitespaces b/w words of
   the matching pattern

* Fix for loading EN->DE vocabularies in wasm test page

 - Loading vocabularies for EN->DE was failing because of
   the new structure of bergamot-models
2021-05-03 16:13:43 +01:00
Jerin Philip
36b3c7291a
WASM Bindings collapse (#87)
* Safe transfer of bindings through typedefs

* Removing Translation* files and bringing in counterparts

* Remove previously commented out code

* Removing commented out include

* Absorb Translation* documentation

Co-authored-by: abhi-agg <66322306+abhi-agg@users.noreply.github.com>
2021-05-03 13:41:37 +01:00
Abhishek Aggarwal
4908e4019e Updated wasm/README file with instructions for byte loading APIs 2021-05-03 10:03:17 +02:00
Abhishek Aggarwal
f3a257d40b Enabled gemm-precision in wasm test page
- This increases the inference speed while providing
   models as bytes to the translation engine
   (it wasn't needed while providing models as files)
2021-05-03 10:03:17 +02:00
Nikolay Bogoychev
d82e01eda4
Full windows support with ssplit from browsermt, not a fork (#109)
* Update marian-dev to the newest mac version

* Attempt windows workflow

* force workflow rerun

* Separate id

* Attempt 3 at github action

* Marian dev submodule now compiles with apple clang

* Updated ssplit version to something more recent

* Attempt to fix compile on wasm

* Do not compile subproject tests

* Fix emscripten compilation on Mac

* 99% on the way to windows compile

* Try with a different generator

* Build release not debug

* Revert CMakeLists.txt hacks

* Fix sse2 compilation failure

* MSVC settings for WIN32

* Add nodefaultlib LIBCMT

* Do not compile ssplit.cpp as it contains sys/mman.h

* Revert ab56b9aa4f

* Update paths

* Set the build type to release if not set previously

* Attempt to build release with the windows workflow

* Attempt 5 at VS studio release build

* Attempt 6 at getting release build on MSVC generator

* The windows build is debug at the moment...

* fix ssplit for ubuntu 16.04

* Fix compilation with clang

* Compile on ubuntu16.04

* Explain what is going on

* Updated ssplit and workflow
2021-05-01 00:29:23 +01:00
Nikolay Bogoychev
e286533164 Update to marian-dev master 2021-04-30 22:34:44 +01:00
Abhishek Aggarwal
2788116f8b Better error logging for wasm test page 2021-04-30 09:09:41 +02:00
Abhishek Aggarwal
3525af6a45 Make wasm test page work with bergamot-models repository
- bergamot-models now contains lexical shortlist bin files as well
2021-04-30 09:09:41 +02:00
abhi-agg
de0abfd795
JS bindings for loading model and shortlist files as bytes (#117)
* Bindings to load model and shortlist files as bytes
* Modified wasm test page for byte based loading of files
* Updates wasm README for byte loading based usage of TranslationModel
2021-04-29 12:04:04 +02:00
abhi-agg
e5ec5bdd33
Control validating the config options via a boolean flag (#116)
* Control validating the config options via a boolean flag

 - parseOptions() function now validates the parsed options
   based on the validate argument

* Minor syntactic fix
2021-04-29 09:38:09 +01:00
Jerin Philip
4be96a97d7 Handle empty translation requests
Fixes https://github.com/browsermt/bergamot-translator/issues/101.
ResponseBuilder is called with empty histories to trigger a valid but
mostly-empty response.
2021-04-28 10:13:45 +02:00
Jerin Philip
fa2003e70d
Cleanup API: Refactor request on-complete transition (#80) 2021-04-27 15:56:39 +01:00
Nikolay Bogoychev
fdf9e66cef
Windows workflows and mac framework accelerate (#108)
Windows still failing but getting closer
2021-04-26 18:59:20 +01:00
abhi-agg
7d2e74f3c0
Changed underlying template parameter of AlignedMemory class (#111)
- AlignedMemory is AlignedVector<char> now instead of AlignedVector<const void*>
 - This solves the issue of allocating 8x of the actual required memory for
   loading files as bytes
2021-04-26 16:26:27 +01:00
Nikolay Bogoychev
fc6976ae29
Remove dead code (#107)
Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
2021-04-22 17:29:22 +01:00
Kenneth Heafield
1184875cc9
Windows PCQueue support without Boost (#106) 2021-04-22 16:01:39 +01:00
Jerin Philip
c00c263f8f
Moving small tests to GitHub CI (#93)
Adds regression-tests to the workflow for native minimal/custom marian and full builds. 

Co-authored-by: abhi-agg <66322306+abhi-agg@users.noreply.github.com>
2021-04-16 11:58:53 +01:00
Nikolay Bogoychev
f1fc4f8041
Fix the target_include_directories (#98) 2021-04-14 14:53:35 +02:00
Abhishek Aggarwal
a7f6bb51d9 Minor cleanup in build-wasm.sh file 2021-04-14 14:51:08 +02:00
Abhishek Aggarwal
1574a4586c Merge remote-tracking branch 'upstream/main' into upstream-sync 2021-04-14 14:36:17 +02:00
Abhishek Aggarwal
f5dffeb5ca Downgraded resource class to 'medium' for circle ci
- Also restricted parallel make compilation to 3
2021-04-14 14:19:13 +02:00
Nikolay Bogoychev
e4b58357db
Clarify misleading comment (#99) 2021-04-14 09:56:07 +01:00
Jerin Philip
3daa024eb3
Strengthen the Annotation class: Handle empty sentences and tests (#85)
* Changing Annotation to adhere to [begin, end)

* Stronger unit tests on sentences + num words, num sentences

* Hotfix with empty string view from EOS

* No more absolving empty-sentence; Added tests now defined behaviour

* Uncommenting important section in unit test

* Ensure empty string view default, beginning at end so marker points

* Further strengthen and comment unit-tests, mark exactly where empty sentence is happening

* Review comments: Dummy sentence + docs

- What should be a simple fast accessor is turning into compute.
  Normally the way to deal with this, for better or worse, is to put 0 at
  the beginning of sentenceEndIds_. (Putting 0 at the beginning of
  sentenceEndIds_)

- Indices into what? Mentioned to be flatByteRanges_.

* Documentation updates

* More changes to docs

Co-authored-by: abhi-agg <66322306+abhi-agg@users.noreply.github.com>
2021-04-12 17:05:23 +01:00
Nikolay Bogoychev
b345b0e035
Rudimentary validator for binary files (#94)
* Rudimentary validator for binary files
2021-04-12 13:46:47 +02:00
Nikolay Bogoychev
5e15d73b7e
Consistent api usage (#91)
* Consistent api between the two versions of the executables in app folder

* Remove shared ptrs
2021-04-09 10:34:24 +02:00
Jerin Philip
b71b3a18d8
Removes vocabs and propogates fixes for breaks (#79)
* Removes vocabs and propogates fixes for breaks

* Prettify diff: Undoing comment shuffles due to merge conflict edits

* 20% of time actual work, 80% prettifying diff

* Histories members -> poof!

We however have Histories in constructor, which we will remove out of the way
soon.

Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
2021-04-07 12:15:46 +01:00
Kenneth Heafield
27a3a3253f
Make AlignedMemory the means of passing in memory (#86) 2021-04-06 13:23:55 +01:00
Qianqian Zhu
f654ab0f71
Enable binary shortlist loading from bytebuffer (#69)
Contains "hack" that must go immediately by editing TranslationModel, to come in following commit.  

* add shortlist_memory and update service-cli-bytearray test

* update marian-dev

* address review comments

* fix ccompliation and tests failures and further address review comments

* small update on marian-dev (based on browsermt/marian-dev PR#28)

* update marian-dev with upstream

* code refactoring according to review

* fix marian-dev submodule conflicts

* switch MemoryGift to AlignedVector

* copy aligned.h from kpu/intgemm for AlignedVector

* changes based on memory ownership and AlignedVector

* fix BatchTranslator inits

* small fixes according to review comments

* update submodule marian-dev to master

* update submodule marian-dev with upstream

Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
2021-04-01 19:36:07 +01:00
abhi-agg
2e5daac978
Marian submodule update (#74)
* Updated marian-dev submodule

 - cmake changes required after the submodule update

* Added workflows for building custom marian on mac and ubuntu

* Renamed cmake option

 - Renamed USE_WASM_COMPATIBLE_SOURCES to USE_WASM_COMPATIBLE_SOURCE
 - Use proper compile defnitions
2021-04-01 16:29:02 +01:00
Kenneth Heafield
3068ed58ff
Explictly install gcc version and use 8 (#81)
* Try to fix gcc missingness in CI
2021-04-01 13:56:16 +01:00
evgeny pavlov
47db7e2b3e Change bergamot app to process stdin texts 2021-04-01 13:22:19 +02:00
Jerin Philip
bfb5e78602
Alignments + weak quality scores capability in Service (#46)
* Draft adjustments to API

* Adjustments to docs

* Let's call the word + sentence ranges annotations

* Editing confusing comment on size()

* Fixing compilation for template adjustments for SentenceRanges

* string_view template hacks

This commit shifts AnnotatedBlob into a templated type and gets the
troubled part to compile. All to manage absl::string_view and
std::string_view.

Objective: marian::bergamot stays C++ 11 to pluck and put in marian
code, bergamot-translator somehow flexes C++17. Simplify development in
one place.

* Fixing the wiring: Gets source to build

Runtime errors exist, but AnnotatedBlobs are consistent.

* Bugfix: Matching old-state after factoring AnnotatedBlob in

* Removing vocabs_ from Response.

(For the umpteenth time).

* Alignment API ready in marian::bergamot::Response

* Wiring alignments upto TranslationResult

* Adjustment to get alignments; bergamot-translator-app has alignments available

* Accessing words instead of Ids

This code sets up access of word string_views from annotations instead
of printing Ids. However, we have segfault. This is likely due to
targetRanges not being set, pending from
https://github.com/browsermt/bergamot-translator/issues/25.

Could also be a rogue EOS token which we're filtering for in string_view
annotations, but not so in alignments.

* Switching to browsermt/marian-dev@jp/decode-string-view for targetTokenRanges

* Target word byte range annotations available

Issues corresponding to #25 should be resolved. There is still a
segfault. Could be due to EOS. Pending investigation.

* Bugfix: Tokens for alignments are now through.

Was not EOS.

* browsermt/marian-dev@master

ByteRange changes work downstream and has been merged to master.
Updating submodule to point to master.

* Style and documentation enhancements: response.cpp

* Style and documentation enhancements: TranslationResult.h

* Descriptions for SentenceRanges templating

* Switching to marian-dev@wasm-sync

* AnnotatedBlob can be copy-ctord/copy-assigned

* TranslationResult: Empty ctor + WASM Bindings

Allows empty construction of TranslationResult. Using this empty
constructor, WASM bindings are adjusted. Unsure of the results, maybe
@abhi-agg can test.

* Cosmetic: SentenceRangesT -> Annotation

- SentenceRangesT is renamed to AnnotationT;
- Further comments to explain heavily templated files.

* Response: Cleaning up unused members and adding docs

* Adding quality scores - attempt

* Stub QualityScores

This adjustment adds capability to get "scores", which should
potentially indicate how confident (at least relative in a
target-sentence) should be. This enables writing the code forward for
TranslationResult, and an example quality-score people can be pointed
at.

- These are not between [0,1] yet.
- In addition, guards to check out-of-bounds access have been placed so
  illegal accesses are caught early on during development.

* Removing token debug statements

* Reworking Annotation without templates

https://github.com/mozilla/bergamot-translator/issues/8 provides
ByteRanges.

- This ByteRange data-type is used in Annotation and converted
  to marian::string_view(=absl::string-view) on demand.
- Since Annotation[using ByteRange] is not bound to anything else, it
  can be unit tested. A unit test is added (originally to test
  independently for integration after).
- Annotation with ByteRange is now propogated across marian::bergamot
  and functionality matched to how it was previously working.

This eliminates the string-view conversion and template code.

* Nit: Removing std::endl flushes

* Bring TranslationResult and Response closer

Helps https://github.com/browsermt/bergamot-translator/issues/53.

In preparation , the data-export types for Quality and Alignment are
pushed down to Response from TranslationResult and computed during
construction. This brings TranslationResult closer to Response, paving
way to avoid having two TranslationResults.

histories_ only remain for marian-decoder replacement usage, which can
be removed in a separate PR.

* Clean up hacks originally added for a unit-test to compile

* Moving Annotation functions to cpp and documenting header file

* Shifting alignments, qualityScore testing capability into main-mts

* Restore Unified API files to previous state

* Adaptations to fix Response with Quality, Alignments to connect to old Unified API

* Missing reset on TranslationResultBindings

* Cleaning up Response documentation to reflect newer code

* Minor adjustments to get build back after main sync

* Marian seems to make available Catch somehow

* Disable COMPILE_BERGAMOT_TESTS for WASM

* Add COMPILE_BERGAMOT_TESTS as a CMakeDependent option

* Use the COMPILE_TESTS flag instead to skip macos.yml

* Trigger unit-tests on GitHub runners for Annotation

* Reordering enable_testing() to before inclusion of test directory

* doc constructs required to operate with alignments

Documents with doxygen compatible documentation for Response,
AnnotatedBlob, Annotation, ByteRange.

Incorporates doxygen compatible documentation for

* Updates ByteRange consistent with general C++

Also little documentation enhancements in the process.

* Updating marian-dev@9337105

* Copy-paste documentation because lazy

* Turn off autoformat and manually edit to fix style changes

* AnnotatedBlob -> AnnotatedText; blob -> text

* text.text in test app renamed

* text of text -> blob of text in places of documentation
2021-03-31 17:41:36 +01:00
Abhishek Aggarwal
e0dca1ba1b Renamed github workflow files
- Naming follows
   <target-arch>-<nature-of-marian>-<runner-os>

   (wasm|native)-(full_marian|custom_marian)-(ubuntu|mac)
2021-03-26 10:02:13 +01:00
Abhishek Aggarwal
fdbce5705b Update marian-dev submodule to master
- Earlier it was using 'wasm' branch
 - CMakefile changes
 - Github workflow change
2021-03-26 10:02:13 +01:00
Abhishek Aggarwal
f38a0bfbcc Remove AbstractTranslationModel class and its references 2021-03-25 10:04:47 +01:00
Jerin Philip
a3250b401f
Marian compatible documentation tooling (#67)
Adds doxygen configurations, additional sphinx which consumes the doxygen files to generate developer API, compatible with marian-nmt/marian-dev.
2021-03-24 17:00:53 +00:00
abhi-agg
12e9232066
Patch WASM artifacts to run optimized (wormhole enabled) inference (#68)
* A script to patch the wasm artifacts to use wormhole via
   APIs that instantiate WASM module
* Updated README
* Load just production ready models
* Shallow clone bergamot-models repo since it has such a large history
* Improved wasm test_page
 - test page can load all 5 language pairs
 - Use intgemm.alpha* models
* Refactor the code that patches wasm artifacts to enable wormhole

Co-authored-by: Andre Natal <anatal@gmail.com>
Co-authored-by: Motin <motin@motin.eu>
2021-03-24 17:10:42 +01:00
Jerin Philip
34228d37bf
Collapse Service into one class instead of three (#62)
* Merging two Services

* Moving stop() logic to destructor

* We have WITH_PTHREADS back

* string based constructor on Service

* Removing now empty service_base.* files

* Hiding away pcqueue_ construction

Ugliest ifdefs I have done in my life.

* Another ifdef to hide pcqueue header file

* Missing semicolons in WITH_PTHREADS path

* Fixing async_translate residue argument from copy

* Adding comments

* Initialize batchtranslator only at one place

To reduce tax for bytebuffer loads, initialize batchtranslator only at
one place.

* \#ifdef WITH_PTHREADS -> #ifndef WASM_HIDE_THREADS

Sane platform (non WASM) is default. This truly only hide-threads from
compilation path and not switch unswitch pthreads (-lpthread).

* Review comments: Rearranging destructor, fix wrong comment

* Move loadVocabularies to service.cpp and put in anonymous namespace

* Prettifying diff: Removing unwanted empty lines

* Indicate in comments multithreaded has numWorkers translators

* Typo fix: bergamot_translator -> bergamot-translator

* Safety guards to avoid pcqueue illegal init

* Add WASM_HIDE_THREADS as a global WASM_COMPILE_FLAG

* Compile Defs: WASM_HIDE_THREADS -> __EMSCRIPTEN__

* Removing dead CMakeLists.txt code following __EMSCRIPTEN__

* Compile defs: __EMSCRIPTEN__ -> WASM
2021-03-23 16:36:13 +00:00
Nikolay Bogoychev
d75dd85def
Load mode as a byte array (#55)
* Switch to wasm branch for this example

* Load marian model from a byte array

* Sanitise executable names

* Change marian branch

* Update marian branch that loads binary models

* Example of loading model as a byte array

* Add the byte array loading files

* Die on misaligned memory

* Remove the unused argument

* Allow loading without a ptr parameter so that we don't break emc workflow
2021-03-22 14:22:56 +00:00
Abhishek Aggarwal
bf28edad82 Improved wasm test_page
- test page can load all 5 language pairs
 - Use intgemm.aplha* models
 - start_server.sh script automatically enable simdwormhole via
   APIs that instantiate WASM module
2021-03-11 14:38:16 +01:00
Motin
d1ecd007a6 Shallow clone bergamot-models repo since it has such a large history 2021-03-11 14:38:16 +01:00
Andre Natal
a2d6650097 Patch to load just production ready models 2021-03-11 14:38:16 +01:00
Abhishek Aggarwal
6e7b7c71ec Updates README for enabling simdwormhole in WASM APIs 2021-03-11 13:44:14 +01:00
Abhishek Aggarwal
4f124e7976 Enabled simdwormhole in github workflows 2021-03-11 13:44:14 +01:00
Abhishek Aggarwal
8c8913e2ef Use intgemm models in wasm test_page 2021-03-11 13:44:14 +01:00
Andre Natal
e96d7047a7 Enable SIMD wormhole in circle ci build scripts
- Modify the APIs that compile & instantiate WASM module
2021-03-11 13:44:14 +01:00
Jerin Philip
f89c989b44 apt-update for ubuntu github actions 2021-03-11 11:30:21 +01:00
abhi-agg
c64deb50a8
Imported CI scripts from mozilla/bergamot-translator-old (#1)
* CircleCI config, docs and badge

* Increase CircleCI RAM from 4gb to 16gb

Co-authored-by: Motin <motin@motin.eu>
2021-03-10 09:30:39 -08:00