Commit Graph

44 Commits

Author SHA1 Message Date
Abhishek Aggarwal
e34420647d
Upgrade emsdk to 3.1.8 (#414)
* Rework WASM compilation options

Necessary to work with newer versions of emscripten that are more picky about which option goes to the compiler, and which to the linker. Also took the opportunity to remove the need for the patching of the bergamot-translation-worker.js file, this can now easily be done through supported apis. Furthermore, I tried to downsize the generated javascript and wasm code a bit.

Initial estimates show that bergamot-translator compiled with emscripten 3.0.0 runs at about 3x the speed of 2.0.9 (when using embedded intgemm). Speed-up when using mozIntGemm is less dramatic.

* Updated marian-dev submodule
* Revert changes specific to patching external gemm modules for wasm
* Better Compilation and Link flags

 - Added "-O3" optimization flag for linking as well
 - "-g2" only for release and debug builds
 - "-g1" for release builds
 - Replaced deprecated "--bind" flag with "-lembind"
 - Removed redundant link flag

* Upgraded emsdk to 3.1.8
* Enclosed EXPORTED_FUNCTIONS values in a list
* Fixed the remaining 2.0.9 reference in circle ci build script
* Updated README

Co-authored-by: Jelmer van der Linde <jelmer@ikhoefgeen.nl>
2022-04-20 00:39:32 +01:00
Jerin Philip
c0f311a8c0
Batteries included python package (#310)
Imports python bindings and associated sources incubated in
https://github.com/jerinphilip/lemonade to bergamot-translator. Adds
 a pybind11 dependency for python bindings.

Following the import, the python build is integrated into the existing 
CMake based build system here. There is a command-line application 
provided through python which provides the ability to fetch and prepare 
models from model-repositories (like browsermt/students or OPUS).

Wheels built for a few common operating systems are provided via GitHub
releases through automated actions configured to run at tagged semantic
versions and pushes to main.

The documentation for python is also integrated into our existing
documentation setup. Previous documentation GitHub action is now
configured to run behind python builds in Ubuntu 18.04 Python3.7,
in order to pick up the packaged as a wheel bergamot module and the
sphinx documentation using the python module.

Formatting checks of black, isort with profile black and a pytype type
checker is configured for the python component residing in this repository.
2022-01-26 20:33:43 +00:00
Jerin Philip
495f98dd0d
Speed up Windows CI with ccache (#308)
Use https://github.com/cristianadam/ccache/releases/ to speed up windows
compilation.

Remove /Zi as it is unsupported by ccache at the moment. This is a debug
flag that was removed in upstream marian-dev
https://github.com/browsermt/marian-dev/pull/43. However, the bergamot
CMakeLists.txt which was originally taken from
marian maintained this under MSCV.
2022-01-22 18:41:04 +00:00
Jerin Philip
3883dd1971
cache: threadsafety-fixes; optional stats collection (#245)
* Make stats hits misses atomic to guard when mutex has multiple buckets
* Use compile time switch for cache-stats-collection bound to COMPILE_TESTS cmake variable
* -DENABLE_CACHE_STATS on if COMPILE_TESTS otherwise optional
* Make stats() call without enabling build fatal abort
2022-01-02 12:33:30 +00:00
Nikolay Bogoychev
8563f0856f
Proper arch setting on win32 (#275)
* Proper arch detection on win32

* Whoops
2021-12-14 23:53:53 +00:00
Abhishek Aggarwal
cafb65e0b5 Wasm builds without SharedArrayBuffer 2021-08-27 09:07:06 +02:00
Jerin Philip
e9e5ac6782
Partial test-apps and tolerance in evaluations (#184)
* Partial test applications

Previously service-cli was used to generate output and accomplish
regression testing for all of: (1) translated-text (2) alignment tokens
+ scores (3) quality scores (4) indirectly annotation and tokenizations.

The --mode native now only outputs a faithful to source translated text
of the input source on stdin.

Test apps are separated into testing only individual functionalities.
This can help in independently testing ssplit-cpp, quality-scores for
the quality estimation implementation etc.

Separating numbers and text have the advantage of being able to compare
one with tolerance using BLEU (text) and some allowed error-rates
(numbers).

* Removing #mac tag

* Moving test apps to src/tests

* Tests are always on for CI

Unit tests are turned off looking for WASM_COMPATIBLE_SOURCES.

* Fixing WASM_COMPATIBLE_SOURCE -> USE_WASM_COMPATIBLE_SOURCE

* Workaround for now; CMakeLists.txt horrors are starting to bite

* BRT: use bergamot-test instead of bergamot now

* This should fix issues: CMakeLists.txt has so many paths

* Casing to camelCase and removing legacyServiceCli

* removing leftover service-cli declaration, some doc updates

* #pragma once is starting to look easier

* All the more reasons to do #pragma once

* Updating marian-dev with intgemm::kCPU print, resolved from INTGEMM_CPUID

* BRT: Use --gemm-highest-arch instead of python script

* Adding intgemm resolve here, where always(?) have intgemm on?

* intgemm-resolve in default binary directory

* BRT: Update to use intgemm-resolve

* marian-dev: Reset to without --gemm-highest-precision

Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
2021-06-14 15:02:42 +01:00
Abhishek Aggarwal
c44868e1fd Import GetVersionFromFile cmake file in root level CMakeLists.txt 2021-05-17 19:34:58 +02:00
Abhishek Aggarwal
6c063c607e Updated CMakeLists.txt to remove packaging steps for wasm compilation
- Removed PACKAGE_DIR cmake option
 - Removed Workerfs, FORCE_FILESYSTEM=1 in wasm builds
   -- File system support is not needed any more (since model,
     shortlist and vocabs are being passed as bytes now)
2021-05-12 16:23:09 +02:00
Kenneth Heafield
ce01de939d
Change USE_WASM_COMPATIBLE_SOURCE =OFF by default on native, force on for WASM (#138)
* Change WASM_COMPATIBLE_SOURCE=OFF by default

The default was WASN_COMPATIBLE_SOURCE=ON COMPILE_WASM=OFF which is a
testing configuration, not a sensible default for native or wasm.

* Always USE_WASM_COMPATIBLE_SOURCE with COMPILE_WASM

* Set CMP0077 to fix variable handling
2021-05-10 12:28:37 +02:00
Nikolay Bogoychev
d82e01eda4
Full windows support with ssplit from browsermt, not a fork (#109)
* Update marian-dev to the newest mac version

* Attempt windows workflow

* force workflow rerun

* Separate id

* Attempt 3 at github action

* Marian dev submodule now compiles with apple clang

* Updated ssplit version to something more recent

* Attempt to fix compile on wasm

* Do not compile subproject tests

* Fix emscripten compilation on Mac

* 99% on the way to windows compile

* Try with a different generator

* Build release not debug

* Revert CMakeLists.txt hacks

* Fix sse2 compilation failure

* MSVC settings for WIN32

* Add nodefaultlib LIBCMT

* Do not compile ssplit.cpp as it contains sys/mman.h

* Revert ab56b9aa4f

* Update paths

* Set the build type to release if not set previously

* Attempt to build release with the windows workflow

* Attempt 5 at VS studio release build

* Attempt 6 at getting release build on MSVC generator

* The windows build is debug at the moment...

* fix ssplit for ubuntu 16.04

* Fix compilation with clang

* Compile on ubuntu16.04

* Explain what is going on

* Updated ssplit and workflow
2021-05-01 00:29:23 +01:00
abhi-agg
2e5daac978
Marian submodule update (#74)
* Updated marian-dev submodule

 - cmake changes required after the submodule update

* Added workflows for building custom marian on mac and ubuntu

* Renamed cmake option

 - Renamed USE_WASM_COMPATIBLE_SOURCES to USE_WASM_COMPATIBLE_SOURCE
 - Use proper compile defnitions
2021-04-01 16:29:02 +01:00
Jerin Philip
bfb5e78602
Alignments + weak quality scores capability in Service (#46)
* Draft adjustments to API

* Adjustments to docs

* Let's call the word + sentence ranges annotations

* Editing confusing comment on size()

* Fixing compilation for template adjustments for SentenceRanges

* string_view template hacks

This commit shifts AnnotatedBlob into a templated type and gets the
troubled part to compile. All to manage absl::string_view and
std::string_view.

Objective: marian::bergamot stays C++ 11 to pluck and put in marian
code, bergamot-translator somehow flexes C++17. Simplify development in
one place.

* Fixing the wiring: Gets source to build

Runtime errors exist, but AnnotatedBlobs are consistent.

* Bugfix: Matching old-state after factoring AnnotatedBlob in

* Removing vocabs_ from Response.

(For the umpteenth time).

* Alignment API ready in marian::bergamot::Response

* Wiring alignments upto TranslationResult

* Adjustment to get alignments; bergamot-translator-app has alignments available

* Accessing words instead of Ids

This code sets up access of word string_views from annotations instead
of printing Ids. However, we have segfault. This is likely due to
targetRanges not being set, pending from
https://github.com/browsermt/bergamot-translator/issues/25.

Could also be a rogue EOS token which we're filtering for in string_view
annotations, but not so in alignments.

* Switching to browsermt/marian-dev@jp/decode-string-view for targetTokenRanges

* Target word byte range annotations available

Issues corresponding to #25 should be resolved. There is still a
segfault. Could be due to EOS. Pending investigation.

* Bugfix: Tokens for alignments are now through.

Was not EOS.

* browsermt/marian-dev@master

ByteRange changes work downstream and has been merged to master.
Updating submodule to point to master.

* Style and documentation enhancements: response.cpp

* Style and documentation enhancements: TranslationResult.h

* Descriptions for SentenceRanges templating

* Switching to marian-dev@wasm-sync

* AnnotatedBlob can be copy-ctord/copy-assigned

* TranslationResult: Empty ctor + WASM Bindings

Allows empty construction of TranslationResult. Using this empty
constructor, WASM bindings are adjusted. Unsure of the results, maybe
@abhi-agg can test.

* Cosmetic: SentenceRangesT -> Annotation

- SentenceRangesT is renamed to AnnotationT;
- Further comments to explain heavily templated files.

* Response: Cleaning up unused members and adding docs

* Adding quality scores - attempt

* Stub QualityScores

This adjustment adds capability to get "scores", which should
potentially indicate how confident (at least relative in a
target-sentence) should be. This enables writing the code forward for
TranslationResult, and an example quality-score people can be pointed
at.

- These are not between [0,1] yet.
- In addition, guards to check out-of-bounds access have been placed so
  illegal accesses are caught early on during development.

* Removing token debug statements

* Reworking Annotation without templates

https://github.com/mozilla/bergamot-translator/issues/8 provides
ByteRanges.

- This ByteRange data-type is used in Annotation and converted
  to marian::string_view(=absl::string-view) on demand.
- Since Annotation[using ByteRange] is not bound to anything else, it
  can be unit tested. A unit test is added (originally to test
  independently for integration after).
- Annotation with ByteRange is now propogated across marian::bergamot
  and functionality matched to how it was previously working.

This eliminates the string-view conversion and template code.

* Nit: Removing std::endl flushes

* Bring TranslationResult and Response closer

Helps https://github.com/browsermt/bergamot-translator/issues/53.

In preparation , the data-export types for Quality and Alignment are
pushed down to Response from TranslationResult and computed during
construction. This brings TranslationResult closer to Response, paving
way to avoid having two TranslationResults.

histories_ only remain for marian-decoder replacement usage, which can
be removed in a separate PR.

* Clean up hacks originally added for a unit-test to compile

* Moving Annotation functions to cpp and documenting header file

* Shifting alignments, qualityScore testing capability into main-mts

* Restore Unified API files to previous state

* Adaptations to fix Response with Quality, Alignments to connect to old Unified API

* Missing reset on TranslationResultBindings

* Cleaning up Response documentation to reflect newer code

* Minor adjustments to get build back after main sync

* Marian seems to make available Catch somehow

* Disable COMPILE_BERGAMOT_TESTS for WASM

* Add COMPILE_BERGAMOT_TESTS as a CMakeDependent option

* Use the COMPILE_TESTS flag instead to skip macos.yml

* Trigger unit-tests on GitHub runners for Annotation

* Reordering enable_testing() to before inclusion of test directory

* doc constructs required to operate with alignments

Documents with doxygen compatible documentation for Response,
AnnotatedBlob, Annotation, ByteRange.

Incorporates doxygen compatible documentation for

* Updates ByteRange consistent with general C++

Also little documentation enhancements in the process.

* Updating marian-dev@9337105

* Copy-paste documentation because lazy

* Turn off autoformat and manually edit to fix style changes

* AnnotatedBlob -> AnnotatedText; blob -> text

* text.text in test app renamed

* text of text -> blob of text in places of documentation
2021-03-31 17:41:36 +01:00
Abhishek Aggarwal
fdbce5705b Update marian-dev submodule to master
- Earlier it was using 'wasm' branch
 - CMakefile changes
 - Github workflow change
2021-03-26 10:02:13 +01:00
Abhishek Aggarwal
0be73705d9 Fixed native builds while using wasm compatible sources
- main-mts and marian-decoder-new can't be used because
   it uses multi-threaded variant of Service class
2021-02-26 14:55:30 +01:00
Abhishek Aggarwal
4d4acf6b8b Cleanup CMakeFiles.txt
- Renamed USE_WASM_COMPATIBLE_MARIAN to USE_WASM_COMPATIBLE_SOURCES
 - Removed COMPILE_THREAD_VARIANT cmake option and removed
   corresponding compile definition
 - Updated workflows and READMEs accordingly
2021-02-26 14:17:48 +01:00
Abhishek Aggarwal
c2b1c6eab4 Use system installed PCRE2 for builds using full blown marian
- USE_INTERNAL_PCRE2 is ON for custom marian builds while OFF
   for full marian builds
2021-02-24 20:02:48 +01:00
Abhishek Aggarwal
4369a56f90 Enable building marian executables for vanilla marian builds
- COMPILE_LIBRARY_ONLY is set to ON only for wasm compatible marian
   builds
2021-02-23 18:15:33 +01:00
Abhishek Aggarwal
415d16bd1d Single cmake option to enable/disable wasm compatible marian compilation
- USE_WASM_COMPATIBLE_MARIAN=off will start using vanilla Marian
   i.e. with full threading support, with exceptions, with MKL

 - Changed the relevant documentation
2021-02-23 16:15:05 +01:00
Abhishek Aggarwal
458176c050 Enable building pcre2 from sources for ssplit submodule
- USE_INTERNAL_PCRE2 is set to ON
 - Sentence splitting is working (tested it via wasm test page)
2021-02-22 18:51:48 +01:00
Jerin Philip
10dcb8f548 Merge remote-tracking branch 'origin/wasm-integration' into jp/absorb-batch-translator
Merging wasm-integration. Single thread codepath seems functional.
Multithreading is broken.
2021-02-17 13:08:58 +00:00
Jerin Philip
44a44fa156 CMake build with submodule recursive clones 2021-02-17 11:48:00 +00:00
Abhishek Aggarwal
c5c5339489 Re-enable simd shuffle pattern for intgemm compilation 2021-02-15 17:18:59 +01:00
Abhishek Aggarwal
3607523c24 Enabled COMPILE_WITHOUT_EXCEPTIONS for marian submodule 2021-02-15 16:54:50 +01:00
Motin
9a5cf30bbb Revert "Enabled simd shuffle pattern for intgemm compilation"
This reverts commit 3dd7a60b35.
2021-02-15 15:03:00 +02:00
Motin
9a5ae9568e Turn of assertions and disable exception catching for wasm builds 2021-02-15 14:24:59 +02:00
Abhishek Aggarwal
3dd7a60b35 Enabled simd shuffle pattern for intgemm compilation
- WORMHOLE cmake option is set to ON when compiling for WASM
 - WASM module might not run on Chrome
2021-02-15 12:58:18 +01:00
Abhishek Aggarwal
28dcf55b41 Improved cmake to use wasm compilation flags across project 2021-02-12 11:36:33 +01:00
Abhishek Aggarwal
ff95e37f89 Improved cmake option PACKAGE_DIR 2021-02-11 23:52:37 +01:00
Abhishek Aggarwal
de501e8f96 Added JS binding files and cmake infrastructure to build them
- Added "wasm" folder
 - Contains README file as well
2021-02-11 23:36:29 +01:00
Abhishek Aggarwal
74b06d863e Add wasm folder to compile JS bindings 2021-02-11 19:09:30 +01:00
Abhishek Aggarwal
23a9527824 Source code changes to compile the project without threads
- Set COMPILE_THREAD_VARIANT cmake option to ON to compile
   multithreaded variant of the project
2021-02-11 16:57:14 +01:00
Abhishek Aggarwal
838547e4d5 Set cmake options of marian properly for this project 2021-02-11 15:42:18 +01:00
Abhishek Aggarwal
b73d4f4cc2 Set cmake option to compile marian library only
- Set COMPILE_LIBRARY_ONLY to ON for marian library
2021-02-11 15:37:38 +01:00
Abhishek Aggarwal
9747d9ba83 Add cmake option to compile project on WASM
- Set cmake option COMPILE_WASM to ON to compile the project
   on WASM
2021-02-11 15:34:27 +01:00
Abhishek Aggarwal
9a54d2116c Updated marian-dev submodule
- Switch to "wasm" branch of browsermt/marian-dev
2021-02-08 13:46:59 +01:00
Jerin Philip
2929077324 Reordering git submodule update before includes 2021-02-02 14:41:26 +00:00
Jerin Philip
548c8880ff CMake updates submodules 2021-02-02 14:39:19 +00:00
Abhishek Aggarwal
026f1af887 Removed redundant lines from CMakeFile 2021-01-26 10:46:35 +01:00
Jerin Philip
bde9094728 Updating CMakeLists to build main
CMakeLists have been modified with the necessary includes to add
browsermt/mts@nuke files to the bergamot-translator library. In
addition, adds the ssplit dependency, corresponding includes.

Intel MKL fails on compilation, unable to find libraries. To solve this
3rd_party/CMakeLists.txt is modified with @ug's fixes to propogate
variables (EXT_LIBS, etc) at a library level.
2021-01-20 19:52:34 +00:00
Abhishek Aggarwal
f8c9a6b0cc Added an application showing usage of bergamot translator
- 'app' folder contains the application
 - The application uses dummy requests and responses for now
2020-11-16 15:44:02 +01:00
Abhishek Aggarwal
358d76871f Small change: Added New line endings 2020-11-11 17:18:12 +01:00
Abhishek Aggarwal
a220f915fc Compile marian submodule in the project
- marian compiles successfully and is ready to be used
   in the project
2020-11-11 16:19:54 +01:00
Abhishek Aggarwal
e5f3d51eff Basic skeleton code for the Unified API specification
- Contains classes for the API specification (doc/Unified_API.md)
 - Things to be changed/decided later:
     Use of std::string_view to represent ranges
     Adding Alignment information
     Basic Setters and Getters for some of the classes
2020-11-03 10:23:05 +01:00