Commit Graph

446 Commits

Author SHA1 Message Date
Abhishek Aggarwal
b00116cb94
Refactor wasm bindings to use consistent interface names as in native (#195)
* Refactored wasm bindings code
 - Replaced TranslationModel, TranslationRequest and TranslationResult
    with Service, ResponseOptions and Response
 - Corresponding documentation changes
 - Names of the bindings files changed
 - Moved Vector<Response> definition in Response specific bindings
   file
2021-06-15 16:02:14 +02:00
Jerin Philip
4b014665ba
Removing alignments and quality-scores test-code (#196)
* Removing alignments and quality-scores test-code
* BRT: Update to main
2021-06-14 18:40:41 +01:00
Jerin Philip
e9e5ac6782
Partial test-apps and tolerance in evaluations (#184)
* Partial test applications

Previously service-cli was used to generate output and accomplish
regression testing for all of: (1) translated-text (2) alignment tokens
+ scores (3) quality scores (4) indirectly annotation and tokenizations.

The --mode native now only outputs a faithful to source translated text
of the input source on stdin.

Test apps are separated into testing only individual functionalities.
This can help in independently testing ssplit-cpp, quality-scores for
the quality estimation implementation etc.

Separating numbers and text have the advantage of being able to compare
one with tolerance using BLEU (text) and some allowed error-rates
(numbers).

* Removing #mac tag

* Moving test apps to src/tests

* Tests are always on for CI

Unit tests are turned off looking for WASM_COMPATIBLE_SOURCES.

* Fixing WASM_COMPATIBLE_SOURCE -> USE_WASM_COMPATIBLE_SOURCE

* Workaround for now; CMakeLists.txt horrors are starting to bite

* BRT: use bergamot-test instead of bergamot now

* This should fix issues: CMakeLists.txt has so many paths

* Casing to camelCase and removing legacyServiceCli

* removing leftover service-cli declaration, some doc updates

* #pragma once is starting to look easier

* All the more reasons to do #pragma once

* Updating marian-dev with intgemm::kCPU print, resolved from INTGEMM_CPUID

* BRT: Use --gemm-highest-arch instead of python script

* Adding intgemm resolve here, where always(?) have intgemm on?

* intgemm-resolve in default binary directory

* BRT: Update to use intgemm-resolve

* marian-dev: Reset to without --gemm-highest-precision

Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
2021-06-14 15:02:42 +01:00
Abhishek Aggarwal
16eb47f47e
Generating cmake configured project version (.js) file in build folder (#194)
- Earlier this file was being generated in folder containing
   actual sources

 - Fixes https://github.com/browsermt/bergamot-translator/issues/161
2021-06-09 13:57:23 +01:00
Jerin Philip
3039dea34b
Fixing if syntax with YAML var subsitution (#188) 2021-06-09 10:21:23 +01:00
Jerin Philip
dc2fb3d64e
CMake fixes: Generate project.h in binary dir, fix GetVersionFromFile for use as submodule. (#193)
* Use CMAKE_CURRENT_SOURCE_DIR instead of CMAKE_SOURCE_DIR for project bound version string

* marian-dev cmake fix

* Generate project.h in binary dir

* We don't want people asking about extra spaces
2021-06-09 10:12:00 +01:00
Abhishek Aggarwal
3e46e3391c Consistent EMSDK version and parallel make jobs in README and github actions
- Set EMSDK version to 2.0.9 to make it consistent
   everywhere in repo
 - Set parallel make jobs to 2
2021-06-09 11:10:10 +02:00
Jerin Philip
d39e0277c6
Replace resize with possible negative range with pop_back() (#189) 2021-06-05 00:28:53 +01:00
Jerin Philip
71a62405e7
Update native (ubuntu, mac) workflows with ccache (#181)
* Matrix is now more organized, Ubuntu 20.04-gcc9.3, Ubuntu-18.04-gcc7.5 is added.
* ccache is extended to MacOS, and brings down CI run times to <5m when
  ccache works.
* The compiler hash scripts are gone, ccache already covers most ground
  by default. The shell script is unnecessary. Cache works by preprocessor
  mode output of running the compiler with -E, which includes the
  necessary information. ccache-docs:How the cache works.
* BRT if failed prints the final 20 lines of the test*.log to inspect
  what's going wrong without having to artifact download.
* Pull request on any branch triggers workflow.
* Push on main and ci-sandbox triggers workflow.
2021-06-04 11:52:36 +01:00
Kenneth Heafield
5f0d3963e2
Remove addSentenceWithPriority (#186) 2021-06-04 00:09:20 +01:00
Jerin Philip
73228bbb4a
Updating marian-dev: intgemm with env variable matmul switches (#187) 2021-06-03 21:01:26 +01:00
Jerin Philip
330840338c
Including WASM documentation in sphinx build toc (#176) 2021-06-01 12:39:28 +01:00
Jerin Philip
ceaf21a532
Deploy generated documentation only if browsermt (#179) 2021-06-01 11:00:53 +01:00
Jerin Philip
5d3ec9c0a9
Single executable (#175)
* Collapsing executables

* Adding new test executable

* Deleting old executable sources

* Updating brt to operate with modes

* cli-framework -> cli

* Updating workflows to check for bergamot instead of bergamot-translator-app

* Adding documentation

* Making fn pure virtual

* Shuffling apps into app namespace, alongside class documentation

* Include app folder in documentation

* BRT update service-cli -> native

* parser.h: service-cli -> native

* Updates to marian-integration.md

* Cleanup: Remove templates, interface proper

* change 4 to 2 cores for build instructions

* service-cli -> native

* Commenting the string constructor explanation

* Not doing halfway interface / inheritance

* Nick hates state, let's try this one

* Revert "Nick hates state, let's try this one"

This reverts commit e56db9f474.

* class -> struct before trying std::function stuff

* oop -> functional?

* Hints on what is happening

* app::ftable -> app::REGISTRY

* We have if-else and functions now.

And we won't have test apps.

* Doc linking to usage examples in brt

* Remove unordered_map

* Documentation updates

* Fix warning
2021-05-31 14:44:59 +01:00
Jerin Philip
eb579ed26f
Updating marian dev RelwithDebInfo -> Release (#178)
* Updating marian dev RelwithDebInfo -> Release

* Updating submodule to point to master
2021-05-27 10:51:53 +01:00
Qianqian Zhu
8bec1b7b6b
Fix failures when loading text shortlist (#154) 2021-05-25 12:05:16 +01:00
Jerin Philip
576afae6b3
Adding documentation action (#168)
Adds a GitHub workflow that builds documentation from sources through doxygen through sphinx on push to the main branch or on push of any semantic version tags. The built documentation is deployed at https://github.com/browsermt/docs@gh-pages, which is rendered at https://browser.mt/docs/<suffix>, where <suffix> is 'main' or a tag vM.m.p corresponding to a semantic version.

On pull request artifacts are uploaded for reviewers to inspect if need be.
2021-05-25 11:10:56 +01:00
Jerin Philip
22a1b9113e
Remove O(N^2) reallocation (#171) 2021-05-22 00:04:49 +01:00
Jerin Philip
f1253720a8
Bumping BRT for hotfixes (#169)
* Bumping BRT for hotfixes

* updating brt to point to main
2021-05-20 12:49:44 +01:00
Nikolay Bogoychev
4f8050be64 Update tests 2021-05-20 11:03:10 +01:00
Motin
4b177d57e4
GitHub action to push browsermt/main branch to mozilla/bergamot-translator every hour (#160)
* Create push-browsermt-main-to-mozilla-main.yml

* Update .github/workflows/push-browsermt-main-to-mozilla-main.yml

Co-authored-by: Graeme <graemenail@gmail.com>

* Tweaks

* Fix yaml syntax

* Parametrized the workflow based on @jerinphilip's example

Co-authored-by: Graeme <graemenail@gmail.com>
2021-05-20 09:33:58 +03:00
Motin
0f8f8e026a
Pin emsdk version to the same one used in Circle CI (#165) 2021-05-20 08:59:30 +03:00
Jerin Philip
9dcf6ab665
Adding clang-format and updating existing sources to adhere (#151)
* Adding a first version of clang-format

* Adding run-clang-format.py

* Adding coding styles to workflow

* Fix indentation on coding-styles workflow

* run-clang-format.'py'

* -style -> --style in python

* Updating ColumnLimit: 120

* Format update with clang-format

* Revert "Format update with clang-format"

This reverts commit 5340b19eae.

* Apply update after sync

* Removing a few empty lines

* Removing one more empty line

* Removing empty in workflow file

* Updating README with coding style instructions

* clang-format-* provided in this repository doc update

Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
2021-05-19 21:50:21 +01:00
Qianqian Zhu
7ad8d0a04d
initialise MemoryBundle members (#167) 2021-05-19 20:11:20 +01:00
Kenneth Heafield
89bd47342b
Use binary lexical shortlist in documentation (#152)
* Use binary lexical shortlist in documentation

* MKL/AppleAccelerate note

Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
Co-authored-by: Jerin Philip <jphilip@ed.ac.uk>
2021-05-19 10:44:32 +01:00
Kenneth Heafield
b25f223fe4
Rewriting batching for threadsafety (#155)
This does make the batcher a critical section across job submission and
cleaving though.  If that becomes a problem, we should go back to
incoming and outgoing queues with a batcher thread.

Also removes blocking mode from native compiles.

Note that translateMultiple no longer guarantees great batching.  Guess
we could lease the mutex from ThreadsafeBatcher and create a session.

There is the risk that one sentence comes in at a time and each thread
grabs one sentence at a time instead of better batching.  Not sure what
to do about that other than some sort of Nagle algorithm.

Due to non-deterministic batching, even with one thread, the regression
tests will go haywire.
2021-05-18 16:11:14 +01:00
Jerin Philip
269edc7ce5
Collapsing TranslationRequest -> ResponseOptions (#139) 2021-05-18 14:25:25 +01:00
abhi-agg
8b621de358
Merge pull request #159 from mozilla/main
Merge histories across bergamot-translator forks
2021-05-18 13:53:26 +02:00
abhi-agg
813e81c10c
Merge branch 'main' into main 2021-05-18 13:53:12 +02:00
Nikolay Bogoychev
10131c731a
Marian submodule with unified loading (#157) 2021-05-18 12:45:22 +01:00
Motin
1c40cc8289
Merge branch 'main' into main 2021-05-18 13:44:08 +03:00
Abhishek Aggarwal
7a973df74d Corrected the version number
- To be in sync with versioning in mozilla/bergamot-translator repo
2021-05-18 12:17:56 +02:00
Abhishek Aggarwal
b73714e222 Merge remote-tracking branch 'upstream/main' into main
- Sync with upstream (https://github.com/browsermt/bergamot-translator)
2021-05-18 08:48:41 +02:00
Abhishek Aggarwal
067076fbc1 Bumped version to 0.3.0
- This brings the version info in sync with the various releases
   of extension
2021-05-17 19:34:58 +02:00
Abhishek Aggarwal
0ad583cc34 Generate project version file for native builds
- The header file exposes a function that provides version information
   for native binaries
2021-05-17 19:34:58 +02:00
Abhishek Aggarwal
2e5880d3d4 Modified wasm cmake file to include version information in built artifacts 2021-05-17 19:34:58 +02:00
Abhishek Aggarwal
c44868e1fd Import GetVersionFromFile cmake file in root level CMakeLists.txt 2021-05-17 19:34:58 +02:00
Abhishek Aggarwal
c1ef6f2bcb Added cmake file to compute version information
- Reads BERGAMOT_VERSION file for generating various strings
   for versioning
2021-05-17 19:34:58 +02:00
Kenneth Heafield
3e70587672
Rewrite annotation class to remove corner cases (#135) 2021-05-17 16:42:18 +01:00
Qianqian Zhu
5bd1fc6b83
Refactor vocabs in Service (#143)
Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
2021-05-17 13:09:03 +01:00
Jerin Philip
77424a3df1
Enabling ccache on github builds for Ubuntu (#95)
* CI Changes to add tiny regression tests

* Adding an inspect cache step

* Removing ccache, pursue in another

* Incorporating Nick's changes through submodule merge

* Submodule now points to master

* Restoring ccache enabled workflow file

* Restoring ccache enabled CMakeLists

* cache -> ccache typo fix

* Moving CCACHE setup to GitHub runner file

* Find also uses CCACHE dir

* Updating CMakeLists not to override env

* Cache compiler binary's contents

* Changing a few names to trigger new build; Testing cache looks fun

* USE_CCACHE=on, -L for inspection

* Adding a ccache_cmd, but will only use in next commit

* Using ccache_cmd

* Removing "

* Adding compiler hash script

* Bunch of absolute paths

* GITHUB_WORKSPACE typo

* Nah, I'll keep -L and trigger another build

* Trying something with compiler hash on cache key backup as well

* builtin, bash it seems

* Empty commit #1

* Move ccache stats to after compile

* Reshuffling ccache vars

* No comments

* Updates to Github output set syntax

* Empty Commit 1

* Empty Commit 2

* Empty commit 3

* /bin/bash -> bash; ccache_cmd for consistency

* Adding ccache -s before and after build

* Adding comments to compiler-hash script

* Let's build cached and non-cached variants together for comparison

* Fixing quotes, /bin/bash -> bash

* Minor var/env adjustment

* Adding ccache -z before the job

* Reverting CMakeLists.txt without CCACHE

* Switching to CMAKE_LANG_COMPILER_LAUNCHER instead of CMakeLists.txt rule

* 5G -> 1G cache size

* 1G -> 2G; Hyperparameter tuning
2021-05-17 11:42:47 +01:00
Qianqian Zhu
6c7e6156ab
Bundle AlignedMemory inputs with MemoryBundle (#147) 2021-05-13 13:18:08 +01:00
Abhishek Aggarwal
6c063c607e Updated CMakeLists.txt to remove packaging steps for wasm compilation
- Removed PACKAGE_DIR cmake option
 - Removed Workerfs, FORCE_FILESYSTEM=1 in wasm builds
   -- File system support is not needed any more (since model,
     shortlist and vocabs are being passed as bytes now)
2021-05-12 16:23:09 +02:00
Abhishek Aggarwal
0189500160 Updated README to remove packaging steps for wasm compilation
- We don't need to package model, shortlist or vocab files into wasm
   binary at build time
2021-05-12 16:23:09 +02:00
Abhishek Aggarwal
e0b9bad058 Updated wasm README to update for passing vocabs as bytes
- Updated Using JS APIs section to pass vocabs as bytes
2021-05-12 16:23:09 +02:00
Abhishek Aggarwal
8a6c7b44a3 Avoid packaging vocab files into wasm binary in CI builds
- We don't need to package vocab files into wasm binary any more
   as a sync with upstream enabled passing vocabs as bytes
2021-05-12 09:55:49 +02:00
Abhishek Aggarwal
451ab047ff Merge remote-tracking branch 'upstream/main' into main 2021-05-12 08:53:25 +02:00
Abhishek Aggarwal
d7cb859ab7 Refactoring TranslationModelBindings class
- typdef AlignedMemory for code readability

 - Added documentation for one of the binding function
2021-05-12 07:32:42 +02:00
Abhishek Aggarwal
5025285e5c Updated wasm test page to pass vocabulary files as bytes 2021-05-12 07:32:42 +02:00
Abhishek Aggarwal
9f78985e45 JS bindings for vocabularies as bytes 2021-05-12 07:32:42 +02:00