* Remove warmhole references
* Remove more references to the WORMHOLE
* Update marian to wormhole removed marian
* Whoops
---------
Co-authored-by: Jelmer van der Linde <jelmer@ikhoefgeen.nl>
* Replace most of the wasm demo page with code from the firefox extension
This code should be more generic and copy/pastable into other projects. Maybe one day it will be an npm package?
* Fix Ukrainian model support
* Add quality estimation output
Automatically enabled when the model(s) support it
* Little "Translating…" indicator
* Don't make Safari fail on something tiny
* Rewire lots of async state to be able to predictably know when the translator is working or not
Previously so much was lazy loaded that it was not easy to catch lack of SIMD support. Now I can just enable the interface only after it has properly loaded.
* No need for a two-stage setup for the worker. Just promise to call `initialize()`!
* More (correct) types and comments for code
* Keyboard shortcuts for input area for bold, italic and underline.
Enough to demo mark-up translation
* Fix `delete()`
* Move javascript glue code into its own npm package
* Add nodejs support and test to package
* More stand-alone build command
…for now, not really used by anything I think
* Ignore build packages
* Use local filesystem for build so it is automatically cached
* fix overflow on demo page
But this might break the mobile demo? I'll have to check into that
* Bring back integrity check, except for NodeJS for now
* Make `build` part of `prepare` so we always make sure we build a complete package
* Move worker code into its own folder
This way I can mark it as a commonjs module which will help cause nodejs treat the files the same as WebWorkers do right now. Firefox doesn't implement `{type: 'module'}` yet for WebWorkers.
* Add README
* Fix paths
* Add npm publish automation
* Make sure webpack ignores node compatibility code
* Add missing webpack:ignore around a worker
* Default to getting models from S3
* Separate "loading" and "translating" indicators
* Bump npm package version
* Add credits
* Don't block on the worker loading
* Not just Mozilla, but Bergamot!
* Make individual translation requests cancelable
* Swap button turns vertically when in skyscraper mode
* Make it easier to debug errors from inside the worker
* Don't bork on deleting a failed worker
* Don't bork on calling translate() with a failed worker
* Handle compilation error with more grace
* `contenteditable=true` seems to work better with some browser extensions
Looking at you, Vimium!
* Clean up abort promise
* Bump npm package version
* Remove `workerUrl` option in favour of better webpack support
With that option it was hard for Webpack to figure out dependencies, and it did not enter my worker script for rewriting. With the hardcoded url it does, and with a bit of `new webpack.DefinePlugin({'typeof self': JSON.stringify('object')}),` we can have webpack remove node-specific code on build!
* Bump version
Minor API change hehe
Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
* Expand the node-test.js example code with documentation
Is there a better way to document code than by providing an annotated & working example of it? Just listing all the exposed methods feels like giving people a box of bricks and expecting them to build a house with it.
* Use @Jerin's feedback to simplify node-test.js explanations
* Use native `console.assert` instead
See #426 for an explanation
* Fix comment
Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
* ARM Support using ruy and simd_utils
* Adding ARM build on GitHub CI
* Add workflow and successful build
ssplit-cpp modified to get cross compiled android on GitHub CI working.
* Client side fixes for int8 no shift on ARM [python]
* Revert "Client side fixes for int8 no shift on ARM [python]"
This reverts commit 020af05a8b.
* moving int8shift no-op inside the library
* Bump 3rd-party/marian-dev
* update the marian branch test
* arm backend works
* Latest and greatest clang-format
Co-authored-by: Jerin Philip <jerinphilip@live.in>
* Remove trailing whitespace
* Additional MacOS wheels: Wheels for python 3.6 to 3.10 with a
minimum target of MacOS 10.9
* Install bergamot package from wheel directory
* Remove no-index as we need dependencies
Import
https://gist.github.com/jelmervdl/a4c8b6b92ad88a885e1cbd51c6ad4902 and
attach it to CI. NodeJS-14 is failing on trying to use the WebAssembly
binary. So we use node-16 independently setup. This paves way for more
complicated testing for WebAssembly bindings in the future.
Old GitHub CI using Ubuntu and MacOS explicitly and building wheels have
been removed in favour of the more portable pypa specified builds. These
wheels should work just as well across a wider range of distributions.
pybind11:CMakeLists.txt requires Development.Module instead of
Development.* to avoid Embed from getting in the way of manylinux
builds.
manylinux_x86_64 builds are added for cp3.6 - 3.10. The linux build
uses an old image via docker. Since the docker images are able to use
shared ccache folder, builds quite fast on warm starts.
ccache usage in setup.py is now triggered by an environment variable.
This allows for builds not to fail if ccache not present.
On tag pushes corresponding to versions, CI is configured to deliver
built wheels to PyPI, reading from repository secrets.
Improves setup.py including documentation and some formatting, and
additional links to source.
Fixes: #315
* Rework WASM compilation options
Necessary to work with newer versions of emscripten that are more picky about which option goes to the compiler, and which to the linker. Also took the opportunity to remove the need for the patching of the bergamot-translation-worker.js file, this can now easily be done through supported apis. Furthermore, I tried to downsize the generated javascript and wasm code a bit.
Initial estimates show that bergamot-translator compiled with emscripten 3.0.0 runs at about 3x the speed of 2.0.9 (when using embedded intgemm). Speed-up when using mozIntGemm is less dramatic.
* Updated marian-dev submodule
* Revert changes specific to patching external gemm modules for wasm
* Better Compilation and Link flags
- Added "-O3" optimization flag for linking as well
- "-g2" only for release and debug builds
- "-g1" for release builds
- Replaced deprecated "--bind" flag with "-lembind"
- Removed redundant link flag
* Upgraded emsdk to 3.1.8
* Enclosed EXPORTED_FUNCTIONS values in a list
* Fixed the remaining 2.0.9 reference in circle ci build script
* Updated README
Co-authored-by: Jelmer van der Linde <jelmer@ikhoefgeen.nl>
* Use a more vanilla windows workflow from translateLocally, remove the
complicated lukka/*. Also removes port overrides in the overall upgrade.
* Disable vcpkg binary caching
* Remove PCRE library hacks after upstream ssplit improvements
Brings in the previously wasm.yml into python.yml and the new file is
renamed as build.yml.
python.yml already had a version and pre-release jobs. These jobs
downloaded artefacts from prior ran jobs (python wheel builds). The
newly attached emscripten build now uploads artefacts of a WebAssembly
binary and javascript file which are fed into the release and
pre-release jobs in addition to the existing python builds.
Enables ccache for emscripten. The configuration uses pyiodide for a
reference (https://github.com/pyodide/pyodide/pull/1805).
Two workflows to run on macOS and Ubuntu, reduced to one on Ubuntu. As
emscripten and the target is cross-platform, also macOS runners being
limited - it makes sense to have this removed.
Upload artefact enabled in preparation for a release action to be
scheduled which will upload the bergamot*.wasm and bergamot*.js for
consumption.
Imports python bindings and associated sources incubated in
https://github.com/jerinphilip/lemonade to bergamot-translator. Adds
a pybind11 dependency for python bindings.
Following the import, the python build is integrated into the existing
CMake based build system here. There is a command-line application
provided through python which provides the ability to fetch and prepare
models from model-repositories (like browsermt/students or OPUS).
Wheels built for a few common operating systems are provided via GitHub
releases through automated actions configured to run at tagged semantic
versions and pushes to main.
The documentation for python is also integrated into our existing
documentation setup. Previous documentation GitHub action is now
configured to run behind python builds in Ubuntu 18.04 Python3.7,
in order to pick up the packaged as a wheel bergamot module and the
sphinx documentation using the python module.
Formatting checks of black, isort with profile black and a pytype type
checker is configured for the python component residing in this repository.
* Convert marian-integration markdown to rst
* Convert native run into a script, include in rst
* Check with CI that the native running example works without fail
* Updated marian-dev submodule
* Import wasm gemm from a separate wasm module
- The fallback implementation of gemm is currently being imported dynamically
for wasm target
* Updated CI scripts and README to import GEMM from a separate wasm module
* Setting model config to int8shiftAlphaAll in wasm test page
Adds a clang-tidy run in addition to the existing clang-format checks.
The clang-tidy checks are not enforced, but is potentially useful to
point to during review.
A cmake change has caused vcpkg to fail without much error message,
which is causing windows workflow runs to fail. Details in the following
link:
* https://github.com/microsoft/vcpkg/issues/18718
To fix, we're going with a version bump in vcpkg. Seeing that run-vcpkg
also seems to have gotten an update, updating run-vcpkg from 7.3 to 7.4
Playing with fire: vcpkg master commit
* Change ResponseBuilder to accept callback
Breaks things everywhere, now we follow the compiler to fix and convert
the std::future -> callback.
* More std::future -> callback
* std::future out of service.{h,cpp}
* compile is working, so is callback
* Some reshuffling of args
* Fixing merge error
* Fixing signature conflicts out of merge
* Fixing that test duct-taping future
* Minor adjustment to get that future back
* Add documentation for the new callback function
* Applying clang-format after update
* Using default responseOptions
* Remove future references from documentation
* translateMultiple only for WASM (#177)
* BRT: update to main; fresh-failures hopefully
* Converting test translateFromStdin to use callback
* BRT: Add fresh #native and #wasm tags
* future from promise, fix error
* Adding #native to GitHub CI
Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
* Partial test applications
Previously service-cli was used to generate output and accomplish
regression testing for all of: (1) translated-text (2) alignment tokens
+ scores (3) quality scores (4) indirectly annotation and tokenizations.
The --mode native now only outputs a faithful to source translated text
of the input source on stdin.
Test apps are separated into testing only individual functionalities.
This can help in independently testing ssplit-cpp, quality-scores for
the quality estimation implementation etc.
Separating numbers and text have the advantage of being able to compare
one with tolerance using BLEU (text) and some allowed error-rates
(numbers).
* Removing #mac tag
* Moving test apps to src/tests
* Tests are always on for CI
Unit tests are turned off looking for WASM_COMPATIBLE_SOURCES.
* Fixing WASM_COMPATIBLE_SOURCE -> USE_WASM_COMPATIBLE_SOURCE
* Workaround for now; CMakeLists.txt horrors are starting to bite
* BRT: use bergamot-test instead of bergamot now
* This should fix issues: CMakeLists.txt has so many paths
* Casing to camelCase and removing legacyServiceCli
* removing leftover service-cli declaration, some doc updates
* #pragma once is starting to look easier
* All the more reasons to do #pragma once
* Updating marian-dev with intgemm::kCPU print, resolved from INTGEMM_CPUID
* BRT: Use --gemm-highest-arch instead of python script
* Adding intgemm resolve here, where always(?) have intgemm on?
* intgemm-resolve in default binary directory
* BRT: Update to use intgemm-resolve
* marian-dev: Reset to without --gemm-highest-precision
Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
* Matrix is now more organized, Ubuntu 20.04-gcc9.3, Ubuntu-18.04-gcc7.5 is added.
* ccache is extended to MacOS, and brings down CI run times to <5m when
ccache works.
* The compiler hash scripts are gone, ccache already covers most ground
by default. The shell script is unnecessary. Cache works by preprocessor
mode output of running the compiler with -E, which includes the
necessary information. ccache-docs:How the cache works.
* BRT if failed prints the final 20 lines of the test*.log to inspect
what's going wrong without having to artifact download.
* Pull request on any branch triggers workflow.
* Push on main and ci-sandbox triggers workflow.
* Collapsing executables
* Adding new test executable
* Deleting old executable sources
* Updating brt to operate with modes
* cli-framework -> cli
* Updating workflows to check for bergamot instead of bergamot-translator-app
* Adding documentation
* Making fn pure virtual
* Shuffling apps into app namespace, alongside class documentation
* Include app folder in documentation
* BRT update service-cli -> native
* parser.h: service-cli -> native
* Updates to marian-integration.md
* Cleanup: Remove templates, interface proper
* change 4 to 2 cores for build instructions
* service-cli -> native
* Commenting the string constructor explanation
* Not doing halfway interface / inheritance
* Nick hates state, let's try this one
* Revert "Nick hates state, let's try this one"
This reverts commit e56db9f474.
* class -> struct before trying std::function stuff
* oop -> functional?
* Hints on what is happening
* app::ftable -> app::REGISTRY
* We have if-else and functions now.
And we won't have test apps.
* Doc linking to usage examples in brt
* Remove unordered_map
* Documentation updates
* Fix warning
Adds a GitHub workflow that builds documentation from sources through doxygen through sphinx on push to the main branch or on push of any semantic version tags. The built documentation is deployed at https://github.com/browsermt/docs@gh-pages, which is rendered at https://browser.mt/docs/<suffix>, where <suffix> is 'main' or a tag vM.m.p corresponding to a semantic version.
On pull request artifacts are uploaded for reviewers to inspect if need be.
* Adding a first version of clang-format
* Adding run-clang-format.py
* Adding coding styles to workflow
* Fix indentation on coding-styles workflow
* run-clang-format.'py'
* -style -> --style in python
* Updating ColumnLimit: 120
* Format update with clang-format
* Revert "Format update with clang-format"
This reverts commit 5340b19eae.
* Apply update after sync
* Removing a few empty lines
* Removing one more empty line
* Removing empty in workflow file
* Updating README with coding style instructions
* clang-format-* provided in this repository doc update
Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
* CI Changes to add tiny regression tests
* Adding an inspect cache step
* Removing ccache, pursue in another
* Incorporating Nick's changes through submodule merge
* Submodule now points to master
* Restoring ccache enabled workflow file
* Restoring ccache enabled CMakeLists
* cache -> ccache typo fix
* Moving CCACHE setup to GitHub runner file
* Find also uses CCACHE dir
* Updating CMakeLists not to override env
* Cache compiler binary's contents
* Changing a few names to trigger new build; Testing cache looks fun
* USE_CCACHE=on, -L for inspection
* Adding a ccache_cmd, but will only use in next commit
* Using ccache_cmd
* Removing "
* Adding compiler hash script
* Bunch of absolute paths
* GITHUB_WORKSPACE typo
* Nah, I'll keep -L and trigger another build
* Trying something with compiler hash on cache key backup as well
* builtin, bash it seems
* Empty commit #1
* Move ccache stats to after compile
* Reshuffling ccache vars
* No comments
* Updates to Github output set syntax
* Empty Commit 1
* Empty Commit 2
* Empty commit 3
* /bin/bash -> bash; ccache_cmd for consistency
* Adding ccache -s before and after build
* Adding comments to compiler-hash script
* Let's build cached and non-cached variants together for comparison
* Fixing quotes, /bin/bash -> bash
* Minor var/env adjustment
* Adding ccache -z before the job
* Reverting CMakeLists.txt without CCACHE
* Switching to CMAKE_LANG_COMPILER_LAUNCHER instead of CMakeLists.txt rule
* 5G -> 1G cache size
* 1G -> 2G; Hyperparameter tuning
* Adding bytearray option
* collapse intermediate for bytearray apps
* Removing service-cli-bytearray
* Removing the bergamot bytearray app
* Bumping updates to brt collapsing apps
* Reasonable defaults and hard check when cmd enabled
* Update documentation for flags
* Bump brt with MKL check and skip
* Bumping BRT with MKL_FOUND instead of USE_MKL
* Bumping BRT with no mkl enforce
* Bumping BRT with ssse3 output
* Let's try disabling OpenBLAS
* Trying to disable apple accelerate
* Using WASM compatible BLAS can enable intgemm
* Adding a CMake -L to see what exactly is the diff
* Revert "Let's try disabling OpenBLAS"
This reverts commit 9a6b9bc53b.
* Revert "Using WASM compatible BLAS can enable intgemm"
This reverts commit 936a592e18.
* Restricting mac tests through tags and on GitHub CI
* Using only check-bytearray
* Bumping BRT with change of default behaviour
* Update marian-dev to the newest mac version
* Attempt windows workflow
* force workflow rerun
* Separate id
* Attempt 3 at github action
* Marian dev submodule now compiles with apple clang
* Updated ssplit version to something more recent
* Attempt to fix compile on wasm
* Do not compile subproject tests
* Fix emscripten compilation on Mac
* 99% on the way to windows compile
* Try with a different generator
* Build release not debug
* Revert CMakeLists.txt hacks
* Fix sse2 compilation failure
* MSVC settings for WIN32
* Add nodefaultlib LIBCMT
* Do not compile ssplit.cpp as it contains sys/mman.h
* Revert ab56b9aa4f
* Update paths
* Set the build type to release if not set previously
* Attempt to build release with the windows workflow
* Attempt 5 at VS studio release build
* Attempt 6 at getting release build on MSVC generator
* The windows build is debug at the moment...
* fix ssplit for ubuntu 16.04
* Fix compilation with clang
* Compile on ubuntu16.04
* Explain what is going on
* Updated ssplit and workflow
Adds regression-tests to the workflow for native minimal/custom marian and full builds.
Co-authored-by: abhi-agg <66322306+abhi-agg@users.noreply.github.com>
* Updated marian-dev submodule
- cmake changes required after the submodule update
* Added workflows for building custom marian on mac and ubuntu
* Renamed cmake option
- Renamed USE_WASM_COMPATIBLE_SOURCES to USE_WASM_COMPATIBLE_SOURCE
- Use proper compile defnitions