Commit Graph

68 Commits

Author SHA1 Message Date
dependabot[bot]
321be8ae04
Bump 3rd_party/marian-dev from 780df27 to 11c6ae7 (#466)
Bumps [3rd_party/marian-dev](https://github.com/browsermt/marian-dev) from `780df27` to `11c6ae7`.
- [Commits](780df2708e...11c6ae7c46)

---
updated-dependencies:
- dependency-name: 3rd_party/marian-dev
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-20 08:10:18 +01:00
dependabot[bot]
0b069acce6
Bump 3rd_party/marian-dev from 300a50f to 780df27 (#464)
Bumps [3rd_party/marian-dev](https://github.com/browsermt/marian-dev) from `300a50f` to `780df27`.
- [Commits](300a50f425...780df2708e)

---
updated-dependencies:
- dependency-name: 3rd_party/marian-dev
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-11 08:20:47 +01:00
Nikolay Bogoychev
534ed37a3d
Remove wormhole references (#459)
* Remove warmhole references

* Remove more references to the WORMHOLE

* Update marian to wormhole removed marian

* Whoops

---------

Co-authored-by: Jelmer van der Linde <jelmer@ikhoefgeen.nl>
2023-08-14 15:22:54 +01:00
dependabot[bot]
ca954670aa
Bump 3rd_party/marian-dev from aa0221e to 8dbde0f (#458)
Bumps [3rd_party/marian-dev](https://github.com/browsermt/marian-dev) from `aa0221e` to `8dbde0f`.
- [Commits](aa0221e687...8dbde0fd8e)

---
updated-dependencies:
- dependency-name: 3rd_party/marian-dev
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-11 15:04:27 +01:00
dependabot[bot]
2bdc493df3
Bump 3rd_party/ssplit-cpp from ad2c5a5 to a311f98 (#456)
Bumps [3rd_party/ssplit-cpp](https://github.com/browsermt/ssplit-cpp) from `ad2c5a5` to `a311f98`.
- [Commits](ad2c5a52a5...a311f9865a)

---
updated-dependencies:
- dependency-name: 3rd_party/ssplit-cpp
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-08 10:37:24 +03:00
dependabot[bot]
e333208cb9
Bump 3rd_party/marian-dev from 6a6bbb6 to aa0221e (#452)
Bumps [3rd_party/marian-dev](https://github.com/browsermt/marian-dev) from `6a6bbb6` to `aa0221e`.
- [Commits](6a6bbb6278...aa0221e687)

---
updated-dependencies:
- dependency-name: 3rd_party/marian-dev
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-31 15:26:44 +01:00
XapaJIaMnu
eaa2562fe0 Sentencepiece windows compilation 2023-07-13 00:14:13 +01:00
XapaJIaMnu
ada8c39224 Fix compilation on newer gcc 2023-06-06 17:04:49 +01:00
dependabot[bot]
b3d36bca90
Bump 3rd_party/marian-dev from 8ceb051 to bb65f47 (#447)
Bumps [3rd_party/marian-dev](https://github.com/browsermt/marian-dev) from `8ceb051` to `bb65f47`.
- [Commits](8ceb051b7f...bb65f473d5)

---
updated-dependencies:
- dependency-name: 3rd_party/marian-dev
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-10 16:07:24 +01:00
dependabot[bot]
eb0fe1b583
Bump 3rd_party/marian-dev from 69e27d2 to 8ceb051 (#446)
Bumps [3rd_party/marian-dev](https://github.com/browsermt/marian-dev) from `69e27d2` to `8ceb051`.
- [Release notes](https://github.com/browsermt/marian-dev/releases)
- [Commits](69e27d2984...8ceb051b7f)

---
updated-dependencies:
- dependency-name: 3rd_party/marian-dev
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-04 10:55:15 +01:00
Nikolay Bogoychev
1ba7461a36 Fix compilation on x86 2023-01-19 10:06:57 +00:00
Nikolay Bogoychev
7d24908959 Apply security update and formatting 2023-01-18 16:46:07 +00:00
Nikolay Bogoychev
6f2659fe59
Arm updated (#443)
* ARM Support using ruy and simd_utils

* Adding ARM build on GitHub CI

* Add workflow and successful build

ssplit-cpp modified to get cross compiled android on GitHub CI working.

* Client side fixes for int8 no shift on ARM [python]

* Revert "Client side fixes for int8 no shift on ARM [python]"

This reverts commit 020af05a8b.

* moving int8shift no-op inside the library

* Bump 3rd-party/marian-dev

* update the marian branch test

* arm backend works

* Latest and greatest clang-format

Co-authored-by: Jerin Philip <jerinphilip@live.in>
2023-01-18 16:31:36 +00:00
Nikolay Bogoychev
6cefc4302d Latest and greatest clang-format 2023-01-18 12:48:53 +00:00
dependabot[bot]
ad781656fe
Bump 3rd_party/marian-dev from 199201e to e88c1aa (#416)
Bumps [3rd_party/marian-dev](https://github.com/browsermt/marian-dev) from `199201e` to `e88c1aa`.
- [Release notes](https://github.com/browsermt/marian-dev/releases)
- [Commits](199201eb89...e88c1aa5d5)

---
updated-dependencies:
- dependency-name: 3rd_party/marian-dev
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-18 16:17:53 +01:00
Abhishek Aggarwal
e34420647d
Upgrade emsdk to 3.1.8 (#414)
* Rework WASM compilation options

Necessary to work with newer versions of emscripten that are more picky about which option goes to the compiler, and which to the linker. Also took the opportunity to remove the need for the patching of the bergamot-translation-worker.js file, this can now easily be done through supported apis. Furthermore, I tried to downsize the generated javascript and wasm code a bit.

Initial estimates show that bergamot-translator compiled with emscripten 3.0.0 runs at about 3x the speed of 2.0.9 (when using embedded intgemm). Speed-up when using mozIntGemm is less dramatic.

* Updated marian-dev submodule
* Revert changes specific to patching external gemm modules for wasm
* Better Compilation and Link flags

 - Added "-O3" optimization flag for linking as well
 - "-g2" only for release and debug builds
 - "-g1" for release builds
 - Replaced deprecated "--bind" flag with "-lembind"
 - Removed redundant link flag

* Upgraded emsdk to 3.1.8
* Enclosed EXPORTED_FUNCTIONS values in a list
* Fixed the remaining 2.0.9 reference in circle ci build script
* Updated README

Co-authored-by: Jelmer van der Linde <jelmer@ikhoefgeen.nl>
2022-04-20 00:39:32 +01:00
dependabot[bot]
f18a8835fa
Bump 3rd_party/ssplit-cpp from a08d6bc to 49fde6d (#408)
Bumps [3rd_party/ssplit-cpp](https://github.com/browsermt/ssplit-cpp) from `a08d6bc` to `49fde6d`.
- [Release notes](https://github.com/browsermt/ssplit-cpp/releases)
- [Commits](a08d6bce20...49fde6df7e)

---
updated-dependencies:
- dependency-name: 3rd_party/ssplit-cpp
  dependency-type: direct:production
...
2022-04-14 11:25:51 +01:00
dependabot[bot]
409b7d2265
Bump 3rd_party/marian-dev from 7e67124 to 844800e (#382)
Bumps [3rd_party/marian-dev](https://github.com/browsermt/marian-dev) from `7e67124` to `844800e`.
- [Release notes](https://github.com/browsermt/marian-dev/releases)
- [Commits](7e67124ae0...844800efcc)

---
updated-dependencies:
- dependency-name: 3rd_party/marian-dev
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-03-18 11:27:53 +00:00
dependabot[bot]
22d6bc07e7
Bump 3rd_party/marian-dev from 08b1544 to 7e67124 (#372)
Bumps [3rd_party/marian-dev](https://github.com/browsermt/marian-dev) from `08b1544` to `7e67124`.
- [Commits](08b1544636...7e67124ae0)

---
updated-dependencies:
- dependency-name: 3rd_party/marian-dev
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-09 08:00:28 +00:00
Jerin Philip
c0f311a8c0
Batteries included python package (#310)
Imports python bindings and associated sources incubated in
https://github.com/jerinphilip/lemonade to bergamot-translator. Adds
 a pybind11 dependency for python bindings.

Following the import, the python build is integrated into the existing 
CMake based build system here. There is a command-line application 
provided through python which provides the ability to fetch and prepare 
models from model-repositories (like browsermt/students or OPUS).

Wheels built for a few common operating systems are provided via GitHub
releases through automated actions configured to run at tagged semantic
versions and pushes to main.

The documentation for python is also integrated into our existing
documentation setup. Previous documentation GitHub action is now
configured to run behind python builds in Ubuntu 18.04 Python3.7,
in order to pick up the packaged as a wheel bergamot module and the
sphinx documentation using the python module.

Formatting checks of black, isort with profile black and a pytype type
checker is configured for the python component residing in this repository.
2022-01-26 20:33:43 +00:00
Nikolay Bogoychev
8563f0856f
Proper arch setting on win32 (#275)
* Proper arch detection on win32

* Whoops
2021-12-14 23:53:53 +00:00
Abhishek Aggarwal
e8fd01e9f4 Updated marian-dev submodule 2021-11-30 17:19:42 +01:00
Jerin Philip
5a693b7eec
Fixes windows workflow for PCRE2 (#260) 2021-11-05 20:48:28 +00:00
Jerin Philip
fa4efb483b
Update ssplit cpp, pcre2 source compile to fix broken builds (#258)
* Update ssplit cpp, pcre2 source compile to fix tests

* Syncing with browsermt/ssplit-cpp

* Removing accidental binary inclusion

* Removing brt accidental update by git add -u

* Fix windows workflow, vcpkg is broken use our cmake route

* [ssplit-cpp] Try searching different library names for Windows
2021-11-05 16:46:03 +00:00
Abhishek Aggarwal
7693a1d007
Updated marian submodule (#256) 2021-11-03 13:54:48 +01:00
Jerin Philip
806169c822
Recover logging (#226) 2021-11-01 16:31:01 +00:00
Jerin Philip
9b443997e2
EXCLUDE_FROM_ALL for marian and ssplit-cpp 3rd-party libraries (#243) 2021-10-31 12:33:42 +00:00
Jerin Philip
47e57c95a6
[ssplit-cpp] Enable position independent library when compiled from sources (#240) 2021-10-29 13:40:28 +01:00
Abhishek Aggarwal
c5167b3d8c
Import matrix-multiply from a separate wasm module (#232)
* Updated marian-dev submodule
* Import wasm gemm from a separate wasm module
 - The fallback implementation of gemm is currently being imported dynamically
   for wasm target
* Updated CI scripts and README to import GEMM from a separate wasm module
* Setting model config to int8shiftAlphaAll in wasm test page
2021-10-27 11:54:39 +02:00
Abhishek Aggarwal
ff391c6f00 Updated marian submodule to latest commit of master 2021-08-27 09:07:06 +02:00
Jerin Philip
13a1fe870f
Load sentence-splitter (non-breaking prefixes) from ByteArray
Service now allows loading Sentence-Splitter (non-breaking prefix file) from ByteArray. Behaviour is consistent with the rest of the ByteArray loads (model, shortlist), where first the ByteArray is checked if empty, if not fall back to loading from file-path. 

Adds regression test to check if source-sentences in constructed Response match expected behaviour when the non-breaking-prefixes file is provided. 

Bonus refactoring to remove an extra layer that existed for no reason.
2021-06-21 18:53:30 +01:00
Jerin Philip
e9e5ac6782
Partial test-apps and tolerance in evaluations (#184)
* Partial test applications

Previously service-cli was used to generate output and accomplish
regression testing for all of: (1) translated-text (2) alignment tokens
+ scores (3) quality scores (4) indirectly annotation and tokenizations.

The --mode native now only outputs a faithful to source translated text
of the input source on stdin.

Test apps are separated into testing only individual functionalities.
This can help in independently testing ssplit-cpp, quality-scores for
the quality estimation implementation etc.

Separating numbers and text have the advantage of being able to compare
one with tolerance using BLEU (text) and some allowed error-rates
(numbers).

* Removing #mac tag

* Moving test apps to src/tests

* Tests are always on for CI

Unit tests are turned off looking for WASM_COMPATIBLE_SOURCES.

* Fixing WASM_COMPATIBLE_SOURCE -> USE_WASM_COMPATIBLE_SOURCE

* Workaround for now; CMakeLists.txt horrors are starting to bite

* BRT: use bergamot-test instead of bergamot now

* This should fix issues: CMakeLists.txt has so many paths

* Casing to camelCase and removing legacyServiceCli

* removing leftover service-cli declaration, some doc updates

* #pragma once is starting to look easier

* All the more reasons to do #pragma once

* Updating marian-dev with intgemm::kCPU print, resolved from INTGEMM_CPUID

* BRT: Use --gemm-highest-arch instead of python script

* Adding intgemm resolve here, where always(?) have intgemm on?

* intgemm-resolve in default binary directory

* BRT: Update to use intgemm-resolve

* marian-dev: Reset to without --gemm-highest-precision

Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
2021-06-14 15:02:42 +01:00
Jerin Philip
dc2fb3d64e
CMake fixes: Generate project.h in binary dir, fix GetVersionFromFile for use as submodule. (#193)
* Use CMAKE_CURRENT_SOURCE_DIR instead of CMAKE_SOURCE_DIR for project bound version string

* marian-dev cmake fix

* Generate project.h in binary dir

* We don't want people asking about extra spaces
2021-06-09 10:12:00 +01:00
Jerin Philip
73228bbb4a
Updating marian-dev: intgemm with env variable matmul switches (#187) 2021-06-03 21:01:26 +01:00
Jerin Philip
eb579ed26f
Updating marian dev RelwithDebInfo -> Release (#178)
* Updating marian dev RelwithDebInfo -> Release

* Updating submodule to point to master
2021-05-27 10:51:53 +01:00
Nikolay Bogoychev
10131c731a
Marian submodule with unified loading (#157) 2021-05-18 12:45:22 +01:00
Kenneth Heafield
ce01de939d
Change USE_WASM_COMPATIBLE_SOURCE =OFF by default on native, force on for WASM (#138)
* Change WASM_COMPATIBLE_SOURCE=OFF by default

The default was WASN_COMPATIBLE_SOURCE=ON COMPILE_WASM=OFF which is a
testing configuration, not a sensible default for native or wasm.

* Always USE_WASM_COMPATIBLE_SOURCE with COMPILE_WASM

* Set CMP0077 to fix variable handling
2021-05-10 12:28:37 +02:00
Nikolay Bogoychev
87adb5d60a Target master of ssplit-cpp 2021-05-07 18:41:08 +01:00
Nikolay Bogoychev
21c1cae472
Update ssplit submodule, removing absl (#132)
* Update ssplit submodule, removing absl

* Fix ssplit variables

* Update ssplit branch

* Fix emscripten compilaiton

* Update tests
2021-05-07 17:58:58 +01:00
Qianqian Zhu
5b02008a97
Enable vocabs pass as byte arrays (#122)
* first attempt to enable vocabs pass as byte arrays

* pass vocabs bytes as AlignedMemory

* add vocabIndices to avoid double loading

* small fix on parameter names and documentation

* fix windows build plus tiny update on documentation

* update marian-dev submodule

* move validate model bytearray in BatchTranslator

* small refactors on validateBinaryModel()

* switch vocab memories to std::vector<marian::Ptr<AlignedMemory>>

* update marian-dev submodule

* replace marian::Ptr to std::shared_ptr for vocab memories

* add note for vocab memories
2021-05-07 14:54:48 +01:00
Nikolay Bogoychev
d82e01eda4
Full windows support with ssplit from browsermt, not a fork (#109)
* Update marian-dev to the newest mac version

* Attempt windows workflow

* force workflow rerun

* Separate id

* Attempt 3 at github action

* Marian dev submodule now compiles with apple clang

* Updated ssplit version to something more recent

* Attempt to fix compile on wasm

* Do not compile subproject tests

* Fix emscripten compilation on Mac

* 99% on the way to windows compile

* Try with a different generator

* Build release not debug

* Revert CMakeLists.txt hacks

* Fix sse2 compilation failure

* MSVC settings for WIN32

* Add nodefaultlib LIBCMT

* Do not compile ssplit.cpp as it contains sys/mman.h

* Revert ab56b9aa4f

* Update paths

* Set the build type to release if not set previously

* Attempt to build release with the windows workflow

* Attempt 5 at VS studio release build

* Attempt 6 at getting release build on MSVC generator

* The windows build is debug at the moment...

* fix ssplit for ubuntu 16.04

* Fix compilation with clang

* Compile on ubuntu16.04

* Explain what is going on

* Updated ssplit and workflow
2021-05-01 00:29:23 +01:00
Nikolay Bogoychev
e286533164 Update to marian-dev master 2021-04-30 22:34:44 +01:00
Nikolay Bogoychev
fdf9e66cef
Windows workflows and mac framework accelerate (#108)
Windows still failing but getting closer
2021-04-26 18:59:20 +01:00
abhi-agg
2e5daac978
Marian submodule update (#74)
* Updated marian-dev submodule

 - cmake changes required after the submodule update

* Added workflows for building custom marian on mac and ubuntu

* Renamed cmake option

 - Renamed USE_WASM_COMPATIBLE_SOURCES to USE_WASM_COMPATIBLE_SOURCE
 - Use proper compile defnitions
2021-04-01 16:29:02 +01:00
Abhishek Aggarwal
fdbce5705b Update marian-dev submodule to master
- Earlier it was using 'wasm' branch
 - CMakefile changes
 - Github workflow change
2021-03-26 10:02:13 +01:00
Nikolay Bogoychev
d75dd85def
Load mode as a byte array (#55)
* Switch to wasm branch for this example

* Load marian model from a byte array

* Sanitise executable names

* Change marian branch

* Update marian branch that loads binary models

* Example of loading model as a byte array

* Add the byte array loading files

* Die on misaligned memory

* Remove the unused argument

* Allow loading without a ptr parameter so that we don't break emc workflow
2021-03-22 14:22:56 +00:00
Abhishek Aggarwal
d3ef1a9bc3 Updated marian submodule
- This fixes the binary model loading problem for wasm
2021-03-10 15:50:27 +01:00
Ulrich Germann
f17f02a544
Update submodule ssplit-cpp 2021-03-03 11:48:56 +01:00
Abhishek Aggarwal
b845ed3693 Update marian submodule
- Fixes the compilation while building with full blown marian
2021-02-24 19:54:38 +01:00
Abhishek Aggarwal
5dcbb721fa Update ssplit submodule to master branch
- This submodule brings pcre2 lib compiled from sources
2021-02-22 18:03:53 +01:00