Commit Graph

20 Commits

Author SHA1 Message Date
Jerin Philip
13443352c0
Docs: Pin Jinja2 to last known working version (#389)
Fixes the docs workflow which is failing after pip is picking up Jinja 3.20. 
We only need >=2.3, this one sets it to 3.0.3 builds were successful last.
2022-03-24 19:26:20 +00:00
Jerin Philip
c0f311a8c0
Batteries included python package (#310)
Imports python bindings and associated sources incubated in
https://github.com/jerinphilip/lemonade to bergamot-translator. Adds
 a pybind11 dependency for python bindings.

Following the import, the python build is integrated into the existing 
CMake based build system here. There is a command-line application 
provided through python which provides the ability to fetch and prepare 
models from model-repositories (like browsermt/students or OPUS).

Wheels built for a few common operating systems are provided via GitHub
releases through automated actions configured to run at tagged semantic
versions and pushes to main.

The documentation for python is also integrated into our existing
documentation setup. Previous documentation GitHub action is now
configured to run behind python builds in Ubuntu 18.04 Python3.7,
in order to pick up the packaged as a wheel bergamot module and the
sphinx documentation using the python module.

Formatting checks of black, isort with profile black and a pytype type
checker is configured for the python component residing in this repository.
2022-01-26 20:33:43 +00:00
Jerin Philip
71b84b7c72
CI guaranteed example documentation (#300)
* Convert marian-integration markdown to rst
* Convert native run into a script, include in rst
* Check with CI that the native running example works without fail
2022-01-06 19:10:57 +00:00
Jerin Philip
571d312930
Constrain mistune to fix docs CI (#278) 2021-12-14 16:34:30 +00:00
Andre Barbosa
63120c174e
QualityEstimation: Preliminary Implementation (#197)
Unifies quality estimation with an interface, refactors previously available
quality scores to fit this interface. Adds a new class of  model with Logistic
Regression powering the predictions as an implementation of said interface. 
QE now provides annotations on words using subwords to word rule-based 
algorithms working with space characters. 

QualityEstimation
-----------------

Implementations of QE are bound together by a `QualityEstimator`
Interface. 

1. The log-probabilities from the machine-translation model re-interpreted
   as quality scores are crafted as an implementation of QualityEstimator.

2. A Logistic-Regression based model is added. This class of models is
   trained supervised with scores labeled by a human annotator.
   Handcrafted features - number of words, log probs from MT model and 
   statistics over the sequence are used to generate the numeric features.
   LogisticRegressor, Matrix (to hold features) are added.

The creation of an instance is switched by the `AlignedMemory` supplied
(be it loaded from the file-system or supplied as a parameter). An empty
AlignedMemory leads to quality scores from NMT while supplying weights
of a trained logistic-regression model in binary format as the contents
lead to an additional pass through the said model to provide more
refined scores.

Both the above now transform subwords into "words" using a heuristic
algorithm, scanning for spaces. This allows the client to work with "words"
to denote quality instead of subwords, as the former is more sensible to
the user.

Testing
-------

1. BRT now has two new test apps to check the QE outputs in text
  (covers subword to words) and numbers domain (covers quality scores).
  These are tested with en-et models for which QualityEstimation is
  available now, on a new input to avoid architecture/compiler issues.
2. Unit test for LogisticRegression model is added.


Docs
----

Doxygen now supports MathJax properly to render explanations for
Logistic Regressions' reductions in place to make computation more
efficient correctly.

Co-authored-by: Felipe C. Dos Santos <felipe.santos.k@gmail.com>
Co-authored-by: Jerin Philip <jerinphilip@live.in>
2021-09-16 16:28:40 +01:00
Jerin Philip
330840338c
Including WASM documentation in sphinx build toc (#176) 2021-06-01 12:39:28 +01:00
Jerin Philip
5d3ec9c0a9
Single executable (#175)
* Collapsing executables

* Adding new test executable

* Deleting old executable sources

* Updating brt to operate with modes

* cli-framework -> cli

* Updating workflows to check for bergamot instead of bergamot-translator-app

* Adding documentation

* Making fn pure virtual

* Shuffling apps into app namespace, alongside class documentation

* Include app folder in documentation

* BRT update service-cli -> native

* parser.h: service-cli -> native

* Updates to marian-integration.md

* Cleanup: Remove templates, interface proper

* change 4 to 2 cores for build instructions

* service-cli -> native

* Commenting the string constructor explanation

* Not doing halfway interface / inheritance

* Nick hates state, let's try this one

* Revert "Nick hates state, let's try this one"

This reverts commit e56db9f474.

* class -> struct before trying std::function stuff

* oop -> functional?

* Hints on what is happening

* app::ftable -> app::REGISTRY

* We have if-else and functions now.

And we won't have test apps.

* Doc linking to usage examples in brt

* Remove unordered_map

* Documentation updates

* Fix warning
2021-05-31 14:44:59 +01:00
Jerin Philip
9dcf6ab665
Adding clang-format and updating existing sources to adhere (#151)
* Adding a first version of clang-format

* Adding run-clang-format.py

* Adding coding styles to workflow

* Fix indentation on coding-styles workflow

* run-clang-format.'py'

* -style -> --style in python

* Updating ColumnLimit: 120

* Format update with clang-format

* Revert "Format update with clang-format"

This reverts commit 5340b19eae.

* Apply update after sync

* Removing a few empty lines

* Removing one more empty line

* Removing empty in workflow file

* Updating README with coding style instructions

* clang-format-* provided in this repository doc update

Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
2021-05-19 21:50:21 +01:00
Kenneth Heafield
89bd47342b
Use binary lexical shortlist in documentation (#152)
* Use binary lexical shortlist in documentation

* MKL/AppleAccelerate note

Co-authored-by: Nikolay Bogoychev <nheart@gmail.com>
Co-authored-by: Jerin Philip <jphilip@ed.ac.uk>
2021-05-19 10:44:32 +01:00
Abhishek Aggarwal
1574a4586c Merge remote-tracking branch 'upstream/main' into upstream-sync 2021-04-14 14:36:17 +02:00
abhi-agg
2e5daac978
Marian submodule update (#74)
* Updated marian-dev submodule

 - cmake changes required after the submodule update

* Added workflows for building custom marian on mac and ubuntu

* Renamed cmake option

 - Renamed USE_WASM_COMPATIBLE_SOURCES to USE_WASM_COMPATIBLE_SOURCE
 - Use proper compile defnitions
2021-04-01 16:29:02 +01:00
Jerin Philip
a3250b401f
Marian compatible documentation tooling (#67)
Adds doxygen configurations, additional sphinx which consumes the doxygen files to generate developer API, compatible with marian-nmt/marian-dev.
2021-03-24 17:00:53 +00:00
abhi-agg
c64deb50a8
Imported CI scripts from mozilla/bergamot-translator-old (#1)
* CircleCI config, docs and badge

* Increase CircleCI RAM from 4gb to 16gb

Co-authored-by: Motin <motin@motin.eu>
2021-03-10 09:30:39 -08:00
Abhishek Aggarwal
4d4acf6b8b Cleanup CMakeFiles.txt
- Renamed USE_WASM_COMPATIBLE_MARIAN to USE_WASM_COMPATIBLE_SOURCES
 - Removed COMPILE_THREAD_VARIANT cmake option and removed
   corresponding compile definition
 - Updated workflows and READMEs accordingly
2021-02-26 14:17:48 +01:00
Abhishek Aggarwal
415d16bd1d Single cmake option to enable/disable wasm compatible marian compilation
- USE_WASM_COMPATIBLE_MARIAN=off will start using vanilla Marian
   i.e. with full threading support, with exceptions, with MKL

 - Changed the relevant documentation
2021-02-23 16:15:05 +01:00
Jerin Philip
d249dcbfaa Build doc updated with wasm-branch compatible command 2021-02-17 21:15:35 +00:00
Jerin Philip
47b9db0c45 Documentation formatting/syntax fix 2021-02-17 13:35:10 +00:00
Jerin Philip
72848ba0f6 Fixes UEdin builds after wasm-integration merge
A bug which crept in during manual merge is now fixed. PCItem -> Batch
on a PCQueue.

docs/marian-integration.md provides instructions to compile successfully
for multithread.
2021-02-17 13:28:58 +00:00
Jerin Philip
10dcb8f548 Merge remote-tracking branch 'origin/wasm-integration' into jp/absorb-batch-translator
Merging wasm-integration. Single thread codepath seems functional.
Multithreading is broken.
2021-02-17 13:08:58 +00:00
abhi-agg
ef2323c952
Unified api draft (#1)
* Changed README file

 - Added a short introduction of this repository
 - More updates to come later

* First draft of the unified API
2020-10-29 09:17:32 +01:00