Commit Graph

29 Commits

Author SHA1 Message Date
Nikolay Bogoychev
5e15d73b7e
Consistent api usage (#91)
* Consistent api between the two versions of the executables in app folder

* Remove shared ptrs
2021-04-09 10:34:24 +02:00
Jerin Philip
b71b3a18d8
Removes vocabs and propogates fixes for breaks (#79)
* Removes vocabs and propogates fixes for breaks

* Prettify diff: Undoing comment shuffles due to merge conflict edits

* 20% of time actual work, 80% prettifying diff

* Histories members -> poof!

We however have Histories in constructor, which we will remove out of the way
soon.

Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
2021-04-07 12:15:46 +01:00
Kenneth Heafield
27a3a3253f
Make AlignedMemory the means of passing in memory (#86) 2021-04-06 13:23:55 +01:00
Qianqian Zhu
f654ab0f71
Enable binary shortlist loading from bytebuffer (#69)
Contains "hack" that must go immediately by editing TranslationModel, to come in following commit.  

* add shortlist_memory and update service-cli-bytearray test

* update marian-dev

* address review comments

* fix ccompliation and tests failures and further address review comments

* small update on marian-dev (based on browsermt/marian-dev PR#28)

* update marian-dev with upstream

* code refactoring according to review

* fix marian-dev submodule conflicts

* switch MemoryGift to AlignedVector

* copy aligned.h from kpu/intgemm for AlignedVector

* changes based on memory ownership and AlignedVector

* fix BatchTranslator inits

* small fixes according to review comments

* update submodule marian-dev to master

* update submodule marian-dev with upstream

Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
2021-04-01 19:36:07 +01:00
abhi-agg
2e5daac978
Marian submodule update (#74)
* Updated marian-dev submodule

 - cmake changes required after the submodule update

* Added workflows for building custom marian on mac and ubuntu

* Renamed cmake option

 - Renamed USE_WASM_COMPATIBLE_SOURCES to USE_WASM_COMPATIBLE_SOURCE
 - Use proper compile defnitions
2021-04-01 16:29:02 +01:00
evgeny pavlov
47db7e2b3e Change bergamot app to process stdin texts 2021-04-01 13:22:19 +02:00
Jerin Philip
bfb5e78602
Alignments + weak quality scores capability in Service (#46)
* Draft adjustments to API

* Adjustments to docs

* Let's call the word + sentence ranges annotations

* Editing confusing comment on size()

* Fixing compilation for template adjustments for SentenceRanges

* string_view template hacks

This commit shifts AnnotatedBlob into a templated type and gets the
troubled part to compile. All to manage absl::string_view and
std::string_view.

Objective: marian::bergamot stays C++ 11 to pluck and put in marian
code, bergamot-translator somehow flexes C++17. Simplify development in
one place.

* Fixing the wiring: Gets source to build

Runtime errors exist, but AnnotatedBlobs are consistent.

* Bugfix: Matching old-state after factoring AnnotatedBlob in

* Removing vocabs_ from Response.

(For the umpteenth time).

* Alignment API ready in marian::bergamot::Response

* Wiring alignments upto TranslationResult

* Adjustment to get alignments; bergamot-translator-app has alignments available

* Accessing words instead of Ids

This code sets up access of word string_views from annotations instead
of printing Ids. However, we have segfault. This is likely due to
targetRanges not being set, pending from
https://github.com/browsermt/bergamot-translator/issues/25.

Could also be a rogue EOS token which we're filtering for in string_view
annotations, but not so in alignments.

* Switching to browsermt/marian-dev@jp/decode-string-view for targetTokenRanges

* Target word byte range annotations available

Issues corresponding to #25 should be resolved. There is still a
segfault. Could be due to EOS. Pending investigation.

* Bugfix: Tokens for alignments are now through.

Was not EOS.

* browsermt/marian-dev@master

ByteRange changes work downstream and has been merged to master.
Updating submodule to point to master.

* Style and documentation enhancements: response.cpp

* Style and documentation enhancements: TranslationResult.h

* Descriptions for SentenceRanges templating

* Switching to marian-dev@wasm-sync

* AnnotatedBlob can be copy-ctord/copy-assigned

* TranslationResult: Empty ctor + WASM Bindings

Allows empty construction of TranslationResult. Using this empty
constructor, WASM bindings are adjusted. Unsure of the results, maybe
@abhi-agg can test.

* Cosmetic: SentenceRangesT -> Annotation

- SentenceRangesT is renamed to AnnotationT;
- Further comments to explain heavily templated files.

* Response: Cleaning up unused members and adding docs

* Adding quality scores - attempt

* Stub QualityScores

This adjustment adds capability to get "scores", which should
potentially indicate how confident (at least relative in a
target-sentence) should be. This enables writing the code forward for
TranslationResult, and an example quality-score people can be pointed
at.

- These are not between [0,1] yet.
- In addition, guards to check out-of-bounds access have been placed so
  illegal accesses are caught early on during development.

* Removing token debug statements

* Reworking Annotation without templates

https://github.com/mozilla/bergamot-translator/issues/8 provides
ByteRanges.

- This ByteRange data-type is used in Annotation and converted
  to marian::string_view(=absl::string-view) on demand.
- Since Annotation[using ByteRange] is not bound to anything else, it
  can be unit tested. A unit test is added (originally to test
  independently for integration after).
- Annotation with ByteRange is now propogated across marian::bergamot
  and functionality matched to how it was previously working.

This eliminates the string-view conversion and template code.

* Nit: Removing std::endl flushes

* Bring TranslationResult and Response closer

Helps https://github.com/browsermt/bergamot-translator/issues/53.

In preparation , the data-export types for Quality and Alignment are
pushed down to Response from TranslationResult and computed during
construction. This brings TranslationResult closer to Response, paving
way to avoid having two TranslationResults.

histories_ only remain for marian-decoder replacement usage, which can
be removed in a separate PR.

* Clean up hacks originally added for a unit-test to compile

* Moving Annotation functions to cpp and documenting header file

* Shifting alignments, qualityScore testing capability into main-mts

* Restore Unified API files to previous state

* Adaptations to fix Response with Quality, Alignments to connect to old Unified API

* Missing reset on TranslationResultBindings

* Cleaning up Response documentation to reflect newer code

* Minor adjustments to get build back after main sync

* Marian seems to make available Catch somehow

* Disable COMPILE_BERGAMOT_TESTS for WASM

* Add COMPILE_BERGAMOT_TESTS as a CMakeDependent option

* Use the COMPILE_TESTS flag instead to skip macos.yml

* Trigger unit-tests on GitHub runners for Annotation

* Reordering enable_testing() to before inclusion of test directory

* doc constructs required to operate with alignments

Documents with doxygen compatible documentation for Response,
AnnotatedBlob, Annotation, ByteRange.

Incorporates doxygen compatible documentation for

* Updates ByteRange consistent with general C++

Also little documentation enhancements in the process.

* Updating marian-dev@9337105

* Copy-paste documentation because lazy

* Turn off autoformat and manually edit to fix style changes

* AnnotatedBlob -> AnnotatedText; blob -> text

* text.text in test app renamed

* text of text -> blob of text in places of documentation
2021-03-31 17:41:36 +01:00
Abhishek Aggarwal
f38a0bfbcc Remove AbstractTranslationModel class and its references 2021-03-25 10:04:47 +01:00
Jerin Philip
34228d37bf
Collapse Service into one class instead of three (#62)
* Merging two Services

* Moving stop() logic to destructor

* We have WITH_PTHREADS back

* string based constructor on Service

* Removing now empty service_base.* files

* Hiding away pcqueue_ construction

Ugliest ifdefs I have done in my life.

* Another ifdef to hide pcqueue header file

* Missing semicolons in WITH_PTHREADS path

* Fixing async_translate residue argument from copy

* Adding comments

* Initialize batchtranslator only at one place

To reduce tax for bytebuffer loads, initialize batchtranslator only at
one place.

* \#ifdef WITH_PTHREADS -> #ifndef WASM_HIDE_THREADS

Sane platform (non WASM) is default. This truly only hide-threads from
compilation path and not switch unswitch pthreads (-lpthread).

* Review comments: Rearranging destructor, fix wrong comment

* Move loadVocabularies to service.cpp and put in anonymous namespace

* Prettifying diff: Removing unwanted empty lines

* Indicate in comments multithreaded has numWorkers translators

* Typo fix: bergamot_translator -> bergamot-translator

* Safety guards to avoid pcqueue illegal init

* Add WASM_HIDE_THREADS as a global WASM_COMPILE_FLAG

* Compile Defs: WASM_HIDE_THREADS -> __EMSCRIPTEN__

* Removing dead CMakeLists.txt code following __EMSCRIPTEN__

* Compile defs: __EMSCRIPTEN__ -> WASM
2021-03-23 16:36:13 +00:00
Nikolay Bogoychev
d75dd85def
Load mode as a byte array (#55)
* Switch to wasm branch for this example

* Load marian model from a byte array

* Sanitise executable names

* Change marian branch

* Update marian branch that loads binary models

* Example of loading model as a byte array

* Add the byte array loading files

* Die on misaligned memory

* Remove the unused argument

* Allow loading without a ptr parameter so that we don't break emc workflow
2021-03-22 14:22:56 +00:00
Abhishek Aggarwal
0be73705d9 Fixed native builds while using wasm compatible sources
- main-mts and marian-decoder-new can't be used because
   it uses multi-threaded variant of Service class
2021-02-26 14:55:30 +01:00
Jerin Philip
e1b74bccab Reverting moot COMPILE_WASM guards in app folder 2021-02-26 11:42:23 +00:00
Jerin Philip
cd01d7552a ServiceBase -> [NonThreadedService, Service]
Through inheritance, a non-threaded and multithreaded Service are
created, both derived of the same ServiceBase class which holds the
common elements.

In preparation to solve SIGSEGV in #41. First inspections gave aborts in
thread part, and repeated SIGSEGV's in lock-policy's of shared_pointers
even in non-threaded paths.

Solving this first, to avoid ifdef or tricky paths. The non-threaded
implementation is not included in WASM builds at all, by separating out
the single-threaded logic. DRY is achieved through inheritance and
operator overloading.
2021-02-25 23:11:09 +00:00
Jerin Philip
10dcb8f548 Merge remote-tracking branch 'origin/wasm-integration' into jp/absorb-batch-translator
Merging wasm-integration. Single thread codepath seems functional.
Multithreading is broken.
2021-02-17 13:08:58 +00:00
Jerin Philip
65e7406970 Comments and lazy stuff to response 2021-02-16 17:00:53 +00:00
Jerin Philip
370e9e2fb6 {translation_result -> response}.h; propogates; 2021-02-14 20:37:46 +00:00
Jerin Philip
0fc6105df4 No more two TranslationResults (sort-of)
To avoid confusion, this commit renames
marian::bergamot::TranslationResult -> marian::bergamot::Response.
Usages of marian::bergamot::TranslationResults are updated across the
source to be consistent with the change and get source back working.
2021-02-14 20:27:53 +00:00
Jerin Philip
5bd4a1a3c0 Refactor: marian-TranslationResult and associated
marian-TranslationResult has more guards in place. Switching to a
construction on demand model for sentenceMappings. These changes
propogate to bergamot translation results.

Integration broke with the change in marian's internals, which are
updated accordingly to get back functionality.

Changes revealed a few bugs, which are fixed:

- ConfigParser already discovered in wasm-integration
  (a06530e92b).
- Lambda captures and undefined values in DeviceId
2021-02-14 20:05:02 +00:00
Jerin Philip
38e8b3cd6d Updates: marian-dev, ssplit for marian-decoder-new
Updates marian-dev and ssplit submodules to point to the upstream
commits which implements the following:

 - marian-dev: encodeWithByteRanges(...) to get source token byte-ranges
 - ssplit: Has a trivial sentencesplitter functionality implemented, and
   now is faster to benchmark with marian-decoder.

This enables a marian-decoder replacement written through ssplit in this
source to be benchmarked constantly with existing marian-decoder.

Nits: Removes logging introduced for multiple workers, and respective
log statements.
2021-02-12 14:23:24 +00:00
Abhishek Aggarwal
584700ce91 Changed translate() API from non-blocking to blocking
- Can be changed back to non-blocking once blocking API
   becomes integrable via WASM port in browser
2021-02-10 11:15:16 +01:00
Jerin Philip
e76a602dc7 Removing config file printing 2021-01-28 21:44:05 +00:00
Jerin Philip
9a17f365c6 Fix for garbled output through cli.
Requirement for string_view is the original source string be transferred
all the way from input to service to back to TranslationResult. This
constraint was violated in several places by means of existence of a
copy-constructor. The issue is fixed by deleting copy and assignment
constructors in marian::bergamot::TranslationResult and
UnifiedAPI::TranslationResult, which demonstrated a few occurances of
the same. Replaced the same with move semantics.  In addition, future is
set and get using move semantics at the moment.  Default
move-constructor didn't seem to be working, so they're made explicit for
TranslationResults.

This commit additionally packs a few deletions and improvements made to
improve structure (textops.cpp, batcher.cpp) along the process of
inspecting and fixing the garbled outputs. They are choose to be kept,
in the interest of time, against a prettified atomic commit engineering.

Combinations of the following commits in jp/string-view-bug
[acfc92 78a588 12d91b 00a277 919e2f 9d3a46 b7e39b 18f67b bf667c]
2021-01-26 21:18:15 +00:00
Abhishek Aggarwal
0d16b1957f Improved main.cpp file
- Print original and translated text
 - Just add 2 vector entries for texts
2021-01-26 14:49:28 +01:00
Abhishek Aggarwal
b49f2c1af3 Cleanup TranslationModelConfiguration to std::string change in API
- Provide yaml formatted string as model configuration
 - Remove redundant files
2021-01-26 11:13:41 +01:00
Jerin Philip
08a7358c3d Integrating marian-translator through API
Using std::string for config. Now capable of launching marian translator
through API interface. There's a sketchy workaround to convert a string
config to marian::Options, with an added note.
2021-01-25 22:11:38 +00:00
Jerin Philip
69adc7af77 Changing code-style to clang-format-google 2021-01-24 21:46:47 +00:00
Jerin Philip
37143933a1 CMakeLists improvements
Only the bergamot-translator library should be linked to main target
Any other library (marian ${MARIAN_CUDA_LIB} ${EXT_LIBS} ssplit
pcrecpp.a pcre.a) should be linked to bergamot-translator target inside
src/translator folder.
2021-01-22 11:29:32 +00:00
Jerin Philip
54a6c6ce80 Moving main (mts) to app/
Commit modifies the example test-code main-mts into the app folder,
updating CMakeLists accordingly.
2021-01-20 21:18:20 +00:00
Abhishek Aggarwal
f8c9a6b0cc Added an application showing usage of bergamot translator
- 'app' folder contains the application
 - The application uses dummy requests and responses for now
2020-11-16 15:44:02 +01:00