Commit Graph

209 Commits

Author SHA1 Message Date
Abhishek Aggarwal
9747d9ba83 Add cmake option to compile project on WASM
- Set cmake option COMPILE_WASM to ON to compile the project
   on WASM
2021-02-11 15:34:27 +01:00
Abhishek Aggarwal
a2d3269344 Updated ssplit submodule 2021-02-10 11:27:16 +01:00
Abhishek Aggarwal
584700ce91 Changed translate() API from non-blocking to blocking
- Can be changed back to non-blocking once blocking API
   becomes integrable via WASM port in browser
2021-02-10 11:15:16 +01:00
Abhishek Aggarwal
5683168a8d Updated ssplit submodule to a different repository
- Added abhi-agg/ssplit-cpp
 - Added its wasm branch in bergamot-translator
 - Native builds of bergamot-translator are successful
   -- Sentence splitting is NOT WORKING
   -- Only translation is working
2021-02-10 10:33:01 +01:00
Abhishek Aggarwal
47b4bae268 Changed encodePreservingSource -> encodeWithByteRanges
- This change happened because marian submodule changed
   this name

 - Native builds are working fine
   -- bergamot-translator-app output is consistent
2021-02-09 15:37:29 +01:00
Abhishek Aggarwal
9a54d2116c Updated marian-dev submodule
- Switch to "wasm" branch of browsermt/marian-dev
2021-02-08 13:46:59 +01:00
Jerin Philip
2929077324 Reordering git submodule update before includes 2021-02-02 14:41:26 +00:00
Jerin Philip
548c8880ff CMake updates submodules 2021-02-02 14:39:19 +00:00
Jerin Philip
e76a602dc7 Removing config file printing 2021-01-28 21:44:05 +00:00
Jerin Philip
9a17f365c6 Fix for garbled output through cli.
Requirement for string_view is the original source string be transferred
all the way from input to service to back to TranslationResult. This
constraint was violated in several places by means of existence of a
copy-constructor. The issue is fixed by deleting copy and assignment
constructors in marian::bergamot::TranslationResult and
UnifiedAPI::TranslationResult, which demonstrated a few occurances of
the same. Replaced the same with move semantics.  In addition, future is
set and get using move semantics at the moment.  Default
move-constructor didn't seem to be working, so they're made explicit for
TranslationResults.

This commit additionally packs a few deletions and improvements made to
improve structure (textops.cpp, batcher.cpp) along the process of
inspecting and fixing the garbled outputs. They are choose to be kept,
in the interest of time, against a prettified atomic commit engineering.

Combinations of the following commits in jp/string-view-bug
[acfc92 78a588 12d91b 00a277 919e2f 9d3a46 b7e39b 18f67b bf667c]
2021-01-26 21:18:15 +00:00
Abhishek Aggarwal
0d16b1957f Improved main.cpp file
- Print original and translated text
 - Just add 2 vector entries for texts
2021-01-26 14:49:28 +01:00
Abhishek Aggarwal
b49f2c1af3 Cleanup TranslationModelConfiguration to std::string change in API
- Provide yaml formatted string as model configuration
 - Remove redundant files
2021-01-26 11:13:41 +01:00
Abhishek Aggarwal
026f1af887 Removed redundant lines from CMakeFile 2021-01-26 10:46:35 +01:00
Jerin Philip
08a7358c3d Integrating marian-translator through API
Using std::string for config. Now capable of launching marian translator
through API interface. There's a sketchy workaround to convert a string
config to marian::Options, with an added note.
2021-01-25 22:11:38 +00:00
Jerin Philip
69adc7af77 Changing code-style to clang-format-google 2021-01-24 21:46:47 +00:00
Jerin Philip
cd025e9f65 CI scripts: master -> main 2021-01-23 14:39:08 +00:00
Jerin Philip
7e2eb02e18 CI and Associated Changes
Enables Mac and Ubuntu CPU only builds through GitHub CI. CI scripts are
copied from marian-dev with necessary changes.

3rd-party/marian-dev is modified to meet C++17 requirements modifying
for half_float.
2021-01-23 13:34:04 +00:00
Jerin Philip
988e76baf9 Removing Exception to fix Apple compile 2021-01-22 15:13:30 +00:00
Abhishek Aggarwal
1c3b656852 Removed a redundant directory inclusion in CMakeFile 2021-01-22 15:53:19 +01:00
Abhishek Aggarwal
c8fc004452 Improved 3rd party header inclusion and library linking 2021-01-22 15:47:36 +01:00
Jerin Philip
3b6b9cd2bf Updating README.md with instructions to run service-cli 2021-01-22 11:51:49 +00:00
Jerin Philip
e75bd7eb57 Adding vim temporary files to .gitignore 2021-01-22 11:31:20 +00:00
Jerin Philip
37143933a1 CMakeLists improvements
Only the bergamot-translator library should be linked to main target
Any other library (marian ${MARIAN_CUDA_LIB} ${EXT_LIBS} ssplit
pcrecpp.a pcre.a) should be linked to bergamot-translator target inside
src/translator folder.
2021-01-22 11:29:32 +00:00
Jerin Philip
80125e2789 Removing unused variable in batch_translator 2021-01-21 14:54:30 +00:00
Jerin Philip
12e7e2c650 Fixing compile error, need tests, CI 2021-01-21 14:54:09 +00:00
Jerin Philip
9b18bd9ffc MTranslationResult, more comments 2021-01-21 02:03:47 +00:00
Jerin Philip
ea1a628cd2 Neaten TextProcessor, add a bit of docs.
- Truncating long sentences into those of a specified length for faster
  processing is now a separate function, for improved readability.
- Changes doing push_back -> emplace_back at places to avoid copy.
- query_to_segments is renamed as process.
- Comments are added in an attempt to bring some sanity.
2021-01-21 01:31:29 +00:00
Jerin Philip
4640ae4091 Fixes copying around vocabs
Vocabs was earlier loaded in each thread and copied several times.
Modified this to be loaded only once in Service and reference used
consistently later on.

This change makes Tokenizer as a class rather moot, as there's only one
private member and a function. Moved this into TextProcessor.
SentenceSplitter, however remains a separate class.

utils.{h,cpp} had only a single loadVocabularies function, which
is at the moment required only in Service. Making loadVocabularies a
function inside Service and getting rid of utils.*.
2021-01-21 00:29:53 +00:00
Jerin Philip
d6ec007df9 TranslationResult Docs
Removed Alignments, too many questions and no concrete answers. Better
off removing unused code. History is kept for now, for internal use.
2021-01-20 21:58:13 +00:00
Jerin Philip
caa03e1d9f Removing unused timer.h 2021-01-20 21:21:43 +00:00
Jerin Philip
54a6c6ce80 Moving main (mts) to app/
Commit modifies the example test-code main-mts into the app folder,
updating CMakeLists accordingly.
2021-01-20 21:18:20 +00:00
Jerin Philip
d3c707f735 Enhancing service.h further 2021-01-20 21:11:27 +00:00
Jerin Philip
b3f1905a12 Adding documentation and example to service.h 2021-01-20 20:56:50 +00:00
Jerin Philip
b25b2276e3 Undoing LineSplitter, reverting SentenceSplitter.
A faster linesplitter added for benchmarks is removed in favour of @ug's
ssplit-cpp.
NOTE: ssplit-cpp's regex based implementation is slow for one-line
parses, which ideally needs to be improved in upstream ssplit-cpp to
trivially reduce to a faster newline character based split.
2021-01-20 20:11:07 +00:00
Jerin Philip
bde9094728 Updating CMakeLists to build main
CMakeLists have been modified with the necessary includes to add
browsermt/mts@nuke files to the bergamot-translator library. In
addition, adds the ssplit dependency, corresponding includes.

Intel MKL fails on compilation, unable to find libraries. To solve this
3rd_party/CMakeLists.txt is modified with @ug's fixes to propogate
variables (EXT_LIBS, etc) at a library level.
2021-01-20 19:52:34 +00:00
Jerin Philip
d786f2554e Bumping marian with sentencepiece capable fork
Modifications to SentencePiece are necessary to provide token level
string_views. This commit changes marian to an alternate branch which
has the feature incorporated.
2021-01-20 19:14:40 +00:00
Jerin Philip
601bd52716 Import sources from mts adaptation
This first commit imports files from  mts which was repurposed for bergamot translator
from https://github.com/browsermt/mts/tree/nuke.
2021-01-20 19:08:46 +00:00
abhi-agg
0200843ed7
Merge pull request #7 from browsermt/application
Updated README and Added a simple Application
2020-12-11 14:44:55 +01:00
abhi-agg
fd897dc4ec
Merge pull request #6 from browsermt/api
Use marian::Options class internally for configuration options
2020-11-26 10:11:54 +01:00
Abhishek Aggarwal
f8c9a6b0cc Added an application showing usage of bergamot translator
- 'app' folder contains the application
 - The application uses dummy requests and responses for now
2020-11-16 15:44:02 +01:00
Abhishek Aggarwal
9478a54628 Improved 3rd party header inclusion
- Inclusion now contains explicit names of the 3rd party
   libraries
2020-11-16 15:14:50 +01:00
Abhishek Aggarwal
cd505c9286 Updated README with 'Build' and 'Use' instructions 2020-11-16 13:09:42 +01:00
Abhishek Aggarwal
ce7312cfd4 Added basic skeleton for Adaptor class
- The class adapts the TranslationModelConfiguration to marian::Options
 - Returns a dummy marian::Options for now
2020-11-12 11:17:34 +01:00
Abhishek Aggarwal
59c940090b Use marian::Options class internally for configuration options
- Marian uses Options class everywhere as configuration options

 - Owing to this project's heavy dependency on Marian:
   -- Made the internal implementation files of the project work
      with marian::Options instead of TranslationModelConfiguration
   -- An Adaptor class to adapt TranslationModelConfiguration
      to marian::Options will be added in following commit
2020-11-12 11:04:19 +01:00
abhi-agg
2c1515313e
Merge pull request #5 from browsermt/api
Separated the public includes of the project from implementation
2020-11-11 19:44:50 +01:00
Abhishek Aggarwal
210c5a466a Separated the public includes of the project from implementation
- All interfaces are present in ROOT/src
2020-11-11 17:52:27 +01:00
abhi-agg
77abbfa9c7
Merge pull request #4 from browsermt/api
Compile marian submodule in the project
2020-11-11 17:25:48 +01:00
Abhishek Aggarwal
358d76871f Small change: Added New line endings 2020-11-11 17:18:12 +01:00
Abhishek Aggarwal
36911d39d5 Link marian library in the project 2020-11-11 16:24:50 +01:00
Abhishek Aggarwal
a220f915fc Compile marian submodule in the project
- marian compiles successfully and is ready to be used
   in the project
2020-11-11 16:19:54 +01:00