BatchTranslators are now held in Service. Threads are separate, and
constructed via lambdas. Retaining BatchTranslator class and member
function (Probably a matter of taste I guess).
This should eliminate complaints in (#10), hopefully.
To avoid confusion, this commit renames
marian::bergamot::TranslationResult -> marian::bergamot::Response.
Usages of marian::bergamot::TranslationResults are updated across the
source to be consistent with the change and get source back working.
marian-TranslationResult has more guards in place. Switching to a
construction on demand model for sentenceMappings. These changes
propogate to bergamot translation results.
Integration broke with the change in marian's internals, which are
updated accordingly to get back functionality.
Changes revealed a few bugs, which are fixed:
- ConfigParser already discovered in wasm-integration
(a06530e92b).
- Lambda captures and undefined values in DeviceId
Batch Ids cannot be set by outside classes to values < 0.
Batch.Id_ =
-1 : Poison, for use in PCQueue
0 : Default constructed, invalid batch.
>0 : Legit batch.
Book-keeping for batch metrics (maxLength, numTokens, etc) and logging
are now moved to Batch. Batch is now a class instead of a struct with
accessors controlling how members can be modified to suit above.
Guided by an objective to move batching mechanism and queueing request
to generate batches into a diffenrent thread. This commit is in
preparation for this functionality.
First, PCItem from the looks of it is *Batch*. Renamed to reflect the
same. Fingers crossed, hopefully no naming conflicts with marian.
BatchTranslator translates a "Batch" now, instead of
vector<RequestSentence>. Additional data members are setup at Batch to
enable development.
Workflows previously in Service, but more adequate in Batcher are now
moved, preparing to move Batcher/enqueuing of a request into a new
thread making it non-blocking. This will allow service to queue requests
into the batcher thread and exit, without waiting until the full-request
is queued.
Batcher now has a path with and without pcqueue.
Updates marian-dev and ssplit submodules to point to the upstream
commits which implements the following:
- marian-dev: encodeWithByteRanges(...) to get source token byte-ranges
- ssplit: Has a trivial sentencesplitter functionality implemented, and
now is faster to benchmark with marian-decoder.
This enables a marian-decoder replacement written through ssplit in this
source to be benchmarked constantly with existing marian-decoder.
Nits: Removes logging introduced for multiple workers, and respective
log statements.
- Added abhi-agg/ssplit-cpp
- Added its wasm branch in bergamot-translator
- Native builds of bergamot-translator are successful
-- Sentence splitting is NOT WORKING
-- Only translation is working
Requirement for string_view is the original source string be transferred
all the way from input to service to back to TranslationResult. This
constraint was violated in several places by means of existence of a
copy-constructor. The issue is fixed by deleting copy and assignment
constructors in marian::bergamot::TranslationResult and
UnifiedAPI::TranslationResult, which demonstrated a few occurances of
the same. Replaced the same with move semantics. In addition, future is
set and get using move semantics at the moment. Default
move-constructor didn't seem to be working, so they're made explicit for
TranslationResults.
This commit additionally packs a few deletions and improvements made to
improve structure (textops.cpp, batcher.cpp) along the process of
inspecting and fixing the garbled outputs. They are choose to be kept,
in the interest of time, against a prettified atomic commit engineering.
Combinations of the following commits in jp/string-view-bug
[acfc92 78a588 12d91b 00a277 919e2f 9d3a46 b7e39b 18f67b bf667c]