* Add GitHub workflows
* Workflows with CMake compilation on Windows
* Ubuntu workflow with Boost
* Ignore warnings from Boost
* Compile unit tests on Windows
* Disable cpuinfo tools if compiled with ninja
* Use a separate CMakeSettings.json for CI
* Disable CMake debugs
* Fix unit tests compilation with Ninja Release
* Use FBGEMM in Windows workflow; add comments
* Fix C4706 warning
* Update CHANGELOG
* Run Windows build on pull requests
* Compile SentencePiece statically in Windows workflow
* Add GitHub workflow on MacOS
* Address review comments
* Disable C4702 globally, not only in Debug
* Update CHANGELOG and workflows names
* Update VERSION
* Add tuple nodes via views and trickery
* Add `topk` operator, currently unused outside unit tests
* Add `abs` operator, currently unused outside unit tests
* Change return type of `Node::allocate()` to `void`. This used to return the number of allocated elements, but isn't really used anywhere. To avoid future confusion of elements and bytes, removed for now.
* Fix server build with current boost, move simple-websocket-server to submodule
* Change submodule to marian-nmt/Simple-WebSocket-Server
* Update submodule simple-websocket-server
Co-authored-by: Gleb Tv <glebtv@gmail.com>
* Add basic support for TSV inputs
* Fix mini-batch-fit for TSV inputs
* Abort if shuffling data from stdin
* Fix terminating training with data from STDIN
* Allow creating vocabs from TSV files
* Add comments; clean creation of vocabs from TSV files
* Guess --tsv-size based on the model type
* Add shortcut for STDIN inputs
* Rename --tsv-size to --tsv-fields
* Allow only one 'stdin' in --train-sets
* Properly create separate vocabularies from a TSV file
* Clearer logging message
* Add error message for wrong number of valid sets if --tsv is used
* Use --no-shuffle instead of --shuffle in the error message
* Fix continuing training from STDIN
* Update CHANGELOG
* Support both 'stdin' and '-'
* Guess --tsv-fields from dim-vocabs if special:model.yml available
* Update error messages
* Move variable outside the loop
* Refactorize utils::splitTsv; add unit tests
* Support '-' as stdin; refactorize; add comments
* Abort if excessive field(s) in the TSV input
* Add a TODO on passing one vocab with fully-tied embeddings
* Remove the unit test with excessive tab-separated fields
The previous mechanism to remove empty inputs does not play well with batch purging (removal of finished sentences). Now we reuse the batch purging mechanism to get rid of empty inputs by forcing EOS for all beam entries of a batch entry for the corresponding source batch entry. The purging then takes care of the rest. We set the probability to log(1) = 0.
Splitting up header file into header and *.cu, comes with the price of having to include specializations for combinations of types as for element.inc and add.inc. No code changes otherwise.
Add CMake options to disable specific compute capabilities.
When run with `make -j16` this compiles in about 6 minutes instead of 7 minutes. Selecting only SM70 during compilation brings down the time to 3 minutes.
* Downgrade NCCL to 2.3.7 as 2.4.2 is buggy (hangs with larger models)
* Actually enable gradient-checkpointing, previous option was inactive
* Clean-up training-only options that should not be displayed for decoder and scorer
* Re-enable conversion to FP16 if element types are compatible (belong to the same type class)
* A few typos and more verbose log messages.
* Add printing word level scores
* Add option --no-spm-decode
* Fix precision for word-level scores
* Fix getting the no-spm-decode option
* Update CHANGELOG
* Add comments and refactor
* Print word-level scores next to other scores in an n-best list
* Remove --word-scores from marian-scorer
* Add --no-spm-decode only if compiled with SentencePiece
* Add comments
* Printing word scores before model scores in n-best lists
* Update VERSION
Co-authored-by: Marcin Junczys-Dowmunt <Marcin.JunczysDowmunt@microsoft.com>
This implements Sequential Unlikelihood Training from https://arxiv.org/abs/1908.04319
* implementation as expensive multi-op, special node in-progress.
* fixed gather operator to work in batched cases
This fixes a number of bugs in our GPU reduce-kernels that would manifest mainly for larger matrices and during back-prop. We also drop support for CUDA 8.0 to be able to take advantage of new GPU primitives introduced by NVidia in CUDA 9.0.
This PR introduces batch-purging in Marian, i.e. whenever a virtual beam becomes inactive (empty) the entire batch entry that corresponds to that beam can be removed from the encoder and decoder neural states. The CPU-side beam search keeps tracks of the hypotheses as before, but needs to perform mappings between original and shifted batch indices.
In FastOpt we do not want to use locking during access, but that makes reference counting not thread-safe. We now use std::unique_ptr to const objects or const references everywhere. This fixes random segfaults with multi-GPU training. @TODO: clean-up option merging to make option generally immutable.