* allow to choose fine-grained CPU intrinsics on as CMake options
* inform user that e.g. -DCOMPILE_AVX2=off will be ignored with -march=native if there is compiler support
This PR adds cleaner suppression of unwanted output words. We identified a situation where SPM with byte-fallback can generate random bytes with output-sampling.
That is particularly harmful when that random bytes happens to be a newline symbol. Here we suppress newline in output unless explicitly wanted.
* unit tests for binary file operations
* adjust changelog
* Set file_ in TemporaryFile for MSVC
Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
Adds intgemm as a module for Marian. Intgemm is @kpu 's 8/16 bit gemm library with support for architectures from SSE2 to AVX512VNNI
Removes outdated integer code, related to the --optimize option
Co-authored-by: Kenneth Heafield <github@kheafield.com>
Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
Co-authored-by: Ulrich Germann <ugermann@inf.ed.ac.uk>
Co-authored-by: Marcin Junczys-Dowmunt <marcinjd@microsoft.com>
Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
* copy changes from commit 4df92f2
* add comments for better understanding
* restore the newline at the end of file and add this changes in changelog.md
* support for Apple Accelerate
* add a CMake flag to use Apple Accelerate as the BLAS library.
* rename USE_ACCELERATE to USE_APPLE_ACCELERATE
* add comment with more info on Accelerate
* link to the Apple documentation on Accelerate.
* This PR adds training of embedding spaces with better separation based on https://arxiv.org/abs/2007.01852
* We can now train with in-batch negative examples or a handful of hand-constructed negative examples provided in a tsv-file.
Adds e.g. --logical-epoch 1Gt (or other units) that alters the way the epoch counter is displayed. The actual underlying counter in form data passes is not changed. This is essentially a logging change that will now display the epoch as a fractional multiple of the chosen unit.
Example for `--logical-epoch 100Mt`:
```
[2020-11-02 04:14:16] Ep. 4.8602 : Up. 16755 : Sen. 1,088,000 : Cost 1.17630422 * 1,993,304 @ 31,985 after 486,015,051 : Time 61.36s : 32483.55 words/s
[2020-11-02 04:15:18] Ep. 4.8803 : Up. 16825 : Sen. 1,162,648 : Cost 1.17474616 * 2,009,996 @ 37,740 after 488,025,047 : Time 61.88s : 32480.17 words/s
[2020-11-02 04:16:19] Ep. 4.9002 : Up. 16893 : Sen. 1,235,200 : Cost 1.17799997 * 1,990,844 @ 26,173 after 490,015,891 : Time 60.47s : 32920.16 words/s
Replace `--after-batches N` and `--after-epochs N` with `--after Nu/Ne` which allows to specify updates, epochs, target labels with units, e.g.:
* `--after 30Gt` or `--after 50ku` or `--after 10e`
* Can also combine multiple criteria: `--after 30Gt,50ku,10e` and will stop when whichever hits first
Changes default `cost-type` from `ce-mean` to `ce-sum` and turns `display-label-counts` on by default.
* Fixes reductions into scalars for <= 32 input elements. Only affects reductions where 0 is not the identity
* Update CHANGELOG.md
* Adds space before "?"
* Adds comment explaining increase in margin for reduction tests. Adds axis comment to argument to reduce functions. Adds more tests for small reduction operators
This PR enables final post-processing of a full transformer stack for correct prenorm behavior.
See issues: #715 and #699,
List of changes:
Add final post-processing in encoder and decoder if requested with --transformer-postprocess-top. Can take combinations of d, n, a. Using a will add a skip connection from the bottom of the stack.
Add --task transformer-base-prenorm and --task transformer-big-prenorm which correspond to --task transformer-base --transformer-preprocess n --transformer-postprocess da --transformer-postprocess-top n.
* Return exit code 15 (SIGTERM) after SIGTERM.
When marian receives signal SIGTERM and exits gracefully (save model & exit),
it should then exit with a non-zero exit code, to signal to any parent process
that it did not exit "naturally".
* Added explanatory comment about exiting marian_train with non-zero status after SIGTERM.
* Bug fix: better handling of SIGTERM for graceful shutdown during training.
Prior to this bug fix, BatchGenerator::fetchBatches, which runs in a separate
thread, would ignore SIGTERM during training (training uses a custom signal handler
for SIGTERM, which simply sets a global flag, to enable graceful shutdown (i.e.,
save models and current state of training before shutting down).
The changes in this commit also facilitate custom handling of other signals in the
future by providing a general singal handler for all signals with a signal number
below 32 (setSignalFlag) and a generic flag checking function (getSignalFlag(sig))
for checking such flags.