This PR refactors the training graph groups and optimizers to enable and simplify things for fp16 support.
Deprecates old unused graph groups and fixes a couple of MPI issues.
Adds intgemm as a module for Marian. Intgemm is @kpu 's 8/16 bit gemm library with support for architectures from SSE2 to AVX512VNNI
Removes outdated integer code, related to the --optimize option
Co-authored-by: Kenneth Heafield <github@kheafield.com>
Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
Co-authored-by: Ulrich Germann <ugermann@inf.ed.ac.uk>
Co-authored-by: Marcin Junczys-Dowmunt <marcinjd@microsoft.com>
Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
* copy changes from commit 4df92f2
* add comments for better understanding
* restore the newline at the end of file and add this changes in changelog.md
* support for Apple Accelerate
* add a CMake flag to use Apple Accelerate as the BLAS library.
* rename USE_ACCELERATE to USE_APPLE_ACCELERATE
* add comment with more info on Accelerate
* link to the Apple documentation on Accelerate.
- Updates sentencepiece to the newest version (removes dependency on protobuf)
- Enable SentencePiece compilation by default since there is no dependency in protobuf anymore.
* This PR adds training of embedding spaces with better separation based on https://arxiv.org/abs/2007.01852
* We can now train with in-batch negative examples or a handful of hand-constructed negative examples provided in a tsv-file.
Adds e.g. --logical-epoch 1Gt (or other units) that alters the way the epoch counter is displayed. The actual underlying counter in form data passes is not changed. This is essentially a logging change that will now display the epoch as a fractional multiple of the chosen unit.
Example for `--logical-epoch 100Mt`:
```
[2020-11-02 04:14:16] Ep. 4.8602 : Up. 16755 : Sen. 1,088,000 : Cost 1.17630422 * 1,993,304 @ 31,985 after 486,015,051 : Time 61.36s : 32483.55 words/s
[2020-11-02 04:15:18] Ep. 4.8803 : Up. 16825 : Sen. 1,162,648 : Cost 1.17474616 * 2,009,996 @ 37,740 after 488,025,047 : Time 61.88s : 32480.17 words/s
[2020-11-02 04:16:19] Ep. 4.9002 : Up. 16893 : Sen. 1,235,200 : Cost 1.17799997 * 1,990,844 @ 26,173 after 490,015,891 : Time 60.47s : 32920.16 words/s
Replace `--after-batches N` and `--after-epochs N` with `--after Nu/Ne` which allows to specify updates, epochs, target labels with units, e.g.:
* `--after 30Gt` or `--after 50ku` or `--after 10e`
* Can also combine multiple criteria: `--after 30Gt,50ku,10e` and will stop when whichever hits first
Changes default `cost-type` from `ce-mean` to `ce-sum` and turns `display-label-counts` on by default.