This PR reverts changes to transformer caching (public PR https://github.com/marian-nmt/marian-dev/pull/881)
It seems to cause catastrophic memory leaks or incorrect de-allocation during decoding.
This PR adds an Azure Pipeline for running regression tests on an Azure Hosted GPU Pool.
It currently run on Ubuntu 18.04, GCC 8, CUDA 11.1, a single Nvidia M60 GPU device (Maxwell).
The pipeline needs to be started manually: go to "Pipelines", then "Marian GPU Pool", click "Run pipeline", select the branch, click "Run".
This PR introduces small fixes around fp16 training and batch fitting:
* Multi-loss casts type to first loss-type before accumulation (aborted before due to missing cast)
* Throw `ShapeSizeException` if total expanded shape size exceeds numeric capacity of the maximum int value (2^31-1)
* During mini-batch-fitting, catch `ShapeSizeException` and use another sizing hint. Aborts outside mini-batch-fitting.
* Negative `--workspace -N` value allocates workspace as total available GPU memory minus N megabytes.
This PR sets default parameters for cost-scaling to 8.f 10000 1.f 8.f, i.e. when scaling scale by 8 and do not try to automatically scale up or down. This seems most stable than variable cost-scaling with larger numbers that was the default before.
This PR fixes case augmentation with multi-threaded reading. The solution is to not look at iterator::pos_ in lazy processing, rather pass it as an argument to the lazy function.
This replaces dense alignment storage and training with a sparse representation. Training speed with guided alignment matches now nearly normal training speed, regaining about 25% speed.
This is no. 1 of 2 PRs. The next one will introduce a new guided-alignment training scheme with better alignment accuracy.
* Add -DDETERMINISTIC=ON/OFF flag to CMake
* Use -DDETERMINISTIC=on in GitHub/Azure workflows
Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
* More examples for MLP layers and docs about RNN layers
* Docs about embedding layer and more doxygen code docs
* Add layer and factors docs into index.rst
* Update layer documentation
* Fix typos
Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
Co-authored-by: Graeme Nail <graemenail.work@gmail.com>
* Add MMAP as an option
* Use io::isBin
* Allow getYamlFromModel from an Item vector
* ScorerWrapper can now load on to a graph from Item vector
The interface IEncoderDecoder can now call graph loads directly from an
Item Vector.
* Translator loads model before creating scorers
Scorers are created from an Item vector
* Replace model-config try-catch with check using IsNull
* Prefer empty vs size
* load by items should be pure virtual
* Stepwise forward load to encdec
* nematus can load from items
* amun can load from items
* loadItems in TranslateService
* Remove logging
* Remove by filename scorer functions
* Replace by filename createScorer
* Explicitly provide default value for get model-mmap
* CLI option for model-mmap only for translation and CPU compile
* Ensure model-mmap option is CPU only
* Remove move on temporary object
* Reinstate log messages for model loading in Amun / Nematus
* Add log messages for model loading in scorers
Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>