Added `--valid-reset-all` that works as `--valid-reset-stalled` but it also resets last best saved validation metrics, which is useful for when the validation sets change for continued training.
Added new regression test: https://github.com/marian-nmt/marian-regression-tests/pull/89
The best-deep alias in marian is currently broken, because it doesn't set the model type and the default is `amum` which is incompatible with multiple layers. This commit just adds the type to the best-deep alias entry.
* This adds GPU-side hashing to tensors (a hash based on mumurhash3)
* The hash is used to check if parameters across nodes have diverged, if yes, resync all parameters and optimizer shards. Before it would resync every N (100 or 200) updates. Now this can be skipped if nothing diverged.
Enables loading of model checkpoints from main node only via MPI.
Until now the checkpoint needed to present in the same location on all nodes. That could be done either via writing to a shared filesystem (problematic due to bad syncing) or by manual copying to the same local location, e.g. /tmp on each node (while writing only happened to one main location).
Now, marian can resume training from only one location on the main node. The remaining nodes do not need to have access. E.g. local /tmp on the main node can be used, or race conditons on shared storage are avoided.
Also avoids creating files for logging on more than one node. This is a bit wonky, done via environment variable lookup.
This PR introduces small fixes around fp16 training and batch fitting:
* Multi-loss casts type to first loss-type before accumulation (aborted before due to missing cast)
* Throw `ShapeSizeException` if total expanded shape size exceeds numeric capacity of the maximum int value (2^31-1)
* During mini-batch-fitting, catch `ShapeSizeException` and use another sizing hint. Aborts outside mini-batch-fitting.
* Negative `--workspace -N` value allocates workspace as total available GPU memory minus N megabytes.
This PR sets default parameters for cost-scaling to 8.f 10000 1.f 8.f, i.e. when scaling scale by 8 and do not try to automatically scale up or down. This seems most stable than variable cost-scaling with larger numbers that was the default before.
This PR fixes case augmentation with multi-threaded reading. The solution is to not look at iterator::pos_ in lazy processing, rather pass it as an argument to the lazy function.
This replaces dense alignment storage and training with a sparse representation. Training speed with guided alignment matches now nearly normal training speed, regaining about 25% speed.
This is no. 1 of 2 PRs. The next one will introduce a new guided-alignment training scheme with better alignment accuracy.
* Add MMAP as an option
* Use io::isBin
* Allow getYamlFromModel from an Item vector
* ScorerWrapper can now load on to a graph from Item vector
The interface IEncoderDecoder can now call graph loads directly from an
Item Vector.
* Translator loads model before creating scorers
Scorers are created from an Item vector
* Replace model-config try-catch with check using IsNull
* Prefer empty vs size
* load by items should be pure virtual
* Stepwise forward load to encdec
* nematus can load from items
* amun can load from items
* loadItems in TranslateService
* Remove logging
* Remove by filename scorer functions
* Replace by filename createScorer
* Explicitly provide default value for get model-mmap
* CLI option for model-mmap only for translation and CPU compile
* Ensure model-mmap option is CPU only
* Remove move on temporary object
* Reinstate log messages for model loading in Amun / Nematus
* Add log messages for model loading in scorers
Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
MacOS is weird and its CPU flags are separated in two separate fields returned by the sysctl interface. To get around this, we need to test both of them, so here goes
Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
* Add GCC 11 support
Some C++ Standard Library headers have been changed to no longer include other headers that they do need to depend on. As such, C++ programs that used standard library components without including the right headers will no longer compile.
The following headers are used less widely in libstdc++ and may need to be included explicitly when compiled with GCC 11:
<limits> (for std::numeric_limits)
<memory> (for std::unique_ptr, std::shared_ptr etc.)
<utility> (for std::pair, std::tuple_size, std::index_sequence etc.)
<thread> (for members of namespace std::this_thread.)
Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
* Attempt to validate task alias
* Validate allowed options for --task alias
* Update comment in aliases.cpp
* Show allowed values for alias
Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
* concatenation combining option added when embeding using factors
* crossMask not used by default
* added an option to better clarify when choosing factor predictor options
* fixed bug when choosing re-embedding option and not setting embedding size
* avoid uncessary string copy
* Check in factors documentation
* Fix duplication in merge
* Self-referential repository
* change --factors-predictor to --lemma-dependency. Default behaviour changed.
* factor related options are now stored with the model
* Update doc/factors.md
* add backward compability for the target factors
* Move backward compatibility checks for factors to happen after the model.npz config is loaded
* Add explicit error msg if using concat on target
* Update func comments. Fix spaces
* Add Marian version requirement
* delete experimental code
Co-authored-by: Pedro Coelho <pedrodiascoelho97@gmail.com>
Co-authored-by: Pedro Coelho <pedro.coelho@unbabel.com>
Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
This change allows marking SentenceTuples as 'altered', if they were generated or modified by data augmentation internally in such a way so as to impact processing. In particular, for such sentence tuples, we do not want to try setting guided alignments if the externally provided guided alignments might no longer be correct after that alteration.