Commit Graph

218 Commits

Author SHA1 Message Date
Marcin Junczys-Dowmunt
4d3702c4ec Merged PR 25950: Add missing defaults for concatenated factors
This PR adds missing default values for concatenated factors.
2022-10-06 05:53:16 +00:00
Marcin Junczys-Dowmunt
1e92cff93d Merged PR 25919: Sync with public master - no review required
Sync with public master, checking compilation, regression tests etc.
2022-10-04 00:42:52 +00:00
Marcin Junczys-Dowmunt
2c55cdb3c0 Merged PR 25889: Fixes bad memory access problem in hashing
Fix bad memory access problem in hashing by using the graph allocator
2022-09-29 19:01:49 +00:00
Marcin Junczys-Dowmunt
2cd3055d76 Merged PR 25836: Check via hashing if re-syncing in local mode is required
* This adds GPU-side hashing to tensors (a hash based on mumurhash3)
* The hash is used to check if parameters across nodes have diverged, if yes, resync all parameters and optimizer shards. Before it would resync every N (100 or 200) updates. Now this can be skipped if nothing diverged.
2022-09-27 18:40:53 +00:00
Marcin Junczys-Dowmunt
7d2045a907 Merged PR 25686: Loading checkpoints from main node only via MPI
Enables loading of model checkpoints from main node only via MPI.

Until now the checkpoint needed to present in the same location on all nodes. That could be done either via writing to a shared filesystem (problematic due to bad syncing) or by manual copying to the same local location, e.g. /tmp on each node (while writing only happened to one main location).

Now, marian can resume training from only one location on the main node. The remaining nodes do not need to have access. E.g. local /tmp on the main node can be used, or race conditons on shared storage are avoided.

Also avoids creating files for logging on more than one node. This is a bit wonky, done via environment variable lookup.
2022-09-21 20:39:54 +00:00
Marcin Junczys-Dowmunt
f3e1efe731 merge with internal master 2022-05-26 06:28:06 -07:00
Marcin Junczys-Dowmunt
1a74358277 Merged PR 23429: Small fixes around fp16 training and batch fitting
This PR introduces small fixes around fp16 training and batch fitting:
* Multi-loss casts type to first loss-type before accumulation (aborted before due to missing cast)
* Throw `ShapeSizeException` if total expanded shape size exceeds numeric capacity of the maximum int value (2^31-1)
* During mini-batch-fitting, catch `ShapeSizeException` and use another sizing hint. Aborts outside mini-batch-fitting.
* Negative `--workspace -N` value allocates workspace as total available GPU memory minus N megabytes.
2022-04-11 20:19:58 +00:00
Marcin Junczys-Dowmunt
d5c7372a67 Merged PR 23407: Fix incorrect/missing gradient accumulation for affine biases
This PR fixes incorrect/missing gradient accumulation with delay > 1 or large effective batch size of biases of affine operations.
2022-04-08 16:00:04 +00:00
Artur Nowakowski
23c36ec1a3
Fixed fp16 training/inference with factors-combine concat (#926) 2022-03-22 10:07:41 +00:00
Marcin Junczys-Dowmunt
16bfa0c913 Merged PR 23094: Adapt --cost-scaling to more stable setting
This PR sets default parameters for cost-scaling to 8.f 10000 1.f 8.f, i.e. when scaling scale by 8 and do not try to automatically scale up or down. This seems most stable than variable cost-scaling with larger numbers that was the default before.
2022-03-16 14:44:17 +00:00
Marcin Junczys-Dowmunt
310d2f42f6 Merged PR 22939: Fix case augmentation with multi-threaded reading
This PR fixes case augmentation with multi-threaded reading. The solution is to not look at iterator::pos_ in lazy processing, rather pass it as an argument to the lazy function.
2022-03-07 16:57:32 +00:00
Graeme Nail
601c9ac980
Detect fortran_order in npz (#911)
* Fix fortran_order parsing
* Abort on non row-major NPZ entries
* Update CHANGELOG
* Update VERSION

Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
2022-02-15 13:22:49 +00:00
Nikolay Bogoychev
8a9580b329
update the intgemm version to upstream (#924)
Some data types got upper cased, that's why there is a larger diff than expected

Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
2022-02-15 11:18:29 +00:00
Marcin Junczys-Dowmunt
b0275e7754 merge with internal master 2022-02-11 06:03:16 -08:00
Marcin Junczys-Dowmunt
4b51dcbd06 Merged PR 22524: Optimize guided alignment training speed via sparse alignments - part 1
This replaces dense alignment storage and training with a sparse representation. Training speed with guided alignment matches now nearly normal training speed, regaining about 25% speed.

This is no. 1 of 2 PRs. The next one will introduce a new guided-alignment training scheme with better alignment accuracy.
2022-02-11 13:50:47 +00:00
Marcin Junczys-Dowmunt
3b21ff39c5 update VERSION and CHANGELOG 2022-02-10 08:35:49 -08:00
Marcin Junczys-Dowmunt
b3feecc82b Merged PR 22483: Make C++17 the official standard for Marian
Make C++17 the official standard for Marian
2022-02-10 16:34:23 +00:00
Graeme Nail
4d44627f26
PyYaml safe_load instead of load (#913)
* pyyaml safe_load instead of load
* Update CHANGELOG
2022-02-10 11:20:27 +00:00
Marcin Junczys-Dowmunt
f00d062189 update VERSION and CHANGELOG - Release 1.11.0 2022-02-08 08:40:33 -08:00
Marcin Junczys-Dowmunt
3cf9e83bac resolve conflicts 2022-02-06 12:33:58 -08:00
Graeme Nail
b29cc07a95
Scorer model loading (#860)
* Add MMAP as an option
* Use io::isBin
* Allow getYamlFromModel from an Item vector
* ScorerWrapper can now load on to a graph from Item vector
The interface IEncoderDecoder can now call graph loads directly from an
Item Vector.
* Translator loads model before creating scorers
Scorers are created from an Item vector
* Replace model-config try-catch with check using IsNull
* Prefer empty vs size
* load by items should be pure virtual
* Stepwise forward load to encdec
* nematus can load from items
* amun can load from items
* loadItems in TranslateService
* Remove logging
* Remove by filename scorer functions
* Replace by filename createScorer
* Explicitly provide default value for get model-mmap
* CLI option for model-mmap only for translation and CPU compile
* Ensure model-mmap option is CPU only
* Remove move on temporary object
* Reinstate log messages for model loading in Amun / Nematus
* Add log messages for model loading in scorers

Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
2022-01-18 12:58:52 +00:00
Nikolay Bogoychev
e26e5b6faf
Use apple accelerate on MacOs by default (#897) 2021-12-16 15:07:34 +00:00
Nikolay Bogoychev
e8a1a2530f
Fix AVX2+ detection on Mac (#895)
MacOS is weird and its CPU flags are separated in two separate fields returned by the sysctl interface. To get around this, we need to test both of them, so here goes

Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
2021-12-07 17:47:33 +00:00
Marcin Junczys-Dowmunt
bbc673c50f update CHANGELOG and VERSION 2021-11-24 18:42:14 -08:00
Nikolay Bogoychev
ab6b826083
Add GCC 11 support (#888)
* Add GCC 11 support

Some C++ Standard Library headers have been changed to no longer include other headers that they do need to depend on. As such, C++ programs that used standard library components without including the right headers will no longer compile.
The following headers are used less widely in libstdc++ and may need to be included explicitly when compiled with GCC 11:

<limits> (for std::numeric_limits)
<memory> (for std::unique_ptr, std::shared_ptr etc.)
<utility> (for std::pair, std::tuple_size, std::index_sequence etc.)
<thread> (for members of namespace std::this_thread.)

Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
2021-11-23 10:13:29 +00:00
Nikolay Bogoychev
1adf80b7c9
Task alias validation during training mode (#886)
* Attempt to validate task alias
* Validate allowed options for --task alias
* Update comment in aliases.cpp
* Show allowed values for alias

Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
2021-11-22 19:19:58 +00:00
David Meikle
3b4e943cda
Added pragma to ignore unused-private-field error on elementType_ which failed in macOS (#872)
Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
2021-11-22 12:22:06 +00:00
Kenneth Heafield
4dd30b5065
Factor concatenation improvements and documentation (#748)
* concatenation combining option added when embeding using factors
* crossMask not used by default
* added an option to better clarify when choosing factor predictor options
* fixed bug when choosing re-embedding option and not setting embedding size
* avoid uncessary string copy
* Check in factors documentation
* Fix duplication in merge
* Self-referential repository
* change --factors-predictor to --lemma-dependency. Default behaviour changed.
* factor related options are now stored with the model
* Update doc/factors.md
* add backward compability for the target factors
* Move backward compatibility checks for factors to happen after the model.npz config is loaded
* Add explicit error msg if using concat on target
* Update func comments. Fix spaces
* Add Marian version requirement
* delete experimental code

Co-authored-by: Pedro Coelho <pedrodiascoelho97@gmail.com>
Co-authored-by: Pedro Coelho <pedro.coelho@unbabel.com>
Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
2021-09-08 14:02:21 +01:00
Rohit Jain
056c4bef5b Merged PR 19860: Case augmented data, if not using factored vocab must not set guided alignments
This change allows marking SentenceTuples as 'altered', if they were generated or modified by data augmentation internally in such a way so as to impact processing. In particular, for such sentence tuples, we do not want to try setting guided alignments if the externally provided guided alignments might no longer be correct after that alteration.
2021-07-17 23:03:16 +00:00
Martin Junczys-Dowmunt
8e88071ae8 Merged PR 19842: Adapt LSH to work with Leaf
Small changes to make the LSH work with Leaf server and QuickSand.
2021-07-16 20:04:16 +00:00
Qianqian Zhu
42f0b8b74b
Binary shortlist (#856)
Co-authored-by: Kenneth Heafield <github@kheafield.com>
2021-07-10 22:56:58 -07:00
Marcin Junczys-Dowmunt
3a478fc47d update version and changelog 2021-07-09 13:46:18 -07:00
Martin Junczys-Dowmunt
fc0f41f24a Merged PR 19597: Enable mpi wrapper to use size larger than MAX_INT
Enable mpi wrapper to use size larger than MAX_INT.
2021-06-28 23:15:23 +00:00
Roman Grundkiewicz
6e87f16e48 Merged PR 18763: Fix adding new validation metrics with --valid-reset-stalled
This fixes a bug that's been discovered recently by checking if a validator exists before resetting its stalled validations.
Regression test for it is in: https://github.com/marian-nmt/marian-regression-tests/pull/80
2021-05-26 06:12:33 +00:00
Marcin Junczys-Dowmunt
3133a9b27b resolve conflict 2021-05-24 11:19:20 -07:00
Marcin Junczys-Dowmunt
84a20f65a1 Merge branch 'master' into pmaster 2021-05-24 11:17:53 -07:00
Marcin Junczys-Dowmunt
8b818b7c07 Avoid Ampere misaligment issue 2021-05-17 13:25:13 -07:00
Nikolay Bogoychev
379212b75c
Enable compute86 where supported (#863)
* Enable compute86 where supported
2021-05-04 12:36:10 +01:00
Kenneth Heafield
36b4b69d7b
Remove unused memoized_ variable (#852) 2021-04-28 13:28:50 +01:00
Roman Grundkiewicz
49e379bba5 Merged PR 18612: Early stopping on first, all, or any validation metrics
Adds `--early-stopping-on first|all|any` allowing to decide if early stopping should take into account only first, all, or any validation metrics.

Feature request: https://github.com/marian-nmt/marian-dev/issues/850
Regression tests: https://github.com/marian-nmt/marian-regression-tests/pull/79
2021-04-26 11:51:43 +00:00
Marcin Junczys-Dowmunt
309bd748ab Merge branch 'master' of github.com:marian-nmt/marian-dev into pmaster 2021-04-21 05:13:58 +00:00
Marcin Junczys-Dowmunt
3e51ff3872 fix depth-scaling in FFN 2021-04-20 15:50:53 +00:00
Kenneth Heafield
bb6092da2b
Compute tensor size using integers (#851) 2021-04-14 08:48:51 -07:00
Marcin Junczys-Dowmunt
ed29048004 Merge branch 'master' of vs-ssh.visualstudio.com:v3/machinetranslation/Marian/marian-dev 2021-04-11 04:29:46 +00:00
Marcin Junczys-Dowmunt
ea55722372 Merge branch 'pmaster' 2021-04-11 04:29:17 +00:00
huangjq0617
a7c3a0b2ef
fix beam_search ABORT when enable openmp and OMP_NUM_THREADS > 1 (#767) 2021-04-10 21:28:04 -07:00
Martin Junczys-Dowmunt
caddad90cd Merged PR 18505: RMSNorm on GPU
Support for RMSNorm as drop-in replace for LayerNorm from _Biao Zhang; Rico Sennrich (2019). Root Mean Square Layer Normalization_. Enabled in Transformer model via `--transformer-postprocess dar` instead of `dan`.
2021-04-10 15:28:38 +00:00
Marcin Junczys-Dowmunt
6435c6f1ce synced with public master 2021-04-09 16:12:34 +00:00
Marcin Junczys-Dowmunt
be65065623
Allow to choose fine-grained CPU intrinsics on as CMake options (#849)
* allow to choose fine-grained CPU intrinsics on as CMake options
* inform user that e.g. -DCOMPILE_AVX2=off will be ignored with -march=native if there is compiler support
2021-04-09 09:02:34 -07:00
rhenry-nv
fddd0e0661
Adds better Affine support for GPUs when using CUDA 11. Introduces a new bias addition kernel for CUDA < 11 (#778)
Co-authored-by: Marcin Junczys-Dowmunt <marcinjd@microsoft.com>
2021-04-08 21:46:27 -07:00