Commit Graph

3992 Commits

Author SHA1 Message Date
Marcin Junczys-Dowmunt
4ffd292881 Merge branch 'master' into pmaster 2023-02-20 12:15:33 -08:00
Marcin Junczys-Dowmunt
031dbb3266 Merged PR 27804: Fallback to old LSH code for MSVC due to bad loop unrolling
The Visual Studio compiler has inferior optimization and loop unrolling to gcc which results in much slower LSH code that was written to explicitly take advantage of loop unrolling at compile time.

Added an #ifdef to fall back to old LSH code on MSVC.
2023-02-13 15:44:19 +00:00
Marcin Junczys-Dowmunt
9ad5203ca2 Merged PR 26476: Sanitize guided-alignment with case-augmentation (still somewhat wonky)
This fixes the blow-ups of using case-augmentation with guided-alignment. However, it's still not recommended to use this particular combination, results will be unreliable.
2023-02-11 16:35:29 +00:00
Varun Mathur
4f145c450f Merged PR 26311: [FSM] make model loading lock non-static
make lock non-static
2023-02-10 16:34:37 +00:00
Roman Grundkiewicz
ee50d4aaea Merged PR 27051: Add an option for completely resetting validation metrics
Added `--valid-reset-all` that works as `--valid-reset-stalled` but it also resets last best saved validation metrics, which is useful for when the validation sets change for continued training.

Added new regression test: https://github.com/marian-nmt/marian-regression-tests/pull/89
2022-12-20 17:56:10 +00:00
dependabot[bot]
36349645b8
Bump src/3rd_party/sentencepiece from 31ac8e8 to 8dc9172 (#970)
Bumps [src/3rd_party/sentencepiece](https://github.com/marian-nmt/sentencepiece) from `31ac8e8` to `8dc9172`.
- [Release notes](https://github.com/marian-nmt/sentencepiece/releases)
- [Commits](31ac8e8876...8dc9172f88)

---
updated-dependencies:
- dependency-name: src/3rd_party/sentencepiece
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-19 08:36:06 +00:00
Nikolay Bogoychev
07a2ac8126
best-deep alias broken (#968)
The best-deep alias in marian is currently broken, because it doesn't set the model type and the default is `amum` which is incompatible with multiple layers. This commit just adds the type to the best-deep alias entry.
2022-11-02 11:16:14 +00:00
Marcin Junczys-Dowmunt
4d3702c4ec Merged PR 25950: Add missing defaults for concatenated factors
This PR adds missing default values for concatenated factors.
2022-10-06 05:53:16 +00:00
Marcin Junczys-Dowmunt
1e92cff93d Merged PR 25919: Sync with public master - no review required
Sync with public master, checking compilation, regression tests etc.
2022-10-04 00:42:52 +00:00
Marcin Junczys-Dowmunt
2c55cdb3c0 Merged PR 25889: Fixes bad memory access problem in hashing
Fix bad memory access problem in hashing by using the graph allocator
2022-09-29 19:01:49 +00:00
Marcin Junczys-Dowmunt
2cd3055d76 Merged PR 25836: Check via hashing if re-syncing in local mode is required
* This adds GPU-side hashing to tensors (a hash based on mumurhash3)
* The hash is used to check if parameters across nodes have diverged, if yes, resync all parameters and optimizer shards. Before it would resync every N (100 or 200) updates. Now this can be skipped if nothing diverged.
2022-09-27 18:40:53 +00:00
Marcin Junczys-Dowmunt
1f2929d528 Merged PR 25733: Fused inplace ReLU and Dropout in transformer FFN layer
* First attempt at fused inplace ReLU and Dropout in transformer FFN layer
* Adds optional output projection to SSRU.

For large FFN blocks and dropout about 20-25% speed improvement during training.
2022-09-26 20:17:33 +00:00
Marcin Junczys-Dowmunt
7d2045a907 Merged PR 25686: Loading checkpoints from main node only via MPI
Enables loading of model checkpoints from main node only via MPI.

Until now the checkpoint needed to present in the same location on all nodes. That could be done either via writing to a shared filesystem (problematic due to bad syncing) or by manual copying to the same local location, e.g. /tmp on each node (while writing only happened to one main location).

Now, marian can resume training from only one location on the main node. The remaining nodes do not need to have access. E.g. local /tmp on the main node can be used, or race conditons on shared storage are avoided.

Also avoids creating files for logging on more than one node. This is a bit wonky, done via environment variable lookup.
2022-09-21 20:39:54 +00:00
Marcin Junczys-Dowmunt
76964791ad Merged PR 23767: More principled sampling and force-decoding
This PR adds correct force-decoding and more principled sampling, both should now work for ensembles, batches and with beam search.
2022-09-16 22:53:08 +00:00
Roman Grundkiewicz
a47912d9f1 Merged PR 25518: Upgrade Azure Pipelines to macos-12
macos-10.15 will become unsupported in December 2022. Changes:
* Upgrade Azure DevOps to macos-12
* Pull https://github.com/marian-nmt/sentencepiece/pull/14
* Fix clang 13 errors as in https://github.com/marian-nmt/marian-dev/pull/939
2022-09-15 06:18:42 +00:00
Marcin Junczys-Dowmunt
042ed8f2e2 Merged PR 24072: Revert changes to transformer caching
This PR reverts changes to transformer caching (public PR https://github.com/marian-nmt/marian-dev/pull/881)

It seems to cause catastrophic memory leaks or incorrect de-allocation during decoding.
2022-05-30 07:27:15 +00:00
Marcin Junczys-Dowmunt
f3e1efe731 merge with internal master 2022-05-26 06:28:06 -07:00
Marcin Junczys-Dowmunt
e4f3d0f740 add fallback option for sampling, for back-compat 2022-05-09 13:28:28 -07:00
Marcin Junczys-Dowmunt
1a74358277 Merged PR 23429: Small fixes around fp16 training and batch fitting
This PR introduces small fixes around fp16 training and batch fitting:
* Multi-loss casts type to first loss-type before accumulation (aborted before due to missing cast)
* Throw `ShapeSizeException` if total expanded shape size exceeds numeric capacity of the maximum int value (2^31-1)
* During mini-batch-fitting, catch `ShapeSizeException` and use another sizing hint. Aborts outside mini-batch-fitting.
* Negative `--workspace -N` value allocates workspace as total available GPU memory minus N megabytes.
2022-04-11 20:19:58 +00:00
Marcin Junczys-Dowmunt
d5c7372a67 Merged PR 23407: Fix incorrect/missing gradient accumulation for affine biases
This PR fixes incorrect/missing gradient accumulation with delay > 1 or large effective batch size of biases of affine operations.
2022-04-08 16:00:04 +00:00
Artur Nowakowski
23c36ec1a3
Fixed fp16 training/inference with factors-combine concat (#926) 2022-03-22 10:07:41 +00:00
dependabot[bot]
78bef7aeba
Bump src/3rd_party/sentencepiece from c307b87 to 5312a30 (#927)
Bumps [src/3rd_party/sentencepiece](https://github.com/marian-nmt/sentencepiece) from `c307b87` to `5312a30`.
- [Release notes](https://github.com/marian-nmt/sentencepiece/releases)
- [Commits](c307b874de...5312a306c4)

---
updated-dependencies:
- dependency-name: src/3rd_party/sentencepiece
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-22 10:06:11 +00:00
Marcin Junczys-Dowmunt
16bfa0c913 Merged PR 23094: Adapt --cost-scaling to more stable setting
This PR sets default parameters for cost-scaling to 8.f 10000 1.f 8.f, i.e. when scaling scale by 8 and do not try to automatically scale up or down. This seems most stable than variable cost-scaling with larger numbers that was the default before.
2022-03-16 14:44:17 +00:00
Marcin Junczys-Dowmunt
310d2f42f6 Merged PR 22939: Fix case augmentation with multi-threaded reading
This PR fixes case augmentation with multi-threaded reading. The solution is to not look at iterator::pos_ in lazy processing, rather pass it as an argument to the lazy function.
2022-03-07 16:57:32 +00:00
Marcin Junczys-Dowmunt
adaaf087e4 better error message 2022-02-16 13:20:48 -08:00
Graeme Nail
601c9ac980
Detect fortran_order in npz (#911)
* Fix fortran_order parsing
* Abort on non row-major NPZ entries
* Update CHANGELOG
* Update VERSION

Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
2022-02-15 13:22:49 +00:00
Nikolay Bogoychev
8a9580b329
update the intgemm version to upstream (#924)
Some data types got upper cased, that's why there is a larger diff than expected

Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
2022-02-15 11:18:29 +00:00
Marcin Junczys-Dowmunt
4b51dcbd06 Merged PR 22524: Optimize guided alignment training speed via sparse alignments - part 1
This replaces dense alignment storage and training with a sparse representation. Training speed with guided alignment matches now nearly normal training speed, regaining about 25% speed.

This is no. 1 of 2 PRs. The next one will introduce a new guided-alignment training scheme with better alignment accuracy.
2022-02-11 13:50:47 +00:00
Marcin Junczys-Dowmunt
b3feecc82b Merged PR 22483: Make C++17 the official standard for Marian
Make C++17 the official standard for Marian
2022-02-10 16:34:23 +00:00
Marcin Junczys-Dowmunt
e6dbacb310 Merged PR 22490: Faster LSH top-k for CPU
This PR replaces the top-k search from FAISS on the CPU with a more specialized version for discrete distances in sub-linear time.
2022-02-10 16:30:21 +00:00
Marcin Junczys-Dowmunt
05ba9e4c31
add -DDETERMINISTIC=ON/OFF flag (#912)
* Add -DDETERMINISTIC=ON/OFF flag to CMake
* Use -DDETERMINISTIC=on in GitHub/Azure workflows

Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
2022-02-08 10:57:20 +00:00
Marcin Junczys-Dowmunt
a365bb5ce9 fix server behaviour 2022-02-07 08:09:54 -08:00
Marcin Junczys-Dowmunt
3cf9e83bac resolve conflicts 2022-02-06 12:33:58 -08:00
Marcin Junczys-Dowmunt
8da539e835 merged with master 2022-02-06 12:00:48 -08:00
Roman Grundkiewicz
266b931daa
Update list of contributors (#906) 2022-01-30 20:11:38 +00:00
Roman Grundkiewicz
07c39c7d76
Cherry picked cleaning/refeactoring patches (#905)
Cherry-picked updates from pull request #457

Co-authored-by: Mateusz Chudyk <mateuszchudyk@gmail.com>
2022-01-28 14:16:41 +00:00
Qianqian Zhu
71b5454b9e
Layer documentation (#892)
* More examples for MLP layers and docs about RNN layers
* Docs about embedding layer and more doxygen code docs
* Add layer and factors docs into index.rst
* Update layer documentation
* Fix typos

Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
Co-authored-by: Graeme Nail <graemenail.work@gmail.com>
2022-01-26 15:17:38 +00:00
Graeme Nail
894a07ad5b
Improve checks on transformer cache (#881)
* Fix caching in transformer attention
* Move hash specialization
* Swap comments to doxygen
* Include string header
2022-01-24 15:28:13 +00:00
Graeme Nail
b29cc07a95
Scorer model loading (#860)
* Add MMAP as an option
* Use io::isBin
* Allow getYamlFromModel from an Item vector
* ScorerWrapper can now load on to a graph from Item vector
The interface IEncoderDecoder can now call graph loads directly from an
Item Vector.
* Translator loads model before creating scorers
Scorers are created from an Item vector
* Replace model-config try-catch with check using IsNull
* Prefer empty vs size
* load by items should be pure virtual
* Stepwise forward load to encdec
* nematus can load from items
* amun can load from items
* loadItems in TranslateService
* Remove logging
* Remove by filename scorer functions
* Replace by filename createScorer
* Explicitly provide default value for get model-mmap
* CLI option for model-mmap only for translation and CPU compile
* Ensure model-mmap option is CPU only
* Remove move on temporary object
* Reinstate log messages for model loading in Amun / Nematus
* Add log messages for model loading in scorers

Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
2022-01-18 12:58:52 +00:00
Qianqian Zhu
cd9afea8d3
Documentation about how to write code documentation (#891)
* add initial guidelines of code documentation
* fix math formula not displayed in Sphinx
* remove @name tags which cannot be extracted by exhale and cause function signature errors
* fix markdown ref warning and update markdown parser in sphinx
* more about doxygen: add Doxygen commands and math formulas
* move code doc guide to a new .rst file
* add formula image
* Set myst-parser version appropriate for the requested sphinx version
* Update documentation on how to write Doxygen comments
* Add new section to the documentation index
* Sphinx 2.4.4 requires myst-parser 0.14
* complete code doc guide and small fixes on reStructuredText formats
* More about reStructuredText
* Update badges on the documentation frontpage

Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
2021-12-07 15:10:46 +00:00
Marcin Junczys-Dowmunt
e8ea37cd5b Merged PR 21648: Allow for dynamic gradient scaling to fade out after N updates
Allow for dynamic gradient scaling to fade out after N updates
2021-12-06 23:20:44 +00:00
Marcin Junczys-Dowmunt
8b8d1b11e2 Merged PR 21553: Parallelize data reading for training
This parallelizes data reading. On very fast GPUs and with small models training speed can be starved by too slow batch creation. Use --data-threads 8 or more, by default currently set to 1 for backcompat.
2021-11-25 02:33:49 +00:00
Nikolay Bogoychev
ab6b826083
Add GCC 11 support (#888)
* Add GCC 11 support

Some C++ Standard Library headers have been changed to no longer include other headers that they do need to depend on. As such, C++ programs that used standard library components without including the right headers will no longer compile.
The following headers are used less widely in libstdc++ and may need to be included explicitly when compiled with GCC 11:

<limits> (for std::numeric_limits)
<memory> (for std::unique_ptr, std::shared_ptr etc.)
<utility> (for std::pair, std::tuple_size, std::index_sequence etc.)
<thread> (for members of namespace std::this_thread.)

Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
2021-11-23 10:13:29 +00:00
Nikolay Bogoychev
1adf80b7c9
Task alias validation during training mode (#886)
* Attempt to validate task alias
* Validate allowed options for --task alias
* Update comment in aliases.cpp
* Show allowed values for alias

Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
2021-11-22 19:19:58 +00:00
David Meikle
3b4e943cda
Added pragma to ignore unused-private-field error on elementType_ which failed in macOS (#872)
Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
2021-11-22 12:22:06 +00:00
Marcin Junczys-Dowmunt
c85d060848 Merged PR 20729: Add top-k sampling
This adds Top-K sampling to Marian and extends the --output-sampling option to take arguments
2021-11-22 03:32:54 +00:00
Marcin Junczys-Dowmunt
1404201926 Merged PR 21151: Cleaning up fp16 behavior
This PR improves clipping and pruning behavior of NaNs and Infs during fp16 training, ultimately avoiding the underflow problems that we were facing so far.
2021-10-26 20:25:39 +00:00
Hieu Hoang
2d79ad02bb Merged PR 20933: beam & batch works for n on-factored models 2021-10-13 20:20:14 +00:00
Marcin Junczys-Dowmunt
03fe175876 Merged PR 20879: Adjustable ffn width and depth in transformer decoder 2021-09-28 17:19:07 +00:00
Marcin Junczys-Dowmunt
d796a3c3b7 Merged PR 20839: Do not ignore ignoreEOS for spm decoding
With final space this eliminates trailing whitespace caused by appending EOS
2021-09-28 17:17:12 +00:00