Commit Graph

143 Commits

Author SHA1 Message Date
Marcin Junczys-Dowmunt
65bf82ffce
version 1.12.0 (#980) 2023-02-21 17:56:29 +00:00
Roman Grundkiewicz
ee50d4aaea Merged PR 27051: Add an option for completely resetting validation metrics
Added `--valid-reset-all` that works as `--valid-reset-stalled` but it also resets last best saved validation metrics, which is useful for when the validation sets change for continued training.

Added new regression test: https://github.com/marian-nmt/marian-regression-tests/pull/89
2022-12-20 17:56:10 +00:00
Marcin Junczys-Dowmunt
4d3702c4ec Merged PR 25950: Add missing defaults for concatenated factors
This PR adds missing default values for concatenated factors.
2022-10-06 05:53:16 +00:00
Marcin Junczys-Dowmunt
1e92cff93d Merged PR 25919: Sync with public master - no review required
Sync with public master, checking compilation, regression tests etc.
2022-10-04 00:42:52 +00:00
Marcin Junczys-Dowmunt
2c55cdb3c0 Merged PR 25889: Fixes bad memory access problem in hashing
Fix bad memory access problem in hashing by using the graph allocator
2022-09-29 19:01:49 +00:00
Marcin Junczys-Dowmunt
2cd3055d76 Merged PR 25836: Check via hashing if re-syncing in local mode is required
* This adds GPU-side hashing to tensors (a hash based on mumurhash3)
* The hash is used to check if parameters across nodes have diverged, if yes, resync all parameters and optimizer shards. Before it would resync every N (100 or 200) updates. Now this can be skipped if nothing diverged.
2022-09-27 18:40:53 +00:00
Marcin Junczys-Dowmunt
7d2045a907 Merged PR 25686: Loading checkpoints from main node only via MPI
Enables loading of model checkpoints from main node only via MPI.

Until now the checkpoint needed to present in the same location on all nodes. That could be done either via writing to a shared filesystem (problematic due to bad syncing) or by manual copying to the same local location, e.g. /tmp on each node (while writing only happened to one main location).

Now, marian can resume training from only one location on the main node. The remaining nodes do not need to have access. E.g. local /tmp on the main node can be used, or race conditons on shared storage are avoided.

Also avoids creating files for logging on more than one node. This is a bit wonky, done via environment variable lookup.
2022-09-21 20:39:54 +00:00
Marcin Junczys-Dowmunt
1a74358277 Merged PR 23429: Small fixes around fp16 training and batch fitting
This PR introduces small fixes around fp16 training and batch fitting:
* Multi-loss casts type to first loss-type before accumulation (aborted before due to missing cast)
* Throw `ShapeSizeException` if total expanded shape size exceeds numeric capacity of the maximum int value (2^31-1)
* During mini-batch-fitting, catch `ShapeSizeException` and use another sizing hint. Aborts outside mini-batch-fitting.
* Negative `--workspace -N` value allocates workspace as total available GPU memory minus N megabytes.
2022-04-11 20:19:58 +00:00
Marcin Junczys-Dowmunt
d5c7372a67 Merged PR 23407: Fix incorrect/missing gradient accumulation for affine biases
This PR fixes incorrect/missing gradient accumulation with delay > 1 or large effective batch size of biases of affine operations.
2022-04-08 16:00:04 +00:00
Marcin Junczys-Dowmunt
16bfa0c913 Merged PR 23094: Adapt --cost-scaling to more stable setting
This PR sets default parameters for cost-scaling to 8.f 10000 1.f 8.f, i.e. when scaling scale by 8 and do not try to automatically scale up or down. This seems most stable than variable cost-scaling with larger numbers that was the default before.
2022-03-16 14:44:17 +00:00
Marcin Junczys-Dowmunt
310d2f42f6 Merged PR 22939: Fix case augmentation with multi-threaded reading
This PR fixes case augmentation with multi-threaded reading. The solution is to not look at iterator::pos_ in lazy processing, rather pass it as an argument to the lazy function.
2022-03-07 16:57:32 +00:00
Marcin Junczys-Dowmunt
4b51dcbd06 Merged PR 22524: Optimize guided alignment training speed via sparse alignments - part 1
This replaces dense alignment storage and training with a sparse representation. Training speed with guided alignment matches now nearly normal training speed, regaining about 25% speed.

This is no. 1 of 2 PRs. The next one will introduce a new guided-alignment training scheme with better alignment accuracy.
2022-02-11 13:50:47 +00:00
Marcin Junczys-Dowmunt
3b21ff39c5 update VERSION and CHANGELOG 2022-02-10 08:35:49 -08:00
Marcin Junczys-Dowmunt
b3feecc82b Merged PR 22483: Make C++17 the official standard for Marian
Make C++17 the official standard for Marian
2022-02-10 16:34:23 +00:00
Marcin Junczys-Dowmunt
f00d062189 update VERSION and CHANGELOG - Release 1.11.0 2022-02-08 08:40:33 -08:00
Marcin Junczys-Dowmunt
3cf9e83bac resolve conflicts 2022-02-06 12:33:58 -08:00
Roman Grundkiewicz
3b458b044e
Update VERSION 2022-01-24 15:28:37 +00:00
Roman Grundkiewicz
b64e258bda
Update VERSION 2022-01-18 12:59:37 +00:00
Roman Grundkiewicz
c84599d08a
Update VERSION 2021-12-16 15:07:55 +00:00
Marcin Junczys-Dowmunt
bbc673c50f update CHANGELOG and VERSION 2021-11-24 18:42:14 -08:00
Nikolay Bogoychev
ab6b826083
Add GCC 11 support (#888)
* Add GCC 11 support

Some C++ Standard Library headers have been changed to no longer include other headers that they do need to depend on. As such, C++ programs that used standard library components without including the right headers will no longer compile.
The following headers are used less widely in libstdc++ and may need to be included explicitly when compiled with GCC 11:

<limits> (for std::numeric_limits)
<memory> (for std::unique_ptr, std::shared_ptr etc.)
<utility> (for std::pair, std::tuple_size, std::index_sequence etc.)
<thread> (for members of namespace std::this_thread.)

Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
2021-11-23 10:13:29 +00:00
Martin Junczys-Dowmunt
8e88071ae8 Merged PR 19842: Adapt LSH to work with Leaf
Small changes to make the LSH work with Leaf server and QuickSand.
2021-07-16 20:04:16 +00:00
Marcin Junczys-Dowmunt
3a478fc47d update version and changelog 2021-07-09 13:46:18 -07:00
Martin Junczys-Dowmunt
fc0f41f24a Merged PR 19597: Enable mpi wrapper to use size larger than MAX_INT
Enable mpi wrapper to use size larger than MAX_INT.
2021-06-28 23:15:23 +00:00
Roman Grundkiewicz
fe74576dc3
Update VERSION 2021-05-04 12:36:37 +01:00
Marcin Junczys-Dowmunt
1c8ee95a54 update version 2021-04-21 05:14:36 +00:00
Marcin Junczys-Dowmunt
8a53b761d5 update version 2021-04-11 04:30:35 +00:00
Martin Junczys-Dowmunt
caddad90cd Merged PR 18505: RMSNorm on GPU
Support for RMSNorm as drop-in replace for LayerNorm from _Biao Zhang; Rico Sennrich (2019). Root Mean Square Layer Normalization_. Enabled in Transformer model via `--transformer-postprocess dar` instead of `dan`.
2021-04-10 15:28:38 +00:00
Marcin Junczys-Dowmunt
fdf9fe7d4a
Update VERSION 2021-04-09 09:03:39 -07:00
Marcin Junczys-Dowmunt
a17ee300f4
Create VERSION 2021-04-08 21:48:01 -07:00
Marcin Junczys-Dowmunt
bfa6180033 Revert "remove TC_MALLOC from optional dependencies (#840)"
This reverts commit 096c48e51c.
2021-04-08 07:30:38 +00:00
Martin Junczys-Dowmunt
7d1f941242 Merged PR 18309: Cleaner suppression of unwanted output words
This PR adds cleaner suppression of unwanted output words. We identified a situation where SPM with byte-fallback can generate random bytes with output-sampling.

That is particularly harmful when that random bytes happens to be a newline symbol. Here we suppress newline in output unless explicitly wanted.
2021-03-26 16:17:12 +00:00
Nikolay Bogoychev
ffd997e360
Properly copy the entire vector in the int16_t case (#845)
Fixes #842 #843 #844
2021-03-23 14:32:01 -07:00
Young Jin Kim
b36d0bbbab
Fix FBGEMM build with gcc 9.3+ (#836) 2021-03-22 11:13:40 -07:00
Marcin Junczys-Dowmunt
0394d2cdbe
Display decoder speed statistics with --stat-freq N (#841)
Display decoder time statistics if requested
2021-03-22 08:58:04 -07:00
Marcin Junczys-Dowmunt
096c48e51c
remove TC_MALLOC from optional dependencies (#840)
There seems to be no benefit from TC_MALLOC any more, hence removing.
2021-03-22 08:02:04 -07:00
Roman Grundkiewicz
c89efbe919
Update VERSION 2021-03-19 15:56:37 +00:00
Roman Grundkiewicz
c724837ab3
Update VERSION 2021-03-19 13:20:31 +00:00
Marcin Junczys-Dowmunt
272096c1d1 sync public and internal master 2021-03-18 03:41:24 +00:00
Marcin Junczys-Dowmunt
8f73923d31 increase version and update changelog 2021-03-18 03:34:44 +00:00
Roman Grundkiewicz
bb92b817dd
Update VERSION 2021-03-12 11:58:53 +00:00
Roman Grundkiewicz
db771e09bd
Update VERSION 2021-03-03 10:21:06 +00:00
Roman Grundkiewicz
d92b74f67a
Update simple websocket server (#823)
* Update simple-websocket-server submodule
* Update VERSION
2021-03-02 17:48:19 +00:00
Roman Grundkiewicz
6810afae36
Update VERSION 2021-03-02 08:42:55 +00:00
Roman Grundkiewicz
8155d232db Update CHANGELOG and VERSION 2021-02-28 09:08:50 +00:00
Roman Grundkiewicz
6627134064
Update VERSION 2021-02-22 13:01:38 +00:00
Marcin Junczys-Dowmunt
6f6d484665 increase version to 1.10.0 2021-02-06 15:35:16 -08:00
Roman Grundkiewicz
024de9a4ad
Update VERSION 2021-01-25 14:39:51 +00:00
Marcin Junczys-Dowmunt
18fd50df85 Bump up version 2021-01-24 16:03:49 -08:00
Roman Grundkiewicz
c1c4af08a9
Bump version 2021-01-07 10:42:24 +00:00