Roman Grundkiewicz
b6581c4c44
Merged PR 26667: Update examples submodule to fix vulnerability issues
...
Updating examples submodule using [protobuf 3.20.2](https://github.com/marian-nmt/marian-examples/pull/29 ) to fix recent [vulnerability issues](https://machinetranslation.visualstudio.com/MachineTranslation/_componentGovernance/mtmain/alert/8035094?typeId=14698327&pipelinesTrackingFilter=0 ).
Related work items: #134319
2022-11-23 19:16:44 +00:00
Roman Grundkiewicz
c79dc80a2f
Merged PR 26617: Update regression-tests & fix CI pipelines
...
Update regression-tests & fix CI pipelines
2022-11-20 13:31:10 +00:00
Roman Grundkiewicz
be1ee3fa94
Merged PR 26318: Fix incorrect envvar name in Azure Pipeline
...
Fix incorrect environment variable name for SAS token in Windows tests
2022-11-01 10:07:40 +00:00
Roman Grundkiewicz
a6de1b781c
Merged PR 26271: Update CI pipeline triggers
...
Updates to the CI triggers:
- Stop running parallel CI runs, i.e. if a pipeline is running, it must finish before new runs are started.
- Exclude paths to files, which are not related to/critical the codebase
- Downloading MKL from a mirror hosting server
2022-11-01 06:26:56 +00:00
Marcin Junczys-Dowmunt
4d3702c4ec
Merged PR 25950: Add missing defaults for concatenated factors
...
This PR adds missing default values for concatenated factors.
2022-10-06 05:53:16 +00:00
Marcin Junczys-Dowmunt
1e92cff93d
Merged PR 25919: Sync with public master - no review required
...
Sync with public master, checking compilation, regression tests etc.
2022-10-04 00:42:52 +00:00
Marcin Junczys-Dowmunt
2c55cdb3c0
Merged PR 25889: Fixes bad memory access problem in hashing
...
Fix bad memory access problem in hashing by using the graph allocator
2022-09-29 19:01:49 +00:00
Marcin Junczys-Dowmunt
2cd3055d76
Merged PR 25836: Check via hashing if re-syncing in local mode is required
...
* This adds GPU-side hashing to tensors (a hash based on mumurhash3)
* The hash is used to check if parameters across nodes have diverged, if yes, resync all parameters and optimizer shards. Before it would resync every N (100 or 200) updates. Now this can be skipped if nothing diverged.
2022-09-27 18:40:53 +00:00
Marcin Junczys-Dowmunt
1f2929d528
Merged PR 25733: Fused inplace ReLU and Dropout in transformer FFN layer
...
* First attempt at fused inplace ReLU and Dropout in transformer FFN layer
* Adds optional output projection to SSRU.
For large FFN blocks and dropout about 20-25% speed improvement during training.
2022-09-26 20:17:33 +00:00
Marcin Junczys-Dowmunt
cfc33f5498
only use tcmalloc_minimal
2022-09-22 15:11:33 -07:00
Marcin Junczys-Dowmunt
7d2045a907
Merged PR 25686: Loading checkpoints from main node only via MPI
...
Enables loading of model checkpoints from main node only via MPI.
Until now the checkpoint needed to present in the same location on all nodes. That could be done either via writing to a shared filesystem (problematic due to bad syncing) or by manual copying to the same local location, e.g. /tmp on each node (while writing only happened to one main location).
Now, marian can resume training from only one location on the main node. The remaining nodes do not need to have access. E.g. local /tmp on the main node can be used, or race conditons on shared storage are avoided.
Also avoids creating files for logging on more than one node. This is a bit wonky, done via environment variable lookup.
2022-09-21 20:39:54 +00:00
Marcin Junczys-Dowmunt
76964791ad
Merged PR 23767: More principled sampling and force-decoding
...
This PR adds correct force-decoding and more principled sampling, both should now work for ensembles, batches and with beam search.
2022-09-16 22:53:08 +00:00
Roman Grundkiewicz
e13053a6f2
Merged PR 25698: Install Python 3.8 on GPU pool
...
Python >= 3.8 is required for numpy >= 1.22, which is the minimum version without vulnerability issues.
2022-09-16 09:30:10 +00:00
Roman Grundkiewicz
6f7766f837
Merged PR 25465: Choose top checkpoints from train.log for averaging
...
Added `--from-log logfile N metric asc|desc` option to `average.py`, which selects top N checkpoint paths from the provided train.log file according to the selected metric. Last 3 arguments to this option are optional. If the last argument is omitted, "asc" is assumed for perplexity and "desc" for other metrics.
2022-09-15 06:19:18 +00:00
Roman Grundkiewicz
a47912d9f1
Merged PR 25518: Upgrade Azure Pipelines to macos-12
...
macos-10.15 will become unsupported in December 2022. Changes:
* Upgrade Azure DevOps to macos-12
* Pull https://github.com/marian-nmt/sentencepiece/pull/14
* Fix clang 13 errors as in https://github.com/marian-nmt/marian-dev/pull/939
2022-09-15 06:18:42 +00:00
Roman Grundkiewicz
5d466bc367
Merged PR 25507: Upgrade Azure Pipelines to ubuntu-20.04
...
Ubuntu-18.04 will not be supported after October 2022.
2022-09-02 05:55:20 +00:00
Alex Muzio
a90950ea25
Merged PR 25154: Add model shapes flag to model_info.py script
...
Add model shapes flag to model_info.py script through `--matrix-shapes` flag
This will print something like:
```
...
encoder_l6_ffn_W1 (1024, 4096)
encoder_l6_ffn_W2 (4096, 1024)
encoder_l6_ffn_b1 (1, 4096)
encoder_l6_ffn_b2 (1, 1024)
encoder_l6_ffn_ffn_ln_bias (1, 1024)
encoder_l6_ffn_ffn_ln_scale (1, 1024)
encoder_l6_self_Wk (1024, 1024)
encoder_l6_self_Wo (1024, 1024)
encoder_l6_self_Wo_ln_bias (1, 1024)
encoder_l6_self_Wo_ln_scale (1, 1024)
encoder_l6_self_Wq (1024, 1024)
encoder_l6_self_Wv (1024, 1024)
encoder_l6_self_bk (1, 1024)
encoder_l6_self_bo (1, 1024)
encoder_l6_self_bq (1, 1024)
encoder_l6_self_bv (1, 1024)
special:model.yml (1264,)
```
2022-08-10 22:23:47 +00:00
Roman Grundkiewicz
c5081df93f
Merged PR 24111: Remove external reference to Docker images
...
The reference to docker.io triggers a security warning (https://eng.ms/docs/more/containers-secure-supply-chain ) making our pipelines flashing orange, which cover the real status of regression testing. This PR simply replaced the external reference to an internal mirror (https://eng.ms/docs/more/containers-secure-supply-chain/approved-images ).
2022-05-31 15:31:39 +00:00
Marcin Junczys-Dowmunt
042ed8f2e2
Merged PR 24072: Revert changes to transformer caching
...
This PR reverts changes to transformer caching (public PR https://github.com/marian-nmt/marian-dev/pull/881 )
It seems to cause catastrophic memory leaks or incorrect de-allocation during decoding.
2022-05-30 07:27:15 +00:00
Marcin Junczys-Dowmunt
f3e1efe731
merge with internal master
2022-05-26 06:28:06 -07:00
Graeme Nail
95720ae19f
Update NVIDIA CUDA signing key for CI; fix for building docs ( #932 )
...
* Update NVIDIA CUDA signing key for CI
* Constrain Jinja2 to build docs
2022-05-18 11:11:28 +01:00
Roman Grundkiewicz
704a323142
Merged PR 22799: Running regression tests on Azure Pipelines
...
This PR adds an Azure Pipeline for running regression tests on an Azure Hosted GPU Pool.
It currently run on Ubuntu 18.04, GCC 8, CUDA 11.1, a single Nvidia M60 GPU device (Maxwell).
The pipeline needs to be started manually: go to "Pipelines", then "Marian GPU Pool", click "Run pipeline", select the branch, click "Run".
2022-05-13 07:30:36 +00:00
Roman Grundkiewicz
e0e3287a3b
Merged PR 23840: Update CUDA installation script for Ubuntu
...
Updates CUDA deb/key fetching
https://developer.nvidia.com/blog/updating-the-cuda-linux-gpg-repository-key/
2022-05-12 16:23:58 +00:00
Marcin Junczys-Dowmunt
e4f3d0f740
add fallback option for sampling, for back-compat
2022-05-09 13:28:28 -07:00
Marcin Junczys-Dowmunt
1a74358277
Merged PR 23429: Small fixes around fp16 training and batch fitting
...
This PR introduces small fixes around fp16 training and batch fitting:
* Multi-loss casts type to first loss-type before accumulation (aborted before due to missing cast)
* Throw `ShapeSizeException` if total expanded shape size exceeds numeric capacity of the maximum int value (2^31-1)
* During mini-batch-fitting, catch `ShapeSizeException` and use another sizing hint. Aborts outside mini-batch-fitting.
* Negative `--workspace -N` value allocates workspace as total available GPU memory minus N megabytes.
2022-04-11 20:19:58 +00:00
Roman Grundkiewicz
1e4e1014ed
Merged PR 23415: Set Windows image back to windows-2019
...
This should resolve latest issues with Windows checks.
2022-04-08 17:15:56 +00:00
Marcin Junczys-Dowmunt
d5c7372a67
Merged PR 23407: Fix incorrect/missing gradient accumulation for affine biases
...
This PR fixes incorrect/missing gradient accumulation with delay > 1 or large effective batch size of biases of affine operations.
2022-04-08 16:00:04 +00:00
Artur Nowakowski
23c36ec1a3
Fixed fp16 training/inference with factors-combine concat ( #926 )
2022-03-22 10:07:41 +00:00
dependabot[bot]
78bef7aeba
Bump src/3rd_party/sentencepiece from c307b87
to 5312a30
( #927 )
...
Bumps [src/3rd_party/sentencepiece](https://github.com/marian-nmt/sentencepiece ) from `c307b87` to `5312a30`.
- [Release notes](https://github.com/marian-nmt/sentencepiece/releases )
- [Commits](c307b874de...5312a306c4
)
---
updated-dependencies:
- dependency-name: src/3rd_party/sentencepiece
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-22 10:06:11 +00:00
dependabot[bot]
75a7a1dfd2
Bump regression-tests from 88e6382
to 4fa9ff5
( #929 )
...
Bumps [regression-tests](https://github.com/marian-nmt/marian-regression-tests ) from `88e6382` to `4fa9ff5`.
- [Release notes](https://github.com/marian-nmt/marian-regression-tests/releases )
- [Commits](88e6382241...4fa9ff55af
)
---
updated-dependencies:
- dependency-name: regression-tests
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-22 08:40:11 +00:00
dependabot[bot]
c809843f14
Bump examples from 6d5921c
to 29f4f7c
( #928 )
...
Bumps [examples](https://github.com/marian-nmt/marian-examples ) from `6d5921c` to `29f4f7c`.
- [Release notes](https://github.com/marian-nmt/marian-examples/releases )
- [Commits](6d5921cc7d...29f4f7c380
)
---
updated-dependencies:
- dependency-name: examples
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-22 08:38:30 +00:00
Marcin Junczys-Dowmunt
16bfa0c913
Merged PR 23094: Adapt --cost-scaling to more stable setting
...
This PR sets default parameters for cost-scaling to 8.f 10000 1.f 8.f, i.e. when scaling scale by 8 and do not try to automatically scale up or down. This seems most stable than variable cost-scaling with larger numbers that was the default before.
2022-03-16 14:44:17 +00:00
Marcin Junczys-Dowmunt
310d2f42f6
Merged PR 22939: Fix case augmentation with multi-threaded reading
...
This PR fixes case augmentation with multi-threaded reading. The solution is to not look at iterator::pos_ in lazy processing, rather pass it as an argument to the lazy function.
2022-03-07 16:57:32 +00:00
Marcin Junczys-Dowmunt
adaaf087e4
better error message
2022-02-16 13:20:48 -08:00
Graeme Nail
601c9ac980
Detect fortran_order in npz ( #911 )
...
* Fix fortran_order parsing
* Abort on non row-major NPZ entries
* Update CHANGELOG
* Update VERSION
Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
2022-02-15 13:22:49 +00:00
dependabot[bot]
58c4576e5d
Bump regression-tests from da95717
to 88e6382
( #923 )
...
Bumps [regression-tests](https://github.com/marian-nmt/marian-regression-tests ) from `da95717` to `88e6382`.
- [Release notes](https://github.com/marian-nmt/marian-regression-tests/releases )
- [Commits](da95717d41...88e6382241
)
---
updated-dependencies:
- dependency-name: regression-tests
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-15 11:21:14 +00:00
Nikolay Bogoychev
8a9580b329
update the intgemm version to upstream ( #924 )
...
Some data types got upper cased, that's why there is a larger diff than expected
Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
2022-02-15 11:18:29 +00:00
Marcin Junczys-Dowmunt
b8bf086b10
move regression-tests pointer
2022-02-11 06:04:38 -08:00
Marcin Junczys-Dowmunt
b0275e7754
merge with internal master
2022-02-11 06:03:16 -08:00
Marcin Junczys-Dowmunt
4b51dcbd06
Merged PR 22524: Optimize guided alignment training speed via sparse alignments - part 1
...
This replaces dense alignment storage and training with a sparse representation. Training speed with guided alignment matches now nearly normal training speed, regaining about 25% speed.
This is no. 1 of 2 PRs. The next one will introduce a new guided-alignment training scheme with better alignment accuracy.
2022-02-11 13:50:47 +00:00
Marcin Junczys-Dowmunt
3b21ff39c5
update VERSION and CHANGELOG
2022-02-10 08:35:49 -08:00
Marcin Junczys-Dowmunt
b3feecc82b
Merged PR 22483: Make C++17 the official standard for Marian
...
Make C++17 the official standard for Marian
2022-02-10 16:34:23 +00:00
Marcin Junczys-Dowmunt
e6dbacb310
Merged PR 22490: Faster LSH top-k for CPU
...
This PR replaces the top-k search from FAISS on the CPU with a more specialized version for discrete distances in sub-linear time.
2022-02-10 16:30:21 +00:00
dependabot[bot]
8fd553e582
Bump examples from 6d5921c
to 0ca966e
( #919 )
...
Bumps [examples](https://github.com/marian-nmt/marian-examples ) from `6d5921c` to `0ca966e`.
- [Release notes](https://github.com/marian-nmt/marian-examples/releases )
- [Commits](6d5921cc7d...0ca966eadd
)
---
updated-dependencies:
- dependency-name: examples
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-10 14:03:37 +00:00
Roman Grundkiewicz
17e55f5a7d
Update VERSION
2022-02-10 11:20:47 +00:00
Graeme Nail
4d44627f26
PyYaml safe_load instead of load ( #913 )
...
* pyyaml safe_load instead of load
* Update CHANGELOG
2022-02-10 11:20:27 +00:00
dependabot[bot]
a492bc57d2
Bump regression-tests from 0716f4e
to f7971b7
( #918 )
...
Bumps [regression-tests](https://github.com/marian-nmt/marian-regression-tests ) from `0716f4e` to `f7971b7`.
- [Release notes](https://github.com/marian-nmt/marian-regression-tests/releases )
- [Commits](0716f4e012...f7971b790a
)
---
updated-dependencies:
- dependency-name: regression-tests
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-10 10:28:04 +00:00
Roman Grundkiewicz
73f1899307
Add dependabot for git submodules ( #916 )
2022-02-10 10:25:08 +00:00
Roman Grundkiewicz
b97645846a
Update release workflow ( #915 )
...
* Add CUDA 11.x to Windows installation script
* Update release.yml workflow
2022-02-09 18:56:56 +00:00
Graeme Nail
bcf29b8cd2
Update acknowledgements ( #914 )
2022-02-09 17:05:48 +00:00