marian

mirror of https://github.com/marian-nmt/marian.git synced 2024-11-03 20:13:47 +03:00

Author	SHA1	Message	Date
Marcin Junczys-Dowmunt	1a74358277	Merged PR 23429: Small fixes around fp16 training and batch fitting This PR introduces small fixes around fp16 training and batch fitting: * Multi-loss casts type to first loss-type before accumulation (aborted before due to missing cast) * Throw `ShapeSizeException` if total expanded shape size exceeds numeric capacity of the maximum int value (2^31-1) * During mini-batch-fitting, catch `ShapeSizeException` and use another sizing hint. Aborts outside mini-batch-fitting. * Negative `--workspace -N` value allocates workspace as total available GPU memory minus N megabytes.	2022-04-11 20:19:58 +00:00
Roman Grundkiewicz	1e4e1014ed	Merged PR 23415: Set Windows image back to windows-2019 This should resolve latest issues with Windows checks.	2022-04-08 17:15:56 +00:00
Marcin Junczys-Dowmunt	d5c7372a67	Merged PR 23407: Fix incorrect/missing gradient accumulation for affine biases This PR fixes incorrect/missing gradient accumulation with delay > 1 or large effective batch size of biases of affine operations.	2022-04-08 16:00:04 +00:00
Marcin Junczys-Dowmunt	16bfa0c913	Merged PR 23094: Adapt --cost-scaling to more stable setting This PR sets default parameters for cost-scaling to 8.f 10000 1.f 8.f, i.e. when scaling scale by 8 and do not try to automatically scale up or down. This seems most stable than variable cost-scaling with larger numbers that was the default before.	2022-03-16 14:44:17 +00:00
Marcin Junczys-Dowmunt	310d2f42f6	Merged PR 22939: Fix case augmentation with multi-threaded reading This PR fixes case augmentation with multi-threaded reading. The solution is to not look at iterator::pos_ in lazy processing, rather pass it as an argument to the lazy function.	2022-03-07 16:57:32 +00:00
Marcin Junczys-Dowmunt	adaaf087e4	better error message	2022-02-16 13:20:48 -08:00
Marcin Junczys-Dowmunt	b8bf086b10	move regression-tests pointer	2022-02-11 06:04:38 -08:00
Marcin Junczys-Dowmunt	b0275e7754	merge with internal master	2022-02-11 06:03:16 -08:00
Marcin Junczys-Dowmunt	4b51dcbd06	Merged PR 22524: Optimize guided alignment training speed via sparse alignments - part 1 This replaces dense alignment storage and training with a sparse representation. Training speed with guided alignment matches now nearly normal training speed, regaining about 25% speed. This is no. 1 of 2 PRs. The next one will introduce a new guided-alignment training scheme with better alignment accuracy.	2022-02-11 13:50:47 +00:00
Marcin Junczys-Dowmunt	3b21ff39c5	update VERSION and CHANGELOG	2022-02-10 08:35:49 -08:00
Marcin Junczys-Dowmunt	b3feecc82b	Merged PR 22483: Make C++17 the official standard for Marian Make C++17 the official standard for Marian	2022-02-10 16:34:23 +00:00
Marcin Junczys-Dowmunt	e6dbacb310	Merged PR 22490: Faster LSH top-k for CPU This PR replaces the top-k search from FAISS on the CPU with a more specialized version for discrete distances in sub-linear time.	2022-02-10 16:30:21 +00:00
dependabot[bot]	8fd553e582	Bump examples from `6d5921c` to `0ca966e` (#919 ) Bumps [examples](https://github.com/marian-nmt/marian-examples) from `6d5921c` to `0ca966e`. - [Release notes](https://github.com/marian-nmt/marian-examples/releases) - [Commits](`6d5921cc7d...0ca966eadd`) --- updated-dependencies: - dependency-name: examples dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-02-10 14:03:37 +00:00
Roman Grundkiewicz	17e55f5a7d	Update VERSION	2022-02-10 11:20:47 +00:00
Graeme Nail	4d44627f26	PyYaml safe_load instead of load (#913 ) * pyyaml safe_load instead of load * Update CHANGELOG	2022-02-10 11:20:27 +00:00
dependabot[bot]	a492bc57d2	Bump regression-tests from `0716f4e` to `f7971b7` (#918 ) Bumps [regression-tests](https://github.com/marian-nmt/marian-regression-tests) from `0716f4e` to `f7971b7`. - [Release notes](https://github.com/marian-nmt/marian-regression-tests/releases) - [Commits](`0716f4e012...f7971b790a`) --- updated-dependencies: - dependency-name: regression-tests dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-02-10 10:28:04 +00:00
Roman Grundkiewicz	73f1899307	Add dependabot for git submodules (#916 )	2022-02-10 10:25:08 +00:00
Roman Grundkiewicz	b97645846a	Update release workflow (#915 ) * Add CUDA 11.x to Windows installation script * Update release.yml workflow	2022-02-09 18:56:56 +00:00
Graeme Nail	bcf29b8cd2	Update acknowledgements (#914 )	2022-02-09 17:05:48 +00:00
Marcin Junczys-Dowmunt	f00d062189	update VERSION and CHANGELOG - Release 1.11.0	2022-02-08 08:40:33 -08:00
Graeme Nail	8e659bb5c0	Document Structure (#910 ) * Add architectural outline * Update index	2022-02-08 10:58:09 +00:00
Marcin Junczys-Dowmunt	05ba9e4c31	add -DDETERMINISTIC=ON/OFF flag (#912 ) * Add -DDETERMINISTIC=ON/OFF flag to CMake * Use -DDETERMINISTIC=on in GitHub/Azure workflows Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>	2022-02-08 10:57:20 +00:00
Marcin Junczys-Dowmunt	a365bb5ce9	fix server behaviour	2022-02-07 08:09:54 -08:00
Marcin Junczys-Dowmunt	aafe8fb5ca	update regression tests pointer	2022-02-07 02:36:20 -08:00
Marcin Junczys-Dowmunt	3cf9e83bac	resolve conflicts	2022-02-06 12:33:58 -08:00
Marcin Junczys-Dowmunt	8da539e835	merged with master	2022-02-06 12:00:48 -08:00
Roman Grundkiewicz	266b931daa	Update list of contributors (#906 )	2022-01-30 20:11:38 +00:00
Roman Grundkiewicz	07c39c7d76	Cherry picked cleaning/refeactoring patches (#905 ) Cherry-picked updates from pull request #457 Co-authored-by: Mateusz Chudyk <mateuszchudyk@gmail.com>	2022-01-28 14:16:41 +00:00
Qianqian Zhu	71b5454b9e	Layer documentation (#892 ) * More examples for MLP layers and docs about RNN layers * Docs about embedding layer and more doxygen code docs * Add layer and factors docs into index.rst * Update layer documentation * Fix typos Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com> Co-authored-by: Graeme Nail <graemenail.work@gmail.com>	2022-01-26 15:17:38 +00:00
Roman Grundkiewicz	3b458b044e	Update VERSION	2022-01-24 15:28:37 +00:00
Graeme Nail	894a07ad5b	Improve checks on transformer cache (#881 ) * Fix caching in transformer attention * Move hash specialization * Swap comments to doxygen * Include string header	2022-01-24 15:28:13 +00:00
Roman Grundkiewicz	b64e258bda	Update VERSION	2022-01-18 12:59:37 +00:00
Graeme Nail	b29cc07a95	Scorer model loading (#860 ) * Add MMAP as an option * Use io::isBin * Allow getYamlFromModel from an Item vector * ScorerWrapper can now load on to a graph from Item vector The interface IEncoderDecoder can now call graph loads directly from an Item Vector. * Translator loads model before creating scorers Scorers are created from an Item vector * Replace model-config try-catch with check using IsNull * Prefer empty vs size * load by items should be pure virtual * Stepwise forward load to encdec * nematus can load from items * amun can load from items * loadItems in TranslateService * Remove logging * Remove by filename scorer functions * Replace by filename createScorer * Explicitly provide default value for get model-mmap * CLI option for model-mmap only for translation and CPU compile * Ensure model-mmap option is CPU only * Remove move on temporary object * Reinstate log messages for model loading in Amun / Nematus * Add log messages for model loading in scorers Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>	2022-01-18 12:58:52 +00:00
Roman Grundkiewicz	c84599d08a	Update VERSION	2021-12-16 15:07:55 +00:00
Nikolay Bogoychev	e26e5b6faf	Use apple accelerate on MacOs by default (#897 )	2021-12-16 15:07:34 +00:00
Nikolay Bogoychev	e8a1a2530f	Fix AVX2+ detection on Mac (#895 ) MacOS is weird and its CPU flags are separated in two separate fields returned by the sysctl interface. To get around this, we need to test both of them, so here goes Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>	2021-12-07 17:47:33 +00:00
Qianqian Zhu	cd9afea8d3	Documentation about how to write code documentation (#891 ) * add initial guidelines of code documentation * fix math formula not displayed in Sphinx * remove @name tags which cannot be extracted by exhale and cause function signature errors * fix markdown ref warning and update markdown parser in sphinx * more about doxygen: add Doxygen commands and math formulas * move code doc guide to a new .rst file * add formula image * Set myst-parser version appropriate for the requested sphinx version * Update documentation on how to write Doxygen comments * Add new section to the documentation index * Sphinx 2.4.4 requires myst-parser 0.14 * complete code doc guide and small fixes on reStructuredText formats * More about reStructuredText * Update badges on the documentation frontpage Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>	2021-12-07 15:10:46 +00:00
Marcin Junczys-Dowmunt	e8ea37cd5b	Merged PR 21648: Allow for dynamic gradient scaling to fade out after N updates Allow for dynamic gradient scaling to fade out after N updates	2021-12-06 23:20:44 +00:00
Graeme Nail	c64cb2990e	Constrain version of mistune to before v2 in GitHub CI Documentation builds (#894 )	2021-12-06 14:06:14 +00:00
Marcin Junczys-Dowmunt	bbc673c50f	update CHANGELOG and VERSION	2021-11-24 18:42:14 -08:00
Marcin Junczys-Dowmunt	8b8d1b11e2	Merged PR 21553: Parallelize data reading for training This parallelizes data reading. On very fast GPUs and with small models training speed can be starved by too slow batch creation. Use --data-threads 8 or more, by default currently set to 1 for backcompat.	2021-11-25 02:33:49 +00:00
Nikolay Bogoychev	ab6b826083	Add GCC 11 support (#888 ) * Add GCC 11 support Some C++ Standard Library headers have been changed to no longer include other headers that they do need to depend on. As such, C++ programs that used standard library components without including the right headers will no longer compile. The following headers are used less widely in libstdc++ and may need to be included explicitly when compiled with GCC 11: <limits> (for std::numeric_limits) <memory> (for std::unique_ptr, std::shared_ptr etc.) <utility> (for std::pair, std::tuple_size, std::index_sequence etc.) <thread> (for members of namespace std::this_thread.) Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>	2021-11-23 10:13:29 +00:00
Nikolay Bogoychev	1adf80b7c9	Task alias validation during training mode (#886 ) * Attempt to validate task alias * Validate allowed options for --task alias * Update comment in aliases.cpp * Show allowed values for alias Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>	2021-11-22 19:19:58 +00:00
Roman Grundkiewicz	3d15cd3d20	Update submodule regression-tests	2021-11-22 06:41:16 -08:00
David Meikle	3b4e943cda	Added pragma to ignore unused-private-field error on elementType_ which failed in macOS (#872 ) Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>	2021-11-22 12:22:06 +00:00
Marcin Junczys-Dowmunt	c85d060848	Merged PR 20729: Add top-k sampling This adds Top-K sampling to Marian and extends the --output-sampling option to take arguments	2021-11-22 03:32:54 +00:00
Roman Grundkiewicz	2bdfbd3f02	Update badges in README.md	2021-11-21 17:06:01 +00:00
Marcin Junczys-Dowmunt	1404201926	Merged PR 21151: Cleaning up fp16 behavior This PR improves clipping and pruning behavior of NaNs and Infs during fp16 training, ultimately avoiding the underflow problems that we were facing so far.	2021-10-26 20:25:39 +00:00
Roman Grundkiewicz	7f06f3c5d2	Merged PR 21166: Keep building on macOS-10.15 Marian does not compile on macOS 11.6, so the build has stopped working due to an upgrade from macOS-10.15 to macOS 11.6 in Azure Pipelines: https://github.com/actions/virtual-environments/issues/4060 This PR explicitly set macOS 10.15 in the workflow.	2021-10-26 11:20:41 +00:00
Hieu Hoang	2d79ad02bb	Merged PR 20933: beam & batch works for n on-factored models	2021-10-13 20:20:14 +00:00

1 2 3 4 5 ...

4874 Commits