Fast Neural Machine Translation in C++
Go to file
Marcin Junczys-Dowmunt 7d2045a907 Merged PR 25686: Loading checkpoints from main node only via MPI
Enables loading of model checkpoints from main node only via MPI.

Until now the checkpoint needed to present in the same location on all nodes. That could be done either via writing to a shared filesystem (problematic due to bad syncing) or by manual copying to the same local location, e.g. /tmp on each node (while writing only happened to one main location).

Now, marian can resume training from only one location on the main node. The remaining nodes do not need to have access. E.g. local /tmp on the main node can be used, or race conditons on shared storage are avoided.

Also avoids creating files for logging on more than one node. This is a bit wonky, done via environment variable lookup.
2022-09-21 20:39:54 +00:00
.github Add dependabot for git submodules (#916) 2022-02-10 10:25:08 +00:00
cmake Fix AVX2+ detection on Mac (#895) 2021-12-07 17:47:33 +00:00
contrib Merged PR 24111: Remove external reference to Docker images 2022-05-31 15:31:39 +00:00
doc Update NVIDIA CUDA signing key for CI; fix for building docs (#932) 2022-05-18 11:11:28 +01:00
examples@29f4f7c380 Bump examples from 6d5921c to 29f4f7c (#928) 2022-03-22 08:38:30 +00:00
regression-tests@4fa9ff55af Bump regression-tests from 88e6382 to 4fa9ff5 (#929) 2022-03-22 08:40:11 +00:00
scripts Merged PR 25465: Choose top checkpoints from train.log for averaging 2022-09-15 06:19:18 +00:00
src Merged PR 25686: Loading checkpoints from main node only via MPI 2022-09-21 20:39:54 +00:00
vs Merged PR 19904: Update instructions for building on Windows 2021-07-22 16:44:35 +00:00
.clang-format Update formatting 2021-03-08 03:09:03 -08:00
.gitattributes revisited fillBatches() and optimized it a little; 2018-10-08 13:29:16 -07:00
.gitignore Add option for printing CMake cached variables (#583) 2020-03-10 10:29:50 -07:00
.gitmodules Integrate intgemm into marian (#595) 2021-01-24 16:02:30 -08:00
azure-pipelines.yml Merged PR 25518: Upgrade Azure Pipelines to macos-12 2022-09-15 06:18:42 +00:00
azure-regression-tests.yml Merged PR 25698: Install Python 3.8 on GPU pool 2022-09-16 09:30:10 +00:00
CHANGELOG.md Merged PR 25686: Loading checkpoints from main node only via MPI 2022-09-21 20:39:54 +00:00
CMakeLists.txt Merged PR 22483: Make C++17 the official standard for Marian 2022-02-10 16:34:23 +00:00
CMakeSettings.json Merged PR 18232: Update VS CMake builds and scripts 2021-03-19 08:27:34 +00:00
CONTRIBUTING.md Add templates for GitHub issues and pull requests 2020-03-16 20:10:18 -07:00
Doxyfile.in Add graph documentations (#788) 2021-02-28 08:07:19 +00:00
LICENSE.md Update LICENSE.md 2017-02-27 01:16:42 +00:00
README.md Update acknowledgements (#914) 2022-02-09 17:05:48 +00:00
VERSION Merged PR 25686: Loading checkpoints from main node only via MPI 2022-09-21 20:39:54 +00:00

Marian

Build Status CUDA 10 Build Status CUDA 11 Build Status CPU Tests Status Latest release License: MIT Twitter

Marian is an efficient Neural Machine Translation framework written in pure C++ with minimal dependencies.

Named in honour of Marian Rejewski, a Polish mathematician and cryptologist.

Main features:

  • Efficient pure C++ implementation
  • Fast multi-GPU training and GPU/CPU translation
  • State-of-the-art NMT architectures: deep RNN and transformer
  • Permissive open source license (MIT)
  • more detail...

If you use this, please cite:

Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Dwojak, Hieu Hoang, Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri Aji, Nikolay Bogoychev, André F. T. Martins, Alexandra Birch (2018). Marian: Fast Neural Machine Translation in C++ (http://www.aclweb.org/anthology/P18-4020)

@InProceedings{mariannmt,
    title     = {Marian: Fast Neural Machine Translation in {C++}},
    author    = {Junczys-Dowmunt, Marcin and Grundkiewicz, Roman and
                 Dwojak, Tomasz and Hoang, Hieu and Heafield, Kenneth and
                 Neckermann, Tom and Seide, Frank and Germann, Ulrich and
                 Fikri Aji, Alham and Bogoychev, Nikolay and
                 Martins, Andr\'{e} F. T. and Birch, Alexandra},
    booktitle = {Proceedings of ACL 2018, System Demonstrations},
    pages     = {116--121},
    publisher = {Association for Computational Linguistics},
    year      = {2018},
    month     = {July},
    address   = {Melbourne, Australia},
    url       = {http://www.aclweb.org/anthology/P18-4020}
}

Amun

The handwritten decoder for RNN models compatible with Marian and Nematus has been superseded by the Marian decoder. The code is available in a separate repository: https://github.com/marian-nmt/amun

Website

More information on https://marian-nmt.github.io

Acknowledgements

The development of Marian received funding from the European Union's Horizon 2020 Research and Innovation Programme under grant agreements 688139 (SUMMA; 2016-2019), 645487 (Modern MT; 2015-2017), 644333 (TraMOOC; 2015-2017), 644402 (HiML; 2015-2017), 825303 (Bergamot; 2019-2021), the European Union's Connecting Europe Facility project 2019-EU-IA-0045 (User-focused Marian; 2020-2022), the Amazon Academic Research Awards program, the World Intellectual Property Organization, and is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via contract #FA8650-17-C-9117.

This software contains source code provided by NVIDIA Corporation.