mirror of
https://github.com/marian-nmt/marian.git
synced 2024-09-17 09:47:34 +03:00
7d2045a907
Enables loading of model checkpoints from main node only via MPI. Until now the checkpoint needed to present in the same location on all nodes. That could be done either via writing to a shared filesystem (problematic due to bad syncing) or by manual copying to the same local location, e.g. /tmp on each node (while writing only happened to one main location). Now, marian can resume training from only one location on the main node. The remaining nodes do not need to have access. E.g. local /tmp on the main node can be used, or race conditons on shared storage are avoided. Also avoids creating files for logging on more than one node. This is a bit wonky, done via environment variable lookup. |
||
---|---|---|
.. | ||
iris | ||
mnist | ||
CMakeLists.txt | ||
README.md |
Marian examples
Examples are enabled with CMake option -DCOMPILE_EXAMPLES=ON
.
MNIST
You will need MNIST data for training and testing. Download them with the
script src/examples/mnist/download.sh
or provide paths to the files with
--train-sets
and --valid-sets
options.