mirror of
https://github.com/marian-nmt/marian.git
synced 2024-11-05 01:31:46 +03:00
d5c7372a67
This PR fixes incorrect/missing gradient accumulation with delay > 1 or large effective batch size of biases of affine operations.
2 lines
8 B
Plaintext
2 lines
8 B
Plaintext
v1.11.6
|