marian/VERSION
Marcin Junczys-Dowmunt d5c7372a67 Merged PR 23407: Fix incorrect/missing gradient accumulation for affine biases
This PR fixes incorrect/missing gradient accumulation with delay > 1 or large effective batch size of biases of affine operations.
2022-04-08 16:00:04 +00:00

2 lines
8 B
Plaintext