Avoid Ampere misaligment issue

2024-09-17 09:47:34 +03:00 · 2021-05-17 13:25:13 -07:00 · 2021-05-17 13:25:13 -07:00 · 8b818b7c07
commit 8b818b7c07
parent 49e379bba5
3 changed files with 3 additions and 2 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -41,6 +41,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
 - Broken links to MNIST data sets

 ### Changed
+- Set REQUIRED_BIAS_ALIGNMENT = 16 in tensors/gpu/prod.cpp to avoid memory-misalignment on certain Ampere GPUs.
 - For BUILD_ARCH != native enable all intrinsics types by default, can be disabled like this: -DCOMPILE_AVX512=off
 - Moved FBGEMM pointer to commit c258054 for gcc 9.3+ fix
 - Change compile options a la -DCOMPILE_CUDA_SM35 to -DCOMPILE_KEPLER, -DCOMPILE_MAXWELL,
--- a/2
+++ b/2
@ -1 +1 @@
-Subproject commit 1afd4eb1014ac451c6a3d6f9b5d34c322902e624
+Subproject commit 7d612ca5e4b27a76f92584dad76d240e34f216d0
--- a/src/tensors/gpu/prod.cpp
+++ b/src/tensors/gpu/prod.cpp
@ -22,7 +22,7 @@ namespace gpu {
 // It seems that the bias must be 8 byte aligned for the cublasLt epilogue to work. Therefore,
 // if the bias pointer is not 8 byte aligned, we do a normal matmul in cublasLt and invoke a 
 // custom epilogue kernel.
-static constexpr int REQUIRED_BIAS_ALIGNMENT = 8;  
+static constexpr int REQUIRED_BIAS_ALIGNMENT = 16; // @TODO: MJD: changed this to 16 to avoid alignment error on A100. Seems to work fine.

 // Used to set preferences for cublasLt to filter out algos if matrices to not meet default 256 byte alignment
 int getAlignmentUpTo256(const void *ptr) {