Avoid Ampere misaligment issue

This commit is contained in:
Marcin Junczys-Dowmunt 2021-05-17 13:25:13 -07:00
parent 49e379bba5
commit 8b818b7c07
3 changed files with 3 additions and 2 deletions

View File

@ -41,6 +41,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- Broken links to MNIST data sets
### Changed
- Set REQUIRED_BIAS_ALIGNMENT = 16 in tensors/gpu/prod.cpp to avoid memory-misalignment on certain Ampere GPUs.
- For BUILD_ARCH != native enable all intrinsics types by default, can be disabled like this: -DCOMPILE_AVX512=off
- Moved FBGEMM pointer to commit c258054 for gcc 9.3+ fix
- Change compile options a la -DCOMPILE_CUDA_SM35 to -DCOMPILE_KEPLER, -DCOMPILE_MAXWELL,

@ -1 +1 @@
Subproject commit 1afd4eb1014ac451c6a3d6f9b5d34c322902e624
Subproject commit 7d612ca5e4b27a76f92584dad76d240e34f216d0

View File

@ -22,7 +22,7 @@ namespace gpu {
// It seems that the bias must be 8 byte aligned for the cublasLt epilogue to work. Therefore,
// if the bias pointer is not 8 byte aligned, we do a normal matmul in cublasLt and invoke a
// custom epilogue kernel.
static constexpr int REQUIRED_BIAS_ALIGNMENT = 8;
static constexpr int REQUIRED_BIAS_ALIGNMENT = 16; // @TODO: MJD: changed this to 16 to avoid alignment error on A100. Seems to work fine.
// Used to set preferences for cublasLt to filter out algos if matrices to not meet default 256 byte alignment
int getAlignmentUpTo256(const void *ptr) {