The changes proposed in this pull request:
* Added regression testing with internal models into Azure Pipelines on both Windows and Ubuntu
* Created https://machinetranslation.visualstudio.com/Marian/_git/marian-prod-tests (more tests will be added over time)
* Made regression test outputs (all `.log`, `.out`, `.diff` files) available for inspection as a downloadable artifact.
* Made `--build-info` option available in CMake-based Windows builds
Warning: I tried to handle multiple cases, but some regression tests may occasionally fail, especially tests using avx2 or avx512 models, because the outputs are system/CPU dependent. I think it's better to merge this already, monitoring the stability of tests, and adding expected outputs variations if necessary, improving the coverage and stability of regression tests over time.
* allow to choose fine-grained CPU intrinsics on as CMake options
* inform user that e.g. -DCOMPILE_AVX2=off will be ignored with -march=native if there is compiler support
Installing Boost manually in all workflows, because it has been recently removed from Azure/GitHub hosted runners. This should fix recent failures of Marian CI builds.
This PR refactors the training graph groups and optimizers to enable and simplify things for fp16 support.
Deprecates old unused graph groups and fixes a couple of MPI issues.
Adds intgemm as a module for Marian. Intgemm is @kpu 's 8/16 bit gemm library with support for architectures from SSE2 to AVX512VNNI
Removes outdated integer code, related to the --optimize option
Co-authored-by: Kenneth Heafield <github@kheafield.com>
Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
Co-authored-by: Ulrich Germann <ugermann@inf.ed.ac.uk>
Co-authored-by: Marcin Junczys-Dowmunt <marcinjd@microsoft.com>
Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
* support for Apple Accelerate
* add a CMake flag to use Apple Accelerate as the BLAS library.
* rename USE_ACCELERATE to USE_APPLE_ACCELERATE
* add comment with more info on Accelerate
* link to the Apple documentation on Accelerate.
- Updates sentencepiece to the newest version (removes dependency on protobuf)
- Enable SentencePiece compilation by default since there is no dependency in protobuf anymore.
- Add installation targets (enabled by GENERATE_MARIAN_INSTALL_TARGETS; default: OFF to preserve CMake 3.5.1 compatibility)
- Add COMPILE_LIBRARY_ONLY option (default: OFF) to exclude in-source executables from the build
- Compiler warning flags are no longer exported as part of the public link interface, only when building privately
- Always set CPUINFO_BUILD_TOOLS=OFF when building fbgemm, not just for MSVC builds
Related work items: #108034
This PR updates Windows build via CMake and build instructions. With https://github.com/marian-nmt/marian-dev/pull/676, this should be fully workable, including CUDA, FBGEMM, SentencePiece, unit tests, marian-server.
List of changes:
- Fixing compilation of marian-server on Windows via CMake
- Updating vs/CheckDeps.bat
- zlib no longer needs to be installed as it is included in 3rd_party
- Installing Boost 1.72 since newer is not supported
- Installing minimal required Boost components in CheckDeps.bat
- Installing protobuf in CheckDeps.bat
- Updating CMakeSettings.json
- Updating vs/README.md
- Development notes extracted to vs/NOTES.md
I did not update and test with CUDA, because I do not have a machine for that, but AFAIK it works properly.
This reimplements the LASER encoder from:
```
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
Mikel Artetxe, Holger Schwenk
https://arxiv.org/abs/1812.10464
```
and adds functionality to embed sentences with any Marian encoder, also different from LASER. Some early attempts to train a transformer model with Encoder-Decoder bottle-neck. This is quite early code, so some code-duplication is to be expected. Nevertheless, it's functional and I would like to have it in master as we will slowly put that into production in various places. I will make the code "nicer" as we go along.
* adds the correct definitions for FINTEGER for Linux and Windows
* moves things around a bit in the CMakeLists.txt files to keep things more local
* fixes a few more warnings in 3rd party code
This PR adds the FAISS LSH index to the Marian as a CPU-side optimization in the output layer of transformer models via KNN search.
* It allows to replace the potentially harmful short-list with a ML free approximation of the final matrix multiply. With increasing number of K neighbors and the size of the chosen hash in bits the approximation becomes more accurate, but also slower. For model dimensions of 512, the sweet spot seems to be around k=100-150 and nbits=1024-1536
* Enable during CPU-side decoding via `--output-approx-knn k nbits`
* Add a `lambda` node that allows to create custom new nodes, most useful for CPU use right now.
This does not introduce any new functionality, just moves code around, so that future PRs are easier to compare. Moving old GraphGroup code to training/deprecated. Once it is clear there is nothing in there that's worth saving, this will be deleted.
Replace -Ofast with -O3 and make sure ffinite-math is turned off.
* Compile marian on mac and clang. Two linker errors left
* MacOS defines has a different definition for unsigned long
* Find OpenBLAS on mac
* Fix a typo in the BLAS detection
* Simplify and add comments
* Refactor cpu allocation code. Do not fallback to malloc
* Fix compilation warning on gcc
* Refactor memory allocation
* Make things compile with clang-8 with fewer warnings.
* Eliminate clang warnings when compiling examples and when compiling without MKL
* added USE_MKL option to compile without MKL for debugging even when MKL is installed
* fixed issues with compiling examples with clang
* Fix compile errors with clang in src/tests.
* Fix missing whitespace in error message in src/tests/sqlite.cpp.
* Responding to Frank Seide's code review.
* Eliminate clang warnings when compiling with -DUSE_FBGEMM=on.
* Fix compilation on gcc 8
* Get Marian to compile with Clang-10.
* Fix Clang-8 warnings when compiling with marian-server
* Add more comments and explicit unsigned long long for windows
* Pull in fbgemm that supports mac
* Fix warning flags order in CMakeLists.txt
Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
Co-authored-by: Ulrich Germann <ulrich.germann@gmail.com>
Co-authored-by: Roman Grundkiewicz <romang@amu.edu.pl>
Splitting up header file into header and *.cu, comes with the price of having to include specializations for combinations of types as for element.inc and add.inc. No code changes otherwise.
Add CMake options to disable specific compute capabilities.
When run with `make -j16` this compiles in about 6 minutes instead of 7 minutes. Selecting only SM70 during compilation brings down the time to 3 minutes.
This fixes a number of bugs in our GPU reduce-kernels that would manifest mainly for larger matrices and during back-prop. We also drop support for CUDA 8.0 to be able to take advantage of new GPU primitives introduced by NVidia in CUDA 9.0.
* Use ccache only when requested via cmake -DUSE_CCACHE=on
* Add link to https://ccache.dev in comment about using ccache.
* Issue success / missing ccache message when ccache is requested during the CCACHE run
* Issue cmake warning instead of cmake status message when use of ccache is requested but ccache cannot be found.