Replace `--after-batches N` and `--after-epochs N` with `--after Nu/Ne` which allows to specify updates, epochs, target labels with units, e.g.:
* `--after 30Gt` or `--after 50ku` or `--after 10e`
* Can also combine multiple criteria: `--after 30Gt,50ku,10e` and will stop when whichever hits first
Changes default `cost-type` from `ce-mean` to `ce-sum` and turns `display-label-counts` on by default.
* Fixes reductions into scalars for <= 32 input elements. Only affects reductions where 0 is not the identity
* Update CHANGELOG.md
* Adds space before "?"
* Adds comment explaining increase in margin for reduction tests. Adds axis comment to argument to reduce functions. Adds more tests for small reduction operators
A few updates to Azure Pipelines:
* Adding CPU-only and GPU-only builds on Ubuntu
* Compiling Marian statically in some of the Ubuntu builds
* Ubuntu build with minimum supported versions of CMake (3.5.1), gcc (5.5), CUDA (10.0 due to GCC 5.5), no MKL
* Compiling marian-server with Boost 1.72 on Windows builds
* Minor clean up
- Add installation targets (enabled by GENERATE_MARIAN_INSTALL_TARGETS; default: OFF to preserve CMake 3.5.1 compatibility)
- Add COMPILE_LIBRARY_ONLY option (default: OFF) to exclude in-source executables from the build
- Compiler warning flags are no longer exported as part of the public link interface, only when building privately
- Always set CPUINFO_BUILD_TOOLS=OFF when building fbgemm, not just for MSVC builds
Related work items: #108034
This PR enables final post-processing of a full transformer stack for correct prenorm behavior.
See issues: #715 and #699,
List of changes:
Add final post-processing in encoder and decoder if requested with --transformer-postprocess-top. Can take combinations of d, n, a. Using a will add a skip connection from the bottom of the stack.
Add --task transformer-base-prenorm and --task transformer-big-prenorm which correspond to --task transformer-base --transformer-preprocess n --transformer-postprocess da --transformer-postprocess-top n.
A few improvements to Azure Pipelines:
- Disabling build on Ubuntu 20.04 due to [issues with FBGEMM and GCC 9+](https://github.com/marian-nmt/marian-dev/issues/709)
- Replacing Invoke-WebRequest with wget.exe
- Cleaning environmental variables
Adds `--output-omit-bias` option which allows to train an output layer without a bias vector. This is expected to be useful for `--output-approx-knn` during decoding, as the LSH-based k-NN search is then exactly approximating the correct top-K values for decoding. The bias adds a shift otherwise. In first experiments the lack of the output bias does not seem to result in any performance loss.