2017-10-14 20:52:56 +03:00
|
|
|
# Changelog
|
2017-10-15 17:40:35 +03:00
|
|
|
|
2017-10-14 20:52:56 +03:00
|
|
|
All notable changes to this project will be documented in this file.
|
|
|
|
|
|
|
|
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
|
|
|
|
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
|
|
|
|
|
|
|
|
## [Unreleased]
|
2017-11-15 18:01:08 +03:00
|
|
|
- Added support for CUBLAS_TENSOR_OP_MATH mode for cublas in cuda 9.0
|
2017-10-14 20:52:56 +03:00
|
|
|
|
2017-10-18 12:02:53 +03:00
|
|
|
### Added
|
|
|
|
|
2017-11-13 15:27:52 +03:00
|
|
|
## [1.0.0] - 2017-11-13
|
2017-10-15 17:40:35 +03:00
|
|
|
|
2017-10-14 20:52:56 +03:00
|
|
|
### Added
|
2017-11-05 11:19:24 +03:00
|
|
|
- Multi-gpu validation, scorer and in-training translation
|
|
|
|
- summary-mode for scorer
|
2017-10-15 17:40:35 +03:00
|
|
|
- New "transformer" model based on [Attention is all you
|
|
|
|
need](https://arxiv.org/abs/1706.03762)
|
|
|
|
- Options specific for the transformer model
|
2017-10-14 20:52:56 +03:00
|
|
|
- Linear learning rate warmup with and without initial value
|
|
|
|
- Cyclic learning rate warmup
|
2017-10-15 17:40:35 +03:00
|
|
|
- More options for learning rate decay, including: optimizer history reset,
|
|
|
|
repeated warmup
|
2017-10-18 12:02:53 +03:00
|
|
|
- Continuous inverted square root decay of learning (`--lr-decay-inv-sqrt`) rate
|
2017-10-15 17:40:35 +03:00
|
|
|
based on number of updates
|
2017-10-14 20:52:56 +03:00
|
|
|
- Exposed optimizer parameters (e.g. momentum etc. for Adam)
|
2017-10-18 12:02:53 +03:00
|
|
|
- Version of deep RNN-based models compatible with Nematus (`--type nematus`)
|
|
|
|
- Synchronous SGD training for multi-gpu (enable with `--sync-sgd`)
|
2017-10-14 20:52:56 +03:00
|
|
|
- Dynamic construction of complex models with different encoders and decoders,
|
2017-10-15 17:40:35 +03:00
|
|
|
currently only available through the C++ API
|
2017-10-14 20:52:56 +03:00
|
|
|
- Option --quiet to suppress output to stderr
|
2017-10-15 17:40:35 +03:00
|
|
|
- Option to choose different variants of optimization criterion: mean
|
|
|
|
cross-entropy, perplexity, cross-entopry sum
|
|
|
|
- In-process translation for validation, uses the same memory as training
|
|
|
|
- Label Smoothing
|
2017-11-13 15:27:52 +03:00
|
|
|
- CHANGELOG.md
|
|
|
|
- CONTRIBUTING.md
|
2017-10-22 22:18:30 +03:00
|
|
|
- Swish activation function default for Transformer
|
2017-11-13 15:27:52 +03:00
|
|
|
(https://arxiv.org/pdf/1710.05941.pdf)
|
2017-10-14 20:52:56 +03:00
|
|
|
|
|
|
|
### Changed
|
2017-11-13 15:27:52 +03:00
|
|
|
- Changed shape organization to follow numpy.
|
|
|
|
- Changed option `--moving-average` to `--exponential-smoothing` and inverted
|
|
|
|
formula to `s_t = (1 - \alpha) * s_{t-1} + \alpha * x_t`, `\alpha` is now
|
|
|
|
`1-e4` by default
|
2017-11-05 11:19:24 +03:00
|
|
|
- Got rid of thrust for compile-time mathematical expressions
|
|
|
|
- Changed boolean option `--normalize` to `--normalize [arg=1] (=0)`. New
|
2017-11-13 15:27:52 +03:00
|
|
|
behaviour is backwards-compatible and can also be specified as
|
|
|
|
`--normalize=0.6`
|
2017-10-17 16:22:42 +03:00
|
|
|
- Renamed "s2s" binary to "marian-decoder"
|
|
|
|
- Renamed "rescorer" binary to "marian-scorer"
|
2017-10-18 12:02:53 +03:00
|
|
|
- Renamed "server" binary to "marian-server"
|
2017-10-17 16:22:42 +03:00
|
|
|
- Renamed option name `--dynamic-batching` to `--mini-batch-fit`
|
2017-10-15 17:40:35 +03:00
|
|
|
- Unified cross-entropy-based validation, supports now perplexity and other CE
|
2017-10-18 12:02:53 +03:00
|
|
|
- Changed `--normalize (bool)` to `--normalize (float)arg`, allow to change
|
|
|
|
length normalization weight as `score / pow(length, arg)`.
|
2017-10-14 20:52:56 +03:00
|
|
|
|
|
|
|
### Removed
|
2017-10-17 16:22:42 +03:00
|
|
|
- Temporarily removed gradient dropping (`--drop-rate X`) until refactoring.
|