marian/CHANGELOG.md

# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## [Unreleased]
- Added support for CUBLAS_TENSOR_OP_MATH mode for cublas in cuda 9.0

### Added

## [1.0.0] - 2017-11-13

### Added
- Multi-gpu validation, scorer and in-training translation
- summary-mode for scorer
- New "transformer" model based on [Attention is all you
  need](https://arxiv.org/abs/1706.03762)
- Options specific for the transformer model
- Linear learning rate warmup with and without initial value
- Cyclic learning rate warmup
- More options for learning rate decay, including: optimizer history reset,
  repeated warmup
- Continuous inverted square root decay of learning (`--lr-decay-inv-sqrt`) rate
  based on number of updates
- Exposed optimizer parameters (e.g. momentum etc. for Adam)
- Version of deep RNN-based models compatible with Nematus (`--type nematus`)
- Synchronous SGD training for multi-gpu (enable with `--sync-sgd`)
- Dynamic construction of complex models with different encoders and decoders,
  currently only available through the C++ API
- Option --quiet to suppress output to stderr
- Option to choose different variants of optimization criterion: mean
  cross-entropy, perplexity, cross-entopry sum
- In-process translation for validation, uses the same memory as training
- Label Smoothing
- CHANGELOG.md
- CONTRIBUTING.md
- Swish activation function default for Transformer
  (https://arxiv.org/pdf/1710.05941.pdf)

### Changed
- Changed shape organization to follow numpy.
- Changed option `--moving-average` to `--exponential-smoothing` and inverted
  formula to `s_t = (1 - \alpha) * s_{t-1} + \alpha * x_t`, `\alpha` is now
  `1-e4` by default
- Got rid of thrust for compile-time mathematical expressions
- Changed boolean option `--normalize` to `--normalize [arg=1] (=0)`. New
  behaviour is backwards-compatible and can also be specified as
  `--normalize=0.6`
- Renamed "s2s" binary to "marian-decoder"
- Renamed "rescorer" binary to "marian-scorer"
- Renamed "server" binary to "marian-server"
- Renamed option name `--dynamic-batching` to `--mini-batch-fit`
- Unified cross-entropy-based validation, supports now perplexity and other CE
- Changed `--normalize (bool)` to `--normalize (float)arg`, allow to change
  length normalization weight as `score / pow(length, arg)`.

### Removed
- Temporarily removed gradient dropping (`--drop-rate X`) until refactoring.
Changelog 2017-10-14 20:52:56 +03:00			`# Changelog`
Update changelog 2017-10-15 17:40:35 +03:00
Changelog 2017-10-14 20:52:56 +03:00			`All notable changes to this project will be documented in this file.`

			`The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)`
			`and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).`

			`## [Unreleased]`
add CUBLAS_TENSOR_OP_MATH for cuda 9.0 2017-11-15 18:01:08 +03:00			`- Added support for CUBLAS_TENSOR_OP_MATH mode for cublas in cuda 9.0`
Changelog 2017-10-14 20:52:56 +03:00
Merge with 'master' branch 2017-10-18 12:02:53 +03:00			`### Added`

Update CONTRIBUTING 2017-11-13 15:27:52 +03:00			`## [1.0.0] - 2017-11-13`
Update changelog 2017-10-15 17:40:35 +03:00
Changelog 2017-10-14 20:52:56 +03:00			`### Added`
update changelog 2017-11-05 11:19:24 +03:00			`- Multi-gpu validation, scorer and in-training translation`
			`- summary-mode for scorer`
Update changelog 2017-10-15 17:40:35 +03:00			`- New "transformer" model based on [Attention is all you`
			`need](https://arxiv.org/abs/1706.03762)`
			`- Options specific for the transformer model`
Changelog 2017-10-14 20:52:56 +03:00			`- Linear learning rate warmup with and without initial value`
			`- Cyclic learning rate warmup`
Update changelog 2017-10-15 17:40:35 +03:00			`- More options for learning rate decay, including: optimizer history reset,`
			`repeated warmup`
Merge with 'master' branch 2017-10-18 12:02:53 +03:00			- Continuous inverted square root decay of learning (`--lr-decay-inv-sqrt`) rate
Update changelog 2017-10-15 17:40:35 +03:00			`based on number of updates`
Changelog 2017-10-14 20:52:56 +03:00			`- Exposed optimizer parameters (e.g. momentum etc. for Adam)`
Merge with 'master' branch 2017-10-18 12:02:53 +03:00			- Version of deep RNN-based models compatible with Nematus (`--type nematus`)
			- Synchronous SGD training for multi-gpu (enable with `--sync-sgd`)
Changelog 2017-10-14 20:52:56 +03:00			`- Dynamic construction of complex models with different encoders and decoders,`
Update changelog 2017-10-15 17:40:35 +03:00			`currently only available through the C++ API`
Changelog 2017-10-14 20:52:56 +03:00			`- Option --quiet to suppress output to stderr`
Update changelog 2017-10-15 17:40:35 +03:00			`- Option to choose different variants of optimization criterion: mean`
			`cross-entropy, perplexity, cross-entopry sum`
			`- In-process translation for validation, uses the same memory as training`
			`- Label Smoothing`
Update CONTRIBUTING 2017-11-13 15:27:52 +03:00			`- CHANGELOG.md`
			`- CONTRIBUTING.md`
swish 2017-10-22 22:18:30 +03:00			`- Swish activation function default for Transformer`
Update CONTRIBUTING 2017-11-13 15:27:52 +03:00			`(https://arxiv.org/pdf/1710.05941.pdf)`
Changelog 2017-10-14 20:52:56 +03:00
			`### Changed`
Update CONTRIBUTING 2017-11-13 15:27:52 +03:00			`- Changed shape organization to follow numpy.`
			- Changed option `--moving-average` to `--exponential-smoothing` and inverted
			formula to `s_t = (1 - \alpha) * s_{t-1} + \alpha * x_t`, `\alpha` is now
			`1-e4` by default
update changelog 2017-11-05 11:19:24 +03:00			`- Got rid of thrust for compile-time mathematical expressions`
			- Changed boolean option `--normalize` to `--normalize [arg=1] (=0)`. New
Update CONTRIBUTING 2017-11-13 15:27:52 +03:00			`behaviour is backwards-compatible and can also be specified as`
			`--normalize=0.6`
update changelog 2017-10-17 16:22:42 +03:00			`- Renamed "s2s" binary to "marian-decoder"`
			`- Renamed "rescorer" binary to "marian-scorer"`
Merge with 'master' branch 2017-10-18 12:02:53 +03:00			`- Renamed "server" binary to "marian-server"`
update changelog 2017-10-17 16:22:42 +03:00			- Renamed option name `--dynamic-batching` to `--mini-batch-fit`
Update changelog 2017-10-15 17:40:35 +03:00			`- Unified cross-entropy-based validation, supports now perplexity and other CE`
Merge with 'master' branch 2017-10-18 12:02:53 +03:00			- Changed `--normalize (bool)` to `--normalize (float)arg`, allow to change
			length normalization weight as `score / pow(length, arg)`.
Changelog 2017-10-14 20:52:56 +03:00
			`### Removed`
update changelog 2017-10-17 16:22:42 +03:00			- Temporarily removed gradient dropping (`--drop-rate X`) until refactoring.