subword-nmt/CHANGELOG.md

CHANGELOG
---------

v0.3.8:
  - multiprocessing support (get_vocab and apply_bpe)
  - progress bar for learn_bpe
  - seed parameter for deterministic BPE dropout
  - ignore some unicode line separators which would crash subword-nmt

v0.3.7:
  - BPE dropout (Provilkov et al., 2019)
  - more efficient glossaries (https://github.com/rsennrich/subword-nmt/pull/69)

v0.3.6:
  - fix to subword-bpe command encoding

v0.3.5:
  - fix to subword-bpe command under Python 2
  - wider support of --total-symbols argument

v0.3.4:
  - segment_tokens method to improve library usability (https://github.com/rsennrich/subword-nmt/pull/52)
  - support regex glossaries (https://github.com/rsennrich/subword-nmt/pull/56)
  - allow unicode separators (https://github.com/rsennrich/subword-nmt/pull/57)
  - new option --total-symbols in learn-bpe (commit 61ad8)
  - fix documentation (best practices) (https://github.com/rsennrich/subword-nmt/pull/60)

v0.3:
 - library is now installable via pip
 - fix occasional problems with UTF-8 whitespace and new lines in learn_bpe and apply_bpe.
   - do not silently convert UTF-8 newline characters into "\n"
   - do not silently convert UTF-8 whitespace characters into " "
   - UTF-8 whitespace and newline characters are now considered part of a word, and segmented by BPE

v0.2:
 - different, more consistent handling of end-of-word token (commit a749a7) (https://github.com/rsennrich/subword-nmt/issues/19)
 - allow passing of vocabulary and frequency threshold to apply_bpe.py, preventing the production of OOV (or rare) subword units (commit a00db)
 - made learn_bpe.py deterministic (commit 4c54e)
 - various changes to make handling of UTF more consistent between Python versions
 - new command line arguments for apply_bpe.py:
   - '--glossaries' to prevent given strings from being affected by BPE
   - '--merges' to apply a subset of learned BPE operations
 - new command line arguments for learn_bpe.py:
   - '--dict-input': rather than raw text file, interpret input as a frequency dictionary (as created by get_vocab.py).


v0.1:
 - consistent cross-version unicode handling
 - all scripts are now deterministic
changelog 2017-04-21 13:25:06 +03:00			`CHANGELOG`
			`---------`

progress bar and version bump 2021-12-08 13:01:01 +03:00			`v0.3.8:`
			`- multiprocessing support (get_vocab and apply_bpe)`
			`- progress bar for learn_bpe`
			`- seed parameter for deterministic BPE dropout`
			`- ignore some unicode line separators which would crash subword-nmt`

version bump 2019-11-25 19:54:19 +03:00			`v0.3.7:`
			`- BPE dropout (Provilkov et al., 2019)`
			`- more efficient glossaries (https://github.com/rsennrich/subword-nmt/pull/69)`

version bump 2018-12-11 17:46:24 +03:00			`v0.3.6:`
			`- fix to subword-bpe command encoding`

fix subword-bpe learn-bpe in Python 2 fixes regression from commit 06352. Error was: AttributeError: 'Namespace' object has no attribute 'separator' 2018-09-17 13:53:36 +03:00			`v0.3.5:`
			`- fix to subword-bpe command under Python 2`
			`- wider support of --total-symbols argument`

version bump 2018-08-17 15:49:09 +03:00			`v0.3.4:`
			`- segment_tokens method to improve library usability (https://github.com/rsennrich/subword-nmt/pull/52)`
			`- support regex glossaries (https://github.com/rsennrich/subword-nmt/pull/56)`
			`- allow unicode separators (https://github.com/rsennrich/subword-nmt/pull/57)`
			`- new option --total-symbols in learn-bpe (commit 61ad8)`
			`- fix documentation (best practices) (https://github.com/rsennrich/subword-nmt/pull/60)`

create symlink in old script location (with deprecation warning) 2018-05-16 16:35:47 +03:00			`v0.3:`
			`- library is now installable via pip`
			`- fix occasional problems with UTF-8 whitespace and new lines in learn_bpe and apply_bpe.`
			`- do not silently convert UTF-8 newline characters into "\n"`
			`- do not silently convert UTF-8 whitespace characters into " "`
			`- UTF-8 whitespace and newline characters are now considered part of a word, and segmented by BPE`

changelog 2017-04-21 13:25:06 +03:00			`v0.2:`
update changelog 2018-03-28 11:19:42 +03:00			`- different, more consistent handling of end-of-word token (commit a749a7) (https://github.com/rsennrich/subword-nmt/issues/19)`
changelog 2017-04-21 13:25:06 +03:00			`- allow passing of vocabulary and frequency threshold to apply_bpe.py, preventing the production of OOV (or rare) subword units (commit a00db)`
update changelog 2018-03-28 11:19:42 +03:00			`- made learn_bpe.py deterministic (commit 4c54e)`
			`- various changes to make handling of UTF more consistent between Python versions`
			`- new command line arguments for apply_bpe.py:`
			`- '--glossaries' to prevent given strings from being affected by BPE`
			`- '--merges' to apply a subset of learned BPE operations`
			`- new command line arguments for learn_bpe.py:`
			`- '--dict-input': rather than raw text file, interpret input as a frequency dictionary (as created by get_vocab.py).`
changelog 2017-04-21 13:25:06 +03:00

			`v0.1:`
			`- consistent cross-version unicode handling`
			`- all scripts are now deterministic`