Commit Graph

137 Commits

Author SHA1 Message Date
Rico Sennrich
7bae758b2e progress bar and version bump 2021-12-08 11:01:01 +01:00
Rico Sennrich
823c880e4b
Merge pull request #100 from VProv/master
Add note about BPE-Dropout during training
2021-02-15 10:10:57 +01:00
Ivan Provilkov
fa326d431c
Add note about BPE-Dropout during training 2021-02-15 11:59:34 +03:00
Rico Sennrich
234923ed53
Merge pull request #94 from yimmon/master
Add parallel support (--num-workers)
2020-06-19 13:36:20 +02:00
yimmon
aa03c3afde Add parallel support (--num-workers) 2020-06-18 03:01:02 +08:00
yimmon
bce5f8abcf Add parallel support (--num-workers) 2020-06-18 01:04:38 +08:00
Rico Sennrich
ef69c1eb27
Merge pull request #92 from yimmon/pr
solve unicode errors when reading the bpe codes file
2020-06-15 10:11:31 +02:00
yimmon
f170245089 solve unicode errors when reading the bpe codes file 2020-06-14 15:20:29 +08:00
Rico Sennrich
1b63d67083
Merge pull request #91 from noe/master
Add --seed command-line argument to apply-bpe
2020-05-22 21:35:51 +02:00
Noe Casas
d80d3dd67c Update readme with the new --seed command-line argument 2020-05-21 12:52:27 +02:00
Noe Casas
61a15e6ae3 Add --seed command-line argument to apply-bpe to make --dropout reproducible 2020-05-21 12:51:01 +02:00
Rico Sennrich
75a69fc153 add some more umlauts to tests to check behavior in different locales 2020-02-21 17:39:42 +01:00
Rico Sennrich
4cac90b5c2
Create pythonpublish.yml
automatically publish pipy package on release
2020-01-03 12:29:17 +01:00
Rico Sennrich
8735d8b084 version bump 2019-11-25 17:54:19 +01:00
Rico Sennrich
5c7b56ea97 apply BPE dropout on list, not set of symbol pairs (in line with what Provilkov et al. did)
simplify and optimize apply_bpe code
2019-11-14 15:14:39 +01:00
Rico Sennrich
c96ed1740c
Merge pull request #81 from kweonwooj/master
apply bpe-dropout in subword-nmt cli mode
2019-11-07 09:39:01 +01:00
Kweonwoo Jung
f7c03abf79 apply bpe-dropout in subword-nmt cli mode 2019-11-07 13:30:05 +09:00
Rico Sennrich
a40db4510c documentation 2019-10-30 09:07:54 +01:00
Rico Sennrich
c4aa49a086 BPE dropout (Provilkov et al., 2019) 2019-10-30 08:59:25 +01:00
Rico Sennrich
18a5c87046
Merge pull request #70 from alvations/patch-4
Use a single regex match with optional operator
2019-01-14 16:13:57 +00:00
Rico Sennrich
e99c89b671
Merge pull request #69 from alvations/patch-3
re.split can catch groups and save the delimiter
2019-01-14 16:12:35 +00:00
alvations
6728e93e3f
Cast filter generator to list for Python3 2019-01-14 23:12:35 +08:00
alvations
f4f430acaf re.split can catch groups and save the delimiter 2019-01-14 23:05:08 +08:00
alvations
8a94d6e6bf
added missing parameter 2019-01-14 22:53:07 +08:00
alvations
ee99a507f3
Use a single regex match with optional operator 2019-01-14 15:42:59 +08:00
Rico Sennrich
9ba2fbfab8
Update README.md 2019-01-11 09:32:00 +00:00
Rico Sennrich
fabe72d4f6 version bump 2018-12-11 14:47:15 +00:00
Rico Sennrich
955abfe7e5 enable encoding fix in subword-bpe
relevant code was not run because subword_bpe.py is never executed as a script.
2018-11-12 17:56:02 +00:00
Rico Sennrich
d21ced8f86 fix subword-bpe learn-bpe in Python 2
fixes regression from commit 06352. Error was:
AttributeError: 'Namespace' object has no attribute 'separator'
2018-09-17 11:57:06 +01:00
Rico Sennrich
6e67561a68
Merge pull request #62 from bastings/master
pass `total_symbols` to learn_bpe
2018-08-23 10:38:12 +01:00
Joost Bastings
bdcf459c27
pass total_symbols to learn_bpe
pass `total_symbols` to learn_bpe when using the `subword-nmt learn-bpe` command
2018-08-22 22:09:08 +02:00
Rico Sennrich
73a6e55d5b suppert argument --total-symbols in learn_joint_bpe_and_vocab 2018-08-20 12:07:45 +01:00
Rico Sennrich
5700db410d version bump 2018-08-17 13:49:09 +01:00
Rico Sennrich
36bfdd3a7a fix best practice instructions
thx to @bastings.
2018-08-17 13:42:14 +01:00
Rico Sennrich
45ff8c5f30
Merge pull request #57 from jsenellart/fix_unicode_separator
enable unicode separator/glossaries in cli
2018-07-18 08:15:03 +10:00
Jean A. Senellart
8450bd3231 condition parameter conversion to python 2 2018-07-18 07:36:11 +10:00
Jean Senellart
d92491ff12
Merge branch 'master' into fix_unicode_separator 2018-07-18 07:25:48 +10:00
Rico Sennrich
06352533dd enable unicode separators in Python2
thanks @jsenellart
2018-07-17 16:40:51 +10:00
Jean A. Senellart
a36b489094 same for glossaries 2018-07-13 04:23:54 +09:00
Jean A. Senellart
9df8997c78 enable unicode separator 2018-07-12 11:52:30 +09:00
Rico Sennrich
f8086c2fa6
Merge pull request #56 from Proyag/master
Extending --glossaries to handle regex
2018-07-10 17:17:09 +01:00
Proyag
ba1db43457 add unittest (and fix python3 integer division in unittest) 2018-07-09 11:12:25 +02:00
Proyag
c06e87d396 handle regex as glossaries 2018-07-09 11:12:17 +02:00
Rico Sennrich
48ba99e657 fix typo in previous commit 2018-06-28 11:48:40 +01:00
Rico Sennrich
61ad855cf0 new option --total-symbols in learn-bpe
redefines "--symbols" to be the number of merge operations,
minus the character vocabulary size, so that "--symbols" becomes
an estimate of the final symbol vocabulary size.

thx @phikoehn
2018-06-28 11:43:56 +01:00
Rico Sennrich
71b22d1a99
Merge pull request #52 from en-dash/master
Improve library usability
2018-06-06 21:34:56 +08:00
Lenz
7e336e0e1f new method segment_tokens that takes and returns a list 2018-06-05 23:13:51 +03:00
Lenz
d643c5ff9a fix: spurious .format() operation 2018-06-05 23:06:43 +03:00
Rico Sennrich
8012fd6607 fix pip package with Python3 2018-05-21 10:53:59 +01:00
Rico Sennrich
1df7c24a37 proper markdown on PyPi 2018-05-17 13:28:21 +01:00