Commit Graph

29 Commits

Author SHA1 Message Date
Rico Sennrich
2d5a3ecdbc remove subword marker at end-of-line 2017-04-07 15:13:26 +02:00
Rico Sennrich
fb526f1b00 rename --is-dict to --dict-input 2017-02-27 15:57:11 +00:00
Martin Boyanov
f37902dec6 Allow passing in a word - count file instead of iterating through the whole dataset 2017-02-25 14:17:56 +02:00
Rico Sennrich
4c54e1df2e make max deterministic by using symbol pair as secondary sort key 2017-02-22 13:58:21 +00:00
Rico Sennrich
669255833f acknowledgements 2017-02-20 10:54:15 +00:00
Rico Sennrich
9f23b0171a consistently use UTF-8 across python versions and environment variables 2017-02-10 11:11:45 +00:00
Rico Sennrich
6a953fd54a Merge branch 'unicode' 2017-02-10 11:07:08 +00:00
Rico Sennrich
c82604aa57 consistent, cross-version unicode handling 2017-01-10 14:52:42 +00:00
Rico Sennrich
269f18593e Merge pull request #10 from rmeertens/master
using python3 print function
2016-11-08 09:09:06 +00:00
roland
d796a78a17 using python3 print function 2016-11-08 10:00:31 +01:00
Rico Sennrich
e68bd0582f frequency threshold for learning
Closes #8
2016-10-17 16:36:33 +01:00
Rico Sennrich
ec5c7b009c comments/whitespace 2016-09-06 14:09:41 +01:00
aagohary
4c3a3b3176 fixed the encoding issue with applying bpe with non-utf8 locale 2016-09-06 13:43:32 +01:00
Rico Sennrich
3004836285 update reference 2016-06-01 14:49:14 +01:00
Rico Sennrich
5d2d3758ad break condition for toy example 2016-03-03 16:39:34 +00:00
Rico Sennrich
d0c78f57c8 add toy implementation of BPE as documentation 2016-03-03 11:17:36 +00:00
Rico Sennrich
962c445819 x2 speedup on Python 2.X (use PyPy for best speed) 2016-02-15 10:44:28 +00:00
Rico Sennrich
3380810b2e escape backslashes in replacement. fixes #5. 2016-01-29 10:53:31 +00:00
Rico Sennrich
e4c38f2f30 Merge pull request #4 from He-Ro/fix-short-references
Fixes #3
2016-01-29 10:02:28 +00:00
Hendrik Rosendahl
19634c38c2 Fixes #3
Add test before division to check if total reference count is greater than zero
2016-01-29 10:23:00 +01:00
Rico Sennrich
8cb41a4c39 command line option for verbosity 2015-12-07 11:25:57 +00:00
Rico Sennrich
84e7411928 implementation of chrF (for evaluation) 2015-11-27 14:48:15 +00:00
Rico Sennrich
d822ce6744 Merge pull request #2 from kyunghyuncho/master
strip line before writing
2015-11-26 09:42:59 +00:00
Kyunghyun Cho
54e52cf3e3 removed debug 2015-11-25 19:01:39 -05:00
Kyunghyun Cho
81947f9907 correct strip 2015-11-25 19:01:04 -05:00
Kyunghyun Cho
b1e99d9829 use strip 2015-11-25 18:43:44 -05:00
Rico Sennrich
3028cc660d allow use of apply_bpe as a class 2015-11-24 12:14:25 +00:00
Rico Sennrich
15f43f2afe fix rare problem with pruned statistics
if the pruned stats are empty, we need to go back to full statistics
2015-10-29 16:45:00 +00:00
Rico Sennrich
83b1847647 initial commit 2015-09-01 11:48:49 +01:00