Hieu Hoang
d30a1d51c8
Merge pull request #220 from wwaites/master
...
flag to turn off sentence splitter from emitting <P>
2020-02-27 18:55:17 -08:00
William Waites
696a5d9833
flag to turn off sentence splitter from emitting <P>
2020-02-26 14:08:26 +00:00
Kenneth Heafield
22923ddcf0
Revert "line buffering for tokeniser and truecaser"
...
This reverts commit 691717c425
.
2020-02-20 09:52:08 +00:00
Hieu Hoang
3c881255b1
Merge pull request #219 from wwaites/master
...
line buffering for tokeniser and truecaser
2020-02-19 10:35:29 -08:00
William Waites
691717c425
line buffering for tokeniser and truecaser
2020-02-17 14:29:24 +00:00
Hieu Hoang
4c5e89f075
Merge pull request #218 from veer66/master
...
Add AARCH64 support
2020-01-22 11:30:03 -08:00
Vee Satayamas
5694efe10b
Add AARCH64 support
2020-01-16 09:13:03 +00:00
Hieu Hoang
e4a52f14e4
Merge pull request #217 from moses-smt/alvations-patch-2
...
Proper spacing for sent-split perl script
2020-01-05 19:46:25 -08:00
alvations
d03df21e88
Proper spacing
2020-01-06 11:43:31 +08:00
Hieu Hoang
f46ee7c5ac
get rid of boost thread local code
2020-01-05 18:56:49 -08:00
Hieu Hoang
745e03b4fc
use c++11 thread local construct instead of boost
2020-01-05 18:09:57 -08:00
Hieu Hoang
fdabcd34f8
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2020-01-05 17:29:14 -08:00
Hieu Hoang
afb353b430
limit thread queue to x2 number of threads
2020-01-05 17:29:04 -08:00
Hieu Hoang
25ec481655
Merge pull request #216 from HjalmarrSv/patch-1
...
Modernized
2020-01-02 03:38:55 +00:00
HjalmarrSv
fa747062dc
Modernized
...
I wanted to properly parse links on https://dumps.wikimedia.org/mirrors.html when page copied as text
My proposed changes does the job.
Basically I had to change by replacing the + at end of line 5 with *(\/)?
The pipe symbol could lead to crashes why I broke up line 5 to three lines. I suggest not using the pipe (|) after reading various posts.
2019-12-17 20:40:51 +01:00
Barry Haddow
a89691fee3
attempt to handle Korean better; only consider horizontal space in final split
2019-12-16 15:52:45 +00:00
Barry Haddow
2cff8ff6dd
split word on any type of space
2019-12-09 17:04:09 +00:00
Hieu Hoang
41b31167fd
Merge pull request #215 from moses-smt/alvations-patch-normalization
...
Single quotes should be escaped as single quotes.
2019-11-24 18:38:05 -08:00
alvations
f6d7adde15
Single quotes should be escaped as single quotes.
2019-11-25 10:10:40 +08:00
Barry Haddow
74d54b54c3
2 letter codes
2019-11-08 15:36:22 +00:00
Barry Haddow
1037070026
support for several Indic languages
2019-11-08 14:56:58 +00:00
Barry Haddow
b1163966b1
initial hi non-breaking prefixes
2019-11-05 16:59:40 +00:00
Barry Haddow
61b1d06570
list items
2019-11-05 16:52:50 +00:00
Barry Haddow
4da86c360f
rupees
2019-11-05 16:02:19 +00:00
Barry Haddow
56b2bad907
fix abbrev rule
2019-11-05 15:58:07 +00:00
Barry Haddow
3910cd6c46
devanagari fix
2019-10-31 21:28:43 +00:00
Barry Haddow
2affb9b624
reorganise indic support
2019-10-31 16:50:17 +00:00
Barry Haddow
d708e26b60
use block notation for indic scripts
2019-10-31 16:12:59 +00:00
Barry Haddow
0fef8ebf4c
fix nbp
2019-10-31 16:08:56 +00:00
Barry Haddow
b1d9fb6d75
full cjk test
2019-10-28 09:53:45 +00:00
Barry Haddow
8ebebbc680
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2019-10-28 09:48:40 +00:00
Hieu Hoang
286188b82a
Merge pull request #214 from JetRunner/patch-1
...
Fix the incorrect processing considering fullwidth number character
2019-10-18 19:54:46 -07:00
Kevin Canwen Xu
5d3331b922
Update replace-unicode-punctuation.perl
2019-10-14 16:33:58 +08:00
alvations
555829a771
Undoing 0578892581
...
Causes abbreviations to not split when ending with a fullstop. E.g.
> The restructuring of IBM was essential to enable it organisationally to take up the responsibilities entrusted in the role with the recent changes in the policy and legislations, revised charter of function of IBM and the new activities and initiatives undertaken by IBM. IBM is also engaged in handholding the States for auction of mineral blocks for greater transparency in allocation of mineral concessions.
2019-10-01 05:27:06 +08:00
Barry Haddow
486dce270f
debug
2019-09-30 16:58:21 +01:00
Barry Haddow
9bffde57ba
revert 05788925
2019-09-30 16:53:06 +01:00
Barry Haddow
257d7e5e66
enable custom non breaking prefixes
2019-09-30 16:52:24 +01:00
Barry Haddow
01a8ec41e8
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2019-09-30 15:33:33 +01:00
Barry Haddow
768944d851
do not add spaces in cjk
2019-09-30 15:33:26 +01:00
Hieu Hoang
b21b071a66
Merge pull request #213 from titsuki/enable-strict
...
Enable use strict pragma
2019-09-25 01:04:13 +01:00
titsuki
490dc3996a
Enable use strict pragma
2019-09-23 15:40:13 +09:00
Hieu Hoang
fd06cdf026
Merge pull request #212 from moses-smt/alvations-patch-regexes
...
The dot before an acronym should be optional.
2019-09-04 08:07:06 +01:00
alvations
0578892581
The dot before an acronym should be optional.
2019-09-04 14:16:41 +08:00
Hieu Hoang
9f08d77b0d
Merge pull request #211 from achimr/master
...
Support for Urdu in sentence splitter
2019-08-21 22:05:45 +01:00
Achim Ruopp
7ad5ffa0c0
Support for Urdu in sentence splitter
2019-07-10 10:48:32 -04:00
Hieu Hoang
158d252389
tweak readme
2019-06-08 18:22:39 +01:00
Hieu Hoang
c0545019eb
Merge pull request #210 from mjpost/patch-1
...
escape angle brackets
2019-04-27 21:23:50 +01:00
Matt Post
63c450b401
escape angle brackets
...
The script doesn't escape angle brackets which can result in bad SGML / XML output. This fixes that, although ideally, this should be implemented with a proper parser and dumper.
2019-04-26 14:24:07 -04:00
Hieu Hoang
187a75cb55
Merge pull request #209 from joelb-git/multi-bleu-detok-non-ascii-fix
...
Fix non-ASCII lowercasing
2019-03-01 23:26:12 +00:00
Joel Barry
fdb7384d3d
Fix non-ASCII lowercasing
2019-02-27 10:17:29 -05:00