Commit Graph

17445 Commits

Author SHA1 Message Date
Anoop Kunchukuttan (STC INDIA)
33da1af73a added new pipelines 2020-09-03 12:55:22 +05:30
Anoop Kunchukuttan (STC INDIA)
5f1c2c2d80 Update azure-pipelines.yml for Azure Pipelines 2020-09-02 10:28:44 +00:00
Anoop Kunchukuttan (STC INDIA)
5ab839aa3e Update azure-pipelines.yml for Azure Pipelines 2020-09-02 10:28:05 +00:00
Anoop Kunchukuttan (STC INDIA)
fc1e484a8f Update azure-pipelines.yml for Azure Pipelines 2020-09-02 10:24:07 +00:00
Anoop Kunchukuttan (STC INDIA)
53ea29008f Updated azure-pipelines.yml 2020-09-02 10:20:39 +00:00
Anoop Kunchukuttan (STC INDIA)
04fe2f4c25 Set up CI with Azure Pipelines
[skip ci]
2020-09-02 10:17:33 +00:00
Hieu Hoang
83baf4daec add MSPT files 2020-08-05 10:23:03 -07:00
Hieu Hoang
e99d9a0d21 Merge github.com:hieuhoang/msmoses 2020-08-05 10:22:40 -07:00
Hieu Hoang
96fd873594 start MSPT 2020-08-05 10:07:49 -07:00
Kenneth Heafield
78ca5f3cc5 Allow Arabic letters to begin a fa sentence 2020-08-03 21:51:09 +01:00
Hieu Hoang
d65d392d46
Merge pull request #222 from cristinae/patch-1
adding rules for Catalan
2020-07-31 09:31:17 -07:00
Cristina España i Bonet
8d78dae634
adding rules for Catalan
special characters within words and contractions closer to French than to English
2020-07-31 15:22:47 +02:00
Barry Haddow
47915b561f escape ampersands 2020-06-30 08:10:56 +01:00
Hieu Hoang
d90a8df862
Merge pull request #221 from HjalmarrSv/master
Added some for sv
2020-06-01 17:19:36 -07:00
HjalmarrSv
b7038d5f24
Merge pull request #1 from HjalmarrSv/HjalmarrSv-patch-1
Update nonbreaking_prefix.sv
2020-05-23 17:46:04 +02:00
HjalmarrSv
da3768a296
Update nonbreaking_prefix.sv
Added Å Ä Ö, which are not unusual initials in names, e.g. Åke, Ärling, Östen.
Added some new, but mostly variations on the existing ones. Both a dot after each letter (or pair) and a dot only after last letter are accepted forms. A couple of decades ago, there had to be a space after the dot, which explains the third form.
The file for sv is much more useful with these few additions. Although, It is still far from complete.
Removed: G (occured twice).
In this list there is one item that is also a word, even when case is kept: tom.
If all words are in small case, then tex, mao, tom (again), may be confused with names, and iaf, etc with named entities.
2020-05-23 17:43:33 +02:00
Kenneth Heafield
89b9b4fba2 sentence splitter -k option to keep line boundaries 2020-03-19 15:44:41 +00:00
Kenneth Heafield
0a892749bc Add Pashto ؟ as a sentence splitting character 2020-03-19 12:06:50 +00:00
Hieu Hoang
d30a1d51c8
Merge pull request #220 from wwaites/master
flag to turn off sentence splitter from emitting <P>
2020-02-27 18:55:17 -08:00
William Waites
696a5d9833 flag to turn off sentence splitter from emitting <P> 2020-02-26 14:08:26 +00:00
Kenneth Heafield
22923ddcf0 Revert "line buffering for tokeniser and truecaser"
This reverts commit 691717c425.
2020-02-20 09:52:08 +00:00
Hieu Hoang
3c881255b1
Merge pull request #219 from wwaites/master
line buffering for tokeniser and truecaser
2020-02-19 10:35:29 -08:00
William Waites
691717c425 line buffering for tokeniser and truecaser 2020-02-17 14:29:24 +00:00
Hieu Hoang
4c5e89f075
Merge pull request #218 from veer66/master
Add AARCH64 support
2020-01-22 11:30:03 -08:00
Vee Satayamas
5694efe10b Add AARCH64 support 2020-01-16 09:13:03 +00:00
Hieu Hoang
e4a52f14e4
Merge pull request #217 from moses-smt/alvations-patch-2
Proper spacing for sent-split perl script
2020-01-05 19:46:25 -08:00
alvations
d03df21e88
Proper spacing 2020-01-06 11:43:31 +08:00
Hieu Hoang
f46ee7c5ac get rid of boost thread local code 2020-01-05 18:56:49 -08:00
Hieu Hoang
745e03b4fc use c++11 thread local construct instead of boost 2020-01-05 18:09:57 -08:00
Hieu Hoang
fdabcd34f8 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2020-01-05 17:29:14 -08:00
Hieu Hoang
afb353b430 limit thread queue to x2 number of threads 2020-01-05 17:29:04 -08:00
Hieu Hoang
25ec481655
Merge pull request #216 from HjalmarrSv/patch-1
Modernized
2020-01-02 03:38:55 +00:00
HjalmarrSv
fa747062dc
Modernized
I wanted to properly parse links on https://dumps.wikimedia.org/mirrors.html when page copied as text
My proposed changes does the job.
Basically I had to change by replacing the + at end of line 5 with *(\/)?
The pipe symbol could lead to crashes why I broke up line 5 to three lines. I suggest not using the pipe (|) after reading various posts.
2019-12-17 20:40:51 +01:00
Barry Haddow
a89691fee3 attempt to handle Korean better; only consider horizontal space in final split 2019-12-16 15:52:45 +00:00
Barry Haddow
2cff8ff6dd split word on any type of space 2019-12-09 17:04:09 +00:00
Hieu Hoang
41b31167fd
Merge pull request #215 from moses-smt/alvations-patch-normalization
Single quotes should be escaped as single quotes.
2019-11-24 18:38:05 -08:00
alvations
f6d7adde15
Single quotes should be escaped as single quotes. 2019-11-25 10:10:40 +08:00
Barry Haddow
74d54b54c3 2 letter codes 2019-11-08 15:36:22 +00:00
Barry Haddow
1037070026 support for several Indic languages 2019-11-08 14:56:58 +00:00
Barry Haddow
b1163966b1 initial hi non-breaking prefixes 2019-11-05 16:59:40 +00:00
Barry Haddow
61b1d06570 list items 2019-11-05 16:52:50 +00:00
Barry Haddow
4da86c360f rupees 2019-11-05 16:02:19 +00:00
Barry Haddow
56b2bad907 fix abbrev rule 2019-11-05 15:58:07 +00:00
Barry Haddow
3910cd6c46 devanagari fix 2019-10-31 21:28:43 +00:00
Barry Haddow
2affb9b624 reorganise indic support 2019-10-31 16:50:17 +00:00
Barry Haddow
d708e26b60 use block notation for indic scripts 2019-10-31 16:12:59 +00:00
Barry Haddow
0fef8ebf4c fix nbp 2019-10-31 16:08:56 +00:00
Barry Haddow
b1d9fb6d75 full cjk test 2019-10-28 09:53:45 +00:00
Barry Haddow
8ebebbc680 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2019-10-28 09:48:40 +00:00
Hieu Hoang
286188b82a
Merge pull request #214 from JetRunner/patch-1
Fix the incorrect processing considering fullwidth number character
2019-10-18 19:54:46 -07:00