mirror of
https://github.com/moses-smt/mosesdecoder.git
synced 2024-12-29 06:52:34 +03:00
104 lines
1.5 KiB
Plaintext
104 lines
1.5 KiB
Plaintext
#Anything in this file, followed by a period (and an upper-case word), does NOT indicate an end-of-sentence marker.
|
|
#Special cases are included for prefixes that ONLY appear before 0-9 numbers.
|
|
|
|
#any single upper case letter followed by a period is not a sentence ender (excluding I occasionally, but we leave it in)
|
|
#usually upper case letters are initials in a name
|
|
A
|
|
B
|
|
C
|
|
D
|
|
E
|
|
F
|
|
G
|
|
H
|
|
I
|
|
J
|
|
K
|
|
L
|
|
M
|
|
N
|
|
O
|
|
P
|
|
Q
|
|
R
|
|
S
|
|
T
|
|
U
|
|
V
|
|
W
|
|
X
|
|
Y
|
|
Z
|
|
Á
|
|
É
|
|
Í
|
|
Ó
|
|
Ö
|
|
Ő
|
|
Ú
|
|
Ü
|
|
Ű
|
|
|
|
#List of titles. These are often followed by upper-case names, but do not indicate sentence breaks
|
|
Dr
|
|
dr
|
|
kb
|
|
Kb
|
|
vö
|
|
Vö
|
|
pl
|
|
Pl
|
|
ca
|
|
Ca
|
|
min
|
|
Min
|
|
max
|
|
Max
|
|
ún
|
|
Ún
|
|
prof
|
|
Prof
|
|
de
|
|
De
|
|
du
|
|
Du
|
|
Szt
|
|
St
|
|
|
|
#Numbers only. These should only induce breaks when followed by a numeric sequence
|
|
# add NUMERIC_ONLY after the word for this function
|
|
#This case is mostly for the english "No." which can either be a sentence of its own, or
|
|
#if followed by a number, a non-breaking prefix
|
|
|
|
# Month name abbreviations
|
|
jan #NUMERIC_ONLY#
|
|
Jan #NUMERIC_ONLY#
|
|
Feb #NUMERIC_ONLY#
|
|
feb #NUMERIC_ONLY#
|
|
márc #NUMERIC_ONLY#
|
|
Márc #NUMERIC_ONLY#
|
|
ápr #NUMERIC_ONLY#
|
|
Ápr #NUMERIC_ONLY#
|
|
máj #NUMERIC_ONLY#
|
|
Máj #NUMERIC_ONLY#
|
|
jún #NUMERIC_ONLY#
|
|
Jún #NUMERIC_ONLY#
|
|
Júl #NUMERIC_ONLY#
|
|
júl #NUMERIC_ONLY#
|
|
aug #NUMERIC_ONLY#
|
|
Aug #NUMERIC_ONLY#
|
|
Szept #NUMERIC_ONLY#
|
|
szept #NUMERIC_ONLY#
|
|
okt #NUMERIC_ONLY#
|
|
Okt #NUMERIC_ONLY#
|
|
nov #NUMERIC_ONLY#
|
|
Nov #NUMERIC_ONLY#
|
|
dec #NUMERIC_ONLY#
|
|
Dec #NUMERIC_ONLY#
|
|
|
|
# Other abbreviations
|
|
tel #NUMERIC_ONLY#
|
|
Tel #NUMERIC_ONLY#
|
|
Fax #NUMERIC_ONLY#
|
|
fax #NUMERIC_ONLY#
|