mirror of
https://github.com/moses-smt/mosesdecoder.git
synced 2024-12-27 05:55:02 +03:00
139 lines
1.3 KiB
Plaintext
139 lines
1.3 KiB
Plaintext
#Anything in this file, followed by a period (and an upper-case word), does NOT
|
|
#indicate an end-of-sentence marker. Special cases are included for prefixes
|
|
#that ONLY appear before 0-9 numbers.
|
|
|
|
#This list is compiled from omorfi <http://code.google.com/p/omorfi> database
|
|
#by Tommi A Pirinen.
|
|
|
|
|
|
#any single upper case letter followed by a period is not a sentence ender
|
|
A
|
|
B
|
|
C
|
|
D
|
|
E
|
|
F
|
|
G
|
|
H
|
|
I
|
|
J
|
|
K
|
|
L
|
|
M
|
|
N
|
|
O
|
|
P
|
|
Q
|
|
R
|
|
S
|
|
T
|
|
U
|
|
V
|
|
W
|
|
X
|
|
Y
|
|
Z
|
|
Å
|
|
Ä
|
|
Ö
|
|
|
|
#List of titles. These are often followed by upper-case names, but do not indicate sentence breaks
|
|
alik
|
|
alil
|
|
amir
|
|
apul
|
|
apul.prof
|
|
arkkit
|
|
ass
|
|
assist
|
|
dipl
|
|
dipl.arkkit
|
|
dipl.ekon
|
|
dipl.ins
|
|
dipl.kielenk
|
|
dipl.kirjeenv
|
|
dipl.kosm
|
|
dipl.urk
|
|
dos
|
|
erikoiseläinl
|
|
erikoishammasl
|
|
erikoisl
|
|
erikoist
|
|
ev.luutn
|
|
evp
|
|
fil
|
|
ft
|
|
hallinton
|
|
hallintot
|
|
hammaslääket
|
|
jatk
|
|
jääk
|
|
kansaned
|
|
kapt
|
|
kapt.luutn
|
|
kenr
|
|
kenr.luutn
|
|
kenr.maj
|
|
kers
|
|
kirjeenv
|
|
kom
|
|
kom.kapt
|
|
komm
|
|
konst
|
|
korpr
|
|
luutn
|
|
maist
|
|
maj
|
|
Mr
|
|
Mrs
|
|
Ms
|
|
M.Sc
|
|
neuv
|
|
nimim
|
|
Ph.D
|
|
prof
|
|
puh.joht
|
|
pääll
|
|
res
|
|
san
|
|
siht
|
|
suom
|
|
sähköp
|
|
säv
|
|
toht
|
|
toim
|
|
toim.apul
|
|
toim.joht
|
|
toim.siht
|
|
tuom
|
|
ups
|
|
vänr
|
|
vääp
|
|
ye.ups
|
|
ylik
|
|
ylil
|
|
ylim
|
|
ylimatr
|
|
yliop
|
|
yliopp
|
|
ylip
|
|
yliv
|
|
|
|
#misc - odd period-ending items that NEVER indicate breaks (p.m. does NOT fall
|
|
#into this category - it sometimes ends a sentence)
|
|
e.g
|
|
ent
|
|
esim
|
|
huom
|
|
i.e
|
|
ilm
|
|
l
|
|
mm
|
|
myöh
|
|
nk
|
|
nyk
|
|
par
|
|
po
|
|
t
|
|
v
|