mosesdecoder/scripts/share/nonbreaking_prefixes
HjalmarrSv da3768a296
Update nonbreaking_prefix.sv
Added Å Ä Ö, which are not unusual initials in names, e.g. Åke, Ärling, Östen.
Added some new, but mostly variations on the existing ones. Both a dot after each letter (or pair) and a dot only after last letter are accepted forms. A couple of decades ago, there had to be a space after the dot, which explains the third form.
The file for sv is much more useful with these few additions. Although, It is still far from complete.
Removed: G (occured twice).
In this list there is one item that is also a word, even when case is kept: tom.
If all words are in small case, then tex, mao, tom (again), may be confused with names, and iaf, etc with named entities.
2020-05-23 17:43:33 +02:00
..
nonbreaking_prefix.as 2 letter codes 2019-11-08 15:36:22 +00:00
nonbreaking_prefix.bn support for several Indic languages 2019-11-08 14:56:58 +00:00
nonbreaking_prefix.ca dos2unix everything 2015-08-23 19:00:19 +04:00
nonbreaking_prefix.cs czech prefixes 2012-07-14 03:32:10 +02:00
nonbreaking_prefix.de lock m_vocab variable access in Encode() and Lookup(). Other functions are still not threadsafe 2012-06-26 13:33:34 -04:00
nonbreaking_prefix.el Update nonbreaking_prefix.el 2013-10-07 20:51:02 +03:00
nonbreaking_prefix.en rupees 2019-11-05 16:02:19 +00:00
nonbreaking_prefix.es lock m_vocab variable access in Encode() and Lookup(). Other functions are still not threadsafe 2012-06-26 13:33:34 -04:00
nonbreaking_prefix.et support for several Indic languages 2019-11-08 14:56:58 +00:00
nonbreaking_prefix.fi fix location and remove english notes 2014-09-04 16:01:10 +01:00
nonbreaking_prefix.fr Single lower-case letter French word 2016-07-31 14:56:37 +02:00
nonbreaking_prefix.ga Create nonbreaking_prefix.ga 2015-09-23 14:35:18 +01:00
nonbreaking_prefix.gu support for several Indic languages 2019-11-08 14:56:58 +00:00
nonbreaking_prefix.hi support for several Indic languages 2019-11-08 14:56:58 +00:00
nonbreaking_prefix.hu dos2unix everything 2015-08-23 19:00:19 +04:00
nonbreaking_prefix.is lock m_vocab variable access in Encode() and Lookup(). Other functions are still not threadsafe 2012-06-26 13:33:34 -04:00
nonbreaking_prefix.it lock m_vocab variable access in Encode() and Lookup(). Other functions are still not threadsafe 2012-06-26 13:33:34 -04:00
nonbreaking_prefix.kn support for several Indic languages 2019-11-08 14:56:58 +00:00
nonbreaking_prefix.lt More abbreviations for LLithuanian. 2017-01-04 23:52:28 -06:00
nonbreaking_prefix.lv Hungarian and Latvian non-breaking prefix files 2013-03-18 17:17:35 -04:00
nonbreaking_prefix.ml support for several Indic languages 2019-11-08 14:56:58 +00:00
nonbreaking_prefix.mni support for several Indic languages 2019-11-08 14:56:58 +00:00
nonbreaking_prefix.mr support for several Indic languages 2019-11-08 14:56:58 +00:00
nonbreaking_prefix.nl lock m_vocab variable access in Encode() and Lookup(). Other functions are still not threadsafe 2012-06-26 13:33:34 -04:00
nonbreaking_prefix.or 2 letter codes 2019-11-08 15:36:22 +00:00
nonbreaking_prefix.pa support for several Indic languages 2019-11-08 14:56:58 +00:00
nonbreaking_prefix.pl chmod 2014-12-11 16:54:19 +00:00
nonbreaking_prefix.pt lock m_vocab variable access in Encode() and Lookup(). Other functions are still not threadsafe 2012-06-26 13:33:34 -04:00
nonbreaking_prefix.ro lock m_vocab variable access in Encode() and Lookup(). Other functions are still not threadsafe 2012-06-26 13:33:34 -04:00
nonbreaking_prefix.ru Fixed bug in tokenizer.perl where comma separated lists of single 2013-08-16 14:39:50 -04:00
nonbreaking_prefix.sk lock m_vocab variable access in Encode() and Lookup(). Other functions are still not threadsafe 2012-06-26 13:33:34 -04:00
nonbreaking_prefix.sl dos2unix everything 2015-08-23 19:00:19 +04:00
nonbreaking_prefix.sv Update nonbreaking_prefix.sv 2020-05-23 17:43:33 +02:00
nonbreaking_prefix.ta support for several Indic languages 2019-11-08 14:56:58 +00:00
nonbreaking_prefix.te support for several Indic languages 2019-11-08 14:56:58 +00:00
nonbreaking_prefix.yue Create a Cantonese version, distinct from Mandarin. 2017-01-05 12:53:21 -06:00
nonbreaking_prefix.zh Create a Cantonese version, distinct from Mandarin. 2017-01-05 12:53:21 -06:00
README.txt move notice about czech prefixes to share/README 2014-08-06 15:03:37 +01:00

The language suffix can be found here:

http://www.loc.gov/standards/iso639-2/php/code_list.php

This code includes data from Daniel Naber's Language Tools (czech abbreviations).
This code includes data from czech wiktionary (also czech abbreviations).