mosesdecoder

mirror of https://github.com/moses-smt/mosesdecoder.git synced 2024-09-11 11:25:40 +03:00

History

HjalmarrSv da3768a296 Update nonbreaking_prefix.sv Added Å Ä Ö, which are not unusual initials in names, e.g. Åke, Ärling, Östen. Added some new, but mostly variations on the existing ones. Both a dot after each letter (or pair) and a dot only after last letter are accepted forms. A couple of decades ago, there had to be a space after the dot, which explains the third form. The file for sv is much more useful with these few additions. Although, It is still far from complete. Removed: G (occured twice). In this list there is one item that is also a word, even when case is kept: tom. If all words are in small case, then tex, mao, tom (again), may be confused with names, and iaf, etc with named entities.		2020-05-23 17:43:33 +02:00
..
nonbreaking_prefix.as	2 letter codes	2019-11-08 15:36:22 +00:00
nonbreaking_prefix.bn	support for several Indic languages	2019-11-08 14:56:58 +00:00
nonbreaking_prefix.ca	dos2unix everything	2015-08-23 19:00:19 +04:00
nonbreaking_prefix.cs	czech prefixes	2012-07-14 03:32:10 +02:00
nonbreaking_prefix.de	lock m_vocab variable access in Encode() and Lookup(). Other functions are still not threadsafe	2012-06-26 13:33:34 -04:00
nonbreaking_prefix.el	Update nonbreaking_prefix.el	2013-10-07 20:51:02 +03:00
nonbreaking_prefix.en	rupees	2019-11-05 16:02:19 +00:00
nonbreaking_prefix.es	lock m_vocab variable access in Encode() and Lookup(). Other functions are still not threadsafe	2012-06-26 13:33:34 -04:00
nonbreaking_prefix.et	support for several Indic languages	2019-11-08 14:56:58 +00:00
nonbreaking_prefix.fi	fix location and remove english notes	2014-09-04 16:01:10 +01:00
nonbreaking_prefix.fr	Single lower-case letter French word	2016-07-31 14:56:37 +02:00
nonbreaking_prefix.ga	Create nonbreaking_prefix.ga	2015-09-23 14:35:18 +01:00
nonbreaking_prefix.gu	support for several Indic languages	2019-11-08 14:56:58 +00:00
nonbreaking_prefix.hi	support for several Indic languages	2019-11-08 14:56:58 +00:00
nonbreaking_prefix.hu	dos2unix everything	2015-08-23 19:00:19 +04:00
nonbreaking_prefix.is	lock m_vocab variable access in Encode() and Lookup(). Other functions are still not threadsafe	2012-06-26 13:33:34 -04:00
nonbreaking_prefix.it	lock m_vocab variable access in Encode() and Lookup(). Other functions are still not threadsafe	2012-06-26 13:33:34 -04:00
nonbreaking_prefix.kn	support for several Indic languages	2019-11-08 14:56:58 +00:00
nonbreaking_prefix.lt	More abbreviations for LLithuanian.	2017-01-04 23:52:28 -06:00
nonbreaking_prefix.lv	Hungarian and Latvian non-breaking prefix files	2013-03-18 17:17:35 -04:00
nonbreaking_prefix.ml	support for several Indic languages	2019-11-08 14:56:58 +00:00
nonbreaking_prefix.mni	support for several Indic languages	2019-11-08 14:56:58 +00:00
nonbreaking_prefix.mr	support for several Indic languages	2019-11-08 14:56:58 +00:00
nonbreaking_prefix.nl	lock m_vocab variable access in Encode() and Lookup(). Other functions are still not threadsafe	2012-06-26 13:33:34 -04:00
nonbreaking_prefix.or	2 letter codes	2019-11-08 15:36:22 +00:00
nonbreaking_prefix.pa	support for several Indic languages	2019-11-08 14:56:58 +00:00
nonbreaking_prefix.pl	chmod	2014-12-11 16:54:19 +00:00
nonbreaking_prefix.pt	lock m_vocab variable access in Encode() and Lookup(). Other functions are still not threadsafe	2012-06-26 13:33:34 -04:00
nonbreaking_prefix.ro	lock m_vocab variable access in Encode() and Lookup(). Other functions are still not threadsafe	2012-06-26 13:33:34 -04:00
nonbreaking_prefix.ru	Fixed bug in tokenizer.perl where comma separated lists of single	2013-08-16 14:39:50 -04:00
nonbreaking_prefix.sk	lock m_vocab variable access in Encode() and Lookup(). Other functions are still not threadsafe	2012-06-26 13:33:34 -04:00
nonbreaking_prefix.sl	dos2unix everything	2015-08-23 19:00:19 +04:00
nonbreaking_prefix.sv	Update nonbreaking_prefix.sv	2020-05-23 17:43:33 +02:00
nonbreaking_prefix.ta	support for several Indic languages	2019-11-08 14:56:58 +00:00
nonbreaking_prefix.te	support for several Indic languages	2019-11-08 14:56:58 +00:00
nonbreaking_prefix.yue	Create a Cantonese version, distinct from Mandarin.	2017-01-05 12:53:21 -06:00
nonbreaking_prefix.zh	Create a Cantonese version, distinct from Mandarin.	2017-01-05 12:53:21 -06:00
README.txt	move notice about czech prefixes to share/README	2014-08-06 15:03:37 +01:00

README.txt

The language suffix can be found here:

http://www.loc.gov/standards/iso639-2/php/code_list.php

This code includes data from Daniel Naber's Language Tools (czech abbreviations).
This code includes data from czech wiktionary (also czech abbreviations).