mosesdecoder

mirror of https://github.com/moses-smt/mosesdecoder.git synced 2024-12-27 05:55:02 +03:00

Author	SHA1	Message	Date
Rico Sennrich	4ca730a67c	improve bilingualLM alignment heuristics consistency	2014-11-26 10:32:41 +00:00
Rico Sennrich	ee759bfede	move bilingual-lm training scripts	2014-11-26 10:32:37 +00:00
Tomáš Musil	4cb81e3093	lmtype now preferred as symbolic name	2014-11-24 12:20:36 +01:00
Hieu Hoang	c0be182bfa	makemteval and small change to tokenizer. /Tom Hoar and Tomas Fulajtar	2014-11-21 13:55:13 +00:00
XapaJIaMnu	52c520c042	Resolve merge conflicts	2014-11-20 15:50:32 +00:00
Hieu Hoang	e27f6b0120	Merge branch 'master' of github.com:moses-smt/mosesdecoder	2014-11-15 14:32:49 +00:00
Hieu Hoang	67ad197d5a	take out PYTHONIOENCODING=utf-8. Rely on Rico's python changes	2014-11-15 14:32:31 +00:00
XapaJIaMnu	a343837095	Add option to choose activation function during nplm training	2014-11-15 11:54:47 +00:00
Rico Sennrich	b0b5eef0c6	fix metric interpolation with mert	2014-11-14 14:35:32 +00:00
Hieu Hoang	acd3ac964a	set PYTHONIOENCODING=utf-8 before running merge_alignment.py	2014-11-14 14:34:31 +00:00
Hieu Hoang	1c27e05a06	softlink for moses_chart	2014-11-14 13:56:56 +00:00
XapaJIaMnu	d5567b6cfb	Training: Do the preparation step ourselves. No validation support yet. No decoder support yet.	2014-11-13 16:14:17 +00:00
Rico Sennrich	8fd3be9e4e	add EOS token </s> to each sentence	2014-11-13 16:14:16 +00:00
Rico Sennrich	f26fc251d5	sort vocab by frequency	2014-11-13 16:14:16 +00:00
XapaJIaMnu	bb70f60f67	grrr	2014-11-13 16:14:16 +00:00
XapaJIaMnu	e330ab35d5	Short option must be only one letter	2014-11-13 16:14:16 +00:00
XapaJIaMnu	a74105ea7d	Fix a wrong condition	2014-11-13 16:14:16 +00:00
XapaJIaMnu	e54c171850	Make it optional to prepare the validation set	2014-11-13 16:14:16 +00:00
XapaJIaMnu	a300824bd1	Add optional validation during training	2014-11-13 16:14:16 +00:00
XapaJIaMnu	0451142ece	Add null token normalization for models to be used with the chart decoder.	2014-11-13 16:13:38 +00:00
XapaJIaMnu	aae894fe6b	Add null token in vocabulary during construction	2014-11-13 16:13:38 +00:00
XapaJIaMnu	b4f51c05d1	Add option to reduce the ngrams from already prepared .ngrams file to train a model with smaller number of ngrams	2014-11-13 16:13:38 +00:00
XapaJIaMnu	fbac0ae418	Make sure we always have unk in the vocabulary, otherwise we get off-by-one indexes during decoding	2014-11-13 15:51:48 +00:00
XapaJIaMnu	961578286f	Forgot to close a file...	2014-11-13 15:51:48 +00:00
XapaJIaMnu	1bac666e5f	Fix small oversights	2014-11-13 15:51:48 +00:00
XapaJIaMnu	617ef015df	Extend train_nplm with various options	2014-11-13 15:51:48 +00:00
Nikolay Bogoychev	2b2766cce8	For GPU training one thread is optimal	2014-11-13 15:51:48 +00:00
Abmayne	4af68a0d1a	Barry's training scripts with some minor changes by me	2014-11-13 15:51:48 +00:00
Phil Williams	59a1ce7380	substitute-filtered-tables.perl: check for RuleTable feature	2014-11-06 11:14:51 +00:00
Phil Williams	5240c430ce	Merge s2t branch This adds a new string-to-tree decoder, which can be enabled with the -s2t option. It's intended to be faster and simpler than the generic chart decoder, and is designed to support lattice input (still WIP). For a en-de system trained on WMT14 data, it's approximately 40% faster in practice. For background information, see the decoding section of the EMNLP tutorial on syntax-based MT: http://www.emnlp2014.org/tutorials/5_notes.pdf Some features are not implemented yet, including support for internal tree structure and soft source-syntactic constraints.	2014-11-04 13:13:56 +00:00
mjdenkowski	40e8f2eca0	Hypergraph output	2014-11-03 09:16:12 -05:00
Hieu Hoang	7ca5e4fbc8	blame stats!	2014-10-31 01:07:33 +00:00
Hieu Hoang	834a89d96b	utf8 encoding /Tomas Fulajtar	2014-10-24 07:33:48 -07:00
Rico Sennrich	df74aa3e89	use short names for sparse features to save disk space and I/O when tuning	2014-10-17 10:36:51 +01:00
Hieu Hoang	44ce4b361a	reduce lmplz memory consumption in recaser	2014-10-14 17:52:47 +01:00
Hieu Hoang	fe266260fb	Merge branch 'master' of github.com:moses-smt/mosesdecoder	2014-10-14 16:01:26 +01:00
Hieu Hoang	6c9c3e1741	portable call to bash /Paul Guyot	2014-10-14 16:01:15 +01:00
Philipp Koehn	2638ff0480	added thot to EMS	2014-10-14 10:13:16 -04:00
Phil Williams	07dbd191ed	analysis.perl: update regexp for current trace format	2014-10-13 10:55:07 +01:00
mjdenkowski	a1f561ac31	Only update dynamic models	2014-10-10 15:09:53 -04:00
Philipp Koehn	34cc9461fb	More Penn Tree Bank compliance (code by Maria Nadejde and Philip Williams	2014-10-10 16:51:32 +01:00
Philipp Koehn	1741bba750	Penn Tree Bank compliant versions of preprocessing	2014-10-10 16:49:06 +01:00
Rico Sennrich	f63807f957	more robust regex	2014-09-30 15:43:38 +01:00
Rico Sennrich	84ad576750	explicitly set BLEU as default scorer (for return-best-dev) (evaluator doesn't accept --scconfig without --sctype)	2014-09-24 14:47:58 +01:00
Hieu Hoang	610090c2ed	don't run truecase trainer unless it's asked for	2014-09-23 21:50:53 +01:00
Rico Sennrich	59cd4be2c9	don't use optimizer-specific options in extractor/evaluator	2014-09-22 10:49:20 +01:00
Rico Sennrich	d39cbca0b9	(optionally) use n-best file for evaluator/return-best-dev this adds support for metrics that rely on alignment / trees	2014-09-22 10:49:20 +01:00
Rico Sennrich	3d00e5dc8c	basic support for more metrics with kbmira metrics need getReferenceLength (for background smoothing) to work with kbmira	2014-09-22 10:49:20 +01:00
Philipp Koehn	ab90efe4af	allow specification of default weights	2014-09-22 05:28:57 +01:00
Philipp Koehn	e9db2fe4aa	Merge branch 'master' of git://github.com/moses-smt/mosesdecoder	2014-09-21 06:04:22 +01:00

1 2 3 4 5 ...

1818 Commits