Commit Graph

1553 Commits

Author SHA1 Message Date
phikoehn
79a2c98ff7 better ems support for different binarizers and reordering models 2013-08-25 20:30:37 +01:00
Jeremy Gwinnup
a5fb4d1550 Fixed bug in tokenizer.perl where comma separated lists of single
characters aren't handled correctly

input> A,B,C,D,E,F

yielded> A, B,C , D,E , F

now yields> A, B, C, D, E, F

Updated Russian nonbreaking prefixes list with capital letters
2013-08-16 14:39:50 -04:00
phikoehn
b368085609 xml constraint 2013-08-15 11:46:45 +01:00
Hieu Hoang
249aefc0c3 Merge github.com:moses-smt/mosesdecoder into hieu_opt_input 2013-08-12 13:04:29 +01:00
Hieu Hoang
02c7af3fb8 Mira changes. Manually applied Eva's patch 2013-08-12 13:03:26 +01:00
Hieu Hoang
ac50d9f349 Merge /home/hieu/workspace/github/mosesdecoder into hieu_opt_input 2013-08-01 16:55:46 +01:00
Hieu Hoang
f234aa203f number recognizer treats each word as atomic, replace all of the word or nothing at all. Recognizer is designed to be run after the text has been tokenized, not before 2013-08-01 16:55:11 +01:00
Hieu Hoang
bb1f49e10c Merge /home/hieu/workspace/github/mosesdecoder into hieu_opt_input 2013-07-31 14:29:18 +01:00
Rico Sennrich
d0e2c43011 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-30 17:18:32 +02:00
Rico Sennrich
29cde2a204 allow overriding table filtering in config (required for multimodelcounts) 2013-07-30 16:46:23 +02:00
Hieu Hoang
2792291849 change default for PhrasePenalty to match what is was before 2013-07-30 15:44:30 +01:00
Hieu Hoang
b05a443f36 correct arguments to substitute-filtered-tables-and-weights.perl 2013-07-30 11:14:17 +01:00
Hieu Hoang
66349474d4 correct arguments to substitute-filtered-tables-and-weights.perl 2013-07-30 11:12:49 +01:00
Hieu Hoang
52ff87a6ce phrase penalty now has it's own ff. No longer in the phrase table 2013-07-29 17:51:20 +01:00
Hieu Hoang
9cdcf713a6 phrase penalty now has it's own ff. No longer in the phrase table 2013-07-29 12:55:44 +01:00
Hieu Hoang
2980dde163 add script to strip xml 2013-07-29 11:44:00 +01:00
Hieu Hoang
4bc7ce99ed add script to strip xml 2013-07-29 11:27:13 +01:00
Hieu Hoang
ff4c4339bd if using CreateOnDiskPt to binarize pt, use processLexicalTable to binarize reordering table. HACK 2013-07-28 12:00:35 +01:00
Hieu Hoang
abe90b5af7 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-27 04:19:16 +01:00
Hieu Hoang
9dab7950fa move closing of filtered file before binarizing. Otherwise file not flushed, causes error in binarizing 2013-07-27 04:18:50 +01:00
Hieu Hoang
4ea975e108 move closing of filtered file before binarizing. Otherwise file not flushed, causes error in binarizing 2013-07-27 03:58:27 +01:00
Hieu Hoang
90651a9b22 move closing of filtered file before binarizing. Otherwise file not flushed, causes error in binarizing 2013-07-27 03:57:05 +01:00
Hieu Hoang
6be88d3f44 update filtering script to correct binarize pb models with CreateOnDiskPt 2013-07-26 15:12:15 +01:00
Hieu Hoang
efd9789577 update filtering script to correct binarize pb models with CreateOnDiskPt 2013-07-26 15:06:07 +01:00
Hieu Hoang
a3e3289b08 In corpus mode, replace number with number symbol 2013-07-25 15:54:47 +01:00
Hieu Hoang
e2c2bc59f1 beautify 2013-07-25 15:23:05 +01:00
Hieu Hoang
78381d0213 @NUM@ --> @num@. In case using recaser 2013-07-25 15:16:15 +01:00
Hieu Hoang
d0172ed5cd create script to convert phrase-table with alignment in Moses' dead-end format to standard format 2013-07-25 12:56:20 +01:00
Hieu Hoang
018998247a create script to convert phrase-table with alignment in Moses' dead-end format to standard format 2013-07-25 12:52:05 +01:00
Barry Haddow
f79746b3c2 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-24 20:49:59 +01:00
Hieu Hoang
6fc21a32fc Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-24 19:01:57 +01:00
Hieu Hoang
c104dee3b2 merge glue grammars, rather than writing them all to the same file. Required by Phil Williams & others when doing syntax extraction 2013-07-24 19:01:46 +01:00
Achim Ruopp
1813f9784b Additional factoring to allow more NE recognizers; bug fixes 2013-07-24 12:44:53 -04:00
Barry Haddow
46ee1ca42d More lattice fixes squashed by merge 2013-07-24 16:09:32 +01:00
Barry Haddow
0ce50a4c70 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-24 15:58:08 +01:00
Phil Williams
1238041f98 Add option to do Penn Treebank style tokenization
tokenizer.perl and detokenizer.perl now have an option called -penn
which does Penn Treebank-like tokenization (English only).  This is
useful if your pipeline involves processing the corpus with tools
trained on PTB-tokenized text.

Unlike PTB, the tokenizer splits on slashes (e.g. "Monday/Tuesday"
becomes "Monday", "@/@", "Tuesday").  If using parse-de-berkeley.perl,
the option -split-slash re-joins tokens that are separated by slashes
for parsing then splits them afterwards.
2013-07-24 13:41:21 +01:00
Barry Haddow
d5e40a5b08 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-24 11:38:23 +01:00
Nadir Durrani
30544ae17e Sample Config File 2013-07-23 12:29:23 +01:00
Nadir Durrani
61e56ecdcd Sample Config File 2013-07-23 12:18:57 +01:00
Barry Haddow
50de0e06d1 Generate correct ini file for lattices 2013-07-23 11:46:37 +01:00
Barry Haddow
8ed8bcafc2 Merge branch 'master' of github.com:moses-smt/mosesdecoder 2013-07-23 11:21:47 +01:00
Barry Haddow
887d5dad62 Restore EMS lattice fixes, squashed by merge. 2013-07-23 10:38:11 +01:00
Phil Williams
91cc7c329e parse-de-berkeley.perl: escape @ characters in input 2013-07-23 10:22:56 +01:00
Barry Haddow
ecc6c7177c Reinstate lattice fixes squashed by merge 2013-07-22 17:25:01 +01:00
unknown
54eb50523b Converted into modulino; added support for French numbers 2013-07-19 14:41:01 -04:00
Hieu Hoang
17366b9d7e add mada wrapper from Phi's collection 2013-07-13 22:42:28 +01:00
Nadir Durrani
418abf42fa Merge branch 'nadir_osm' 2013-07-09 11:44:14 +01:00
Hieu Hoang
2203bb3284 beautify 2013-07-08 19:08:31 +01:00
Nadir Durrani
deae3ac7b9 OSM entries in train-model.perl, experiment.* 2013-07-07 13:05:09 +01:00
Nadir Durrani
d2bc6a2584 In EMS 2013-07-04 19:58:19 +01:00
Hieu Hoang
0e46cd377c Merge branch 'master' into nadir_osm 2013-07-03 20:24:20 +01:00
Nadir Durrani
fbdb07a94c EMS 2013-07-03 10:54:38 +01:00
Nadir Durrani
82d6105f05 OSM Training Script 2013-07-02 13:59:47 +01:00
Hieu Hoang
4e4cf1e313 script to replace numbers with placeholder. /Achim Ruopp 2013-07-01 23:00:59 +01:00
Wilker Aziz
f3cd72537c Merge branch 'master' of https://github.com/moses-smt/mosesdecoder 2013-06-24 15:39:18 +01:00
Wilker Aziz
2c19238c24 Patching up the suffix array wrappers 2013-06-24 15:38:10 +01:00
Wilker Aziz
b49e6a162f Wrapper to lmplz 2013-06-24 12:20:20 +01:00
Hieu Hoang
a85f819a53 superceded 2013-06-24 11:33:11 +01:00
phikoehn
f5b8c47a2e Merge branch 'master' of ssh://github.com/moses-smt/mosesdecoder 2013-06-23 17:19:37 +01:00
phikoehn
164b06cd7e debugging 2013-06-23 17:19:22 +01:00
Hieu Hoang
dc33fa3d3d redo parsing of feature function parameters 2013-06-20 12:50:41 +01:00
Hieu Hoang
029110c245 change table-limit specification to new format 2013-06-14 10:09:06 +01:00
Hieu Hoang
abe6bb7c22 refactor parsing of feature functiona args 2013-06-10 18:11:55 +01:00
phikoehn
54f2ea07bd handle sparse features in translation table 2013-06-09 20:00:19 +01:00
phikoehn
ce372477c9 conversion script from Moses V1.0 moses.ini files to current format - may need some further tweaking 2013-06-09 14:28:56 +01:00
phikoehn
2e8fbe77a2 corrected example files 2013-06-08 14:45:55 +01:00
phikoehn
730da7edec sparse feature specification bug fix 2013-06-08 13:39:15 +01:00
Hieu Hoang
a974bbafac Merge pull request #37 from neubig/ems-interpolate-scientific
Fixed crash in interpolation for small lambdas
2013-06-07 16:24:32 -07:00
Rico Sennrich
8581fb9518 fix (minor) unicode warning, and update permissions 2013-06-03 13:48:31 +02:00
Graham Neubig
33d5aac6af Fixed crash in interpolation for small lambdas
The EMS crashed when interpolating language models when the ideal lambdas included numbers so small that they required scientific notation (eg: 1.332e-07). This patch adds "e" and "-" to the acceptable numbers to fix this problem
2013-06-01 12:37:24 +09:00
phikoehn
68501f5a36 bug fix with weight substitution 2013-05-31 12:27:35 +01:00
Hieu Hoang
f83622b0b7 figure out which feature function to apply at which decode step. Book-keeping 2013-05-30 17:16:10 +01:00
Hieu Hoang
8f7c12ef40 beautify 2013-05-29 18:19:06 +01:00
Hieu Hoang
6249432407 beautify 2013-05-29 18:16:15 +01:00
Hieu Hoang
beaf295741 beautify 2013-05-27 17:47:54 +01:00
Hieu Hoang
6996973a56 beautify 2013-05-27 17:42:27 +01:00
phikoehn
8944ea541a fast align parameter 2013-05-25 23:20:27 +01:00
phikoehn
542cd72c63 moved config creation back into train-model.perl 2013-05-19 03:28:02 +01:00
Hieu Hoang
0596ba4245 carry [weight-file] from tuned ini 2013-05-17 18:23:55 +01:00
Hieu Hoang
11632e298e add substitute-filtered-tables-and-weights.perl for applying filter for evaluation step 2013-05-17 16:13:24 +01:00
Hieu Hoang
42c292765a add substitute-filtered-tables-and-weights.perl for applying filter for evaluation step 2013-05-17 13:28:21 +01:00
phikoehn
4cdffc8a89 fixes for sparse feature handling 2013-05-17 08:37:29 +01:00
phikoehn
13991fc88f added specification to example config files for fast align 2013-05-17 06:42:54 +01:00
Hieu Hoang
35d37a91a1 Don't print 'sparse' for sparse feature functions. All features functions can contains dense and sparse 2013-05-16 23:36:59 +01:00
Barry Haddow
585786d26b can specify location of create-ini 2013-05-16 19:34:56 +01:00
Hieu Hoang
f96a82d26c add normalize-punctuation.perl, from WMT 2013-05-16 17:03:37 +01:00
Hieu Hoang
8dd84d7a40 change integration of sparse features with EMS to account for new weights format 2013-05-16 15:38:05 +01:00
Hieu Hoang
7522f3963c change PhraseDictionaryTreeAdaptor --> PhraseDictionaryBinary 2013-05-15 09:35:28 +01:00
Barry Haddow
a4ce50f2fb Fix for cygwin 2013-05-14 08:54:29 +01:00
phikoehn
41da5b2760 Merge branch 'master' of git://github.com/moses-smt/mosesdecoder 2013-05-12 08:16:22 +01:00
Hieu Hoang
40bc98df56 filter Memory --> OnDisk and Binary 2013-05-11 21:57:03 +01:00
Hieu Hoang
51fc6fdcb6 filter Memory --> OnDisk and Binary 2013-05-11 21:46:45 +01:00
Hieu Hoang
a8f4e2c8fe changes for cruise control 2013-05-10 15:43:49 +01:00
Hieu Hoang
e2f2aff94a merged. Mostly by discarding new changes 2013-05-03 14:36:39 +01:00
Barry Haddow
8a965cd62e Fixes to binarize-all 2013-05-03 10:15:37 +01:00
Barry Haddow
8993339df4 Make sure tuning uses filtered config when available. 2013-05-02 18:50:21 +01:00
Barry Haddow
cf47ad132c Ability to specify number of conf net weights 2013-05-02 18:50:03 +01:00
Hieu Hoang
929b153216 merge 2013-05-02 17:59:36 +01:00
Barry Haddow
48fe0610ef Merge branch 'master' of github.com:moses-smt/mosesdecoder
Conflicts:
	scripts/training/filter-model-given-input.pl
2013-05-02 17:02:51 +01:00
Barry Haddow
5eebb9538e Enable skipping of filtering in EMS
Use 'binarize-all = path-to-binarize-model.perl
2013-05-02 15:15:52 +01:00