phikoehn
79a2c98ff7
better ems support for different binarizers and reordering models
2013-08-25 20:30:37 +01:00
Jeremy Gwinnup
a5fb4d1550
Fixed bug in tokenizer.perl where comma separated lists of single
...
characters aren't handled correctly
input> A,B,C,D,E,F
yielded> A, B,C , D,E , F
now yields> A, B, C, D, E, F
Updated Russian nonbreaking prefixes list with capital letters
2013-08-16 14:39:50 -04:00
phikoehn
b368085609
xml constraint
2013-08-15 11:46:45 +01:00
Hieu Hoang
249aefc0c3
Merge github.com:moses-smt/mosesdecoder into hieu_opt_input
2013-08-12 13:04:29 +01:00
Hieu Hoang
02c7af3fb8
Mira changes. Manually applied Eva's patch
2013-08-12 13:03:26 +01:00
Hieu Hoang
ac50d9f349
Merge /home/hieu/workspace/github/mosesdecoder into hieu_opt_input
2013-08-01 16:55:46 +01:00
Hieu Hoang
f234aa203f
number recognizer treats each word as atomic, replace all of the word or nothing at all. Recognizer is designed to be run after the text has been tokenized, not before
2013-08-01 16:55:11 +01:00
Hieu Hoang
bb1f49e10c
Merge /home/hieu/workspace/github/mosesdecoder into hieu_opt_input
2013-07-31 14:29:18 +01:00
Rico Sennrich
d0e2c43011
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-30 17:18:32 +02:00
Rico Sennrich
29cde2a204
allow overriding table filtering in config (required for multimodelcounts)
2013-07-30 16:46:23 +02:00
Hieu Hoang
2792291849
change default for PhrasePenalty to match what is was before
2013-07-30 15:44:30 +01:00
Hieu Hoang
b05a443f36
correct arguments to substitute-filtered-tables-and-weights.perl
2013-07-30 11:14:17 +01:00
Hieu Hoang
66349474d4
correct arguments to substitute-filtered-tables-and-weights.perl
2013-07-30 11:12:49 +01:00
Hieu Hoang
52ff87a6ce
phrase penalty now has it's own ff. No longer in the phrase table
2013-07-29 17:51:20 +01:00
Hieu Hoang
9cdcf713a6
phrase penalty now has it's own ff. No longer in the phrase table
2013-07-29 12:55:44 +01:00
Hieu Hoang
2980dde163
add script to strip xml
2013-07-29 11:44:00 +01:00
Hieu Hoang
4bc7ce99ed
add script to strip xml
2013-07-29 11:27:13 +01:00
Hieu Hoang
ff4c4339bd
if using CreateOnDiskPt to binarize pt, use processLexicalTable to binarize reordering table. HACK
2013-07-28 12:00:35 +01:00
Hieu Hoang
abe90b5af7
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-27 04:19:16 +01:00
Hieu Hoang
9dab7950fa
move closing of filtered file before binarizing. Otherwise file not flushed, causes error in binarizing
2013-07-27 04:18:50 +01:00
Hieu Hoang
4ea975e108
move closing of filtered file before binarizing. Otherwise file not flushed, causes error in binarizing
2013-07-27 03:58:27 +01:00
Hieu Hoang
90651a9b22
move closing of filtered file before binarizing. Otherwise file not flushed, causes error in binarizing
2013-07-27 03:57:05 +01:00
Hieu Hoang
6be88d3f44
update filtering script to correct binarize pb models with CreateOnDiskPt
2013-07-26 15:12:15 +01:00
Hieu Hoang
efd9789577
update filtering script to correct binarize pb models with CreateOnDiskPt
2013-07-26 15:06:07 +01:00
Hieu Hoang
a3e3289b08
In corpus mode, replace number with number symbol
2013-07-25 15:54:47 +01:00
Hieu Hoang
e2c2bc59f1
beautify
2013-07-25 15:23:05 +01:00
Hieu Hoang
78381d0213
@NUM@ --> @num@. In case using recaser
2013-07-25 15:16:15 +01:00
Hieu Hoang
d0172ed5cd
create script to convert phrase-table with alignment in Moses' dead-end format to standard format
2013-07-25 12:56:20 +01:00
Hieu Hoang
018998247a
create script to convert phrase-table with alignment in Moses' dead-end format to standard format
2013-07-25 12:52:05 +01:00
Barry Haddow
f79746b3c2
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-24 20:49:59 +01:00
Hieu Hoang
6fc21a32fc
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-24 19:01:57 +01:00
Hieu Hoang
c104dee3b2
merge glue grammars, rather than writing them all to the same file. Required by Phil Williams & others when doing syntax extraction
2013-07-24 19:01:46 +01:00
Achim Ruopp
1813f9784b
Additional factoring to allow more NE recognizers; bug fixes
2013-07-24 12:44:53 -04:00
Barry Haddow
46ee1ca42d
More lattice fixes squashed by merge
2013-07-24 16:09:32 +01:00
Barry Haddow
0ce50a4c70
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-24 15:58:08 +01:00
Phil Williams
1238041f98
Add option to do Penn Treebank style tokenization
...
tokenizer.perl and detokenizer.perl now have an option called -penn
which does Penn Treebank-like tokenization (English only). This is
useful if your pipeline involves processing the corpus with tools
trained on PTB-tokenized text.
Unlike PTB, the tokenizer splits on slashes (e.g. "Monday/Tuesday"
becomes "Monday", "@/@", "Tuesday"). If using parse-de-berkeley.perl,
the option -split-slash re-joins tokens that are separated by slashes
for parsing then splits them afterwards.
2013-07-24 13:41:21 +01:00
Barry Haddow
d5e40a5b08
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-24 11:38:23 +01:00
Nadir Durrani
30544ae17e
Sample Config File
2013-07-23 12:29:23 +01:00
Nadir Durrani
61e56ecdcd
Sample Config File
2013-07-23 12:18:57 +01:00
Barry Haddow
50de0e06d1
Generate correct ini file for lattices
2013-07-23 11:46:37 +01:00
Barry Haddow
8ed8bcafc2
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-23 11:21:47 +01:00
Barry Haddow
887d5dad62
Restore EMS lattice fixes, squashed by merge.
2013-07-23 10:38:11 +01:00
Phil Williams
91cc7c329e
parse-de-berkeley.perl: escape @ characters in input
2013-07-23 10:22:56 +01:00
Barry Haddow
ecc6c7177c
Reinstate lattice fixes squashed by merge
2013-07-22 17:25:01 +01:00
unknown
54eb50523b
Converted into modulino; added support for French numbers
2013-07-19 14:41:01 -04:00
Hieu Hoang
17366b9d7e
add mada wrapper from Phi's collection
2013-07-13 22:42:28 +01:00
Nadir Durrani
418abf42fa
Merge branch 'nadir_osm'
2013-07-09 11:44:14 +01:00
Hieu Hoang
2203bb3284
beautify
2013-07-08 19:08:31 +01:00
Nadir Durrani
deae3ac7b9
OSM entries in train-model.perl, experiment.*
2013-07-07 13:05:09 +01:00
Nadir Durrani
d2bc6a2584
In EMS
2013-07-04 19:58:19 +01:00
Hieu Hoang
0e46cd377c
Merge branch 'master' into nadir_osm
2013-07-03 20:24:20 +01:00
Nadir Durrani
fbdb07a94c
EMS
2013-07-03 10:54:38 +01:00
Nadir Durrani
82d6105f05
OSM Training Script
2013-07-02 13:59:47 +01:00
Hieu Hoang
4e4cf1e313
script to replace numbers with placeholder. /Achim Ruopp
2013-07-01 23:00:59 +01:00
Wilker Aziz
f3cd72537c
Merge branch 'master' of https://github.com/moses-smt/mosesdecoder
2013-06-24 15:39:18 +01:00
Wilker Aziz
2c19238c24
Patching up the suffix array wrappers
2013-06-24 15:38:10 +01:00
Wilker Aziz
b49e6a162f
Wrapper to lmplz
2013-06-24 12:20:20 +01:00
Hieu Hoang
a85f819a53
superceded
2013-06-24 11:33:11 +01:00
phikoehn
f5b8c47a2e
Merge branch 'master' of ssh://github.com/moses-smt/mosesdecoder
2013-06-23 17:19:37 +01:00
phikoehn
164b06cd7e
debugging
2013-06-23 17:19:22 +01:00
Hieu Hoang
dc33fa3d3d
redo parsing of feature function parameters
2013-06-20 12:50:41 +01:00
Hieu Hoang
029110c245
change table-limit specification to new format
2013-06-14 10:09:06 +01:00
Hieu Hoang
abe6bb7c22
refactor parsing of feature functiona args
2013-06-10 18:11:55 +01:00
phikoehn
54f2ea07bd
handle sparse features in translation table
2013-06-09 20:00:19 +01:00
phikoehn
ce372477c9
conversion script from Moses V1.0 moses.ini files to current format - may need some further tweaking
2013-06-09 14:28:56 +01:00
phikoehn
2e8fbe77a2
corrected example files
2013-06-08 14:45:55 +01:00
phikoehn
730da7edec
sparse feature specification bug fix
2013-06-08 13:39:15 +01:00
Hieu Hoang
a974bbafac
Merge pull request #37 from neubig/ems-interpolate-scientific
...
Fixed crash in interpolation for small lambdas
2013-06-07 16:24:32 -07:00
Rico Sennrich
8581fb9518
fix (minor) unicode warning, and update permissions
2013-06-03 13:48:31 +02:00
Graham Neubig
33d5aac6af
Fixed crash in interpolation for small lambdas
...
The EMS crashed when interpolating language models when the ideal lambdas included numbers so small that they required scientific notation (eg: 1.332e-07). This patch adds "e" and "-" to the acceptable numbers to fix this problem
2013-06-01 12:37:24 +09:00
phikoehn
68501f5a36
bug fix with weight substitution
2013-05-31 12:27:35 +01:00
Hieu Hoang
f83622b0b7
figure out which feature function to apply at which decode step. Book-keeping
2013-05-30 17:16:10 +01:00
Hieu Hoang
8f7c12ef40
beautify
2013-05-29 18:19:06 +01:00
Hieu Hoang
6249432407
beautify
2013-05-29 18:16:15 +01:00
Hieu Hoang
beaf295741
beautify
2013-05-27 17:47:54 +01:00
Hieu Hoang
6996973a56
beautify
2013-05-27 17:42:27 +01:00
phikoehn
8944ea541a
fast align parameter
2013-05-25 23:20:27 +01:00
phikoehn
542cd72c63
moved config creation back into train-model.perl
2013-05-19 03:28:02 +01:00
Hieu Hoang
0596ba4245
carry [weight-file] from tuned ini
2013-05-17 18:23:55 +01:00
Hieu Hoang
11632e298e
add substitute-filtered-tables-and-weights.perl for applying filter for evaluation step
2013-05-17 16:13:24 +01:00
Hieu Hoang
42c292765a
add substitute-filtered-tables-and-weights.perl for applying filter for evaluation step
2013-05-17 13:28:21 +01:00
phikoehn
4cdffc8a89
fixes for sparse feature handling
2013-05-17 08:37:29 +01:00
phikoehn
13991fc88f
added specification to example config files for fast align
2013-05-17 06:42:54 +01:00
Hieu Hoang
35d37a91a1
Don't print 'sparse' for sparse feature functions. All features functions can contains dense and sparse
2013-05-16 23:36:59 +01:00
Barry Haddow
585786d26b
can specify location of create-ini
2013-05-16 19:34:56 +01:00
Hieu Hoang
f96a82d26c
add normalize-punctuation.perl, from WMT
2013-05-16 17:03:37 +01:00
Hieu Hoang
8dd84d7a40
change integration of sparse features with EMS to account for new weights format
2013-05-16 15:38:05 +01:00
Hieu Hoang
7522f3963c
change PhraseDictionaryTreeAdaptor --> PhraseDictionaryBinary
2013-05-15 09:35:28 +01:00
Barry Haddow
a4ce50f2fb
Fix for cygwin
2013-05-14 08:54:29 +01:00
phikoehn
41da5b2760
Merge branch 'master' of git://github.com/moses-smt/mosesdecoder
2013-05-12 08:16:22 +01:00
Hieu Hoang
40bc98df56
filter Memory --> OnDisk and Binary
2013-05-11 21:57:03 +01:00
Hieu Hoang
51fc6fdcb6
filter Memory --> OnDisk and Binary
2013-05-11 21:46:45 +01:00
Hieu Hoang
a8f4e2c8fe
changes for cruise control
2013-05-10 15:43:49 +01:00
Hieu Hoang
e2f2aff94a
merged. Mostly by discarding new changes
2013-05-03 14:36:39 +01:00
Barry Haddow
8a965cd62e
Fixes to binarize-all
2013-05-03 10:15:37 +01:00
Barry Haddow
8993339df4
Make sure tuning uses filtered config when available.
2013-05-02 18:50:21 +01:00
Barry Haddow
cf47ad132c
Ability to specify number of conf net weights
2013-05-02 18:50:03 +01:00
Hieu Hoang
929b153216
merge
2013-05-02 17:59:36 +01:00
Barry Haddow
48fe0610ef
Merge branch 'master' of github.com:moses-smt/mosesdecoder
...
Conflicts:
scripts/training/filter-model-given-input.pl
2013-05-02 17:02:51 +01:00
Barry Haddow
5eebb9538e
Enable skipping of filtering in EMS
...
Use 'binarize-all = path-to-binarize-model.perl
2013-05-02 15:15:52 +01:00