Hieu Hoang
a3e3289b08
In corpus mode, replace number with number symbol
2013-07-25 15:54:47 +01:00
Hieu Hoang
76a9730ca8
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-25 15:23:12 +01:00
Barry Haddow
7081f06413
Fixes to the shared build
2013-07-25 15:24:34 +01:00
Hieu Hoang
e2c2bc59f1
beautify
2013-07-25 15:23:05 +01:00
Hieu Hoang
78381d0213
@NUM@ --> @num@. In case using recaser
2013-07-25 15:16:15 +01:00
Phil Williams
f0b603e6b5
extract-ghkm: write glue grammars for all sentence offsets
...
extract-parallel now merges separate glue grammars, so remove
previous workaround.
2013-07-25 13:53:32 +01:00
Hieu Hoang
d0172ed5cd
create script to convert phrase-table with alignment in Moses' dead-end format to standard format
2013-07-25 12:56:20 +01:00
Hieu Hoang
018998247a
create script to convert phrase-table with alignment in Moses' dead-end format to standard format
2013-07-25 12:52:05 +01:00
Hieu Hoang
c0aba71c79
bug processing unknown word with digits
2013-07-25 08:41:59 +01:00
Barry Haddow
f79746b3c2
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-24 20:49:59 +01:00
Hieu Hoang
6fc21a32fc
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-24 19:01:57 +01:00
Hieu Hoang
c104dee3b2
merge glue grammars, rather than writing them all to the same file. Required by Phil Williams & others when doing syntax extraction
2013-07-24 19:01:46 +01:00
Achim Ruopp
1813f9784b
Additional factoring to allow more NE recognizers; bug fixes
2013-07-24 12:44:53 -04:00
Barry Haddow
46ee1ca42d
More lattice fixes squashed by merge
2013-07-24 16:09:32 +01:00
Barry Haddow
0ce50a4c70
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-24 15:58:08 +01:00
Phil Williams
1238041f98
Add option to do Penn Treebank style tokenization
...
tokenizer.perl and detokenizer.perl now have an option called -penn
which does Penn Treebank-like tokenization (English only). This is
useful if your pipeline involves processing the corpus with tools
trained on PTB-tokenized text.
Unlike PTB, the tokenizer splits on slashes (e.g. "Monday/Tuesday"
becomes "Monday", "@/@", "Tuesday"). If using parse-de-berkeley.perl,
the option -split-slash re-joins tokens that are separated by slashes
for parsing then splits them afterwards.
2013-07-24 13:41:21 +01:00
Kenneth Heafield
71ae8c9d19
LM/Factory.cpp -> FF/Factory.cpp oops
2013-07-24 12:13:11 +01:00
Ian Johnson
68779c66b9
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-24 11:52:21 +01:00
Ian Johnson
08f64dea28
Arrow pipeline submodules now use https protocol.
2013-07-24 11:52:14 +01:00
Barry Haddow
d5e40a5b08
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-24 11:38:23 +01:00
Phil Williams
b5584fdecf
extract-ghkm: workaround for extract-parallel issue
...
Don't write glue grammar or unknown word label files unless the sentence
offset is 0. This prevents multiple instances of extract-ghkm writing
to the same two files when extract-parallel is used.
TODO Better solutions might be:
1. modify extract-parallel so that it only configures one instance of
extract-ghkm to write the glue / unknown-lhs files (like the current
workaround, this assumes file chunks are representative of the whole)
2. add multithreading support directly to extract-ghkm
3. write distinct output files for each extract-ghkm instance and
combine them on completion
2013-07-23 14:55:16 +01:00
Hieu Hoang
e6a3df7e97
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-23 13:12:30 +01:00
Hieu Hoang
206b165d14
randlm compile with refactored code. No regression tests yet
2013-07-23 12:56:35 +01:00
Hieu Hoang
9b9e8cc759
eclipse file with randlm
2013-07-23 12:41:02 +01:00
Nadir Durrani
30544ae17e
Sample Config File
2013-07-23 12:29:23 +01:00
Nadir Durrani
61e56ecdcd
Sample Config File
2013-07-23 12:18:57 +01:00
Barry Haddow
50de0e06d1
Generate correct ini file for lattices
2013-07-23 11:46:37 +01:00
Barry Haddow
8ed8bcafc2
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-23 11:21:47 +01:00
Barry Haddow
887d5dad62
Restore EMS lattice fixes, squashed by merge.
2013-07-23 10:38:11 +01:00
Phil Williams
91cc7c329e
parse-de-berkeley.perl: escape @ characters in input
2013-07-23 10:22:56 +01:00
Hieu Hoang
1e906bea73
add ControlRecombination feature function
2013-07-23 01:38:08 +01:00
Hieu Hoang
42c1c908a5
add ControlRecombination feature function
2013-07-23 01:32:25 +01:00
Barry Haddow
ecc6c7177c
Reinstate lattice fixes squashed by merge
2013-07-22 17:25:01 +01:00
Hieu Hoang
2590601708
add ControlRecombination feature function
2013-07-20 23:41:49 +01:00
Hieu Hoang
a098227abe
add ControlRecombination feature function
2013-07-20 23:10:50 +01:00
Hieu Hoang
96da822861
Don't deprecate lmodel-oov-feature
2013-07-20 17:20:12 +01:00
Hieu Hoang
b6f8e3c383
Don't mix old and new ini file format
2013-07-20 17:08:03 +01:00
Hieu Hoang
5b7a9af588
refactor RandLM. Compiles with eclipse but not with bjam
2013-07-20 00:19:04 +01:00
Hieu Hoang
d4e641de80
eclipse
2013-07-19 23:19:17 +01:00
Hieu Hoang
11666a8359
RandLM is currently broken
2013-07-19 22:39:20 +01:00
Achim Ruopp
3a668aaccf
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-19 15:56:04 -04:00
unknown
54eb50523b
Converted into modulino; added support for French numbers
2013-07-19 14:41:01 -04:00
Hieu Hoang
4a4b1a168d
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-19 18:52:54 +01:00
Kenneth Heafield
2f6e669fb9
Merge branch 'master' of github.com:moses-smt/mosesdecoder
2013-07-19 18:50:29 +01:00
Kenneth Heafield
e1a2b2f0c9
Reduce scope of lm dependency
2013-07-19 18:50:12 +01:00
Hieu Hoang
c77ec1b904
beautfiy
2013-07-19 13:56:02 +01:00
Hieu Hoang
a95127b972
add default weights for feature functions that aren't tuneable, eg. OOV feature
2013-07-19 13:24:05 +01:00
Hieu Hoang
8a28178339
add default weights for feature functions that aren't tuneable, eg. OOV feature
2013-07-19 11:35:50 +01:00
Hieu Hoang
24a9a7949e
eclipse
2013-07-19 09:37:33 +01:00
Kenneth Heafield
b5e6b9c959
Factory
2013-07-18 22:54:52 +01:00