Commit Graph

2110 Commits

Author SHA1 Message Date
heafield
cb848f41b3 Fix corner case in trie builder context merging
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3890 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-21 17:15:24 +00:00
hieuhoang1972
948d916ca0 xcode
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3889 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-20 07:28:01 +00:00
bhaddow
c86e6b38b3 add new nonbreaking prefixes
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3884 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-17 21:51:17 +00:00
bhaddow
6b8415bffb Write alignment info through OutputCollector so it gets ordered
correctly when run with multiple threads.


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3882 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-16 16:50:55 +00:00
phkoehn
df901e7ce6 added files from Tom Hoar
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3881 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-16 10:44:26 +00:00
bojar
76174ccd4b mark web/bin/detokenizer.perl as outdated
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3880 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-14 13:35:04 +00:00
bojar
26ccace946 Czech detokenization
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3879 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-14 13:32:41 +00:00
maurocettolo
4c6dfbddc3 minor changes to make Moses compliant with IRSTLM toolkit (release 5.60.01)
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3878 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-11 11:32:35 +00:00
heafield
fb02a67afb Fix segfaults (or at least one of them)
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3877 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-11 01:51:30 +00:00
ales-t
e922c159b6 Alignment points are also created for unknown source words.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3876 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-08 18:04:09 +00:00
ales-t
83e2406f42 Word alignment output also works with MBR decoding.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3875 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-08 17:15:50 +00:00
bhaddow
47df5fd51c Add triples with default values if insufficient number are supplied. Note
that min and max are no longer used, and should be removed at some point.


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3874 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-08 16:26:17 +00:00
pjwilliams
d4359f9875 If Boost is available, use per-sentence object pools to allocate ProcessedRule
and WordConsumed objects (which are used to store rule table lookup state).
Large numbers of these objects are used during decoding and this can
significantly improve performance, especially for multithreaded decoding,
though at the cost of increased total memory use.

The ./configure option --disable-boost-pool can be used to disable this
feature if memory is tight.  This currently only affects moses_chart with
in-memory rule tables.



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3873 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-07 15:43:19 +00:00
bhaddow
6221d2a558 Patch to add covered to osgx from Dennis Mehay
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3872 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-04 16:21:57 +00:00
heafield
fccfd85c6e Option for null context in n-gram query, use tab for delimiter
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3871 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-04 15:38:47 +00:00
maurocettolo
8fcd76f2fc made handling of chunk LM compatible with recent efficiency updates to IRSTLM toolkit by Nicola
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3870 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-04 07:42:56 +00:00
phkoehn
4e72cd91be added decoding-graph-backoff, still experimenting with it
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3869 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-03 13:41:44 +00:00
bhaddow
7651f3bb42 Fix command line parsing, update for new ttable format
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3868 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-03 09:55:08 +00:00
bojar
0bc0ece594 Ales Tamchyna's printing of alignments (-print-alignment-info did nothing)
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3867 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-03 09:08:42 +00:00
bojar
72945c543e a script to convert AT&T FSA to 'python lattice format' that Moses reads
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3866 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-03 08:32:44 +00:00
pjwilliams
93b0a15a2d Add --with-tcmalloc option to ./configure to enable linking against Google's
TCMalloc library.  Currently assumes tcmalloc is installed in a standard
system directory.



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3865 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-02 20:06:52 +00:00
hieuhoang1972
0eed5716b7 get rid of linked trans opt
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3864 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-02 11:24:19 +00:00
hieuhoang1972
e087e78df9 get rid of linked trans opt
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3863 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-02 11:06:19 +00:00
hieuhoang1972
840f3915ce add xml markup back to regression testing
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3862 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-01 18:30:01 +00:00
hieuhoang1972
db404d0fc0 pass regression. Not sure why is passed before, wasn't beam threshold implemented?
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3861 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-01 18:17:44 +00:00
hieuhoang1972
c6e0391b21 spans must be in consistent format start-end, not start,end
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3860 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-01 15:17:13 +00:00
heafield
66a76ac134 kenlm:
Fix can't find lm/model.hh from ./configure introduced in 3849
Remove some cruft from read_arpa
Avoid some error messages inside progress bars
FilePiece correctness (did not impact existing code)



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3859 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-28 19:44:48 +00:00
hieuhoang1972
abacb9166a xcode
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3857 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-28 14:57:55 +00:00
heafield
87f15593da Remove vestigial len parameter from language model calls
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3856 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-27 19:01:45 +00:00
pjwilliams
967b7be213 Support for multithreading in moses_chart (-threads option). This hasn't
been thoroughly tested yet, so don't be surprised if it breaks.  Verbose
output will be scrambled.



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3853 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-26 13:15:13 +00:00
heafield
fa04e673bf Minor fixes: unused parameter, factor optional components into a central header.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3849 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-26 01:19:11 +00:00
redpony
eddb28e0ce facilitate programmatic creation of word lattices
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3848 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-25 20:08:29 +00:00
heafield
22ce1d2f19 kenlm update
- Fix case where "foo bar baz" appears but "bar baz" does not.  Previously probing silently returned the wrong answer and trie silently broke.  
- More aggressive recombination: if "baz quux" is never followed by any word, then do not include "bar" in the state.  
- kenlm assumes that "foo bar" is present if "foo bar baz" is.  This is now checked.  
- Binary format version number bump because the format has changed to support the above.  
- Lower memory consumption trie building.  But it will take longer for to ensure correct handling of blanks and aggressive recombination.  
- Fix progress bar newlines on trie building.

Agrees with SRI's 1-best outputs on the WMT 10 evaluation set.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3847 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-25 19:11:48 +00:00
pjwilliams
8051c5ad35 Use TranslationTask objects to perform sentence decoding in moses-chart.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3846 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-25 17:15:30 +00:00
rsennrich
ec00f9a916 fix to MERT: disable normaliziation when optimizing subset of features.
before, active features were normalized to 1; optimizing one feature would always set it to 1, preventing any real optimization.

git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3845 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-25 16:10:47 +00:00
pjwilliams
99bbfe938b Use OutputCollector to write moses-chart output.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3843 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-25 15:17:17 +00:00
pjwilliams
67b30ea0c7 Move sentence-specific rule lookup state out of PhraseDictionarySCFG and
PhraseDictionaryOnDisk and into ChartRuleLookupManager.



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3842 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-24 19:14:19 +00:00
bhaddow
7b6503680a Shortcut when trans opts cache is size 0. Avoids potential uninitialised read.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3825 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-21 21:25:14 +00:00
hieuhoang1972
96bd3a164d vs.net
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3802 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-17 16:19:33 +00:00
hieuhoang1972
fe0d53b73f vs.net
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3801 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-17 15:48:46 +00:00
bhaddow
a2bde7a16e Make sure internal libraries and paths go before boost.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3800 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-15 22:41:24 +00:00
bhaddow
a9cd71628a Change of boost macros - please make sure you favourite configuration still works
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3799 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-13 23:38:48 +00:00
suzyh
994ccb12c2 Fix to EMS call of multi-bleu evaluation script
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3798 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-13 04:53:09 +00:00
pjwilliams
d20667a46d Faster lookup for rules with source and/or target syntax labels (in-memory rule
table only).


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3797 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-13 00:25:10 +00:00
heafield
a596b48971 Fix --enable-shared compilation.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3796 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-11 19:32:59 +00:00
rafpayen
98495fc02c give absolute path for glue-grammar instead of ./model
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3793 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-07 13:16:15 +00:00
nicolabertoldi
dbad1bb7aa now mert-moses.pl correctly call Moses for generating nbest
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3782 1f5c12ca-751b-0410-a591-d2e778427230
2010-12-15 14:49:34 +00:00
nicolabertoldi
ab2185c4a5 more robust behavior of qsub-wrapper.pl
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3781 1f5c12ca-751b-0410-a591-d2e778427230
2010-12-15 14:47:51 +00:00
pjwilliams
3dec57a518 When scoring phrase pairs, store copies of the active pairs' PHRASE objects
instead of inserting them into a PhraseTable.  In a test on a 21GB
target-syntax extract file, this reduced user time from 195 to 120 mins.


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3777 1f5c12ca-751b-0410-a591-d2e778427230
2010-12-14 23:49:57 +00:00
pjwilliams
627d8edf8e Fix bug affecting Good-Turing discounting: repeated phrase pairs were always
contributing a count of 1 because PhraseAlignment::addToCount() was looking
for counts in the fifth column, not the fourth.


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3775 1f5c12ca-751b-0410-a591-d2e778427230
2010-12-14 16:31:53 +00:00