Commit Graph

56 Commits

Author SHA1 Message Date
heafield
3274f72bfb More documentation
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3951 1f5c12ca-751b-0410-a591-d2e778427230
2011-04-19 15:17:01 +00:00
heafield
a3385c3905 Chris Dyer wanted #include <cstddef> for newer g++
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3945 1f5c12ca-751b-0410-a591-d2e778427230
2011-04-06 21:14:24 +00:00
heafield
a7584f57d5 Don't malloc non-PODs without calling their constructor. Also, exception safety.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3942 1f5c12ca-751b-0410-a591-d2e778427230
2011-04-02 01:57:22 +00:00
hieuhoang1972
0fdde952bc compiles with clang++
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3940 1f5c12ca-751b-0410-a591-d2e778427230
2011-04-01 23:31:11 +00:00
hieuhoang1972
adc2ac2c6a xcode
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3938 1f5c12ca-751b-0410-a591-d2e778427230
2011-03-30 20:31:09 +00:00
heafield
02c767a16f Hieu's opinion is to keep the standalone test shell scripts.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3937 1f5c12ca-751b-0410-a591-d2e778427230
2011-03-29 15:07:57 +00:00
heafield
5059b5ab01 Fix compiler warning
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3935 1f5c12ca-751b-0410-a591-d2e778427230
2011-03-22 02:05:38 +00:00
heafield
17ad255f70 Fix handling of last vocab word when <unk> is missing and a trie is being built.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3934 1f5c12ca-751b-0410-a591-d2e778427230
2011-03-21 19:51:08 +00:00
heafield
4674d9bcfd Change default <unk> to -100.0. Fix some exception printing.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3933 1f5c12ca-751b-0410-a591-d2e778427230
2011-03-21 14:40:21 +00:00
heafield
98d4d36a49 Fix compiler warning
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3915 1f5c12ca-751b-0410-a591-d2e778427230
2011-03-07 22:51:25 +00:00
heafield
b2b2688a74 Make kenlm compile with icc by changing exception handling
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3909 1f5c12ca-751b-0410-a591-d2e778427230
2011-03-01 17:29:37 +00:00
heafield
7b5d0234c6 More error handling:
<s> and </s> throw up is optional, but default.  
If a binary file makes it to the ARPA parser (somebody gzipped a binary file or passed it build binary), the message is more informative.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3905 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-24 19:37:39 +00:00
heafield
6f63bb4161 Prevent people from loading partially built binary files. Partially build files made with the old build_binary will still load, but any partial files with the new build_binary (this revision) will throw an error on load.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3904 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-24 17:11:53 +00:00
heafield
5f0eacce4b Apparently some systems (including those at IRST) don't print exceptions that work their way up
to main.  Do this.  


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3895 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-23 17:59:56 +00:00
heafield
cb848f41b3 Fix corner case in trie builder context merging
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3890 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-21 17:15:24 +00:00
heafield
fb02a67afb Fix segfaults (or at least one of them)
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3877 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-11 01:51:30 +00:00
heafield
fccfd85c6e Option for null context in n-gram query, use tab for delimiter
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3871 1f5c12ca-751b-0410-a591-d2e778427230
2011-02-04 15:38:47 +00:00
heafield
66a76ac134 kenlm:
Fix can't find lm/model.hh from ./configure introduced in 3849
Remove some cruft from read_arpa
Avoid some error messages inside progress bars
FilePiece correctness (did not impact existing code)



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3859 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-28 19:44:48 +00:00
hieuhoang1972
abacb9166a xcode
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3857 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-28 14:57:55 +00:00
heafield
fa04e673bf Minor fixes: unused parameter, factor optional components into a central header.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3849 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-26 01:19:11 +00:00
redpony
eddb28e0ce facilitate programmatic creation of word lattices
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3848 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-25 20:08:29 +00:00
heafield
22ce1d2f19 kenlm update
- Fix case where "foo bar baz" appears but "bar baz" does not.  Previously probing silently returned the wrong answer and trie silently broke.  
- More aggressive recombination: if "baz quux" is never followed by any word, then do not include "bar" in the state.  
- kenlm assumes that "foo bar" is present if "foo bar baz" is.  This is now checked.  
- Binary format version number bump because the format has changed to support the above.  
- Lower memory consumption trie building.  But it will take longer for to ensure correct handling of blanks and aggressive recombination.  
- Fix progress bar newlines on trie building.

Agrees with SRI's 1-best outputs on the WMT 10 evaluation set.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3847 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-25 19:11:48 +00:00
heafield
a596b48971 Fix --enable-shared compilation.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3796 1f5c12ca-751b-0410-a591-d2e778427230
2011-01-11 19:32:59 +00:00
heafield
915cb22b6a Make tests run on OS X. This was an issue with the test (and its use of popen) not with the code.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3767 1f5c12ca-751b-0410-a591-d2e778427230
2010-12-09 21:58:54 +00:00
heafield
9062e3b73b KenLM update: allow user to specify data structure and parameters on command line to
build_binary.  Also some minor bugfixes.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3762 1f5c12ca-751b-0410-a591-d2e778427230
2010-12-08 03:15:37 +00:00
heafield
eabc137306 Make kenlm tests compile on more systems by adding headers.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3738 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-28 21:16:06 +00:00
heafield
0d8b62791e kenlm update.
Improved portability:
Hopefully handle big endian architectures (trie will fail at runtime with if this isn't working yet).  
Remove dependence on err.h.
Handle some Solaris weirdness wrt mmap and strerror.  
Clean up murmur_hash header.  

Add comparison and ZeroRemaining requested by Chris Dyer.  

More number parsing in FilePiece.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3737 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-28 02:54:56 +00:00
heafield
82f29bfc16 Chris Dyer says this should make things compile better on OS X.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3694 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-10 02:05:51 +00:00
hieuhoang1972
3b6b002df8 xcode
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3691 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-09 13:25:09 +00:00
hieuhoang1972
9a72825d29 mac compile
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3689 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-08 16:09:04 +00:00
heafield
2784923899 Rename a bunch of kenlm files. A ./regenerate-makefiles.sh is required.
Make loading with MAP_POPULATE on Linux and read on other OSes the default.
Use LM #9 for lazy loading, as recommended by other devs.  
Slightly faster trie.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3688 1f5c12ca-751b-0410-a591-d2e778427230
2010-11-06 00:40:16 +00:00
heafield
bf88f87d78 Fix return value of FilePiece::ReadLine at end of file. Did not impact existing kenlm (since
they don't read to the end of file) but will impact future versions.  


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3682 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-29 17:53:19 +00:00
heafield
c12c2c59d2 Autodetect model from binary format.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3675 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-28 01:05:04 +00:00
hieuhoang1972
eb374bf082 cygwin build
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3674 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-27 20:47:28 +00:00
hieuhoang1972
735d5b682f xcode
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3673 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-27 18:54:50 +00:00
heafield
614d6002a6 Integrate heafield-refactorlm. Faster kenlm with new binary format. Stateful language model
framework.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3671 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-27 17:50:40 +00:00
hieuhoang1972
46b59cbdd7 xcode
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3667 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-27 10:20:33 +00:00
heafield
64cfacd1bd Backporting FilePiece leaked scoped_FILE, but only into the test.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3665 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-27 04:04:23 +00:00
hieuhoang1972
34e7c43114 xcode
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3664 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-27 03:14:11 +00:00
heafield
d1b1b4f34c Tom from precision translation tools reports that IRST doesn't generate a blank line after each block. Removed this
requirement from the parser.  


git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3657 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-26 14:04:32 +00:00
heafield
8d0d44f5cd Support gzipped ARPA files. Progress bar tweak. Test fixes. Holding off on the big change for now.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3643 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-23 05:21:10 +00:00
heafield
e65ecd0632 Put official website in README
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3631 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-19 15:55:49 +00:00
hieuhoang1972
5f6baa9021 xcode
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3627 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-18 05:47:07 +00:00
hieuhoang1972
7463257be5 gcc 3.4
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3613 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-11 13:36:40 +00:00
hieuhoang1972
e504b797b2 xcode
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3599 1f5c12ca-751b-0410-a591-d2e778427230
2010-10-01 00:21:27 +00:00
heafield
770df2a92d Unbodge kenlm by moving compilation to kenlm/ instead of kenlm/lm. Changing the headers every
time I copied to Moses was getting annoying.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3587 1f5c12ca-751b-0410-a591-d2e778427230
2010-09-28 16:26:55 +00:00
heafield
d1a7c636ac Jon Clark complained that IRSTLM puts 0.0 backoff for n-grams of longest order and that I throw
an error.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3578 1f5c12ca-751b-0410-a591-d2e778427230
2010-09-27 18:38:21 +00:00
heafield
e6184ae947 Updates to kenlm:
Kludged and slow interface requested by Hieu because apparently Moses can't store language model state.  
Separate files for ARPA reading, vocabulary, and weights.  
Remove build shell scripts that won't work after Hieu changed the header file layout.  



git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3572 1f5c12ca-751b-0410-a591-d2e778427230
2010-09-27 03:46:44 +00:00
heafield
61f5472f1c Avoid some unused parameter complaints and force automake dependencies to fix parallel make.
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3571 1f5c12ca-751b-0410-a591-d2e778427230
2010-09-27 00:57:11 +00:00
hieuhoang1972
a82c2d5531 ken lm integration
git-svn-id: https://mosesdecoder.svn.sourceforge.net/svnroot/mosesdecoder/trunk@3569 1f5c12ca-751b-0410-a591-d2e778427230
2010-09-26 17:02:53 +00:00