mosesdecoder

mirror of https://github.com/moses-smt/mosesdecoder.git synced 2024-12-28 14:32:38 +03:00

Author	SHA1	Message	Date
Jeroen Vermeulen	09c982c1de	Remove bad initialization. Setting lastLine[0] when lastLine is empty probably doesn't do anything, but in C++11 is definitely undefined. The value wasn't used anyway!	2015-05-01 18:42:04 +07:00
Jeroen Vermeulen	eca5824100	Remove trailing whitespace in C++ files.	2015-04-30 12:05:11 +07:00
Jeroen Vermeulen	616b589da3	Fix a bunch of compiler warnings. Warnings are useful, but only if there are few!	2015-04-29 21:18:51 +07:00
Jeroen Vermeulen	fc810e363e	Remove conflicting definition of isNonTerminal. This only affects configurations where inline functions become regular, non-weak symbols, leading to link conflicts. The extra definition was not used anywhere. The removed definition was probably less efficient. However the only functional difference was that it returned false for the empty nonterminal, i.e. "[]".	2015-04-22 10:43:15 +07:00
Jeroen Vermeulen	32722ab5b1	Support tokenize(const std::string &) as well. Convenience wrapper: the actual function takes a const char[], but many of the call sites want to pass a string and have to call its c_str() first.	2015-04-22 10:35:18 +07:00
Jeroen Vermeulen	b2d821a141	Unify tokenize() into util, and unit-test it. The duplicate definition works fine in environments where the inline definition becomes a weak symbol in the object file, but if it gets generated as a regular definition, the duplicate definition causes link problems. In most call sites the return value could easily be made const, which gives both the reader and the compiler a bit more certainty about the code's intentions. In theory this may help performance, but it's mainly for clarity. The comments are based on reverse-engineering, and the unit tests are based on the comments. It's possible that some of what's in there is not essential, in which case, don't feel bad about changing it! I left a third identical definition in place, though I updated it with my changes to avoid creeping divergence, and noted the duplication in a comment. It would be nice to get rid of this definition as well, but it'd introduce headers from the main Moses tree into biconcor, which may be against policy.	2015-04-22 09:59:05 +07:00
Matthias Huck	633e7be8f0	integer overflows in Good-Turing discounting	2015-03-30 17:42:55 +01:00
Jeroen Vermeulen	c634f6ee5b	Remove some unused variables. This silences a few more compiler warnings.	2015-03-30 10:26:39 +07:00
Jeroen Vermeulen	789a2e2bc3	Fix some compile warnings (gcc 4.9.2). Mostly signed/unsigned comparisons and reordered member initializations; also a few unused variables. There are more, but if I chip away at them for a while, who knows, it may catch on and warnings may eventually become socially stigmatizing. :)	2015-03-29 18:10:51 +07:00
Jeroen Vermeulen	9852a0c2ff	Modernize "C" includes in phrase-extract. This is one of those little chores in managing a long-lived C++ project: standard C headers like stdio.h and math.h now have their own place in the C++ standard as resp. cstdio, cmath, and so on. In this branch the #include names are updated for the phrase-extract/ subdirectory; more branches to follow. C++11 adds cstdint, but to support compilation with the previous standard, that change is left for later.	2015-03-28 19:56:20 +07:00
Matthias Huck	534a894c0b	glue rules with stripped BitPar labels	2015-03-10 22:02:21 +00:00
Matthias Huck	01bed83cf9	GHKM extraction: option to strip non-terminal labels from BitPar syntactic parses right during extraction (i.e., remove any suffix starting with a hyphen from the label)	2015-03-10 21:25:32 +00:00
Hieu Hoang	ad73919979	merge with private branch	2015-03-10 15:28:45 +00:00
Phil Williams	9e88f794e6	Add phrase-extract/postprocess-egret-forests This performs some minor transformations to Egret forests: escaping of Moses special characters; removal of "^g" suffixes from constituent labels; and marking of slash/hyphen split points (using @ characters).	2015-03-10 13:51:30 +00:00
Matthias Huck	25f5470216	GHKM: write target parts-of-speech as a factor	2015-03-09 21:54:03 +00:00
Matthias Huck	524ed4406e	pragma once	2015-03-09 21:44:54 +00:00
Matthias Huck	559077f6f8	some moderate modifications in phrase-extract/score-main.cpp (e.g., use Moses::Scan<>() rather than atof()/atoi())	2015-03-09 18:49:32 +00:00
Matthias Huck	973fd98052	conservative update of some old code in phrase-extract/consolidate-main.cpp	2015-03-09 18:47:28 +00:00
Matthias Huck	0c79e19ff9	consolidate properties: fixing bug from commit `b08d3ed`	2015-03-09 18:44:02 +00:00
Hieu Hoang	b08d3ed0fe	merge with private branch. Add --Count arg	2015-03-09 00:47:51 +00:00
Matthias Huck	99b8f65fb1	GHKM: POS factor in glue rules: target side only	2015-03-06 16:47:44 +00:00
Matthias Huck	aa077ab66c	GHKM extraction / consolidate: write most frequent POS sequence from property to factor (for usage with a POS LM)	2015-03-05 22:25:32 +00:00
Matthias Huck	773a16b5fd	POS property in glue rules	2015-03-04 23:05:45 +00:00
Matthias Huck	638e9c3f60	POS property: map tags to indices in consolidate	2015-03-04 22:48:34 +00:00
Matthias Huck	06e87d851e	GHKM: extract POS phrase property (from preterminals in the syntactic parse tree)	2015-03-04 21:40:56 +00:00
Phil Williams	0346fbb138	filter-rule-table: stopgap (non-) filtering for T2S/SCFG	2015-02-23 11:27:20 +00:00
Hieu Hoang	32de075022	beautify	2015-02-19 12:27:23 +00:00
Phil Williams	e1d60211a4	filter-rule-table: comments + minor clean-up.	2015-02-11 12:03:27 +00:00
Phil Williams	02f5ada680	filter-rule-table: support for "hierarchical" and "s2t" model types Output should match filter-rule-table.py, but filtering is faster. Some rough timings: That This System A 0h 13m 0h 04m System B 18h 03m 0h 51m System A is WMT14, en-de, string-to-tree (32M rules, 3,000 test sentences) System B is WMT14, cs-en, string-to-tree (293M rules, 13,071 test sentences)	2015-02-10 15:11:10 +00:00
Hieu Hoang	70e8eb54ce	Using boost for prefix/suffix checks /Jeroen Vermeulen	2015-02-05 16:23:47 +00:00
Philipp Koehn	f69c1dab02	more efficient default recaser training	2015-02-04 09:18:09 +00:00
Phil Williams	6b9da6c585	filter-rule-table: merge changes from t2s branch (still WIP)	2015-02-03 11:33:10 +00:00
Matthias Huck	9987beb453	SoftSourceSyntacticConstraintsFeature: Now for both non-terminals (as before) _and_ terminals. Also added score components based on relative frequency. (TODO: logprobs right now; are plain probabilities better?)	2015-01-23 18:41:18 +00:00
Matthias Huck	b50c197313	forgot to check this in some time ago	2015-01-20 21:41:41 +00:00
Matthias Huck	a6c09e57d0	domain features in GHKM extraction	2015-01-20 21:36:55 +00:00
Hieu Hoang	b50b3164fa	beautify	2015-01-15 11:18:39 +00:00
Hieu Hoang	6289b39fd8	update extract-mixed-syntax	2015-01-15 09:53:57 +00:00
Hieu Hoang	6d61db28fa	use astyle 2.01. It's on Edinburgh server and doesn't screw up enum	2015-01-14 19:21:11 +00:00
Hieu Hoang	05ead45e71	beautify	2015-01-14 11:07:42 +00:00
Matthias Huck	168118d252	PhraseOrientationFeature efficiency improvement	2015-01-09 14:03:18 +00:00
Phil Williams	7cc75a0fa1	score-stsg: add --TreeScore option	2014-12-30 18:57:23 +00:00
Philipp Koehn	831f947874	long overdue feature: do not produce very low scoring translation table entries that are never used and just gum up the works	2014-12-21 01:14:42 +00:00
Nicola Bertoldi	e4eb201c52	merged master into dynamic-models and solved conflicts	2014-12-13 12:52:47 +01:00
Phil Williams	b9a382aa78	Add filter-rule-table This will eventually replace filter-rule-table.py. At the moment it can only filter rule tables where the source-side is a STSG fragment and when the test sentences have parse trees.	2014-12-07 14:56:48 +00:00
Phil Williams	60e56efc6b	phrase-extract: add syntax-common sub-library And remove some (near-)duplicate code from pcfg-common and score-stsg.	2014-12-07 14:27:51 +00:00
Phil Williams	a2708b8431	relax-parse: fix hang SyntaxTree::Parse() would enter a very long loop due to an unintialized member variable.	2014-12-07 12:56:41 +00:00
Hieu Hoang	4b10c59bea	add OutputSearchGraphHypergraph() to API framework. Move m_source to BaseManager	2014-12-05 21:33:59 +00:00
Matthias Huck	7a299de66b	avoid necessity of masking "{{" in the data	2014-12-04 15:54:05 +00:00
Matthias Huck	24a8a6a511	PhraseOrientationFeature	2014-12-03 20:04:26 +00:00
Hieu Hoang	49a2ff1faa	Merge branch 'merge-cmd'	2014-12-02 19:09:34 +00:00

1 2 3 4 5 ...

260 Commits