Commit Graph

82 Commits

Author SHA1 Message Date
Matthias Huck
dda3f1867c Hiero phrase orientation 2016-01-06 18:52:14 +00:00
Matthias Huck
bd3f573452 Hiero phrase orientation 2015-12-10 12:56:37 +00:00
Phil Williams
3c86649e34 Fix bug in STSG rule scoring 2015-09-25 12:11:23 +01:00
Hieu Hoang
e0d2af268c eclipse 2015-08-11 13:10:38 +04:00
Phil Williams
e7228ec9fb extract-ghkm: minor refactoring 2015-07-06 14:41:34 +01:00
Phil Williams
44372d7787 extract-ghkm: fix a couple of exception-related issues 2015-07-06 12:05:41 +01:00
Phil Williams
c6a3d8e54a Ongoing moses/phrase-extract refactoring 2015-06-04 16:54:31 +01:00
MosesAdmin
5696a59ae4 daily automatic beautifier 2015-06-04 13:41:46 +01:00
Phil Williams
ed321791a7 Ongoing moses/phrase-extract refactoring 2015-06-03 11:10:45 +01:00
Phil Williams
2e21f051f2 Ongoing moses/phrase-extract refactoring 2015-06-03 10:05:36 +01:00
Phil Williams
2f04d4a56e Ongoing moses/phrase-extract refactoring 2015-06-02 15:23:41 +01:00
Phil Williams
0c61970ac7 Ongoing moses/phrase-extract refactoring 2015-06-02 13:56:03 +01:00
Phil Williams
d3fb4a8002 Ongoing moses/phrase-extract refactoring 2015-06-02 10:16:42 +01:00
Phil Williams
8a9505d72f Ongoing moses/phrase-extract refactoring 2015-06-01 16:54:12 +01:00
Phil Williams
f37415a259 Ongoing moses/phrase-extract refactoring 2015-06-01 16:40:35 +01:00
Phil Williams
f61091e38d Ongoing moses/phrase-extract refactoring 2015-06-01 14:23:25 +01:00
Phil Williams
c754aef37a Oops. Fix compile error. 2015-06-01 08:45:04 +01:00
Phil Williams
985e7bbfc3 Ongoing moses/phrase-extract refactoring 2015-05-29 20:57:25 +01:00
Phil Williams
2f735998ca Rename MosesTraining::SyntaxTree to MosesTraining::SyntaxNodeCollection
This is the first step in a small-scale refactoring effort that will touch a
lot of the syntax-related code in moses/phrase-extract.  The end goals are:

  - a storage mechanism for general attribute/value pairs in XML-style
    tree / lattice input.  E.g. the "pcfg-score" and "semantic-role"
    attributes in:

     <tree label="PRP" pcfg-score="1.0" semantic-role="AGENT"> I </tree>

  - consolidation of the various near-duplicate Tree / XmlTreeParser classes
    that have accumulated over the years (my fault)

  - general de-crufting
2015-05-29 18:46:02 +01:00
Jeroen Vermeulen
a25193cc5d Fix a lot of lint, mostly trailing whitespace.
This is lint reported by the new lint-checking functionality in beautify.py.
(We can change to a different lint checker if we have a better one, but it
would probably still flag these same problems.)

Lint checking can help a lot, but only if we get the lint under control.
2015-05-17 20:04:04 +07:00
Hieu Hoang
cc8c6b7b10 beautify 2015-05-02 11:45:24 +01:00
Jeroen Vermeulen
eca5824100 Remove trailing whitespace in C++ files. 2015-04-30 12:05:11 +07:00
Jeroen Vermeulen
32722ab5b1 Support tokenize(const std::string &) as well.
Convenience wrapper: the actual function takes a const char[], but many of
the call sites want to pass a string and have to call its c_str() first.
2015-04-22 10:35:18 +07:00
Jeroen Vermeulen
b2d821a141 Unify tokenize() into util, and unit-test it.
The duplicate definition works fine in environments where the inline
definition becomes a weak symbol in the object file, but if it gets
generated as a regular definition, the duplicate definition causes link
problems.

In most call sites the return value could easily be made const, which
gives both the reader and the compiler a bit more certainty about the code's
intentions.  In theory this may help performance, but it's mainly for clarity.

The comments are based on reverse-engineering, and the unit tests are based
on the comments.  It's possible that some of what's in there is not essential,
in which case, don't feel bad about changing it!

I left a third identical definition in place, though I updated it with my
changes to avoid creeping divergence, and noted the duplication in a comment.
It would be nice to get rid of this definition as well, but it'd introduce
headers from the main Moses tree into biconcor, which may be against policy.
2015-04-22 09:59:05 +07:00
Matthias Huck
534a894c0b glue rules with stripped BitPar labels 2015-03-10 22:02:21 +00:00
Matthias Huck
01bed83cf9 GHKM extraction: option to strip non-terminal labels from BitPar syntactic parses right during extraction (i.e., remove any suffix starting with a hyphen from the label) 2015-03-10 21:25:32 +00:00
Matthias Huck
25f5470216 GHKM: write target parts-of-speech as a factor 2015-03-09 21:54:03 +00:00
Matthias Huck
99b8f65fb1 GHKM: POS factor in glue rules: target side only 2015-03-06 16:47:44 +00:00
Matthias Huck
aa077ab66c GHKM extraction / consolidate: write most frequent POS sequence from property to factor (for usage with a POS LM) 2015-03-05 22:25:32 +00:00
Matthias Huck
773a16b5fd POS property in glue rules 2015-03-04 23:05:45 +00:00
Matthias Huck
06e87d851e GHKM: extract POS phrase property (from preterminals in the syntactic parse tree) 2015-03-04 21:40:56 +00:00
Matthias Huck
9987beb453 SoftSourceSyntacticConstraintsFeature: Now for both non-terminals (as before) _and_ terminals.
Also added score components based on relative frequency.
(TODO: logprobs right now; are plain probabilities better?)
2015-01-23 18:41:18 +00:00
Matthias Huck
a6c09e57d0 domain features in GHKM extraction 2015-01-20 21:36:55 +00:00
Hieu Hoang
6d61db28fa use astyle 2.01. It's on Edinburgh server and doesn't screw up enum 2015-01-14 19:21:11 +00:00
Hieu Hoang
05ead45e71 beautify 2015-01-14 11:07:42 +00:00
Matthias Huck
168118d252 PhraseOrientationFeature efficiency improvement 2015-01-09 14:03:18 +00:00
Nicola Bertoldi
e4eb201c52 merged master into dynamic-models and solved conflicts 2014-12-13 12:52:47 +01:00
Matthias Huck
24a8a6a511 PhraseOrientationFeature 2014-12-03 20:04:26 +00:00
Hieu Hoang
ba7afba9f6 move n-best code for phrase-based from IOWrapper to ChartManager 2014-12-02 17:40:53 +00:00
Phil Williams
ef1262a17f extract-ghkm: change STSG output format. 2014-11-21 15:46:12 +00:00
Phil Williams
c46fb10ec7 extract-ghkm: add --STSG option 2014-11-21 11:30:29 +00:00
Matthias Huck
0fd987a8c6 avoid necessity of masking "{{" in the data 2014-11-12 18:28:59 +00:00
Phil Williams
a5d803ee14 extract-ghkm: add -T2S option 2014-11-12 14:03:24 +00:00
Phil Williams
05ecc914c2 Fix a few more compiler warnings (from Clang mostly). 2014-10-10 15:47:53 +01:00
Matthias Huck
5ac6c42508 PhraseOrientationFeature: bugfixes 2014-09-13 00:20:17 +01:00
Matthias Huck
0cf0d595d3 GHKM glue grammar: Orientation phrase property 2014-09-12 17:30:03 +01:00
Matthias Huck
63316960a1 GHKM glue grammar: print word alignment links for <s> and </s>,
SSTART and SEND in internal tree structure
2014-09-12 17:18:31 +01:00
Matthias Huck
1523f3315d PhraseOrientationFeature for chart-based decoding: a first simple version,
with lots of log output
2014-09-12 13:51:04 +01:00
Matthias Huck
33992f9af5 uninitialized variables and double include 2014-08-08 16:27:17 +01:00
Paul Baltescu
d75c4e1ae5 OxLM integration. 2014-08-08 01:18:05 +01:00