Commit Graph

12 Commits

Author SHA1 Message Date
Phil Williams
985e7bbfc3 Ongoing moses/phrase-extract refactoring 2015-05-29 20:57:25 +01:00
Phil Williams
2f735998ca Rename MosesTraining::SyntaxTree to MosesTraining::SyntaxNodeCollection
This is the first step in a small-scale refactoring effort that will touch a
lot of the syntax-related code in moses/phrase-extract.  The end goals are:

  - a storage mechanism for general attribute/value pairs in XML-style
    tree / lattice input.  E.g. the "pcfg-score" and "semantic-role"
    attributes in:

     <tree label="PRP" pcfg-score="1.0" semantic-role="AGENT"> I </tree>

  - consolidation of the various near-duplicate Tree / XmlTreeParser classes
    that have accumulated over the years (my fault)

  - general de-crufting
2015-05-29 18:46:02 +01:00
Jeroen Vermeulen
eca5824100 Remove trailing whitespace in C++ files. 2015-04-30 12:05:11 +07:00
Jeroen Vermeulen
32722ab5b1 Support tokenize(const std::string &) as well.
Convenience wrapper: the actual function takes a const char[], but many of
the call sites want to pass a string and have to call its c_str() first.
2015-04-22 10:35:18 +07:00
Jeroen Vermeulen
b2d821a141 Unify tokenize() into util, and unit-test it.
The duplicate definition works fine in environments where the inline
definition becomes a weak symbol in the object file, but if it gets
generated as a regular definition, the duplicate definition causes link
problems.

In most call sites the return value could easily be made const, which
gives both the reader and the compiler a bit more certainty about the code's
intentions.  In theory this may help performance, but it's mainly for clarity.

The comments are based on reverse-engineering, and the unit tests are based
on the comments.  It's possible that some of what's in there is not essential,
in which case, don't feel bad about changing it!

I left a third identical definition in place, though I updated it with my
changes to avoid creeping divergence, and noted the duplication in a comment.
It would be nice to get rid of this definition as well, but it'd introduce
headers from the main Moses tree into biconcor, which may be against policy.
2015-04-22 09:59:05 +07:00
Phil Williams
60e56efc6b phrase-extract: add syntax-common sub-library
And remove some (near-)duplicate code from pcfg-common and score-stsg.
2014-12-07 14:27:51 +00:00
Rico Sennrich
ba52fa163b use &#124; as default escape sequence for "|" (for consistency with tokenizer.perl) 2014-03-21 19:19:03 +00:00
Hieu Hoang
6249432407 beautify 2013-05-29 18:16:15 +01:00
Kenneth Heafield
7d692496c3 More little jamfile changes 2012-11-12 16:57:56 +00:00
Kenneth Heafield
d74b784ad2 And pcfg-common too... 2012-11-12 16:53:42 +00:00
Hieu Hoang
121e258e84 namespace all classes in mert directory 2012-06-30 21:39:10 +01:00
Hieu Hoang
4eef94b121 move c++ code out of /script/ to / 2012-05-31 17:24:06 +01:00