Commit Graph

19 Commits

Author SHA1 Message Date
Phil Williams
c1142741a1 relax-parse: write node attributes to output 2015-07-17 14:11:56 +01:00
Phil Williams
8653bd8159 Ongoing moses/phrase-extract refactoring 2015-06-03 14:20:00 +01:00
Phil Williams
ed321791a7 Ongoing moses/phrase-extract refactoring 2015-06-03 11:10:45 +01:00
Phil Williams
2f735998ca Rename MosesTraining::SyntaxTree to MosesTraining::SyntaxNodeCollection
This is the first step in a small-scale refactoring effort that will touch a
lot of the syntax-related code in moses/phrase-extract.  The end goals are:

  - a storage mechanism for general attribute/value pairs in XML-style
    tree / lattice input.  E.g. the "pcfg-score" and "semantic-role"
    attributes in:

     <tree label="PRP" pcfg-score="1.0" semantic-role="AGENT"> I </tree>

  - consolidation of the various near-duplicate Tree / XmlTreeParser classes
    that have accumulated over the years (my fault)

  - general de-crufting
2015-05-29 18:46:02 +01:00
Jeroen Vermeulen
32722ab5b1 Support tokenize(const std::string &) as well.
Convenience wrapper: the actual function takes a const char[], but many of
the call sites want to pass a string and have to call its c_str() first.
2015-04-22 10:35:18 +07:00
Jeroen Vermeulen
b2d821a141 Unify tokenize() into util, and unit-test it.
The duplicate definition works fine in environments where the inline
definition becomes a weak symbol in the object file, but if it gets
generated as a regular definition, the duplicate definition causes link
problems.

In most call sites the return value could easily be made const, which
gives both the reader and the compiler a bit more certainty about the code's
intentions.  In theory this may help performance, but it's mainly for clarity.

The comments are based on reverse-engineering, and the unit tests are based
on the comments.  It's possible that some of what's in there is not essential,
in which case, don't feel bad about changing it!

I left a third identical definition in place, though I updated it with my
changes to avoid creeping divergence, and noted the duplication in a comment.
It would be nice to get rid of this definition as well, but it'd introduce
headers from the main Moses tree into biconcor, which may be against policy.
2015-04-22 09:59:05 +07:00
Hieu Hoang
6d61db28fa use astyle 2.01. It's on Edinburgh server and doesn't screw up enum 2015-01-14 19:21:11 +00:00
Nicola Bertoldi
e4eb201c52 merged master into dynamic-models and solved conflicts 2014-12-13 12:52:47 +01:00
Phil Williams
60e56efc6b phrase-extract: add syntax-common sub-library
And remove some (near-)duplicate code from pcfg-common and score-stsg.
2014-12-07 14:27:51 +00:00
Matthias Huck
19a5ef4a1a relax-parse: use cin.peek()
Hope this eliminates some weird behavior
2014-07-17 20:19:28 +01:00
Hieu Hoang
29d83d94b1 delete any mention of SAFE_GETLINE so it doesn't reappear 2014-06-08 17:18:07 +01:00
Hieu Hoang
cb94a3181b use standard c++ getline instead of old Moses SAFE_GETLINE 2014-06-08 16:23:14 +01:00
Nicola Bertoldi
614d7a0376 beautify 2013-08-11 23:43:26 +02:00
Hieu Hoang
310b26f989 beautify 2013-07-08 20:52:14 +01:00
Hieu Hoang
3eba5782c2 beautify 2013-07-08 20:25:47 +01:00
Hieu Hoang
dc33fa3d3d redo parsing of feature function parameters 2013-06-20 12:50:41 +01:00
Hieu Hoang
abe6bb7c22 refactor parsing of feature functiona args 2013-06-10 18:11:55 +01:00
Phil Williams
139148bc8f extract-ghkm and friends: don't unescape special characters
Don't unescape special characters when reading XML parse trees in
extract-ghkm, extract-rules, and relax-parse.
2012-12-17 20:08:02 +00:00
Kenneth Heafield
62d37fa2b6 Refactor phrase-extract/Jamfile 2012-11-12 14:17:48 +00:00