Final commit in this round of refactoring (which started with commit
2f735998...). The main changes are:
- a general storage mechanism for attribute/value pairs in XML-style
tree / lattice input. E.g. the "pcfg-score" and "semantic-role"
attributes in:
<tree label="PRP" pcfg-score="1.0" semantic-role="AGENT"> I </tree>
- consolidation of the various near-duplicate Tree / XmlTreeParser classes
that have accumulated over the years (my fault)
- miscellaneous de-crufting
This is the first step in a small-scale refactoring effort that will touch a
lot of the syntax-related code in moses/phrase-extract. The end goals are:
- a storage mechanism for general attribute/value pairs in XML-style
tree / lattice input. E.g. the "pcfg-score" and "semantic-role"
attributes in:
<tree label="PRP" pcfg-score="1.0" semantic-role="AGENT"> I </tree>
- consolidation of the various near-duplicate Tree / XmlTreeParser classes
that have accumulated over the years (my fault)
- general de-crufting
This is a common convention: when a program gets a dash as the path of a
file that it should write, it writes to standard output instead.
Enhances portability to systems that don't have /dev/stdout.
This is lint reported by the new lint-checking functionality in beautify.py.
(We can change to a different lint checker if we have a better one, but it
would probably still flag these same problems.)
Lint checking can help a lot, but only if we get the lint under control.
This only affects configurations where inline functions become regular,
non-weak symbols, leading to link conflicts. The extra definition was not
used anywhere.
The removed definition was probably less efficient. However the only
functional difference was that it returned false for the empty nonterminal,
i.e. "[]".
The duplicate definition works fine in environments where the inline
definition becomes a weak symbol in the object file, but if it gets
generated as a regular definition, the duplicate definition causes link
problems.
In most call sites the return value could easily be made const, which
gives both the reader and the compiler a bit more certainty about the code's
intentions. In theory this may help performance, but it's mainly for clarity.
The comments are based on reverse-engineering, and the unit tests are based
on the comments. It's possible that some of what's in there is not essential,
in which case, don't feel bad about changing it!
I left a third identical definition in place, though I updated it with my
changes to avoid creeping divergence, and noted the duplication in a comment.
It would be nice to get rid of this definition as well, but it'd introduce
headers from the main Moses tree into biconcor, which may be against policy.
Mostly signed/unsigned comparisons and reordered member
initializations; also a few unused variables.
There are more, but if I chip away at them for a while, who knows, it
may catch on and warnings may eventually become socially stigmatizing.
:)
This is one of those little chores in managing a long-lived C++
project: standard C headers like stdio.h and math.h now have their own
place in the C++ standard as resp. cstdio, cmath, and so on. In this
branch the #include names are updated for the phrase-extract/
subdirectory; more branches to follow.
C++11 adds cstdint, but to support compilation with the previous
standard, that change is left for later.
This performs some minor transformations to Egret forests: escaping of
Moses special characters; removal of "^g" suffixes from constituent labels;
and marking of slash/hyphen split points (using @ characters).