mosesdecoder/phrase-extract/tables-core.h
Jeroen Vermeulen b2d821a141 Unify tokenize() into util, and unit-test it.
The duplicate definition works fine in environments where the inline
definition becomes a weak symbol in the object file, but if it gets
generated as a regular definition, the duplicate definition causes link
problems.

In most call sites the return value could easily be made const, which
gives both the reader and the compiler a bit more certainty about the code's
intentions.  In theory this may help performance, but it's mainly for clarity.

The comments are based on reverse-engineering, and the unit tests are based
on the comments.  It's possible that some of what's in there is not essential,
in which case, don't feel bad about changing it!

I left a third identical definition in place, though I updated it with my
changes to avoid creeping divergence, and noted the duplication in a comment.
It would be nice to get rid of this definition as well, but it'd introduce
headers from the main Moses tree into biconcor, which may be against policy.
2015-04-22 09:59:05 +07:00

70 lines
1.3 KiB
C++

// $Id$
#ifndef _TABLES_H
#define _TABLES_H
#include <iostream>
#include <fstream>
#include <cassert>
#include <cstdlib>
#include <string>
#include <queue>
#include <map>
#include <cmath>
namespace MosesTraining
{
typedef std::string WORD;
typedef unsigned int WORD_ID;
class Vocabulary
{
public:
std::map<WORD, WORD_ID> lookup;
std::vector< WORD > vocab;
WORD_ID storeIfNew( const WORD& );
WORD_ID getWordID( const WORD& );
inline WORD &getWord( const WORD_ID id ) {
return vocab[ id ];
}
};
typedef std::vector< WORD_ID > PHRASE;
typedef unsigned int PHRASE_ID;
class PhraseTable
{
public:
std::map< PHRASE, PHRASE_ID > lookup;
std::vector< PHRASE > phraseTable;
PHRASE_ID storeIfNew( const PHRASE& );
PHRASE_ID getPhraseID( const PHRASE& );
void clear();
inline PHRASE &getPhrase( const PHRASE_ID id ) {
return phraseTable[ id ];
}
};
typedef std::vector< std::pair< PHRASE_ID, double > > PHRASEPROBVEC;
class TTable
{
public:
std::map< PHRASE_ID, std::vector< std::pair< PHRASE_ID, double > > > ttable;
std::map< PHRASE_ID, std::vector< std::pair< PHRASE_ID, std::vector< double > > > > ttableMulti;
};
class DTable
{
public:
std::map< int, double > dtable;
void init();
void load( const std::string& );
double get( int );
};
}
#endif