*`VWFeatureSourceExternalFeatures column=0`: when used with -inputtype 5 (`TabbedSentence`) this can be used to supply additional feature to VW. The input is a tab-separated file, the first column is the usual input sentence, all other columns can be used for meta-data. Parameter column=0 counts beginning with the first column that is not the input sentence.
*`VWFeatureSourceIndicator`: Ass a feature for the whole source phrase.
*`VWFeatureSourcePhraseInternal`: Adds a separate feature for every word of the source phrase.
*`VWFeatureSourceWindow size=3`: Adds source words in a window of size 3 before and after the source phrase as features. These do not overlap with `VWFeatureSourcePhraseInternal`.
To train a classifier, run `vwtrainer` (a limited version of the `moses` binary). Configure your features in the `moses.ini` file (see above) and set the `train` flag:
The `path` variable points to the file (prefix) where features will be written. Currently, threads write to separate files (maybe subject to change sooner or later): `features.txt.1`, `features.txt.2` etc.
`vwtrainer` creates the translation option collection for each input sentence but does not run decoding. Therefore, you probably want to disable expensive feature functions such as the language model (LM score is not used by VW features at the moment).
Currently, classification is implemented using VW's `csoaa_ldf` scheme with quadratic features which take the product of the source namespace (`s`, contains label-independent features) and the target namespace (`t`, contains label-dependent features).
To train VW in this setting, use the command:
cat features.txt.* | vw --hash all --noconstant -b 26 -q st --csoaa_ldf mc -f classifier1.vw