mirror of
https://github.com/moses-smt/mosesdecoder.git
synced 2024-12-29 06:52:34 +03:00
1dce55f413
Some differences from Moses tokenizer: fraction characters count as numbers, _ handling, URLs Currently 3x slower than perl :'(. Looking to make it faster by composing regex substitutions. TODO eliminate sprintf and fixed-size buffers. |
||
---|---|---|
.. | ||
Jamfile | ||
tokenizer_main.cpp | ||
tokenizer.cpp | ||
tokenizer.h |