Jeroen Vermeulen
eca5824100
Remove trailing whitespace in C++ files.
2015-04-30 12:05:11 +07:00
Hieu Hoang
0c58e19491
uncomment out lines that clang choked on. Now works
2015-04-02 22:39:44 +04:00
akimbal1
ad70c9a35d
resolve conflicts
2015-04-02 14:15:15 -04:00
akimbal1
b4e24a2fb8
compile with clang 3.3 x86_x6, no warnings
2015-04-02 14:07:23 -04:00
Hieu Hoang
d71e516176
make it compile on osx/clang
2015-04-02 21:30:47 +04:00
akimbal1
8cea968067
handle asian stock tickers better
2015-04-01 18:49:51 -04:00
akimbal1
d4ef9ce106
make -a work more like the perl tokenizer
2015-04-01 18:26:19 -04:00
akimbal1
2e39e829bf
splitter and tokenizer tweaks, multithreading tokenizer
2015-04-01 15:49:32 -04:00
akimbal1
fd596b1972
splitter tweaks
2015-04-01 02:21:03 -04:00
akimbal1
3db8c87c7c
add -B option
2015-03-31 22:03:32 -04:00
akimbal1
9aa73eed4f
add splitter
2015-03-31 21:53:14 -04:00
akimbal1
1b9da3bb04
draft splitter
2015-03-19 01:02:18 -04:00
akimbal1
915c29b0dd
detokenization fixes and features
2015-02-15 17:19:47 -05:00
akimbal1
eff60db207
stop treating dash like hyphen
2015-02-15 00:23:29 -05:00
akimbal1
6352dc773c
closer match to perl tokenizer
2015-02-14 23:37:44 -05:00
akimbal1
362e6a9374
remove spurious endl
2015-02-02 15:57:04 -05:00
akimbal1
8ea1c9fd40
alignment for hieu
2015-02-02 12:55:21 -05:00
Hieu Hoang
884a0b1c90
forgot to add Parameters.cpp. Change c++11 to c++0x to support older compilers (on Ubuntu 12.04 etc).
2015-01-30 17:45:20 +00:00
Hieu Hoang
1dea58e945
separate parameters into it's own class
2015-01-25 15:02:33 +00:00
Hieu Hoang
5d2b0224d6
Jamfile for tokenizer
2015-01-25 14:00:35 +00:00
akimbal1
d38dcd89bb
add glib-2.0 for better unicodification and faster implementation
2015-01-23 13:35:09 -05:00
Kenneth Heafield
e30065072e
C++ tokenizer based on RE2. Not by me.
...
Some differences from Moses tokenizer: fraction characters count as numbers, _ handling, URLs
Currently 3x slower than perl :'(. Looking to make it faster by composing regex substitutions.
TODO eliminate sprintf and fixed-size buffers.
2015-01-21 12:23:44 -05:00