Commit Graph

22 Commits

Author SHA1 Message Date
Jeroen Vermeulen
eca5824100 Remove trailing whitespace in C++ files. 2015-04-30 12:05:11 +07:00
Hieu Hoang
0c58e19491 uncomment out lines that clang choked on. Now works 2015-04-02 22:39:44 +04:00
akimbal1
ad70c9a35d resolve conflicts 2015-04-02 14:15:15 -04:00
akimbal1
b4e24a2fb8 compile with clang 3.3 x86_x6, no warnings 2015-04-02 14:07:23 -04:00
Hieu Hoang
d71e516176 make it compile on osx/clang 2015-04-02 21:30:47 +04:00
akimbal1
8cea968067 handle asian stock tickers better 2015-04-01 18:49:51 -04:00
akimbal1
d4ef9ce106 make -a work more like the perl tokenizer 2015-04-01 18:26:19 -04:00
akimbal1
2e39e829bf splitter and tokenizer tweaks, multithreading tokenizer 2015-04-01 15:49:32 -04:00
akimbal1
fd596b1972 splitter tweaks 2015-04-01 02:21:03 -04:00
akimbal1
3db8c87c7c add -B option 2015-03-31 22:03:32 -04:00
akimbal1
9aa73eed4f add splitter 2015-03-31 21:53:14 -04:00
akimbal1
1b9da3bb04 draft splitter 2015-03-19 01:02:18 -04:00
akimbal1
915c29b0dd detokenization fixes and features 2015-02-15 17:19:47 -05:00
akimbal1
eff60db207 stop treating dash like hyphen 2015-02-15 00:23:29 -05:00
akimbal1
6352dc773c closer match to perl tokenizer 2015-02-14 23:37:44 -05:00
akimbal1
362e6a9374 remove spurious endl 2015-02-02 15:57:04 -05:00
akimbal1
8ea1c9fd40 alignment for hieu 2015-02-02 12:55:21 -05:00
Hieu Hoang
884a0b1c90 forgot to add Parameters.cpp. Change c++11 to c++0x to support older compilers (on Ubuntu 12.04 etc). 2015-01-30 17:45:20 +00:00
Hieu Hoang
1dea58e945 separate parameters into it's own class 2015-01-25 15:02:33 +00:00
Hieu Hoang
5d2b0224d6 Jamfile for tokenizer 2015-01-25 14:00:35 +00:00
akimbal1
d38dcd89bb add glib-2.0 for better unicodification and faster implementation 2015-01-23 13:35:09 -05:00
Kenneth Heafield
e30065072e C++ tokenizer based on RE2. Not by me.
Some differences from Moses tokenizer:  fraction characters count as numbers, _ handling, URLs
Currently 3x slower than perl :'(.  Looking to make it faster by composing regex substitutions.
TODO eliminate sprintf and fixed-size buffers.
2015-01-21 12:23:44 -05:00