Matt Wells
|
7df7fbe721
|
support the CONNECT for gb squid proxy
|
2014-10-02 12:36:43 -07:00 |
|
mwells
|
42b891219d
|
several fixes for floater proxy through squid proxy.
gb needs to act like squid for the rendering machines so
it can do crawl delay backoff and load balancing over the
floaters.
|
2014-10-02 02:08:38 -07:00 |
|
mwells
|
c2f98a81b6
|
fix floater bug from reading hashtable off disk.
force use floaters if ! useRobots and is diffbot crawl.
|
2014-09-26 15:30:42 -07:00 |
|
mwells
|
082b39e027
|
turn off images for qa tests.
fix loop stuff some more. seewms to be slower
|
2014-09-10 14:13:39 -07:00 |
|
mwells
|
8f14207fc9
|
fix core dump in qa testing
|
2014-09-10 08:08:02 -07:00 |
|
mwells
|
caee238c46
|
fixes to make easier to compile on max os x.
|
2014-08-28 12:55:02 -07:00 |
|
mwells
|
d5ef8a36e7
|
fix crawldelay bug. we were ignoring it.
|
2014-08-27 17:19:13 -07:00 |
|
mwells
|
6a28250e94
|
get qa test working after nyt bug fix
|
2014-08-06 16:00:25 -07:00 |
|
mwells
|
947be58f10
|
Merge branch 'diffbot-testing' into testing
Conflicts:
HttpRequest.cpp
Msg13.cpp
XmlDoc.cpp
|
2014-08-05 17:19:53 -07:00 |
|
mwells
|
cc1ceaaac2
|
fix nyt.com cookie redir bug.
fixed bug when POSTing injection request with multipart/form-data.
|
2014-08-05 17:04:11 -07:00 |
|
mwells
|
e66e7e5d11
|
undid some log debug msg stuff
|
2014-07-12 17:02:45 -07:00 |
|
mwells
|
2f8207ccf7
|
qa fixes
|
2014-07-11 19:07:49 -07:00 |
|
mwells
|
5f26918910
|
lots of bug fixes. more qa fixes.
|
2014-07-11 08:00:30 -07:00 |
|
Matt Wells
|
0ecc7933d6
|
qa test for squid/sections
|
2014-07-10 16:28:24 -07:00 |
|
mwells
|
05fcef9651
|
more vote infusion and squid proxy fixes.
|
2014-07-09 14:57:58 -07:00 |
|
mwells
|
d4218e01d7
|
inject docs that come through our squid proxy
|
2014-07-09 12:25:23 -07:00 |
|
mwells
|
d7b67f21e7
|
return error if we get CONNECT requests. we don't
handle those because we can't cache them or inject
the sectiondb voting info into their tags because they
are encrypted from us.
|
2014-07-09 11:06:46 -07:00 |
|
mwells
|
d9ae010371
|
shard gbfacetstr:gbxpathsitehash123456 terms by termid for speed.
got them working again multicasting a msg 0x39 to the appropriate shard.
set special msg39request flag for better performance for those guys.
|
2014-07-07 12:32:27 -07:00 |
|
mwells
|
6434e5cc04
|
Merge branch 'testing' into diffbot-matt
Conflicts:
Errno.cpp
Errno.h
Parms.h
|
2014-07-07 09:49:59 -07:00 |
|
mwells
|
05065f7f8c
|
treat http status 999 as forbidden.
|
2014-07-07 09:46:24 -07:00 |
|
mwells
|
aeae6bb1a5
|
qa test updates
|
2014-07-06 15:04:21 -07:00 |
|
mwells
|
92799ef393
|
add support for tunnelling https fetch
through an http proxy using CONNECT
directive. needs more debugging.
|
2014-07-01 10:43:52 -06:00 |
|
mwells
|
9249564191
|
now floaters are working pretty well
|
2014-06-30 16:26:10 -06:00 |
|
mwells
|
df8b9bd01a
|
more fixes for section markup proxy
|
2014-06-12 15:28:03 -07:00 |
|
mwells
|
20c4ac4205
|
got it marking up html now with sectiondb stats.
seems to work ok.
|
2014-06-12 14:42:08 -07:00 |
|
mwells
|
ea90e7f755
|
more fixes for sectiondb markup code
|
2014-06-12 13:05:45 -07:00 |
|
mwells
|
e4ce9bc9ac
|
squidproxycache/floaters/sectiondbtagging all compiles.
need to do run-time debugging now.
|
2014-06-11 17:57:28 -07:00 |
|
mwells
|
6f70282ba2
|
almost got sectiondb integration compiling
|
2014-06-11 17:24:58 -07:00 |
|
mwells
|
29e90d1d55
|
squid proxy fixes
|
2014-06-09 16:10:24 -07:00 |
|
mwells
|
5bf3042633
|
fix squid proxy cache key generation
|
2014-06-09 14:37:13 -07:00 |
|
mwells
|
b71ea7f7c6
|
fixes for squid proxy simulator
|
2014-06-09 14:31:48 -07:00 |
|
mwells
|
7d452a766c
|
completed squid proxy simulation code
|
2014-06-09 12:42:05 -07:00 |
|
mwells
|
965d992f98
|
Merge branch 'diffbot-testing' into diffbot-matt
Conflicts:
Msg13.cpp
|
2014-06-06 15:14:41 -07:00 |
|
mwells
|
3f2dcda4e1
|
got new floater/proxy logic compiling.
|
2014-06-06 15:11:51 -07:00 |
|
Matt Wells
|
13243a411c
|
more fixes for fake http reply hack
|
2014-06-05 20:31:49 -07:00 |
|
Matt Wells
|
ce7294e9a9
|
more mem leak fixes for fake
bulk job empty http replies
|
2014-06-05 20:09:12 -07:00 |
|
mwells
|
2c750b2c22
|
Merge branch 'diffbot-testing' into diffbot-matt
|
2014-06-04 13:56:44 -07:00 |
|
mwells
|
d23032241d
|
fix mem leak when downloading images is turned on.
|
2014-06-03 13:26:56 -07:00 |
|
mwells
|
91c7115c73
|
nothing
|
2014-06-03 11:49:21 -07:00 |
|
mwells
|
6dcbc10e92
|
spider proxy updates.
|
2014-06-03 11:38:44 -07:00 |
|
mwells
|
29c1c83967
|
select the proxy later down the pipeline to allow
for cache hits, etc.
|
2014-06-02 15:33:25 -07:00 |
|
mwells
|
5377a7543c
|
more spider proxy bug fixes
|
2014-06-02 15:17:43 -07:00 |
|
mwells
|
ee5af6b30e
|
more spider proxy fixes
|
2014-06-02 14:59:15 -07:00 |
|
mwells
|
ca450e6bbd
|
using msg55 when done downloading through a proxy to record
stats for load balancing on host #0
|
2014-06-02 13:48:33 -07:00 |
|
mwells
|
a811462d5f
|
spider proxy stuff compiles now
|
2014-05-30 15:05:00 -07:00 |
|
Matt Wells
|
d6434191d1
|
nomenclature changes to reduce collissions.
name collection 'qatest123' for doing smoke tests,
not 'test'.
|
2014-03-31 15:02:17 -07:00 |
|
mwells
|
b6e5424e32
|
do not download bulkjob urls in crawlbot.
just return a fake http reply.
however, do use crawl-delay throttling
logic. deduping is already turned off for
bulk jobs so it should be ok.
|
2014-03-21 12:40:38 -07:00 |
|
Matt Wells
|
e351d2a6f1
|
get searching on token working
|
2014-03-06 17:01:41 -08:00 |
|
Matt Wells
|
8aef2ba8a0
|
take out potentially bad robots.txt
filter compression logic.
|
2014-01-28 18:26:16 -08:00 |
|
Matt Wells
|
321fc90ff6
|
fix some cores.
NOTE: emails disabled here... need to fix.
|
2014-01-24 12:07:28 -08:00 |
|