Commit Graph

63 Commits

Author SHA1 Message Date
Matt Wells
7df7fbe721 support the CONNECT for gb squid proxy 2014-10-02 12:36:43 -07:00
mwells
42b891219d several fixes for floater proxy through squid proxy.
gb needs to act like squid for the rendering machines so
it can do crawl delay backoff and load balancing over the
floaters.
2014-10-02 02:08:38 -07:00
mwells
c2f98a81b6 fix floater bug from reading hashtable off disk.
force use floaters if ! useRobots and is diffbot crawl.
2014-09-26 15:30:42 -07:00
mwells
082b39e027 turn off images for qa tests.
fix loop stuff some more. seewms to be slower
2014-09-10 14:13:39 -07:00
mwells
8f14207fc9 fix core dump in qa testing 2014-09-10 08:08:02 -07:00
mwells
caee238c46 fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
mwells
d5ef8a36e7 fix crawldelay bug. we were ignoring it. 2014-08-27 17:19:13 -07:00
mwells
6a28250e94 get qa test working after nyt bug fix 2014-08-06 16:00:25 -07:00
mwells
947be58f10 Merge branch 'diffbot-testing' into testing
Conflicts:
	HttpRequest.cpp
	Msg13.cpp
	XmlDoc.cpp
2014-08-05 17:19:53 -07:00
mwells
cc1ceaaac2 fix nyt.com cookie redir bug.
fixed bug when POSTing injection request with multipart/form-data.
2014-08-05 17:04:11 -07:00
mwells
e66e7e5d11 undid some log debug msg stuff 2014-07-12 17:02:45 -07:00
mwells
2f8207ccf7 qa fixes 2014-07-11 19:07:49 -07:00
mwells
5f26918910 lots of bug fixes. more qa fixes. 2014-07-11 08:00:30 -07:00
Matt Wells
0ecc7933d6 qa test for squid/sections 2014-07-10 16:28:24 -07:00
mwells
05fcef9651 more vote infusion and squid proxy fixes. 2014-07-09 14:57:58 -07:00
mwells
d4218e01d7 inject docs that come through our squid proxy 2014-07-09 12:25:23 -07:00
mwells
d7b67f21e7 return error if we get CONNECT requests. we don't
handle those because we can't cache them or inject
the sectiondb voting info into their tags because they
are encrypted from us.
2014-07-09 11:06:46 -07:00
mwells
d9ae010371 shard gbfacetstr:gbxpathsitehash123456 terms by termid for speed.
got them working again multicasting a msg 0x39 to the appropriate shard.
set special msg39request flag for better performance for those guys.
2014-07-07 12:32:27 -07:00
mwells
6434e5cc04 Merge branch 'testing' into diffbot-matt
Conflicts:
	Errno.cpp
	Errno.h
	Parms.h
2014-07-07 09:49:59 -07:00
mwells
05065f7f8c treat http status 999 as forbidden. 2014-07-07 09:46:24 -07:00
mwells
aeae6bb1a5 qa test updates 2014-07-06 15:04:21 -07:00
mwells
92799ef393 add support for tunnelling https fetch
through an http proxy using CONNECT
directive. needs more debugging.
2014-07-01 10:43:52 -06:00
mwells
9249564191 now floaters are working pretty well 2014-06-30 16:26:10 -06:00
mwells
df8b9bd01a more fixes for section markup proxy 2014-06-12 15:28:03 -07:00
mwells
20c4ac4205 got it marking up html now with sectiondb stats.
seems to work ok.
2014-06-12 14:42:08 -07:00
mwells
ea90e7f755 more fixes for sectiondb markup code 2014-06-12 13:05:45 -07:00
mwells
e4ce9bc9ac squidproxycache/floaters/sectiondbtagging all compiles.
need to do run-time debugging now.
2014-06-11 17:57:28 -07:00
mwells
6f70282ba2 almost got sectiondb integration compiling 2014-06-11 17:24:58 -07:00
mwells
29e90d1d55 squid proxy fixes 2014-06-09 16:10:24 -07:00
mwells
5bf3042633 fix squid proxy cache key generation 2014-06-09 14:37:13 -07:00
mwells
b71ea7f7c6 fixes for squid proxy simulator 2014-06-09 14:31:48 -07:00
mwells
7d452a766c completed squid proxy simulation code 2014-06-09 12:42:05 -07:00
mwells
965d992f98 Merge branch 'diffbot-testing' into diffbot-matt
Conflicts:
	Msg13.cpp
2014-06-06 15:14:41 -07:00
mwells
3f2dcda4e1 got new floater/proxy logic compiling. 2014-06-06 15:11:51 -07:00
Matt Wells
13243a411c more fixes for fake http reply hack 2014-06-05 20:31:49 -07:00
Matt Wells
ce7294e9a9 more mem leak fixes for fake
bulk job empty http replies
2014-06-05 20:09:12 -07:00
mwells
2c750b2c22 Merge branch 'diffbot-testing' into diffbot-matt 2014-06-04 13:56:44 -07:00
mwells
d23032241d fix mem leak when downloading images is turned on. 2014-06-03 13:26:56 -07:00
mwells
91c7115c73 nothing 2014-06-03 11:49:21 -07:00
mwells
6dcbc10e92 spider proxy updates. 2014-06-03 11:38:44 -07:00
mwells
29c1c83967 select the proxy later down the pipeline to allow
for cache hits, etc.
2014-06-02 15:33:25 -07:00
mwells
5377a7543c more spider proxy bug fixes 2014-06-02 15:17:43 -07:00
mwells
ee5af6b30e more spider proxy fixes 2014-06-02 14:59:15 -07:00
mwells
ca450e6bbd using msg55 when done downloading through a proxy to record
stats for load balancing on host #0
2014-06-02 13:48:33 -07:00
mwells
a811462d5f spider proxy stuff compiles now 2014-05-30 15:05:00 -07:00
Matt Wells
d6434191d1 nomenclature changes to reduce collissions.
name collection 'qatest123' for doing smoke tests,
not 'test'.
2014-03-31 15:02:17 -07:00
mwells
b6e5424e32 do not download bulkjob urls in crawlbot.
just return a fake http reply.
however, do use crawl-delay throttling
logic. deduping is already turned off for
bulk jobs so it should be ok.
2014-03-21 12:40:38 -07:00
Matt Wells
e351d2a6f1 get searching on token working 2014-03-06 17:01:41 -08:00
Matt Wells
8aef2ba8a0 take out potentially bad robots.txt
filter compression logic.
2014-01-28 18:26:16 -08:00
Matt Wells
321fc90ff6 fix some cores.
NOTE: emails disabled here... need to fix.
2014-01-24 12:07:28 -08:00