Commit Graph

16 Commits

Author SHA1 Message Date
mwells
c2f98a81b6 fix floater bug from reading hashtable off disk.
force use floaters if ! useRobots and is diffbot crawl.
2014-09-26 15:30:42 -07:00
mwells
6a28250e94 get qa test working after nyt bug fix 2014-08-06 16:00:25 -07:00
mwells
947be58f10 Merge branch 'diffbot-testing' into testing
Conflicts:
	HttpRequest.cpp
	Msg13.cpp
	XmlDoc.cpp
2014-08-05 17:19:53 -07:00
mwells
cc1ceaaac2 fix nyt.com cookie redir bug.
fixed bug when POSTing injection request with multipart/form-data.
2014-08-05 17:04:11 -07:00
mwells
05fcef9651 more vote infusion and squid proxy fixes. 2014-07-09 14:57:58 -07:00
mwells
ea90e7f755 more fixes for sectiondb markup code 2014-06-12 13:05:45 -07:00
mwells
7d452a766c completed squid proxy simulation code 2014-06-09 12:42:05 -07:00
mwells
965d992f98 Merge branch 'diffbot-testing' into diffbot-matt
Conflicts:
	Msg13.cpp
2014-06-06 15:14:41 -07:00
mwells
3f2dcda4e1 got new floater/proxy logic compiling. 2014-06-06 15:11:51 -07:00
Matt Wells
ce7294e9a9 more mem leak fixes for fake
bulk job empty http replies
2014-06-05 20:09:12 -07:00
mwells
ee5af6b30e more spider proxy fixes 2014-06-02 14:59:15 -07:00
mwells
ca450e6bbd using msg55 when done downloading through a proxy to record
stats for load balancing on host #0
2014-06-02 13:48:33 -07:00
mwells
b6e5424e32 do not download bulkjob urls in crawlbot.
just return a fake http reply.
however, do use crawl-delay throttling
logic. deduping is already turned off for
bulk jobs so it should be ok.
2014-03-21 12:40:38 -07:00
Matt Wells
0f3374e3f3 measure crawl delay by default from
start of each download now. it is
a parm in msg13request.
2013-11-26 14:07:28 -08:00
Matt Wells
e8065a0f0a enforce crawl delay perfectly. 2013-11-22 18:26:34 -08:00
Matt Wells
f6e560c1f4 Initial file population. 2013-08-02 13:12:24 -07:00