mwells
|
c2f98a81b6
|
fix floater bug from reading hashtable off disk.
force use floaters if ! useRobots and is diffbot crawl.
|
2014-09-26 15:30:42 -07:00 |
|
mwells
|
6a28250e94
|
get qa test working after nyt bug fix
|
2014-08-06 16:00:25 -07:00 |
|
mwells
|
947be58f10
|
Merge branch 'diffbot-testing' into testing
Conflicts:
HttpRequest.cpp
Msg13.cpp
XmlDoc.cpp
|
2014-08-05 17:19:53 -07:00 |
|
mwells
|
cc1ceaaac2
|
fix nyt.com cookie redir bug.
fixed bug when POSTing injection request with multipart/form-data.
|
2014-08-05 17:04:11 -07:00 |
|
mwells
|
05fcef9651
|
more vote infusion and squid proxy fixes.
|
2014-07-09 14:57:58 -07:00 |
|
mwells
|
ea90e7f755
|
more fixes for sectiondb markup code
|
2014-06-12 13:05:45 -07:00 |
|
mwells
|
7d452a766c
|
completed squid proxy simulation code
|
2014-06-09 12:42:05 -07:00 |
|
mwells
|
965d992f98
|
Merge branch 'diffbot-testing' into diffbot-matt
Conflicts:
Msg13.cpp
|
2014-06-06 15:14:41 -07:00 |
|
mwells
|
3f2dcda4e1
|
got new floater/proxy logic compiling.
|
2014-06-06 15:11:51 -07:00 |
|
Matt Wells
|
ce7294e9a9
|
more mem leak fixes for fake
bulk job empty http replies
|
2014-06-05 20:09:12 -07:00 |
|
mwells
|
ee5af6b30e
|
more spider proxy fixes
|
2014-06-02 14:59:15 -07:00 |
|
mwells
|
ca450e6bbd
|
using msg55 when done downloading through a proxy to record
stats for load balancing on host #0
|
2014-06-02 13:48:33 -07:00 |
|
mwells
|
b6e5424e32
|
do not download bulkjob urls in crawlbot.
just return a fake http reply.
however, do use crawl-delay throttling
logic. deduping is already turned off for
bulk jobs so it should be ok.
|
2014-03-21 12:40:38 -07:00 |
|
Matt Wells
|
0f3374e3f3
|
measure crawl delay by default from
start of each download now. it is
a parm in msg13request.
|
2013-11-26 14:07:28 -08:00 |
|
Matt Wells
|
e8065a0f0a
|
enforce crawl delay perfectly.
|
2013-11-22 18:26:34 -08:00 |
|
Matt Wells
|
f6e560c1f4
|
Initial file population.
|
2013-08-02 13:12:24 -07:00 |
|