Matt
|
6fc83566e2
|
more fixes
|
2015-02-02 14:06:38 -08:00 |
|
Matt
|
c15bd53e52
|
added support for supplying basic proxy authorization
to spider proxies. username:password@1.2.3.4:80
|
2015-02-02 13:23:38 -08:00 |
|
mwells
|
87285ba3cd
|
use gbmemcpy not memcpy so we can get profiler working again
since memcpy can't be interrupted and backtrace() called.
|
2015-01-13 12:25:42 -07:00 |
|
Matt
|
c03ba31ec2
|
try to reduce log spam
|
2015-01-05 11:03:49 -08:00 |
|
Matt
|
6c5ca9162c
|
quick fix for internal ip bug
|
2014-12-16 13:39:09 -08:00 |
|
Matt
|
329f004e74
|
compiler updates
|
2014-12-10 12:09:04 -08:00 |
|
Matt Wells
|
8e315504a2
|
fix empty rdbcache bug of not enough buf mem.
|
2014-11-27 13:17:00 -08:00 |
|
Matt
|
4e8a42e024
|
text replacements for bad int32_t substitutions
|
2014-11-17 18:24:38 -08:00 |
|
Matt
|
931a1c4bc6
|
good checkpoint. quite a few fixes.
|
2014-11-17 18:13:36 -08:00 |
|
Matt
|
4a0554c76f
|
more 64bit fixes
|
2014-11-14 17:30:32 -08:00 |
|
Matt
|
4c19453ea9
|
working with -m32 for basic testing.
compiles for 64-bit.
|
2014-11-12 11:38:37 -08:00 |
|
Matt
|
96b8197ad3
|
now it compiles with -m32
|
2014-11-10 14:45:11 -08:00 |
|
Matt Wells
|
e7dd8f7956
|
replace long long with int64_t
|
2014-10-30 13:36:39 -06:00 |
|
Mike Tung
|
f14552e194
|
Remove mobile user-agents to prevent fetching mobile version of page.
|
2014-10-13 19:36:34 -07:00 |
|
Matt Wells
|
8bb3545b71
|
emergency fixes for out of sockets core and
get proxy request timing out causing spider to hang bug.
|
2014-10-09 07:20:04 -07:00 |
|
Matt Wells
|
b0974b81fe
|
make it 500 ms
|
2014-10-07 14:44:20 -07:00 |
|
Matt Wells
|
4bdd496db0
|
reduce delay per banned proxy from 2s to 1s
|
2014-10-07 14:43:36 -07:00 |
|
Matt Wells
|
65800b65cf
|
fix so diffbot doesn't timeout due
to large floater/proxy backoff crawl delay.
append &timeout=MAXCRAWLDELAY to diffbot api url.
|
2014-10-07 14:32:38 -07:00 |
|
Matt Wells
|
7df7fbe721
|
support the CONNECT for gb squid proxy
|
2014-10-02 12:36:43 -07:00 |
|
mwells
|
42b891219d
|
several fixes for floater proxy through squid proxy.
gb needs to act like squid for the rendering machines so
it can do crawl delay backoff and load balancing over the
floaters.
|
2014-10-02 02:08:38 -07:00 |
|
mwells
|
c2f98a81b6
|
fix floater bug from reading hashtable off disk.
force use floaters if ! useRobots and is diffbot crawl.
|
2014-09-26 15:30:42 -07:00 |
|
mwells
|
082b39e027
|
turn off images for qa tests.
fix loop stuff some more. seewms to be slower
|
2014-09-10 14:13:39 -07:00 |
|
mwells
|
8f14207fc9
|
fix core dump in qa testing
|
2014-09-10 08:08:02 -07:00 |
|
mwells
|
caee238c46
|
fixes to make easier to compile on max os x.
|
2014-08-28 12:55:02 -07:00 |
|
mwells
|
d5ef8a36e7
|
fix crawldelay bug. we were ignoring it.
|
2014-08-27 17:19:13 -07:00 |
|
mwells
|
6a28250e94
|
get qa test working after nyt bug fix
|
2014-08-06 16:00:25 -07:00 |
|
mwells
|
947be58f10
|
Merge branch 'diffbot-testing' into testing
Conflicts:
HttpRequest.cpp
Msg13.cpp
XmlDoc.cpp
|
2014-08-05 17:19:53 -07:00 |
|
mwells
|
cc1ceaaac2
|
fix nyt.com cookie redir bug.
fixed bug when POSTing injection request with multipart/form-data.
|
2014-08-05 17:04:11 -07:00 |
|
mwells
|
e66e7e5d11
|
undid some log debug msg stuff
|
2014-07-12 17:02:45 -07:00 |
|
mwells
|
2f8207ccf7
|
qa fixes
|
2014-07-11 19:07:49 -07:00 |
|
mwells
|
5f26918910
|
lots of bug fixes. more qa fixes.
|
2014-07-11 08:00:30 -07:00 |
|
Matt Wells
|
0ecc7933d6
|
qa test for squid/sections
|
2014-07-10 16:28:24 -07:00 |
|
mwells
|
05fcef9651
|
more vote infusion and squid proxy fixes.
|
2014-07-09 14:57:58 -07:00 |
|
mwells
|
d4218e01d7
|
inject docs that come through our squid proxy
|
2014-07-09 12:25:23 -07:00 |
|
mwells
|
d7b67f21e7
|
return error if we get CONNECT requests. we don't
handle those because we can't cache them or inject
the sectiondb voting info into their tags because they
are encrypted from us.
|
2014-07-09 11:06:46 -07:00 |
|
mwells
|
d9ae010371
|
shard gbfacetstr:gbxpathsitehash123456 terms by termid for speed.
got them working again multicasting a msg 0x39 to the appropriate shard.
set special msg39request flag for better performance for those guys.
|
2014-07-07 12:32:27 -07:00 |
|
mwells
|
6434e5cc04
|
Merge branch 'testing' into diffbot-matt
Conflicts:
Errno.cpp
Errno.h
Parms.h
|
2014-07-07 09:49:59 -07:00 |
|
mwells
|
05065f7f8c
|
treat http status 999 as forbidden.
|
2014-07-07 09:46:24 -07:00 |
|
mwells
|
aeae6bb1a5
|
qa test updates
|
2014-07-06 15:04:21 -07:00 |
|
mwells
|
92799ef393
|
add support for tunnelling https fetch
through an http proxy using CONNECT
directive. needs more debugging.
|
2014-07-01 10:43:52 -06:00 |
|
mwells
|
9249564191
|
now floaters are working pretty well
|
2014-06-30 16:26:10 -06:00 |
|
mwells
|
df8b9bd01a
|
more fixes for section markup proxy
|
2014-06-12 15:28:03 -07:00 |
|
mwells
|
20c4ac4205
|
got it marking up html now with sectiondb stats.
seems to work ok.
|
2014-06-12 14:42:08 -07:00 |
|
mwells
|
ea90e7f755
|
more fixes for sectiondb markup code
|
2014-06-12 13:05:45 -07:00 |
|
mwells
|
e4ce9bc9ac
|
squidproxycache/floaters/sectiondbtagging all compiles.
need to do run-time debugging now.
|
2014-06-11 17:57:28 -07:00 |
|
mwells
|
6f70282ba2
|
almost got sectiondb integration compiling
|
2014-06-11 17:24:58 -07:00 |
|
mwells
|
29e90d1d55
|
squid proxy fixes
|
2014-06-09 16:10:24 -07:00 |
|
mwells
|
5bf3042633
|
fix squid proxy cache key generation
|
2014-06-09 14:37:13 -07:00 |
|
mwells
|
b71ea7f7c6
|
fixes for squid proxy simulator
|
2014-06-09 14:31:48 -07:00 |
|
mwells
|
7d452a766c
|
completed squid proxy simulation code
|
2014-06-09 12:42:05 -07:00 |
|