Commit Graph

81 Commits

Author SHA1 Message Date
Matt
6fc83566e2 more fixes 2015-02-02 14:06:38 -08:00
Matt
c15bd53e52 added support for supplying basic proxy authorization
to spider proxies. username:password@1.2.3.4:80
2015-02-02 13:23:38 -08:00
mwells
87285ba3cd use gbmemcpy not memcpy so we can get profiler working again
since memcpy can't be interrupted and backtrace() called.
2015-01-13 12:25:42 -07:00
Matt
c03ba31ec2 try to reduce log spam 2015-01-05 11:03:49 -08:00
Matt
6c5ca9162c quick fix for internal ip bug 2014-12-16 13:39:09 -08:00
Matt
329f004e74 compiler updates 2014-12-10 12:09:04 -08:00
Matt Wells
8e315504a2 fix empty rdbcache bug of not enough buf mem. 2014-11-27 13:17:00 -08:00
Matt
4e8a42e024 text replacements for bad int32_t substitutions 2014-11-17 18:24:38 -08:00
Matt
931a1c4bc6 good checkpoint. quite a few fixes. 2014-11-17 18:13:36 -08:00
Matt
4a0554c76f more 64bit fixes 2014-11-14 17:30:32 -08:00
Matt
4c19453ea9 working with -m32 for basic testing.
compiles for 64-bit.
2014-11-12 11:38:37 -08:00
Matt
96b8197ad3 now it compiles with -m32 2014-11-10 14:45:11 -08:00
Matt Wells
e7dd8f7956 replace long long with int64_t 2014-10-30 13:36:39 -06:00
Mike Tung
f14552e194 Remove mobile user-agents to prevent fetching mobile version of page. 2014-10-13 19:36:34 -07:00
Matt Wells
8bb3545b71 emergency fixes for out of sockets core and
get proxy request timing out causing spider to hang bug.
2014-10-09 07:20:04 -07:00
Matt Wells
b0974b81fe make it 500 ms 2014-10-07 14:44:20 -07:00
Matt Wells
4bdd496db0 reduce delay per banned proxy from 2s to 1s 2014-10-07 14:43:36 -07:00
Matt Wells
65800b65cf fix so diffbot doesn't timeout due
to large floater/proxy backoff crawl delay.
append &timeout=MAXCRAWLDELAY to diffbot api url.
2014-10-07 14:32:38 -07:00
Matt Wells
7df7fbe721 support the CONNECT for gb squid proxy 2014-10-02 12:36:43 -07:00
mwells
42b891219d several fixes for floater proxy through squid proxy.
gb needs to act like squid for the rendering machines so
it can do crawl delay backoff and load balancing over the
floaters.
2014-10-02 02:08:38 -07:00
mwells
c2f98a81b6 fix floater bug from reading hashtable off disk.
force use floaters if ! useRobots and is diffbot crawl.
2014-09-26 15:30:42 -07:00
mwells
082b39e027 turn off images for qa tests.
fix loop stuff some more. seewms to be slower
2014-09-10 14:13:39 -07:00
mwells
8f14207fc9 fix core dump in qa testing 2014-09-10 08:08:02 -07:00
mwells
caee238c46 fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
mwells
d5ef8a36e7 fix crawldelay bug. we were ignoring it. 2014-08-27 17:19:13 -07:00
mwells
6a28250e94 get qa test working after nyt bug fix 2014-08-06 16:00:25 -07:00
mwells
947be58f10 Merge branch 'diffbot-testing' into testing
Conflicts:
	HttpRequest.cpp
	Msg13.cpp
	XmlDoc.cpp
2014-08-05 17:19:53 -07:00
mwells
cc1ceaaac2 fix nyt.com cookie redir bug.
fixed bug when POSTing injection request with multipart/form-data.
2014-08-05 17:04:11 -07:00
mwells
e66e7e5d11 undid some log debug msg stuff 2014-07-12 17:02:45 -07:00
mwells
2f8207ccf7 qa fixes 2014-07-11 19:07:49 -07:00
mwells
5f26918910 lots of bug fixes. more qa fixes. 2014-07-11 08:00:30 -07:00
Matt Wells
0ecc7933d6 qa test for squid/sections 2014-07-10 16:28:24 -07:00
mwells
05fcef9651 more vote infusion and squid proxy fixes. 2014-07-09 14:57:58 -07:00
mwells
d4218e01d7 inject docs that come through our squid proxy 2014-07-09 12:25:23 -07:00
mwells
d7b67f21e7 return error if we get CONNECT requests. we don't
handle those because we can't cache them or inject
the sectiondb voting info into their tags because they
are encrypted from us.
2014-07-09 11:06:46 -07:00
mwells
d9ae010371 shard gbfacetstr:gbxpathsitehash123456 terms by termid for speed.
got them working again multicasting a msg 0x39 to the appropriate shard.
set special msg39request flag for better performance for those guys.
2014-07-07 12:32:27 -07:00
mwells
6434e5cc04 Merge branch 'testing' into diffbot-matt
Conflicts:
	Errno.cpp
	Errno.h
	Parms.h
2014-07-07 09:49:59 -07:00
mwells
05065f7f8c treat http status 999 as forbidden. 2014-07-07 09:46:24 -07:00
mwells
aeae6bb1a5 qa test updates 2014-07-06 15:04:21 -07:00
mwells
92799ef393 add support for tunnelling https fetch
through an http proxy using CONNECT
directive. needs more debugging.
2014-07-01 10:43:52 -06:00
mwells
9249564191 now floaters are working pretty well 2014-06-30 16:26:10 -06:00
mwells
df8b9bd01a more fixes for section markup proxy 2014-06-12 15:28:03 -07:00
mwells
20c4ac4205 got it marking up html now with sectiondb stats.
seems to work ok.
2014-06-12 14:42:08 -07:00
mwells
ea90e7f755 more fixes for sectiondb markup code 2014-06-12 13:05:45 -07:00
mwells
e4ce9bc9ac squidproxycache/floaters/sectiondbtagging all compiles.
need to do run-time debugging now.
2014-06-11 17:57:28 -07:00
mwells
6f70282ba2 almost got sectiondb integration compiling 2014-06-11 17:24:58 -07:00
mwells
29e90d1d55 squid proxy fixes 2014-06-09 16:10:24 -07:00
mwells
5bf3042633 fix squid proxy cache key generation 2014-06-09 14:37:13 -07:00
mwells
b71ea7f7c6 fixes for squid proxy simulator 2014-06-09 14:31:48 -07:00
mwells
7d452a766c completed squid proxy simulation code 2014-06-09 12:42:05 -07:00