Commit Graph

988 Commits

Author SHA1 Message Date
Matt Wells
609a344a57 fix counting bug in array parms 2014-02-11 22:28:04 -07:00
Matt Wells
9a76ff2531 minor parm updates 2014-02-11 20:50:36 -07:00
Matt Wells
c9be18615c more parm saving fixes 2014-02-10 22:04:22 -07:00
Matt Wells
2efbb602df fix saving parms bug 2014-02-10 21:52:29 -07:00
Matt Wells
953b7c558d parm updates 2014-02-10 21:45:03 -07:00
Matt Wells
c041d47a0c html formatting updates 2014-02-10 00:15:04 -07:00
Matt Wells
b309d84245 html updates 2014-02-09 23:19:43 -07:00
Matt Wells
9f0d2ad82e parm updates 2014-02-09 23:05:36 -07:00
Matt Wells
cdf2550136 more parm fixes 2014-02-09 22:51:16 -07:00
Matt Wells
c2c3fe993c parm fixes for basic pages 2014-02-09 22:25:08 -07:00
Matt Wells
d2b473e554 checkpoint 2014-02-09 19:09:44 -07:00
Matt Wells
91ea5384a6 formatting changes 2014-02-09 16:57:39 -07:00
Matt Wells
ecdd167d9b code checkpoint 2014-02-09 16:41:43 -07:00
Matt Wells
f420bd2769 checkpoint 2014-02-09 15:09:48 -07:00
Matt Wells
c9ef525338 code checkpoint 2014-02-09 12:55:45 -07:00
Matt Wells
6c9a44367f code checkpoint 2014-02-09 12:38:40 -07:00
Matt Wells
e60576c8eb another code checkpoint 2014-02-08 22:57:30 -07:00
Matt Wells
156b50240a code checkpoint 2014-02-08 16:24:33 -07:00
Matt Wells
e593b6e1de basic controls code checkpoint. 2014-02-08 15:10:06 -07:00
Matt Wells
dabd691626 basic admin controls page structure 2014-02-08 00:34:45 -07:00
Matt Wells
fc47c18aec new printadmintop functionality. 2014-02-07 23:08:04 -07:00
Matt Wells
258e3cba0d fix maxtocrawl limit thing 2014-02-04 09:25:27 -07:00
Matt Wells
17fff243f9 add connectips back. call them adminIps this time.
if your ip is on the list then you have admin
access. cookie tokens will come later/soon.
2014-02-03 20:47:48 -07:00
Matt Wells
5ea852dac3 fix core when thread fails to spawn. 2014-02-03 07:27:32 -07:00
Matt Wells
b46da4c192 prevent msg20/tagdb lookup socket jam up.
throttle back max outstanding msg20s (summary generations)
based on used udp sockets.
2014-02-03 07:09:29 -07:00
Matt Wells
56adb2ee8c nomenclature. url filters -> spider scheduler 2014-02-02 17:00:11 -07:00
Matt Wells
10235bb840 fix add url and cached page getting 2014-02-02 16:49:31 -07:00
Matt Wells
7bf8a2ac49 do not let glibc do malloc checks, we do that. 2014-02-02 13:41:59 -07:00
Matt Wells
4be68fdaa6 set safebuf::m_buf to null in destructor 2014-02-02 12:16:11 -07:00
Matt Wells
0df697e56a fix keep alive loop code to bail out if
fails to bind to socket as well as quick cores.
2014-02-02 12:11:18 -07:00
Matt Wells
f58a94a8cc fix diffbot url bug 2014-02-02 11:53:10 -07:00
Matt Wells
93021b2f13 Merge branch 'diffbot'
Conflicts:

	Collectiondb.cpp
	Spider.cpp
	Spider.h
2014-02-01 11:31:00 -07:00
Matt Wells
095c47f181 Merge branch 'diffbot'
Conflicts:

	Collectiondb.cpp
	Spider.cpp
	Spider.h
2014-02-01 11:28:31 -07:00
Matt Wells
4346fcee29 added recovery mode display in hosts table 2014-02-01 10:16:46 -08:00
Matt Wells
4d2eafe39b added some repair logic for 0001.dat files.
turn of spiderdb disk cache for now.
2014-02-01 10:14:25 -08:00
Matt Wells
10d0e9f52b Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot 2014-01-31 14:54:23 -08:00
Matt Wells
392d043bd8 undo canonical deduping.
added dump round stats when uploading
json files.
2014-01-31 14:53:49 -08:00
Matt Wells
6e9b4f8ca2 fix core 2014-01-30 22:03:12 -07:00
Matt Wells
e8a6d8f345 fix another core from freening wrong byte sized
crawl info reply.
2014-01-30 20:16:41 -08:00
Matt Wells
09fd98c95b Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot 2014-01-30 19:57:07 -08:00
Matt Wells
7107f730d0 fix another core from deleting a coll
and deleting a spidercoll in progress.
2014-01-30 19:56:43 -08:00
Matt Wells
4a1ad74f79 test fix for keep alive infinite loop bug. 2014-01-30 14:16:16 -08:00
Matt Wells
83e291f12b fix infinite keep alive restart bug some more 2014-01-30 14:12:32 -08:00
Matt Wells
03aa7842d0 do not enter into an inifinite keep alive restart loop. 2014-01-30 14:40:03 -07:00
Matt Wells
40f373c9e0 Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot 2014-01-30 13:11:48 -08:00
Matt Wells
95a47a776e image updates 2014-01-30 13:11:26 -08:00
Matt Wells
8bdb9d1a3e doc updates per john on how we dedup 2014-01-30 10:57:49 -08:00
Matt Wells
8876dae984 added and fixed support for <link ahref=xxx rel=canonical>.
treat those as simplified meta redirects.
updated spider dedup documentation in developer.html file.
2014-01-30 10:37:59 -08:00
Matt Wells
6a45e42128 added ability to treat <link xyz.com rel=canoical> as meta redirects.
should help us dedup.
added a function to do looser deduping of spider pages although current
not enabled, we are still using the more strict one.
added documentation on how we dedup to developer.html for jon to
take a look at.
2014-01-30 10:04:09 -08:00
Matt Wells
6af9441818 change deduping logic to be first come first
server, but site rank trumps. fixed bug from
fix before.
2014-01-29 16:14:42 -08:00