Matt Wells
609a344a57
fix counting bug in array parms
2014-02-11 22:28:04 -07:00
Matt Wells
9a76ff2531
minor parm updates
2014-02-11 20:50:36 -07:00
Matt Wells
c9be18615c
more parm saving fixes
2014-02-10 22:04:22 -07:00
Matt Wells
2efbb602df
fix saving parms bug
2014-02-10 21:52:29 -07:00
Matt Wells
953b7c558d
parm updates
2014-02-10 21:45:03 -07:00
Matt Wells
c041d47a0c
html formatting updates
2014-02-10 00:15:04 -07:00
Matt Wells
b309d84245
html updates
2014-02-09 23:19:43 -07:00
Matt Wells
9f0d2ad82e
parm updates
2014-02-09 23:05:36 -07:00
Matt Wells
cdf2550136
more parm fixes
2014-02-09 22:51:16 -07:00
Matt Wells
c2c3fe993c
parm fixes for basic pages
2014-02-09 22:25:08 -07:00
Matt Wells
d2b473e554
checkpoint
2014-02-09 19:09:44 -07:00
Matt Wells
91ea5384a6
formatting changes
2014-02-09 16:57:39 -07:00
Matt Wells
ecdd167d9b
code checkpoint
2014-02-09 16:41:43 -07:00
Matt Wells
f420bd2769
checkpoint
2014-02-09 15:09:48 -07:00
Matt Wells
c9ef525338
code checkpoint
2014-02-09 12:55:45 -07:00
Matt Wells
6c9a44367f
code checkpoint
2014-02-09 12:38:40 -07:00
Matt Wells
e60576c8eb
another code checkpoint
2014-02-08 22:57:30 -07:00
Matt Wells
156b50240a
code checkpoint
2014-02-08 16:24:33 -07:00
Matt Wells
e593b6e1de
basic controls code checkpoint.
2014-02-08 15:10:06 -07:00
Matt Wells
dabd691626
basic admin controls page structure
2014-02-08 00:34:45 -07:00
Matt Wells
fc47c18aec
new printadmintop functionality.
2014-02-07 23:08:04 -07:00
Matt Wells
258e3cba0d
fix maxtocrawl limit thing
2014-02-04 09:25:27 -07:00
Matt Wells
17fff243f9
add connectips back. call them adminIps this time.
...
if your ip is on the list then you have admin
access. cookie tokens will come later/soon.
2014-02-03 20:47:48 -07:00
Matt Wells
5ea852dac3
fix core when thread fails to spawn.
2014-02-03 07:27:32 -07:00
Matt Wells
b46da4c192
prevent msg20/tagdb lookup socket jam up.
...
throttle back max outstanding msg20s (summary generations)
based on used udp sockets.
2014-02-03 07:09:29 -07:00
Matt Wells
56adb2ee8c
nomenclature. url filters -> spider scheduler
2014-02-02 17:00:11 -07:00
Matt Wells
10235bb840
fix add url and cached page getting
2014-02-02 16:49:31 -07:00
Matt Wells
7bf8a2ac49
do not let glibc do malloc checks, we do that.
2014-02-02 13:41:59 -07:00
Matt Wells
4be68fdaa6
set safebuf::m_buf to null in destructor
2014-02-02 12:16:11 -07:00
Matt Wells
0df697e56a
fix keep alive loop code to bail out if
...
fails to bind to socket as well as quick cores.
2014-02-02 12:11:18 -07:00
Matt Wells
f58a94a8cc
fix diffbot url bug
2014-02-02 11:53:10 -07:00
Matt Wells
93021b2f13
Merge branch 'diffbot'
...
Conflicts:
Collectiondb.cpp
Spider.cpp
Spider.h
2014-02-01 11:31:00 -07:00
Matt Wells
095c47f181
Merge branch 'diffbot'
...
Conflicts:
Collectiondb.cpp
Spider.cpp
Spider.h
2014-02-01 11:28:31 -07:00
Matt Wells
4346fcee29
added recovery mode display in hosts table
2014-02-01 10:16:46 -08:00
Matt Wells
4d2eafe39b
added some repair logic for 0001.dat files.
...
turn of spiderdb disk cache for now.
2014-02-01 10:14:25 -08:00
Matt Wells
10d0e9f52b
Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
2014-01-31 14:54:23 -08:00
Matt Wells
392d043bd8
undo canonical deduping.
...
added dump round stats when uploading
json files.
2014-01-31 14:53:49 -08:00
Matt Wells
6e9b4f8ca2
fix core
2014-01-30 22:03:12 -07:00
Matt Wells
e8a6d8f345
fix another core from freening wrong byte sized
...
crawl info reply.
2014-01-30 20:16:41 -08:00
Matt Wells
09fd98c95b
Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
2014-01-30 19:57:07 -08:00
Matt Wells
7107f730d0
fix another core from deleting a coll
...
and deleting a spidercoll in progress.
2014-01-30 19:56:43 -08:00
Matt Wells
4a1ad74f79
test fix for keep alive infinite loop bug.
2014-01-30 14:16:16 -08:00
Matt Wells
83e291f12b
fix infinite keep alive restart bug some more
2014-01-30 14:12:32 -08:00
Matt Wells
03aa7842d0
do not enter into an inifinite keep alive restart loop.
2014-01-30 14:40:03 -07:00
Matt Wells
40f373c9e0
Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
2014-01-30 13:11:48 -08:00
Matt Wells
95a47a776e
image updates
2014-01-30 13:11:26 -08:00
Matt Wells
8bdb9d1a3e
doc updates per john on how we dedup
2014-01-30 10:57:49 -08:00
Matt Wells
8876dae984
added and fixed support for <link ahref=xxx rel=canonical>.
...
treat those as simplified meta redirects.
updated spider dedup documentation in developer.html file.
2014-01-30 10:37:59 -08:00
Matt Wells
6a45e42128
added ability to treat <link xyz.com rel=canoical> as meta redirects.
...
should help us dedup.
added a function to do looser deduping of spider pages although current
not enabled, we are still using the more strict one.
added documentation on how we dedup to developer.html for jon to
take a look at.
2014-01-30 10:04:09 -08:00
Matt Wells
6af9441818
change deduping logic to be first come first
...
server, but site rank trumps. fixed bug from
fix before.
2014-01-29 16:14:42 -08:00