Matt Wells
a81f2145bd
fix sendmail ip to 127.0.0.1
2014-05-16 08:08:20 -07:00
Matt Wells
7ca1e8e790
gb.conf update for new parm
2014-05-12 10:53:22 -07:00
Matt Wells
3b97682cc3
more bool query fixes
2014-03-18 10:44:56 -07:00
Matt Wells
6e23d37e47
Merge branch 'diffbot' into diffbot-testing
2014-03-17 17:27:28 -07:00
Matt Wells
edbd61b0c5
thread fixes. if pthread_create fails then
...
keep thread queue and just return. will try to
relaunch later. do not count delete keys towards
shard rebalance count.
2014-03-15 20:07:02 -07:00
mwells
7812f5c746
more bool fixes. still needs a little more work
2014-03-13 13:54:23 -07:00
Matt Wells
68a14de031
security admin fixes
2014-02-12 00:36:09 -07:00
Matt Wells
c9be18615c
more parm saving fixes
2014-02-10 22:04:22 -07:00
Matt Wells
ecdd167d9b
code checkpoint
2014-02-09 16:41:43 -07:00
Matt Wells
dabd691626
basic admin controls page structure
2014-02-08 00:34:45 -07:00
Matt Wells
239811b024
take out confusing function no longer used
2014-01-28 11:10:59 -08:00
Matt Wells
8a9b1f7a19
added diffbot retry rules.
...
added maxTotalSpiders parm for
all colls to follow.
tried to fix msg 0x00 socket jam up.
2014-01-22 19:57:38 -08:00
Matt Wells
443bb26f01
disk page cache back on
2014-01-21 19:03:47 -08:00
Matt Wells
33c5d9c07f
a lot of times rdb tree has invalid collection
...
numbers in it so fix our counting algo in case
the collection rec no longer exists!
2014-01-21 19:01:44 -08:00
Matt Wells
e6eb9003b5
more formatting
2014-01-19 01:09:38 -08:00
Matt Wells
1d6ba52dcd
list collections in sidebar.
2014-01-09 21:13:41 -08:00
Matt Wells
6660dca57c
default parm updates
2014-01-09 20:07:19 -08:00
Matt Wells
c596b6c5a6
default gb.conf update
2014-01-09 19:59:02 -08:00
Matt Wells
d76e7a9c8e
highlight non-default value parms.
2014-01-09 19:37:17 -08:00
Matt Wells
2ac8ff2952
compile regex so it's case dependent
2013-12-23 09:30:35 -08:00
Matt Wells
6f2e552bcd
fix core in linked list of msg13requests in
...
case one gets freed
2013-12-20 11:26:46 -08:00
Matt Wells
144e2c898e
save resources by not doing reads
...
on an empty doledb priority.
stop saving allSpidersOn and Off parms.
2013-12-08 14:07:31 -07:00
Matt Wells
3cc300bf03
spider log debug msg fix.
...
boost max cpu threads to 10, seems
to have many cores usually.
2013-11-22 14:17:10 -08:00
Matt Wells
43e40208b8
Merge branch 'master' into diffbot
...
Conflicts:
SafeBuf.cpp
SafeBuf.h
SearchInput.cpp
XmlDoc.cpp
2013-11-20 15:51:58 -08:00
Matt Wells
245264c2c9
fix respider frequency bug.
2013-10-21 15:06:23 -07:00
mwells
11897f09da
turn off log debug msg.
2013-10-16 16:24:08 -06:00
mwells
6052f60c48
speed up dirty word detection since we added a bunch
...
of new dirty words/phrases.
2013-10-15 22:41:31 -07:00
mwells
2bb8b818d6
more bug fixes with notification system.
2013-10-09 16:28:15 -06:00
mwells
c1c5c4e3d0
send notifications if no urls available
...
for immediate spidering.
2013-10-09 15:24:35 -06:00
mwells
259ec08e09
email hook now works but you have to
...
supply the IP address of your sendmail
server and it has to allow email
forwarding from host #0 's IP. specify
the sendmail server's IP in the Master
Controls.
2013-10-02 09:36:44 -06:00
mwells
20952eedbe
customizable api list in url filters
2013-09-30 09:18:22 -06:00
mwells
0edcbcc7d8
printlocktable() function
2013-09-29 10:20:14 -06:00
mwells
9bf8bf7712
add spider reply even on g_errno now with an error
...
code of EINTERNAL error in the spider reply.
no longer just sit on the lock. this was blocking
an entire ip when just lock sitting for 3 hrs.
and only do read rate timeouts if there was at least
one byte read. this was causing diffbot reply to
read rate timeout after just 60 seconds even though
its timeout was specified as 90 seconds.
2013-09-29 09:22:20 -06:00
mwells
c216f7b2a7
use 48 bit url hash for lock keys again.
...
query reindex recs can just use their
prob docids as fake uh48s. we need it so we
can avoid the fakedb record and just use
the spider reply to trigger a 5-second
lock expiration. a little simpler. added
logdebugspiderwait for waiting tree debugging.
fixed per ip spider limiting. fixed losing
spiders down blackhole from updateCrawlInfo.
check UrlLock::m_confirmed when counting outstanding
spiders on one ip since may have a lock on one host
but not get granted on all! it calls
confirmLockAcquisition() when it gets fully granted
the lock so it can set UrlLock::confirmed.
2013-09-29 00:09:46 -06:00
mwells
5884951190
only do certain things if running
...
on a machine in matt wells datacenter.
like fan switching based on temps,
or printing seo links. made seo functions
weak overridable placeholder stubs so if
seo.o is linked in it will override.
include seo.o object if seo.cpp file exists
for automatic seo module building and linking.
2013-09-28 13:43:56 -06:00
mwells
e3c4ce189a
fixed cores. fixed json.
2013-09-26 14:28:04 -06:00
mwells
0039b23064
almost done with json api.
2013-09-25 15:37:20 -06:00
mwells
1d92004e06
fix spider flow debug msgs
2013-09-25 12:07:11 -06:00
mwells
8461e33b53
fixed more spider bugs.
2013-09-23 21:26:27 -07:00
mwells
b90ef3de0d
more spider fixes. right after getting lock,
...
use msg12 to remove rec from doledb/doleiptable
and add 0 entry to waiting table so doledb is
again immediately repopulated with that firstIp
so we can spider multiple urls from the same ip
at the same time.
2013-09-23 20:25:28 -06:00
mwells
7c31ecff4a
fixed fakedb key support.
2013-09-23 15:16:23 -06:00
mwells
4d33737ac1
fakedb fixes
2013-09-23 08:19:54 -07:00
Matt Wells
6af02119a1
use cookies to display url filters table.
2013-09-18 13:50:55 -07:00
Matt Wells
487d3f0a0e
fix url filters bugs.
2013-09-18 11:02:09 -07:00
Matt Wells
78a334198b
Merge branch 'master' into diffbot
2013-09-16 09:05:37 -07:00
Matt Wells
e6f87f5049
do not send email alerts to sysadmin@gigablast.
2013-09-16 08:10:18 -07:00
Matt Wells
6b330da240
cleanup warnings in log.
2013-09-13 14:37:35 -07:00
Matt Wells
19056fc3f2
show "processed" instead of "matched".
...
other fixes for spider stats. add
new crawl stats. attempts and successes.
2013-09-13 11:51:55 -07:00
mwells
37a6549a58
updates to developer.html developer
...
documentation. removed a lot of obsolete
information. still needs more work.
2013-08-21 13:09:55 -06:00
mwells
24af21394d
dns ip fix in gb.conf.
2013-08-19 15:25:37 -06:00