Commit Graph

24 Commits

Author SHA1 Message Date
mwells
7a0971ca39 fix nasty spider bug that was not prioritizing things right.
fixed image debug logging.
2014-05-10 10:07:37 -07:00
Matt Wells
75032da5b9 fix pagination for &stream=1 2014-04-22 11:18:21 -07:00
mwells
72dc660598 Merge branch 'testing' into diffbot-matt
Conflicts:
	Collectiondb.cpp
	HttpRequest.h
	PageBasic.cpp
	coll.main.0/coll.conf
2014-04-09 11:18:39 -07:00
Matt Wells
8aa0662a27 Merge branch 'diffbot' into testing
Conflicts:

	Make.depend
	PageResults.cpp
	Parms.cpp
	Spider.cpp
	Spider.h
	gb.conf
2014-03-08 09:38:44 -07:00
Matt Wells
c143ee1fba fix core when creating a new collection because
we incremented m_numRecs but did not grow the ptr buffer.
also added support for localgb.conf so we can use that
instead of gb.conf to avoid git push/pull conflicts.
2014-03-07 09:05:14 -08:00
Matt Wells
3b0a571cea fix security system to actually work now 2014-02-12 00:06:00 -07:00
Matt Wells
156b50240a code checkpoint 2014-02-08 16:24:33 -07:00
Matt Wells
17fff243f9 add connectips back. call them adminIps this time.
if your ip is on the list then you have admin
access. cookie tokens will come later/soon.
2014-02-03 20:47:48 -07:00
Matt Wells
8a9b1f7a19 added diffbot retry rules.
added maxTotalSpiders parm for
all colls to follow.
tried to fix msg 0x00 socket jam up.
2014-01-22 19:57:38 -08:00
Matt Wells
dba382f7f7 added max cpu merge threads parm and defaulted to 10
up from 2 for better disk reading latencies.
2014-01-21 13:11:53 -08:00
Matt Wells
fc17521697 Merge branch 'master' into diffbot
Conflicts:
	Hostdb.cpp
	Makefile
	PageResults.cpp
	PageRoot.cpp
	Pages.cpp
	Rdb.cpp
	SearchInput.cpp
	SearchInput.h
	Spider.cpp
	Spider.h
	XmlDoc.cpp
2013-10-16 14:28:42 -07:00
mwells
3374ce450a fix a couple catdb generation bugs.
MAX_CATIDS violation causing corruption.
not saving catdb tree to catdb-saved.dat
causing missing catdb recs.
2013-10-12 20:33:04 -07:00
mwells
e1bde7b7fe fixed bug of getting lock from the
wrong group.
2013-10-04 12:42:01 -06:00
mwells
10dad2e6bd fixed bug of not removing spider lock
in addSpiderReply() because isAssignedToUs()
was there.
2013-10-03 10:45:19 -06:00
mwells
259ec08e09 email hook now works but you have to
supply the IP address of your sendmail
server and it has to allow email
forwarding from host #0's IP. specify
the sendmail server's IP in the Master
Controls.
2013-10-02 09:36:44 -06:00
mwells
c216f7b2a7 use 48 bit url hash for lock keys again.
query reindex recs can just use their
prob docids as fake uh48s. we need it so we
can avoid the fakedb record and just use
the spider reply to trigger a 5-second
lock expiration. a little simpler. added
logdebugspiderwait for waiting tree debugging.
fixed per ip spider limiting. fixed losing
spiders down blackhole from updateCrawlInfo.
check UrlLock::m_confirmed when counting outstanding
spiders on one ip since may have a lock on one host
but not get granted on all! it calls
confirmLockAcquisition() when it gets fully granted
the lock so it can set UrlLock::confirmed.
2013-09-29 00:09:46 -06:00
Matt Wells
c0f1330d70 Merge branch 'master' into diffbot
Conflicts:

	HttpServer.cpp
	Makefile
	PageGet.cpp
	Pages.h
	SafeBuf.h
2013-09-28 13:13:12 -07:00
mwells
5884951190 only do certain things if running
on a machine in matt wells datacenter.
like fan switching based on temps,
or printing seo links. made seo functions
weak overridable placeholder stubs so if
seo.o is linked in it will override.
include seo.o object if seo.cpp file exists
for automatic seo module building and linking.
2013-09-28 13:43:56 -06:00
mwells
b90ef3de0d more spider fixes. right after getting lock,
use msg12 to remove rec from doledb/doleiptable
and add 0 entry to waiting table so doledb is
again immediately repopulated with that firstIp
so we can spider multiple urls from the same ip
at the same time.
2013-09-23 20:25:28 -06:00
Matt Wells
c77453348f Merge branch 'master' into diffbot
Conflicts:
	SearchInput.cpp
	XmlDoc.cpp
2013-09-18 09:23:48 -07:00
mwells
119a4c0c22 fix adult content detector 2013-09-17 23:53:17 -06:00
Matt Wells
a034604cef clean up to remove g_conf.m_useDiffbot 2013-09-16 15:00:43 -07:00
Matt Wells
5dc7bd2ab4 integrate diffbot from svn back into git. 2013-09-13 09:23:18 -07:00
Matt Wells
f6e560c1f4 Initial file population. 2013-08-02 13:12:24 -07:00