Commit Graph

14 Commits

Author SHA1 Message Date
Matt Wells
fc17521697 Merge branch 'master' into diffbot
Conflicts:
	Hostdb.cpp
	Makefile
	PageResults.cpp
	PageRoot.cpp
	Pages.cpp
	Rdb.cpp
	SearchInput.cpp
	SearchInput.h
	Spider.cpp
	Spider.h
	XmlDoc.cpp
2013-10-16 14:28:42 -07:00
mwells
3374ce450a fix a couple catdb generation bugs.
MAX_CATIDS violation causing corruption.
not saving catdb tree to catdb-saved.dat
causing missing catdb recs.
2013-10-12 20:33:04 -07:00
mwells
e1bde7b7fe fixed bug of getting lock from the
wrong group.
2013-10-04 12:42:01 -06:00
mwells
10dad2e6bd fixed bug of not removing spider lock
in addSpiderReply() because isAssignedToUs()
was there.
2013-10-03 10:45:19 -06:00
mwells
259ec08e09 email hook now works but you have to
supply the IP address of your sendmail
server and it has to allow email
forwarding from host #0's IP. specify
the sendmail server's IP in the Master
Controls.
2013-10-02 09:36:44 -06:00
mwells
c216f7b2a7 use 48 bit url hash for lock keys again.
query reindex recs can just use their
prob docids as fake uh48s. we need it so we
can avoid the fakedb record and just use
the spider reply to trigger a 5-second
lock expiration. a little simpler. added
logdebugspiderwait for waiting tree debugging.
fixed per ip spider limiting. fixed losing
spiders down blackhole from updateCrawlInfo.
check UrlLock::m_confirmed when counting outstanding
spiders on one ip since may have a lock on one host
but not get granted on all! it calls
confirmLockAcquisition() when it gets fully granted
the lock so it can set UrlLock::confirmed.
2013-09-29 00:09:46 -06:00
Matt Wells
c0f1330d70 Merge branch 'master' into diffbot
Conflicts:

	HttpServer.cpp
	Makefile
	PageGet.cpp
	Pages.h
	SafeBuf.h
2013-09-28 13:13:12 -07:00
mwells
5884951190 only do certain things if running
on a machine in matt wells datacenter.
like fan switching based on temps,
or printing seo links. made seo functions
weak overridable placeholder stubs so if
seo.o is linked in it will override.
include seo.o object if seo.cpp file exists
for automatic seo module building and linking.
2013-09-28 13:43:56 -06:00
mwells
b90ef3de0d more spider fixes. right after getting lock,
use msg12 to remove rec from doledb/doleiptable
and add 0 entry to waiting table so doledb is
again immediately repopulated with that firstIp
so we can spider multiple urls from the same ip
at the same time.
2013-09-23 20:25:28 -06:00
Matt Wells
c77453348f Merge branch 'master' into diffbot
Conflicts:
	SearchInput.cpp
	XmlDoc.cpp
2013-09-18 09:23:48 -07:00
mwells
119a4c0c22 fix adult content detector 2013-09-17 23:53:17 -06:00
Matt Wells
a034604cef clean up to remove g_conf.m_useDiffbot 2013-09-16 15:00:43 -07:00
Matt Wells
5dc7bd2ab4 integrate diffbot from svn back into git. 2013-09-13 09:23:18 -07:00
Matt Wells
f6e560c1f4 Initial file population. 2013-08-02 13:12:24 -07:00