Commit Graph

48 Commits

Author SHA1 Message Date
mwells
61b4ec4ca6 added some qa testing logic. qa.cpp. 2014-04-05 11:33:42 -07:00
Matt Wells
8aa0662a27 Merge branch 'diffbot' into testing
Conflicts:

	Make.depend
	PageResults.cpp
	Parms.cpp
	Spider.cpp
	Spider.h
	gb.conf
2014-03-08 09:38:44 -07:00
Matt Wells
a6b7e088f5 take out tfndb, unused. fix core
from diffbot url too long.
2014-02-26 01:07:13 -08:00
Matt Wells
9a76ff2531 minor parm updates 2014-02-11 20:50:36 -07:00
Matt Wells
d2b473e554 checkpoint 2014-02-09 19:09:44 -07:00
Matt Wells
ecdd167d9b code checkpoint 2014-02-09 16:41:43 -07:00
Matt Wells
c9ef525338 code checkpoint 2014-02-09 12:55:45 -07:00
Matt Wells
6c9a44367f code checkpoint 2014-02-09 12:38:40 -07:00
Matt Wells
e593b6e1de basic controls code checkpoint. 2014-02-08 15:10:06 -07:00
Matt Wells
5c8b9af1d3 fix rdbcache corruption from -O2 compile bug.
fix too many spiders per ip bug!
2014-02-05 16:58:21 -08:00
Matt Wells
8a49e87a61 got code with shard rebalancing compiling.
now we store a "sharded by termid" bit in posdb
key for checksums, etc keys that are not sharded
by docid. save having to do disk seeks on every
host in the cluster to do a dup check, etc.
2014-01-11 16:08:42 -08:00
Matt Wells
1d6ba52dcd list collections in sidebar. 2014-01-09 21:13:41 -08:00
mwells
82494baa89 move CollectionRec stuff into Collectiondb files
for simplicity.
2013-12-10 15:28:04 -08:00
mwells
f2d5661965 parmdb overhaul. support collection add/del
sync when host comes back online. use udp not tcp.
host #0 can now handle a new incoming request while
a parm change is currently outstanding.
all missed "command" parms will be received when a dead host
comes back online, too, like a tight merge for instance.
does not use msg4, uses msg3e and msg3f for syncing and
sending parms.
2013-12-10 13:09:55 -08:00
Matt Wells
5e4b5a112c Merge branch 'master' into diffbot
Conflicts:

	PageResults.cpp
	Threads.cpp
	XmlDoc.cpp
	XmlDoc.h
2013-12-07 11:34:26 -07:00
Matt Wells
c50ef1954f show admin controls on serps if ip is local.
fixed up the "reindex" page for deleting/reindexing
search results for a given query.
2013-12-06 09:48:30 -07:00
Matt Wells
39f8dc646b default gigabits on for my copy. 2013-12-01 15:07:06 -07:00
Matt Wells
879cd588e0 use -DPTHREADS not _PTHREADS_ 2013-11-19 00:49:43 -08:00
Matt Wells
e909b85638 Merge branch 'master' into diffbot 2013-11-19 00:45:49 -08:00
mwells
64910ee991 fix oops 2013-11-18 22:32:00 -07:00
Matt Wells
4e71bc0698 use pthreads again until we can verify the
stability of the new clone approach.
2013-11-18 22:23:38 -07:00
Matt Wells
25dd764dac Merge branch 'master' into diffbot
Conflicts:
	Makefile
	PageResults.cpp
2013-11-18 16:59:33 -08:00
Matt Wells
3df310d3ec take out -lpthread. don't need it. 2013-11-17 22:25:19 -07:00
Matt Wells
5022ea4d6e try ditching pthreads and using straight-up errno.
it seems perhaps each clone() gets its own copy of
errno now?
2013-11-17 19:43:20 -07:00
Matt Wells
e27646c088 cleanup fixes. 2013-11-15 15:01:56 -07:00
Matt Wells
5e30728a3a new graphic icons. minor clean ups. 2013-11-15 14:47:05 -07:00
Matt Wells
21a6b070a7 added X-referring-url: X-anchor-text: and
X-surrounding-text: to diffbot http request header.
2013-10-31 11:44:09 -07:00
Matt Wells
d0ddfb7d7d would block when deleting or resetting
a collection when the rdb tree is saving to
disk. keeps retrying every 100ms since it
modifies the tree.
2013-10-30 13:12:46 -07:00
Matt Wells
84a3aded94 spider round updates correction 2013-10-17 17:18:05 -07:00
Matt Wells
df7fd21253 spider rounds update. 2013-10-17 17:17:19 -07:00
Matt Wells
fc17521697 Merge branch 'master' into diffbot
Conflicts:
	Hostdb.cpp
	Makefile
	PageResults.cpp
	PageRoot.cpp
	Pages.cpp
	Rdb.cpp
	SearchInput.cpp
	SearchInput.h
	Spider.cpp
	Spider.h
	XmlDoc.cpp
2013-10-16 14:28:42 -07:00
mwells
3db726c22e take out references to AdultBit.cpp,
since it is no longer used.
2013-10-14 23:21:58 -07:00
mwells
80918ca6e3 remove old libplotter references
and files.
2013-10-13 23:48:07 -07:00
mwells
6d5643e185 json parsing 2013-10-11 16:14:26 -06:00
mwells
6c2c9f7774 trying to bring back dmoz integration. 2013-10-02 22:34:21 -06:00
mwells
43e4c939eb Merge branch 'master' into diffbot
Conflicts:
	Make.depend
2013-10-02 13:15:07 -06:00
Matt Wells
c911a606c9 renamed matches.h and matches.cpp to
matches2.h and matches2.cpp to avoid potential
confusion with Matches.h and Matches.cpp files.
2013-10-01 07:58:24 -07:00
mwells
d11e9520bd couple fixes to makefile etc. 2013-09-28 16:37:39 -06:00
Matt Wells
c0f1330d70 Merge branch 'master' into diffbot
Conflicts:

	HttpServer.cpp
	Makefile
	PageGet.cpp
	Pages.h
	SafeBuf.h
2013-09-28 13:13:12 -07:00
mwells
5884951190 only do certain things if running
on a machine in matt wells datacenter.
like fan switching based on temps,
or printing seo links. made seo functions
weak overridable placeholder stubs so if
seo.o is linked in it will override.
include seo.o object if seo.cpp file exists
for automatic seo module building and linking.
2013-09-28 13:43:56 -06:00
mwells
fd081478de fix crawlbot to work on a distributed network
as far as adding/deleting/resetting  colls
and updating parms. ideally we'd have a Colldb
Rdb where each key was a parm. that would make
syncing easier if a host went down, then it would
get the negative/positive colldb parm keys later.
so it could sync up on all your operations as long
as all your operations in terms of adding and deleting
database key/value pairs.
2013-09-26 22:41:05 -06:00
mwells
f34a7f44ab compiler flag fix for xmldoc.o 2013-09-16 22:35:16 -06:00
Matt Wells
6b330da240 cleanup warnings in log. 2013-09-13 14:37:35 -07:00
Matt Wells
a412c798bf Merge branch 'master' into diffbot
Conflicts:
	PageResults.cpp
2013-09-13 09:24:28 -07:00
Matt Wells
5dc7bd2ab4 integrate diffbot from svn back into git. 2013-09-13 09:23:18 -07:00
mwells
34b6d3e74a fixed some cores. brought in fixes from
old repo.
2013-09-08 16:16:13 -06:00
Matt Wells
94e6492916 removed MAX_COLL_RECS so we can have unlimited
collections, really limited by the sizeof(collnum_t) only now,
which is 16bits, 15bits unsigned, which is the limitation.
can always expand this so we can have more than 32k collections.
2013-08-30 16:20:38 -07:00
Matt Wells
f6e560c1f4 Initial file population. 2013-08-02 13:12:24 -07:00