Commit Graph

81 Commits

Author SHA1 Message Date
Matt Wells
d6434191d1 nomenclature changes to reduce collissions.
name collection 'qatest123' for doing smoke tests,
not 'test'.
2014-03-31 15:02:17 -07:00
Matt Wells
5057fdaf14 aesthetic cleanups 2014-03-16 17:12:04 -07:00
Matt Wells
edbd61b0c5 thread fixes. if pthread_create fails then
keep thread queue and just return. will try to
relaunch later. do not count delete keys towards
shard rebalance count.
2014-03-15 20:07:02 -07:00
Matt Wells
1f162ce7b2 update localhosts.conf too 2014-03-14 19:20:23 -07:00
Matt Wells
27e8e810d2 use collnum instead of coll string.
more stable since resetting collections
keeps string the same but changes the collnum.
2014-03-06 15:48:11 -08:00
Matt Wells
f11e25024a Merge branch 'diffbot' into diffbot-testing 2014-02-26 20:34:06 -08:00
Matt Wells
a6b7e088f5 take out tfndb, unused. fix core
from diffbot url too long.
2014-02-26 01:07:13 -08:00
Matt Wells
94a55bf9a6 fixes for new link info code so it doesn't
bottleneck. got EFENCE_SIZE working so we
can use efence on large allocs only so we don't
go oom using it. might help finding some of
the out of bounds writing going on.
2014-02-25 10:55:05 -08:00
Matt Wells
7806a8a68c fix excessive dupcache deduping. 2014-02-05 13:41:15 -08:00
Matt Wells
7bf8a2ac49 do not let glibc do malloc checks, we do that. 2014-02-02 13:41:59 -07:00
Matt Wells
0df697e56a fix keep alive loop code to bail out if
fails to bind to socket as well as quick cores.
2014-02-02 12:11:18 -07:00
Matt Wells
4346fcee29 added recovery mode display in hosts table 2014-02-01 10:16:46 -08:00
Matt Wells
4a1ad74f79 test fix for keep alive infinite loop bug. 2014-01-30 14:16:16 -08:00
Matt Wells
83e291f12b fix infinite keep alive restart bug some more 2014-01-30 14:12:32 -08:00
Matt Wells
03aa7842d0 do not enter into an inifinite keep alive restart loop. 2014-01-30 14:40:03 -07:00
Matt Wells
b40f393f4c fix a couple cores related to deleting collections
in progress. support termlist dump with terms
containing colons.
2014-01-29 15:56:07 -08:00
Matt Wells
7b424a6236 always use kstart.
fixed restrictDomain bug of not saving parm.
sped up csv download around 2x.
2014-01-28 14:37:21 -08:00
Matt Wells
8f39c41962 just print out cached page straight, it is
just the diffbot json reply pretty much
verbatim, except for being tokenized.
should no longer escape forward slashes.
2014-01-28 11:04:53 -08:00
Matt Wells
474676010c fix gb install 1-15 logic 2014-01-27 14:28:48 -08:00
Matt Wells
bc78b21dc6 for json docs only give them a single
xmlnode in the Xml.cpp class. hopefully
will not get "malformed sections" error
anymore. i think that was a result of the
json having html tags in it and making
unnested html structures which the
sections class did not like.
TODO: probably do this for CT_TEXT etc.
as well.
2014-01-25 08:17:38 -08:00
Matt Wells
321fc90ff6 fix some cores.
NOTE: emails disabled here... need to fix.
2014-01-24 12:07:28 -08:00
Matt Wells
5c9b688f72 spiderdb fixes for injections 2014-01-19 14:33:27 -08:00
Matt Wells
36b93a1e92 minor cmdline fixes 2014-01-18 21:26:59 -08:00
Matt Wells
4606e88721 code cleanups.
xmldoc::injectDoc(), and it'll
add a SpiderRequest as well.
better collectiondb init code.
2014-01-18 21:19:26 -08:00
Matt Wells
f9d0a02dbe test and get gbparenturl: query working. 2014-01-18 09:28:58 -08:00
Matt Wells
16f8af0d57 added awesome streaming mode support
to tcpserver.cpp for sending back
json objects as we get them from shards.
and as we get them in small pieces so we
don't go oom. made that code much simpler
and more reliable in the long run.
2014-01-17 16:26:17 -08:00
Matt Wells
01a3282020 fix problem scanning spiderdb.
move dedup spiderdb code to
RdbMerge.cpp where it really should be.
2014-01-16 17:04:08 -08:00
Matt Wells
883487889d make gb install only have 10 outstanding per an ip
since ssh seems to close connections if you have more
than 12 out.
2014-01-15 14:41:30 -08:00
Matt Wells
6de7abf6ba display fixes.
./gb installgb and ./gb installgb2 now install 'gb'
if 'gb.new' is not present.
2014-01-11 17:16:20 -08:00
Matt Wells
8a49e87a61 got code with shard rebalancing compiling.
now we store a "sharded by termid" bit in posdb
key for checksums, etc keys that are not sharded
by docid. save having to do disk seeks on every
host in the cluster to do a dup check, etc.
2014-01-11 16:08:42 -08:00
Matt Wells
1d6ba52dcd list collections in sidebar. 2014-01-09 21:13:41 -08:00
Matt Wells
ebdf1f638a fix ./gb installgb2 to be semi-sequential 2014-01-09 13:25:45 -08:00
Matt Wells
47327a0c41 Merge branch 'master' into diffbot 2014-01-09 13:07:59 -08:00
Matt Wells
70f8c416de allow collections to be added when no colls exist.
fixed gb start2 etc. to be sequential.
2014-01-09 13:07:16 -08:00
Matt Wells
161a5c5d6b logging cleanups 2014-01-09 12:38:38 -08:00
Matt Wells
5007dc8e0c fix core in gb seektest 2014-01-09 11:17:05 -07:00
Matt Wells
909022642d Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot 2014-01-07 12:10:59 -08:00
Matt Wells
e366c12470 Merge branch 'master' into diffbot
Conflicts:
	Collectiondb.cpp
	Msg13.cpp
	Parms.cpp
	Spider.h
2014-01-07 12:09:11 -08:00
Matt Wells
4f64677b4f get new global preemptive cache
logic compiling, with section voting
stats.
2014-01-05 11:51:09 -08:00
mwells
9bf49884b9 fix compiler warning 2014-01-02 01:35:52 -07:00
Matt Wells
7df2111ceb fixed 'gb inject titledb-DIR newhosts.conf' command
for populating an index from titledb files in DIR
and transmitting to appropriate host in newhosts.conf.
also prettied up the gb -h output to use a formatting
function.
2014-01-02 01:20:08 -07:00
Matt Wells
935a4faccf fixed './gb inject titledb newhosts.conf'
You have to be in working directory of the instance
whose cached pages (titlerecs) you want to inject
into the new cluster defined by newhosts.conf.
2014-01-01 22:04:26 -07:00
Matt Wells
d8a9a3f4e3 fix parm sync code some more.
added localhosts.conf  to the 'gb install' dist.
2013-12-27 14:00:37 -08:00
Matt Wells
958becbdf0 fix parm checksum for syncing parms.
was not using gbstrlen() for strings.
2013-12-27 11:56:20 -08:00
Matt Wells
9b080ff89c more parmdb bug fixes 2013-12-16 13:36:31 -08:00
Matt Wells
9be1ab6323 more parmdb fixes 2013-12-16 12:20:13 -08:00
Matt Wells
0615acff17 zero out url filters checkboxes on submit 2013-12-16 11:03:40 -08:00
mwells
f2d5661965 parmdb overhaul. support collection add/del
sync when host comes back online. use udp not tcp.
host #0 can now handle a new incoming request while
a parm change is currently outstanding.
all missed "command" parms will be received when a dead host
comes back online, too, like a tight merge for instance.
does not use msg4, uses msg3e and msg3f for syncing and
sending parms.
2013-12-10 13:09:55 -08:00
mwells
0e47d48d8c test commit 2013-12-10 13:02:52 -08:00
Matt Wells
06edfddf31 a bunch of bug fixes, mostly spider related.
also some for pagereindex.
2013-12-07 21:56:37 -07:00