Matt Wells
321fc90ff6
fix some cores.
...
NOTE: emails disabled here... need to fix.
2014-01-24 12:07:28 -08:00
Matt Wells
5c9b688f72
spiderdb fixes for injections
2014-01-19 14:33:27 -08:00
Matt Wells
36b93a1e92
minor cmdline fixes
2014-01-18 21:26:59 -08:00
Matt Wells
4606e88721
code cleanups.
...
xmldoc::injectDoc(), and it'll
add a SpiderRequest as well.
better collectiondb init code.
2014-01-18 21:19:26 -08:00
Matt Wells
f9d0a02dbe
test and get gbparenturl: query working.
2014-01-18 09:28:58 -08:00
Matt Wells
16f8af0d57
added awesome streaming mode support
...
to tcpserver.cpp for sending back
json objects as we get them from shards.
and as we get them in small pieces so we
don't go oom. made that code much simpler
and more reliable in the long run.
2014-01-17 16:26:17 -08:00
Matt Wells
01a3282020
fix problem scanning spiderdb.
...
move dedup spiderdb code to
RdbMerge.cpp where it really should be.
2014-01-16 17:04:08 -08:00
Matt Wells
883487889d
make gb install only have 10 outstanding per an ip
...
since ssh seems to close connections if you have more
than 12 out.
2014-01-15 14:41:30 -08:00
Matt Wells
6de7abf6ba
display fixes.
...
./gb installgb and ./gb installgb2 now install 'gb'
if 'gb.new' is not present.
2014-01-11 17:16:20 -08:00
Matt Wells
8a49e87a61
got code with shard rebalancing compiling.
...
now we store a "sharded by termid" bit in posdb
key for checksums, etc keys that are not sharded
by docid. save having to do disk seeks on every
host in the cluster to do a dup check, etc.
2014-01-11 16:08:42 -08:00
Matt Wells
1d6ba52dcd
list collections in sidebar.
2014-01-09 21:13:41 -08:00
Matt Wells
ebdf1f638a
fix ./gb installgb2 to be semi-sequential
2014-01-09 13:25:45 -08:00
Matt Wells
47327a0c41
Merge branch 'master' into diffbot
2014-01-09 13:07:59 -08:00
Matt Wells
70f8c416de
allow collections to be added when no colls exist.
...
fixed gb start2 etc. to be sequential.
2014-01-09 13:07:16 -08:00
Matt Wells
161a5c5d6b
logging cleanups
2014-01-09 12:38:38 -08:00
Matt Wells
5007dc8e0c
fix core in gb seektest
2014-01-09 11:17:05 -07:00
Matt Wells
909022642d
Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
2014-01-07 12:10:59 -08:00
Matt Wells
e366c12470
Merge branch 'master' into diffbot
...
Conflicts:
Collectiondb.cpp
Msg13.cpp
Parms.cpp
Spider.h
2014-01-07 12:09:11 -08:00
Matt Wells
4f64677b4f
get new global preemptive cache
...
logic compiling, with section voting
stats.
2014-01-05 11:51:09 -08:00
mwells
9bf49884b9
fix compiler warning
2014-01-02 01:35:52 -07:00
Matt Wells
7df2111ceb
fixed 'gb inject titledb-DIR newhosts.conf' command
...
for populating an index from titledb files in DIR
and transmitting to appropriate host in newhosts.conf.
also prettied up the gb -h output to use a formatting
function.
2014-01-02 01:20:08 -07:00
Matt Wells
935a4faccf
fixed './gb inject titledb newhosts.conf'
...
You have to be in working directory of the instance
whose cached pages (titlerecs) you want to inject
into the new cluster defined by newhosts.conf.
2014-01-01 22:04:26 -07:00
Matt Wells
d8a9a3f4e3
fix parm sync code some more.
...
added localhosts.conf to the 'gb install' dist.
2013-12-27 14:00:37 -08:00
Matt Wells
958becbdf0
fix parm checksum for syncing parms.
...
was not using gbstrlen() for strings.
2013-12-27 11:56:20 -08:00
Matt Wells
9b080ff89c
more parmdb bug fixes
2013-12-16 13:36:31 -08:00
Matt Wells
9be1ab6323
more parmdb fixes
2013-12-16 12:20:13 -08:00
Matt Wells
0615acff17
zero out url filters checkboxes on submit
2013-12-16 11:03:40 -08:00
mwells
f2d5661965
parmdb overhaul. support collection add/del
...
sync when host comes back online. use udp not tcp.
host #0 can now handle a new incoming request while
a parm change is currently outstanding.
all missed "command" parms will be received when a dead host
comes back online, too, like a tight merge for instance.
does not use msg4, uses msg3e and msg3f for syncing and
sending parms.
2013-12-10 13:09:55 -08:00
mwells
0e47d48d8c
test commit
2013-12-10 13:02:52 -08:00
Matt Wells
06edfddf31
a bunch of bug fixes, mostly spider related.
...
also some for pagereindex.
2013-12-07 21:56:37 -07:00
Matt Wells
c669f8c138
fix file descriptor leak in Dir class.
...
try to fix core from Thread getting SIGALRM.
try to set NOFILES to 1024 at startup in case
more are allowed.
2013-11-19 13:41:56 -08:00
Matt Wells
e909b85638
Merge branch 'master' into diffbot
2013-11-19 00:45:49 -08:00
Matt Wells
9c62ab362c
Revert "use scp not rcp for administrative cmds"
...
This reverts commit cc1d117e55
.
2013-11-19 00:19:21 -07:00
Matt Wells
25dd764dac
Merge branch 'master' into diffbot
...
Conflicts:
Makefile
PageResults.cpp
2013-11-18 16:59:33 -08:00
Matt Wells
cc1d117e55
use scp not rcp for administrative cmds
...
like './gb installgb'
most ppl do not have rcp on their system
any more.
2013-11-17 20:49:38 -07:00
Matt Wells
d0ddfb7d7d
would block when deleting or resetting
...
a collection when the rdb tree is saving to
disk. keeps retrying every 100ms since it
modifies the tree.
2013-10-30 13:12:46 -07:00
Matt Wells
240da39873
Merge branch 'master' into diffbot
2013-10-25 12:32:02 -07:00
Matt Wells
a2d54b0d08
nothing. merge test.
2013-10-22 21:53:07 -07:00
Matt Wells
b589b17e63
fix collection resetting.
2013-10-18 15:21:00 -07:00
Matt Wells
fc17521697
Merge branch 'master' into diffbot
...
Conflicts:
Hostdb.cpp
Makefile
PageResults.cpp
PageRoot.cpp
Pages.cpp
Rdb.cpp
SearchInput.cpp
SearchInput.h
Spider.cpp
Spider.h
XmlDoc.cpp
2013-10-16 14:28:42 -07:00
mwells
3374ce450a
fix a couple catdb generation bugs.
...
MAX_CATIDS violation causing corruption.
not saving catdb tree to catdb-saved.dat
causing missing catdb recs.
2013-10-12 20:33:04 -07:00
mwells
0de777d80d
parser fixes
2013-10-11 17:35:12 -06:00
mwells
ea859ef685
added 'gb emailmandrill' for testing.
...
got it working. it posts json, not url encoded.
2013-10-09 17:35:51 -06:00
mwells
d464066da4
use catdb/ not cat/
2013-10-04 22:39:41 -06:00
Matt Wells
fe97e08281
move from groups to shards. got rid of annoying
...
groupid bit mask thing.
2013-10-04 16:18:56 -07:00
Matt Wells
3fa0ad5786
fix './gb install' cmd to install the new files.
2013-10-04 14:04:47 -07:00
mwells
6c2c9f7774
trying to bring back dmoz integration.
2013-10-02 22:34:21 -06:00
Matt Wells
c0f1330d70
Merge branch 'master' into diffbot
...
Conflicts:
HttpServer.cpp
Makefile
PageGet.cpp
Pages.h
SafeBuf.h
2013-09-28 13:13:12 -07:00
mwells
5884951190
only do certain things if running
...
on a machine in matt wells datacenter.
like fan switching based on temps,
or printing seo links. made seo functions
weak overridable placeholder stubs so if
seo.o is linked in it will override.
include seo.o object if seo.cpp file exists
for automatic seo module building and linking.
2013-09-28 13:43:56 -06:00
mwells
40192249f9
spider speedups and fixes.
2013-09-25 11:58:03 -06:00
Matt Wells
a412c798bf
Merge branch 'master' into diffbot
...
Conflicts:
PageResults.cpp
2013-09-13 09:24:28 -07:00
Matt Wells
5dc7bd2ab4
integrate diffbot from svn back into git.
2013-09-13 09:23:18 -07:00
mwells
34b6d3e74a
fixed some cores. brought in fixes from
...
old repo.
2013-09-08 16:16:13 -06:00
mwells
5e0a53b909
minor print change
2013-08-31 10:57:36 -06:00
mwells
af46945403
show more info when dumping doledb.
2013-08-31 10:55:05 -06:00
Matt Wells
94e6492916
removed MAX_COLL_RECS so we can have unlimited
...
collections, really limited by the sizeof(collnum_t) only now,
which is 16bits, 15bits unsigned, which is the limitation.
can always expand this so we can have more than 32k collections.
2013-08-30 16:20:38 -07:00
mwells
7d3cc672c8
use ./gb blaster -u <fileofurls> to just inject urls,
...
but use -i to also add the outlinks to spiderdb.
2013-08-19 16:33:27 -06:00
mwells
eb4758b565
fix init error when injecting file of urls.
2013-08-19 13:34:47 -06:00
mwells
2c83b96ba4
Added support for 'gb blaster -i <fileofurls> <maxThreads>' to
...
inject/index a file of urls. Committing older work for
compare.html that shows differences between gigablast and solr,
but has a lot of blanks.
2013-08-19 13:26:46 -06:00
mwells
4092177e5f
added injectme3 file and documentation into compare.html
...
to describe how to inject a file of concatenated HTML
documents into gb. Still have to find out how to do that
in SOLR and elasticsearch for comparison.
2013-08-17 11:02:26 -06:00
Matt Wells
f6e560c1f4
Initial file population.
2013-08-02 13:12:24 -07:00