Commit Graph

362 Commits

Author SHA1 Message Date
mwells
af46945403 show more info when dumping doledb. 2013-08-31 10:55:05 -06:00
Matt Wells
9696c7936a Merge branch 'master' into diffbot 2013-08-30 16:33:00 -07:00
Matt Wells
94e6492916 removed MAX_COLL_RECS so we can have unlimited
collections, really limited by the sizeof(collnum_t) only now,
which is 16bits, 15bits unsigned, which is the limitation.
can always expand this so we can have more than 32k collections.
2013-08-30 16:20:38 -07:00
mwells
f6bcaeb76a minor fix. 2013-08-30 00:16:30 -06:00
mwells
900bbf8fba try to fix the bug of the spiders kinda getting
stuck and now spidering to their max potential
because of doledb record annihilations at the
top of the spider priority queue in spiderdb of
SpiderRequests. was causing lots of re-reads
in Msg5.cpp of doledb, like over 300 rounds,
very slow.
2013-08-29 21:59:02 -06:00
mwells
2e9c8f7c6e Merge branch 'master' of github.com:gigablast/open-source-search-engine 2013-08-29 21:17:46 -06:00
mwells
84fae9a3c6 Fix issue of reading spiderrequests from
doledb at the very first key in spiderdb.
causes lots of positive/negative key annihilations.
we end up re-reading like 300 times in some
cases just to get a url from a doledb priority.
2013-08-29 21:16:59 -06:00
mwells
ca2a024d04 fixed up thread/spider log msgs.
fixed core from calling fprintf in
alarm signal missed quickpoll handler.
2013-08-29 21:15:42 -06:00
mwells
e925012dce change a couple of possible reserved names in C++
to non-reserved names. #define _ADDRESS_H_ to
_GB_ADDRESS_H_ etc.
2013-08-28 22:59:01 -06:00
mwells
82ee2dfed7 fix cores when spider is unzipping
gzipped web pages.
2013-08-28 22:49:22 -06:00
mwells
80179525c1 when using pthreads block SIGIO
so it does not silently kill the gb
process because we no longer have a handler
for it because it was bogging down the cpu
because it went off every time a udp datagram
was sent/received and it seemed to have a ton
of overhead with it. SIGIO used to be sent
when the signal queue was full so we'd resort
to polling the file descriptors, so i'm not
sure how this will affect us. also updated
Threads.cpp to use getpidtid() instead of
getpid() to get the thread id when using
pthreads, not the process id. using pthreads
is now default behaviour even though they suck.
we used to use clone() but the newer stuff doesn't
allow us to override errno_location anymore.
2013-08-21 15:01:26 -06:00
mwells
6332de2daf added link to compare.html comparison to SOLR
into documentation.
2013-08-21 13:14:17 -06:00
mwells
37a6549a58 updates to developer.html developer
documentation. removed a lot of obsolete
information. still needs more work.
2013-08-21 13:09:55 -06:00
mwells
8971d9b932 comment our urldb from developer.html
since no longer used.
2013-08-21 08:59:51 -06:00
mwells
6cf0497c2c added a little posdb documentation to
developer.html. posdb replaced indexdb
as the new index because it has word
position info as well as word field info.
2013-08-21 08:40:28 -06:00
mwells
a2a57addd9 try fixing the cpu being slammed in the
sigiohandler. seems like signals meaning
might have changed in the kernel, etc.
over the years. fixed Loop.cpp.
2013-08-20 14:12:44 -06:00
mwells
a270a9bc91 updated README.md to reference compare.html 2013-08-19 17:20:30 -06:00
mwells
7d3cc672c8 use ./gb blaster -u <fileofurls> to just inject urls,
but use -i to also add the outlinks to spiderdb.
2013-08-19 16:33:27 -06:00
mwells
3550bf2d8a compare.html update. 2013-08-19 16:21:01 -06:00
mwells
95a020574c set spiderlinks=1 when doing
./gb blaster -i <fileofurls> to
index/inject a file of urls so that
we add the outlinks to spiderdb. this will
slow things down a little since we will have
to do a dns lookup of the subdomain of each
outlink, unless it is cached.
2013-08-19 16:15:58 -06:00
mwells
72d7e42497 added a quick start note to admin.html. 2013-08-19 15:34:07 -06:00
mwells
24af21394d dns ip fix in gb.conf. 2013-08-19 15:25:37 -06:00
mwells
e9297df240 listen on DNS port 5998 not 6000. 6000 seemed
to cause issues on a particular install for
some reason.
2013-08-19 15:02:27 -06:00
mwells
71aa03ab5d little admin.html update. 2013-08-19 13:45:43 -06:00
mwells
eb4758b565 fix init error when injecting file of urls. 2013-08-19 13:34:47 -06:00
mwells
2c83b96ba4 Added support for 'gb blaster -i <fileofurls> <maxThreads>' to
inject/index a file of urls. Committing older work for
compare.html that shows differences between gigablast and solr,
but has a lot of blanks.
2013-08-19 13:26:46 -06:00
mwells
5facc7d859 add injection timing stat point to compare.html 2013-08-17 11:06:24 -06:00
mwells
4092177e5f added injectme3 file and documentation into compare.html
to describe how to inject a file of concatenated HTML
documents into gb. Still have to find out how to do that
in SOLR and elasticsearch for comparison.
2013-08-17 11:02:26 -06:00
Matt Wells
c0e1216022 stub for compare.html 2013-08-17 09:07:48 -07:00
Matt Wells
f7f377a1f7 fixed a core dump in proxy.cpp.
make doubly sure protocol
is not 1.1 since we have keep alives disabled
we need to force protocol to 1.0.
2013-08-17 08:58:28 -07:00
Matt Wells
410604c388 minor edits 2013-08-17 08:48:07 -07:00
Matt Wells
5bac648cc9 start up the gigablast blog again. 2013-08-17 08:44:32 -07:00
Matt Wells
5a2cb35e6c added gb.conf.txt and hosts.conf.txt for
display from admin.html.
2013-08-11 00:45:27 -07:00
Matt Wells
e42afac9d8 admin.html documentation updates. 2013-08-11 00:32:53 -07:00
Matt Wells
834128a076 Fixed heap breaches caused by our bult-in
electric fence code from death queries.
Use HTTP/1.0 not 1.1 since we disabled keep-alive
support a long time ago.
2013-08-10 09:51:14 -07:00
mwells
651b899453 oops, wrong sign direction. 2013-08-09 22:14:13 -06:00
Matt Wells
9b94e0feac fix core from huge death query. 2013-08-09 21:05:38 -07:00
Matt Wells
ef333f8937 fix bug in user accounting system summary
stats recs. when sumbuf reallocated the
hashtable of ptrs could be invalidated. so
use offsets, not ptrs, in the hash table.
2013-08-09 16:40:12 -07:00
mwells
4f4047a3ad new Make.depend. 2013-08-09 17:13:45 -06:00
Matt Wells
dbefaec10d Merge branch 'master' of git@github.com:gigablast/open-source-search-engine 2013-08-09 16:04:00 -07:00
Matt Wells
77325fe5fa Fixed a couple mostly proxy-related cores. 2013-08-09 16:03:48 -07:00
Steve Cook
f15beeb297 Removed another unused variable (tidy make output)w 2013-08-09 12:37:25 -06:00
Matt Wells
5a5050d8d9 extend copyright line in LICENSE file. 2013-08-09 09:05:09 -07:00
Matt Wells
14688f1a7b remove more temp files 2013-08-09 08:54:45 -07:00
Matt Wells
c386f0d1f0 remove temp file. 2013-08-09 08:53:36 -07:00
Matt Wells
2c7caf0653 Merge branch 'master' of git@github.com:gigablast/open-source-search-engine 2013-08-09 08:52:23 -07:00
Matt Wells
76cf68f7b1 Fixed some bugs. 2013-08-09 08:52:15 -07:00
mwells
fd88c6c3b2 add contact info to README.md. 2013-08-08 15:31:20 -06:00
mwells
0b94b31fbc Fix potential core issue in proxy. 2013-08-08 15:14:36 -06:00
mwells
7d54efef09 remove file addsinprogress.dat 2013-08-08 14:45:16 -06:00