mwells
af46945403
show more info when dumping doledb.
2013-08-31 10:55:05 -06:00
Matt Wells
9696c7936a
Merge branch 'master' into diffbot
2013-08-30 16:33:00 -07:00
Matt Wells
94e6492916
removed MAX_COLL_RECS so we can have unlimited
...
collections, really limited by the sizeof(collnum_t) only now,
which is 16bits, 15bits unsigned, which is the limitation.
can always expand this so we can have more than 32k collections.
2013-08-30 16:20:38 -07:00
mwells
f6bcaeb76a
minor fix.
2013-08-30 00:16:30 -06:00
mwells
900bbf8fba
try to fix the bug of the spiders kinda getting
...
stuck and now spidering to their max potential
because of doledb record annihilations at the
top of the spider priority queue in spiderdb of
SpiderRequests. was causing lots of re-reads
in Msg5.cpp of doledb, like over 300 rounds,
very slow.
2013-08-29 21:59:02 -06:00
mwells
2e9c8f7c6e
Merge branch 'master' of github.com:gigablast/open-source-search-engine
2013-08-29 21:17:46 -06:00
mwells
84fae9a3c6
Fix issue of reading spiderrequests from
...
doledb at the very first key in spiderdb.
causes lots of positive/negative key annihilations.
we end up re-reading like 300 times in some
cases just to get a url from a doledb priority.
2013-08-29 21:16:59 -06:00
mwells
ca2a024d04
fixed up thread/spider log msgs.
...
fixed core from calling fprintf in
alarm signal missed quickpoll handler.
2013-08-29 21:15:42 -06:00
mwells
e925012dce
change a couple of possible reserved names in C++
...
to non-reserved names. #define _ADDRESS_H_ to
_GB_ADDRESS_H_ etc.
2013-08-28 22:59:01 -06:00
mwells
82ee2dfed7
fix cores when spider is unzipping
...
gzipped web pages.
2013-08-28 22:49:22 -06:00
mwells
80179525c1
when using pthreads block SIGIO
...
so it does not silently kill the gb
process because we no longer have a handler
for it because it was bogging down the cpu
because it went off every time a udp datagram
was sent/received and it seemed to have a ton
of overhead with it. SIGIO used to be sent
when the signal queue was full so we'd resort
to polling the file descriptors, so i'm not
sure how this will affect us. also updated
Threads.cpp to use getpidtid() instead of
getpid() to get the thread id when using
pthreads, not the process id. using pthreads
is now default behaviour even though they suck.
we used to use clone() but the newer stuff doesn't
allow us to override errno_location anymore.
2013-08-21 15:01:26 -06:00
mwells
6332de2daf
added link to compare.html comparison to SOLR
...
into documentation.
2013-08-21 13:14:17 -06:00
mwells
37a6549a58
updates to developer.html developer
...
documentation. removed a lot of obsolete
information. still needs more work.
2013-08-21 13:09:55 -06:00
mwells
8971d9b932
comment our urldb from developer.html
...
since no longer used.
2013-08-21 08:59:51 -06:00
mwells
6cf0497c2c
added a little posdb documentation to
...
developer.html. posdb replaced indexdb
as the new index because it has word
position info as well as word field info.
2013-08-21 08:40:28 -06:00
mwells
a2a57addd9
try fixing the cpu being slammed in the
...
sigiohandler. seems like signals meaning
might have changed in the kernel, etc.
over the years. fixed Loop.cpp.
2013-08-20 14:12:44 -06:00
mwells
a270a9bc91
updated README.md to reference compare.html
2013-08-19 17:20:30 -06:00
mwells
7d3cc672c8
use ./gb blaster -u <fileofurls> to just inject urls,
...
but use -i to also add the outlinks to spiderdb.
2013-08-19 16:33:27 -06:00
mwells
3550bf2d8a
compare.html update.
2013-08-19 16:21:01 -06:00
mwells
95a020574c
set spiderlinks=1 when doing
...
./gb blaster -i <fileofurls> to
index/inject a file of urls so that
we add the outlinks to spiderdb. this will
slow things down a little since we will have
to do a dns lookup of the subdomain of each
outlink, unless it is cached.
2013-08-19 16:15:58 -06:00
mwells
72d7e42497
added a quick start note to admin.html.
2013-08-19 15:34:07 -06:00
mwells
24af21394d
dns ip fix in gb.conf.
2013-08-19 15:25:37 -06:00
mwells
e9297df240
listen on DNS port 5998 not 6000. 6000 seemed
...
to cause issues on a particular install for
some reason.
2013-08-19 15:02:27 -06:00
mwells
71aa03ab5d
little admin.html update.
2013-08-19 13:45:43 -06:00
mwells
eb4758b565
fix init error when injecting file of urls.
2013-08-19 13:34:47 -06:00
mwells
2c83b96ba4
Added support for 'gb blaster -i <fileofurls> <maxThreads>' to
...
inject/index a file of urls. Committing older work for
compare.html that shows differences between gigablast and solr,
but has a lot of blanks.
2013-08-19 13:26:46 -06:00
mwells
5facc7d859
add injection timing stat point to compare.html
2013-08-17 11:06:24 -06:00
mwells
4092177e5f
added injectme3 file and documentation into compare.html
...
to describe how to inject a file of concatenated HTML
documents into gb. Still have to find out how to do that
in SOLR and elasticsearch for comparison.
2013-08-17 11:02:26 -06:00
Matt Wells
c0e1216022
stub for compare.html
2013-08-17 09:07:48 -07:00
Matt Wells
f7f377a1f7
fixed a core dump in proxy.cpp.
...
make doubly sure protocol
is not 1.1 since we have keep alives disabled
we need to force protocol to 1.0.
2013-08-17 08:58:28 -07:00
Matt Wells
410604c388
minor edits
2013-08-17 08:48:07 -07:00
Matt Wells
5bac648cc9
start up the gigablast blog again.
2013-08-17 08:44:32 -07:00
Matt Wells
5a2cb35e6c
added gb.conf.txt and hosts.conf.txt for
...
display from admin.html.
2013-08-11 00:45:27 -07:00
Matt Wells
e42afac9d8
admin.html documentation updates.
2013-08-11 00:32:53 -07:00
Matt Wells
834128a076
Fixed heap breaches caused by our bult-in
...
electric fence code from death queries.
Use HTTP/1.0 not 1.1 since we disabled keep-alive
support a long time ago.
2013-08-10 09:51:14 -07:00
mwells
651b899453
oops, wrong sign direction.
2013-08-09 22:14:13 -06:00
Matt Wells
9b94e0feac
fix core from huge death query.
2013-08-09 21:05:38 -07:00
Matt Wells
ef333f8937
fix bug in user accounting system summary
...
stats recs. when sumbuf reallocated the
hashtable of ptrs could be invalidated. so
use offsets, not ptrs, in the hash table.
2013-08-09 16:40:12 -07:00
mwells
4f4047a3ad
new Make.depend.
2013-08-09 17:13:45 -06:00
Matt Wells
dbefaec10d
Merge branch 'master' of git@github.com:gigablast/open-source-search-engine
2013-08-09 16:04:00 -07:00
Matt Wells
77325fe5fa
Fixed a couple mostly proxy-related cores.
2013-08-09 16:03:48 -07:00
Steve Cook
f15beeb297
Removed another unused variable (tidy make output)w
2013-08-09 12:37:25 -06:00
Matt Wells
5a5050d8d9
extend copyright line in LICENSE file.
2013-08-09 09:05:09 -07:00
Matt Wells
14688f1a7b
remove more temp files
2013-08-09 08:54:45 -07:00
Matt Wells
c386f0d1f0
remove temp file.
2013-08-09 08:53:36 -07:00
Matt Wells
2c7caf0653
Merge branch 'master' of git@github.com:gigablast/open-source-search-engine
2013-08-09 08:52:23 -07:00
Matt Wells
76cf68f7b1
Fixed some bugs.
2013-08-09 08:52:15 -07:00
mwells
fd88c6c3b2
add contact info to README.md.
2013-08-08 15:31:20 -06:00
mwells
0b94b31fbc
Fix potential core issue in proxy.
2013-08-08 15:14:36 -06:00
mwells
7d54efef09
remove file addsinprogress.dat
2013-08-08 14:45:16 -06:00