Matt Wells
f9f73dae65
fixed core from null json
2013-12-14 15:19:52 -07:00
Matt Wells
d85dbfb8e7
do not use safebuf in thread
2013-12-12 10:15:02 -07:00
Matt Wells
6f6c4aed84
minor admin.html edit.
2013-12-10 10:39:38 -07:00
Matt Wells
1a7d5e389b
very minor admin.html edit
2013-12-10 00:56:56 -07:00
Matt Wells
ec2254d8ed
added multi language support note to admin.html
2013-12-09 23:18:33 -07:00
Matt Wells
f7e7acb398
minor log msg updates.
...
updated admin.html to give some performance and
storage capacity info.
2013-12-09 23:16:24 -07:00
Matt Wells
0dcd1211d3
new opensource icon.
2013-12-08 19:47:39 -07:00
Matt Wells
92e3d841a6
minor update
2013-12-08 19:28:45 -07:00
Matt Wells
12404b4f85
doc updates
2013-12-08 19:26:48 -07:00
Matt Wells
25dd764dac
Merge branch 'master' into diffbot
...
Conflicts:
Makefile
PageResults.cpp
2013-11-18 16:59:33 -08:00
Matt Wells
e27646c088
cleanup fixes.
2013-11-15 15:01:56 -07:00
Matt Wells
5e30728a3a
new graphic icons. minor clean ups.
2013-11-15 14:47:05 -07:00
Matt Wells
afb5a2be64
Merge branch 'master' into diffbot
2013-11-06 10:18:04 -08:00
Matt Wells
5a5973a47f
privacy.html update
2013-11-05 09:33:42 -07:00
Matt Wells
e4cce243de
minor documentation updates
2013-10-30 19:43:35 -07:00
Matt Wells
240da39873
Merge branch 'master' into diffbot
2013-10-25 12:32:02 -07:00
Matt Wells
033a6ec578
update robots.txt
2013-10-21 22:11:11 -07:00
Matt Wells
dae005e4ae
ensure dmoz info valid when making titlerec
2013-10-16 14:53:48 -07:00
Matt Wells
fc17521697
Merge branch 'master' into diffbot
...
Conflicts:
Hostdb.cpp
Makefile
PageResults.cpp
PageRoot.cpp
Pages.cpp
Rdb.cpp
SearchInput.cpp
SearchInput.h
Spider.cpp
Spider.h
XmlDoc.cpp
2013-10-16 14:28:42 -07:00
mwells
876af6d8c6
dmoz support is now updated and re-integrated.
2013-10-13 16:53:28 -07:00
mwells
b60bdcc038
documentation updates. fixed sd=0.
2013-10-13 14:24:41 -07:00
mwells
c949bfe315
ignore certain errors and index the doc anyway
...
so we at least have it in our dmoz index with its
designated title and summary from dmoz.
2013-10-13 00:02:25 -07:00
mwells
63c7764cd1
c=dmoz3 to c=dmoz
2013-10-06 17:12:45 -07:00
mwells
183b7c372e
make sections grow dynamically so we do not
...
OOM when trying to index a gbdmoz.urls.txt.* file
which can be 25MB.
2013-10-06 11:04:10 -06:00
mwells
612f2872f7
use addurl to add the gbdmoz url
...
files to gigablast. it should index
just those dmoz urls, and not spider their links.
it should ignore external errors like
ETCPTIMEDOUT when indexing so it will be
identical to dmoz.
2013-10-05 23:22:51 -06:00
mwells
d464066da4
use catdb/ not cat/
2013-10-04 22:39:41 -06:00
mwells
71d5d05f7c
use catdb/ subdir not cat/ for consistency.
2013-10-04 21:35:13 -06:00
mwells
78c4bda368
fix dmozparse urldump -s bugs
...
for dumping out urls in dmoz.
2013-10-04 00:00:26 -06:00
mwells
a0c79932bb
catdb is now generated successfully.
2013-10-02 23:36:49 -06:00
mwells
43e4c939eb
Merge branch 'master' into diffbot
...
Conflicts:
Make.depend
2013-10-02 13:15:07 -06:00
mwells
c03e862b99
use a better version of hosts.conf where we
...
specify the working directory for each host
entry. then we can use the exact same hosts.conf
file for each gb instance rather than having to
change the single "working-dir:" directive for
each instance, in the case where the each have
a different working directory.
2013-10-02 13:11:58 -06:00
Matt Wells
c0f1330d70
Merge branch 'master' into diffbot
...
Conflicts:
HttpServer.cpp
Makefile
PageGet.cpp
Pages.h
SafeBuf.h
2013-09-28 13:13:12 -07:00
mwells
5a52072888
recommended SSDs for optimal performance in admin.html.
2013-09-28 14:02:02 -06:00
mwells
f043cc67e4
Merge branch 'master' into diffbot
...
Conflicts:
Spider.cpp
2013-09-26 22:43:27 -06:00
Matt Wells
fbd62cecba
updated compilation instructions. need
...
to apt-get install gcc-multilib.
2013-09-20 10:06:01 -07:00
Matt Wells
47465f6d90
more fixes. trying to fix spiders to
...
spider multiple urls from same ip...
2013-09-19 11:13:40 -07:00
Matt Wells
5deda56ede
minor documentation updates.
2013-09-15 22:16:14 -07:00
Matt Wells
3fdbae4b05
admin.html documentation update.
2013-09-15 22:05:01 -07:00
mwells
2211881e59
take apt-get install ssl stuff out of admin.html
...
installation instructions since we supply the
ssl headers now.
2013-09-15 18:27:47 -06:00
mwells
6332de2daf
added link to compare.html comparison to SOLR
...
into documentation.
2013-08-21 13:14:17 -06:00
mwells
37a6549a58
updates to developer.html developer
...
documentation. removed a lot of obsolete
information. still needs more work.
2013-08-21 13:09:55 -06:00
mwells
8971d9b932
comment our urldb from developer.html
...
since no longer used.
2013-08-21 08:59:51 -06:00
mwells
6cf0497c2c
added a little posdb documentation to
...
developer.html. posdb replaced indexdb
as the new index because it has word
position info as well as word field info.
2013-08-21 08:40:28 -06:00
mwells
7d3cc672c8
use ./gb blaster -u <fileofurls> to just inject urls,
...
but use -i to also add the outlinks to spiderdb.
2013-08-19 16:33:27 -06:00
mwells
3550bf2d8a
compare.html update.
2013-08-19 16:21:01 -06:00
mwells
72d7e42497
added a quick start note to admin.html.
2013-08-19 15:34:07 -06:00
mwells
71aa03ab5d
little admin.html update.
2013-08-19 13:45:43 -06:00
mwells
2c83b96ba4
Added support for 'gb blaster -i <fileofurls> <maxThreads>' to
...
inject/index a file of urls. Committing older work for
compare.html that shows differences between gigablast and solr,
but has a lot of blanks.
2013-08-19 13:26:46 -06:00
mwells
5facc7d859
add injection timing stat point to compare.html
2013-08-17 11:06:24 -06:00
mwells
4092177e5f
added injectme3 file and documentation into compare.html
...
to describe how to inject a file of concatenated HTML
documents into gb. Still have to find out how to do that
in SOLR and elasticsearch for comparison.
2013-08-17 11:02:26 -06:00