Commit Graph

58 Commits

Author SHA1 Message Date
Matt Wells
f9f73dae65 fixed core from null json 2013-12-14 15:19:52 -07:00
Matt Wells
d85dbfb8e7 do not use safebuf in thread 2013-12-12 10:15:02 -07:00
Matt Wells
6f6c4aed84 minor admin.html edit. 2013-12-10 10:39:38 -07:00
Matt Wells
1a7d5e389b very minor admin.html edit 2013-12-10 00:56:56 -07:00
Matt Wells
ec2254d8ed added multi language support note to admin.html 2013-12-09 23:18:33 -07:00
Matt Wells
f7e7acb398 minor log msg updates.
updated admin.html to give some performance and
storage capacity info.
2013-12-09 23:16:24 -07:00
Matt Wells
0dcd1211d3 new opensource icon. 2013-12-08 19:47:39 -07:00
Matt Wells
92e3d841a6 minor update 2013-12-08 19:28:45 -07:00
Matt Wells
12404b4f85 doc updates 2013-12-08 19:26:48 -07:00
Matt Wells
25dd764dac Merge branch 'master' into diffbot
Conflicts:
	Makefile
	PageResults.cpp
2013-11-18 16:59:33 -08:00
Matt Wells
e27646c088 cleanup fixes. 2013-11-15 15:01:56 -07:00
Matt Wells
5e30728a3a new graphic icons. minor clean ups. 2013-11-15 14:47:05 -07:00
Matt Wells
afb5a2be64 Merge branch 'master' into diffbot 2013-11-06 10:18:04 -08:00
Matt Wells
5a5973a47f privacy.html update 2013-11-05 09:33:42 -07:00
Matt Wells
e4cce243de minor documentation updates 2013-10-30 19:43:35 -07:00
Matt Wells
240da39873 Merge branch 'master' into diffbot 2013-10-25 12:32:02 -07:00
Matt Wells
033a6ec578 update robots.txt 2013-10-21 22:11:11 -07:00
Matt Wells
dae005e4ae ensure dmoz info valid when making titlerec 2013-10-16 14:53:48 -07:00
Matt Wells
fc17521697 Merge branch 'master' into diffbot
Conflicts:
	Hostdb.cpp
	Makefile
	PageResults.cpp
	PageRoot.cpp
	Pages.cpp
	Rdb.cpp
	SearchInput.cpp
	SearchInput.h
	Spider.cpp
	Spider.h
	XmlDoc.cpp
2013-10-16 14:28:42 -07:00
mwells
876af6d8c6 dmoz support is now updated and re-integrated. 2013-10-13 16:53:28 -07:00
mwells
b60bdcc038 documentation updates. fixed sd=0. 2013-10-13 14:24:41 -07:00
mwells
c949bfe315 ignore certain errors and index the doc anyway
so we at least have it in our dmoz index with its
designated title and summary from dmoz.
2013-10-13 00:02:25 -07:00
mwells
63c7764cd1 c=dmoz3 to c=dmoz 2013-10-06 17:12:45 -07:00
mwells
183b7c372e make sections grow dynamically so we do not
OOM when trying to index a gbdmoz.urls.txt.* file
which can be 25MB.
2013-10-06 11:04:10 -06:00
mwells
612f2872f7 use addurl to add the gbdmoz url
files to gigablast. it should index
just those dmoz urls, and not spider their links.
it should ignore external errors like
ETCPTIMEDOUT when indexing so it will be
identical to dmoz.
2013-10-05 23:22:51 -06:00
mwells
d464066da4 use catdb/ not cat/ 2013-10-04 22:39:41 -06:00
mwells
71d5d05f7c use catdb/ subdir not cat/ for consistency. 2013-10-04 21:35:13 -06:00
mwells
78c4bda368 fix dmozparse urldump -s bugs
for dumping out urls in dmoz.
2013-10-04 00:00:26 -06:00
mwells
a0c79932bb catdb is now generated successfully. 2013-10-02 23:36:49 -06:00
mwells
43e4c939eb Merge branch 'master' into diffbot
Conflicts:
	Make.depend
2013-10-02 13:15:07 -06:00
mwells
c03e862b99 use a better version of hosts.conf where we
specify the working directory for each host
entry. then we can use the exact same hosts.conf
file for each gb instance rather than having to
change the single "working-dir:" directive for
each instance, in the case where the each have
a different working directory.
2013-10-02 13:11:58 -06:00
Matt Wells
c0f1330d70 Merge branch 'master' into diffbot
Conflicts:

	HttpServer.cpp
	Makefile
	PageGet.cpp
	Pages.h
	SafeBuf.h
2013-09-28 13:13:12 -07:00
mwells
5a52072888 recommended SSDs for optimal performance in admin.html. 2013-09-28 14:02:02 -06:00
mwells
f043cc67e4 Merge branch 'master' into diffbot
Conflicts:
	Spider.cpp
2013-09-26 22:43:27 -06:00
Matt Wells
fbd62cecba updated compilation instructions. need
to apt-get install gcc-multilib.
2013-09-20 10:06:01 -07:00
Matt Wells
47465f6d90 more fixes. trying to fix spiders to
spider multiple urls from same ip...
2013-09-19 11:13:40 -07:00
Matt Wells
5deda56ede minor documentation updates. 2013-09-15 22:16:14 -07:00
Matt Wells
3fdbae4b05 admin.html documentation update. 2013-09-15 22:05:01 -07:00
mwells
2211881e59 take apt-get install ssl stuff out of admin.html
installation instructions since we supply the
ssl headers now.
2013-09-15 18:27:47 -06:00
mwells
6332de2daf added link to compare.html comparison to SOLR
into documentation.
2013-08-21 13:14:17 -06:00
mwells
37a6549a58 updates to developer.html developer
documentation. removed a lot of obsolete
information. still needs more work.
2013-08-21 13:09:55 -06:00
mwells
8971d9b932 comment our urldb from developer.html
since no longer used.
2013-08-21 08:59:51 -06:00
mwells
6cf0497c2c added a little posdb documentation to
developer.html. posdb replaced indexdb
as the new index because it has word
position info as well as word field info.
2013-08-21 08:40:28 -06:00
mwells
7d3cc672c8 use ./gb blaster -u <fileofurls> to just inject urls,
but use -i to also add the outlinks to spiderdb.
2013-08-19 16:33:27 -06:00
mwells
3550bf2d8a compare.html update. 2013-08-19 16:21:01 -06:00
mwells
72d7e42497 added a quick start note to admin.html. 2013-08-19 15:34:07 -06:00
mwells
71aa03ab5d little admin.html update. 2013-08-19 13:45:43 -06:00
mwells
2c83b96ba4 Added support for 'gb blaster -i <fileofurls> <maxThreads>' to
inject/index a file of urls. Committing older work for
compare.html that shows differences between gigablast and solr,
but has a lot of blanks.
2013-08-19 13:26:46 -06:00
mwells
5facc7d859 add injection timing stat point to compare.html 2013-08-17 11:06:24 -06:00
mwells
4092177e5f added injectme3 file and documentation into compare.html
to describe how to inject a file of concatenated HTML
documents into gb. Still have to find out how to do that
in SOLR and elasticsearch for comparison.
2013-08-17 11:02:26 -06:00