Nov 20 2017 -- A distributed open source search engine and spider/crawler written in C/C++ for Linux on Intel/AMD. From gigablast dot com, which has binaries for download. See the README.md file at the very bottom of this page for instructions.
Go to file
Matt Wells fe8ebd23a3 added simplified redirect urls to spiderdb
as a new spiderrequest. made XmlDoc::getLinks()
call m_links.set(redirUrl.getUrl()) so that it is
treated like an outlink on the page and gets added
from addOutlinkSpiderRecsToMetaList().
2013-10-17 12:06:12 -07:00
antiword-dir Initial file population. 2013-08-02 13:12:24 -07:00
coll.main.0 crawlbot api work. 2013-10-15 11:54:54 -06:00
html Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
openssl we already include our own 32-bit 2013-09-15 18:25:49 -06:00
ucdata Initial file population. 2013-08-02 13:12:24 -07:00
Abbreviations.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Abbreviations.h change a couple of possible reserved names in C++ 2013-08-28 22:59:01 -06:00
Accessdb.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Accessdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Address.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Address.h change a couple of possible reserved names in C++ 2013-08-28 22:59:01 -06:00
addtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Ads.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Ads.h Initial file population. 2013-08-02 13:12:24 -07:00
AdultBit.cpp Initial file population. 2013-08-02 13:12:24 -07:00
AdultBit.h Initial file population. 2013-08-02 13:12:24 -07:00
animate.cpp Initial file population. 2013-08-02 13:12:24 -07:00
antiword Initial file population. 2013-08-02 13:12:24 -07:00
AutoBan.cpp various fixes. 2013-09-16 10:16:49 -07:00
AutoBan.h Initial file population. 2013-08-02 13:12:24 -07:00
badcattable.dat Initial file population. 2013-08-02 13:12:24 -07:00
BigFile.cpp Initial file population. 2013-08-02 13:12:24 -07:00
BigFile.h Initial file population. 2013-08-02 13:12:24 -07:00
Bits.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Bits.h Initial file population. 2013-08-02 13:12:24 -07:00
blaster.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Blaster.cpp use ./gb blaster -u <fileofurls> to just inject urls, 2013-08-19 16:33:27 -06:00
Blaster.h use ./gb blaster -u <fileofurls> to just inject urls, 2013-08-19 16:33:27 -06:00
bmptopnm Initial file population. 2013-08-02 13:12:24 -07:00
Cachedb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Cachedb.h Initial file population. 2013-08-02 13:12:24 -07:00
camsort.cpp Initial file population. 2013-08-02 13:12:24 -07:00
catcountry.dat Initial file population. 2013-08-02 13:12:24 -07:00
Catdb.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Catdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Categories.cpp documentation updates. fixed sd=0. 2013-10-13 14:24:41 -07:00
Categories.h documentation updates. fixed sd=0. 2013-10-13 14:24:41 -07:00
CatRec.cpp fix a couple catdb generation bugs. 2013-10-12 20:33:04 -07:00
CatRec.h Initial file population. 2013-08-02 13:12:24 -07:00
character-sets Initial file population. 2013-08-02 13:12:24 -07:00
check_unicode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Clusterdb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Clusterdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Collectiondb.cpp fix crawlbot bugs 2013-10-16 12:12:22 -07:00
Collectiondb.h removed MAX_COLL_RECS so we can have unlimited 2013-08-30 16:20:38 -07:00
CollectionRec.cpp customizable api list in url filters 2013-09-30 09:18:22 -06:00
CollectionRec.h crawlbot api work. 2013-10-15 11:54:54 -06:00
Conf.cpp Merge branch 'master' into diffbot 2013-09-28 13:13:12 -07:00
Conf.h Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
convert.cpp Initial file population. 2013-08-02 13:12:24 -07:00
CountryCode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
CountryCode.h Initial file population. 2013-08-02 13:12:24 -07:00
create_ucd_tables.cpp Initial file population. 2013-08-02 13:12:24 -07:00
DailyMerge.cpp Fixed some bugs. 2013-08-09 08:52:15 -07:00
DailyMerge.h Initial file population. 2013-08-02 13:12:24 -07:00
DataFeed.cpp Initial file population. 2013-08-02 13:12:24 -07:00
DataFeed.h Initial file population. 2013-08-02 13:12:24 -07:00
Datedb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Datedb.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Dates.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Dates.h Initial file population. 2013-08-02 13:12:24 -07:00
Diff.cpp Fixed some bugs. 2013-08-09 08:52:15 -07:00
Diff.h Initial file population. 2013-08-02 13:12:24 -07:00
Dir.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Dir.h Initial file population. 2013-08-02 13:12:24 -07:00
DiskPageCache.cpp Initial file population. 2013-08-02 13:12:24 -07:00
DiskPageCache.h Initial file population. 2013-08-02 13:12:24 -07:00
dlstubs.c Initial file population. 2013-08-02 13:12:24 -07:00
dmozparse.cpp add support for noindex meta tag. 2013-10-12 22:50:23 -07:00
Dns.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Dns.h Initial file population. 2013-08-02 13:12:24 -07:00
DnsProtocol.h Initial file population. 2013-08-02 13:12:24 -07:00
dnstest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Domains.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Domains.h Initial file population. 2013-08-02 13:12:24 -07:00
dumpcore.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Entities.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Entities.h Initial file population. 2013-08-02 13:12:24 -07:00
Errno.cpp add spider reply even on g_errno now with an error 2013-09-29 09:22:20 -06:00
Errno.h add spider reply even on g_errno now with an error 2013-09-29 09:22:20 -06:00
Events.h Initial file population. 2013-08-02 13:12:24 -07:00
Facebook.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Facebook.h Initial file population. 2013-08-02 13:12:24 -07:00
fastIndexTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
fctypes.cpp fix core from calling a gettime related 2013-09-06 15:39:53 -06:00
fctypes.h Initial file population. 2013-08-02 13:12:24 -07:00
File.cpp couple fixes to makefile etc. 2013-09-28 16:37:39 -06:00
File.h Initial file population. 2013-08-02 13:12:24 -07:00
filterquerylogs.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Flags.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Flags.h Initial file population. 2013-08-02 13:12:24 -07:00
gb-include.h Initial file population. 2013-08-02 13:12:24 -07:00
gb.conf speed up dirty word detection since we added a bunch 2013-10-15 22:41:31 -07:00
gb.pem so we have spider https sites add 2013-10-13 00:15:39 -07:00
gbfilter.cpp Initial file population. 2013-08-02 13:12:24 -07:00
gbtitletest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geneaology.cpp Initial file population. 2013-08-02 13:12:24 -07:00
generateSuperMergeCode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geo_ip_table.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geo_ip_table.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP_internal.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP.c Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIPCity.c Initial file population. 2013-08-02 13:12:24 -07:00
GeoIPCity.h Initial file population. 2013-08-02 13:12:24 -07:00
getsample.cpp Initial file population. 2013-08-02 13:12:24 -07:00
giftopnm Initial file population. 2013-08-02 13:12:24 -07:00
hash.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hash.h get "&site=abc.com+xyz.com"... working to restrict 2013-09-15 20:16:48 -07:00
HashTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
HashTable.h Initial file population. 2013-08-02 13:12:24 -07:00
HashTableT.cpp Initial file population. 2013-08-02 13:12:24 -07:00
HashTableT.h Initial file population. 2013-08-02 13:12:24 -07:00
HashTableX.cpp spider speedups and fixes. 2013-09-25 11:58:03 -06:00
HashTableX.h spider speedups and fixes. 2013-09-25 11:58:03 -06:00
hashtest2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hashtest3.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hashtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Highlight.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Highlight.h Initial file population. 2013-08-02 13:12:24 -07:00
Hostdb.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Hostdb.h fix another bug from shard change. 2013-10-04 16:49:50 -07:00
hosts.conf have to use different ports if multiple gb 2013-10-02 16:12:17 -07:00
hosts.cpp Initial file population. 2013-08-02 13:12:24 -07:00
HttpMime.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
HttpMime.h update the dirty word list. but we still 2013-10-15 01:01:19 -07:00
HttpRequest.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
HttpRequest.h Merge branch 'master' into diffbot 2013-09-28 13:13:12 -07:00
HttpServer.cpp added 'gb emailmandrill' for testing. 2013-10-09 17:35:51 -06:00
HttpServer.h add sendEmailThroughMandrill() to send 2013-10-08 18:01:38 -07:00
iana_charset.cpp new crawlbot api. not backwards compatible any more. 2013-09-17 10:25:54 -07:00
iana_charset.h new crawlbot api. not backwards compatible any more. 2013-09-17 10:25:54 -07:00
iconv.h Initial file population. 2013-08-02 13:12:24 -07:00
Images.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Images.h Initial file population. 2013-08-02 13:12:24 -07:00
Indexdb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Indexdb.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
IndexList.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexList.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexReadInfo.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexReadInfo.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable2.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable.h Initial file population. 2013-08-02 13:12:24 -07:00
injectme3 added injectme3 file and documentation into compare.html 2013-08-17 11:02:26 -06:00
injector.cpp Initial file population. 2013-08-02 13:12:24 -07:00
iostream.h Initial file population. 2013-08-02 13:12:24 -07:00
ip.cpp Initial file population. 2013-08-02 13:12:24 -07:00
ip.h Initial file population. 2013-08-02 13:12:24 -07:00
ipconfig.cpp fixed some cores. brought in fixes from 2013-09-08 16:16:13 -06:00
Iso8859.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Iso8859.h Initial file population. 2013-08-02 13:12:24 -07:00
jointest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
jpegtopnm Initial file population. 2013-08-02 13:12:24 -07:00
Json.cpp fix crawlbot bugs 2013-10-16 12:12:22 -07:00
Json.h json indexing/hashing updates. 2013-10-16 15:41:12 -07:00
keepalive.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Lang.cpp comment updates 2013-10-15 23:13:50 -07:00
Lang.h Initial file population. 2013-08-02 13:12:24 -07:00
LangList.cpp Initial file population. 2013-08-02 13:12:24 -07:00
LangList.h Initial file population. 2013-08-02 13:12:24 -07:00
Language.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Language.h Initial file population. 2013-08-02 13:12:24 -07:00
LanguageIdentifier.cpp Initial file population. 2013-08-02 13:12:24 -07:00
LanguageIdentifier.h Initial file population. 2013-08-02 13:12:24 -07:00
LanguagePages.cpp Initial file population. 2013-08-02 13:12:24 -07:00
LanguagePages.h Initial file population. 2013-08-02 13:12:24 -07:00
libc.a Initial file population. 2013-08-02 13:12:24 -07:00
libcrypto.a Initial file population. 2013-08-02 13:12:24 -07:00
libgcc.a Initial file population. 2013-08-02 13:12:24 -07:00
libiconv.a Initial file population. 2013-08-02 13:12:24 -07:00
libiconv.la Initial file population. 2013-08-02 13:12:24 -07:00
libm.a Initial file population. 2013-08-02 13:12:24 -07:00
libpthread.a Initial file population. 2013-08-02 13:12:24 -07:00
libssl.a Initial file population. 2013-08-02 13:12:24 -07:00
libstdc++.a Initial file population. 2013-08-02 13:12:24 -07:00
libz.a Initial file population. 2013-08-02 13:12:24 -07:00
LICENSE exclude events and seo functionality. 2013-09-08 17:07:42 -06:00
Linkdb.cpp fix mem leak of LinkInfo. 2013-10-16 17:17:28 -07:00
Linkdb.h fix mem leak of LinkInfo. 2013-10-16 17:17:28 -07:00
LinkedList.h Initial file population. 2013-08-02 13:12:24 -07:00
linkspam.cpp renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
linkspam.h Initial file population. 2013-08-02 13:12:24 -07:00
Log.cpp fixed up thread/spider log msgs. 2013-08-29 21:15:42 -06:00
Log.h Initial file population. 2013-08-02 13:12:24 -07:00
Loop.cpp cleanup warnings in log. 2013-09-13 14:37:35 -07:00
Loop.h Initial file population. 2013-08-02 13:12:24 -07:00
looptest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
main.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Make.depend json indexing/hashing updates. 2013-10-16 15:41:12 -07:00
Makefile Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
malloc.c Initial file population. 2013-08-02 13:12:24 -07:00
matches2.cpp speed up dirty word detection since we added a bunch 2013-10-15 22:41:31 -07:00
matches2.h renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
Matches.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Matches.h Initial file population. 2013-08-02 13:12:24 -07:00
Mem.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Mem.h fix typo 2013-09-08 19:51:57 -07:00
membustest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPool.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPool.h Initial file population. 2013-08-02 13:12:24 -07:00
MemPoolTree.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPoolTree.h Initial file population. 2013-08-02 13:12:24 -07:00
memtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
mergetest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MetaContainer.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MetaContainer.h Initial file population. 2013-08-02 13:12:24 -07:00
Mime.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Mime.h Initial file population. 2013-08-02 13:12:24 -07:00
mixfile.cpp Initial file population. 2013-08-02 13:12:24 -07:00
mmseg.h Initial file population. 2013-08-02 13:12:24 -07:00
monitor.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Monitordb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Monitordb.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg0.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg0.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg1.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg1.h log fixes for debugging. try to 2013-10-02 22:37:20 -06:00
Msg1f.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg1f.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg2.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg2.h new &sites=xyz.com+abc.com+... functionality compiles ok. 2013-09-15 18:14:32 -06:00
Msg2a.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg2a.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg2b.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg2b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg3.cpp get "&site=abc.com+xyz.com"... working to restrict 2013-09-15 20:16:48 -07:00
Msg3.h almost done adding support for whitelists. 2013-09-15 15:15:56 -06:00
Msg3a.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg3a.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg3e.cpp fix infinite loop from json parsing and 2013-09-27 17:52:36 -06:00
Msg3e.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg4.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg4.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg5.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg5.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg6b.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg6b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg8b.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg8b.h Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg9b.cpp fix a couple catdb generation bugs. 2013-10-12 20:33:04 -07:00
Msg9b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg13.cpp fix core when getting new spider reply 2013-10-04 20:44:29 -07:00
Msg13.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg17.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg17.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg20.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg20.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg22.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg22.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg24.cpp new Make.depend. 2013-08-09 17:13:45 -06:00
Msg28.cpp fix core from (broad)casting valueless cgi field. 2013-10-03 14:51:59 -07:00
Msg28.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg30.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg30.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg35.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg35.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg36.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg36.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg37.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg37.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg39.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg39.h new &sites=xyz.com+abc.com+... functionality compiles ok. 2013-09-15 18:14:32 -06:00
Msg40.cpp trying to bring back dmoz integration. 2013-10-02 22:34:21 -06:00
Msg40.h trying to bring back dmoz integration. 2013-10-02 22:34:21 -06:00
Msg40Cache.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg40Cache.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg42.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg42.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg51.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg51.h Initial file population. 2013-08-02 13:12:24 -07:00
Msgaa.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msgaa.h Initial file population. 2013-08-02 13:12:24 -07:00
MsgC.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
MsgC.h Initial file population. 2013-08-02 13:12:24 -07:00
Msge0.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msge0.h Initial file population. 2013-08-02 13:12:24 -07:00
Msge1.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msge1.h Initial file population. 2013-08-02 13:12:24 -07:00
Multicast.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Multicast.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
mysynonyms.txt Initial file population. 2013-08-02 13:12:24 -07:00
numwords.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageAddColl.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageAddUrl.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
PageCatdb.cpp trying to bring back dmoz integration. 2013-10-02 22:34:21 -06:00
PageCrawlBot.cpp fix mem leak of LinkInfo. 2013-10-16 17:17:28 -07:00
PageCrawlBot.h got email and url notification code compiling. 2013-10-01 15:14:39 -06:00
PageDirectory.cpp fix dup bug. 2013-10-13 16:06:38 -07:00
PageEvents.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageGet.cpp show cached json objects as application/json 2013-10-16 17:54:17 -07:00
PageHosts.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
PageIndexdb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
PageInject.cpp crawlbot fixes. 2013-10-15 16:31:59 -07:00
PageInject.h crawlbot fixes. 2013-10-15 16:31:59 -07:00
PageLogin.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageLogView.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageNetTest.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
PageNetTest.h Initial file population. 2013-08-02 13:12:24 -07:00
PageOverview.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
PageParser.cpp fixed some cores. brought in fixes from 2013-09-08 16:16:13 -06:00
PageParser.h Initial file population. 2013-08-02 13:12:24 -07:00
PagePerf.cpp half way done fixing performance graph. 2013-10-13 22:02:21 -07:00
PageReindex.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageReindex.h Initial file population. 2013-08-02 13:12:24 -07:00
PageResults.cpp make : into . for indexing json names. 2013-10-16 17:43:46 -07:00
PageResults.h added searchbox for dmoz pages/sites. 2013-10-13 15:45:12 -07:00
PageRoot.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Pages.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Pages.h got email and url notification code compiling. 2013-10-01 15:14:39 -06:00
PageSockets.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageSpam.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageStats.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
PageStatsdb.cpp fix potential problem of tons of points in 2013-10-14 22:52:29 -07:00
PageSubmit.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageThesaurus.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageThreads.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageTitledb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
PageTurk.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageTurk.h Initial file population. 2013-08-02 13:12:24 -07:00
Parms.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Parms.h code checkpoint 2013-10-14 13:00:05 -06:00
parse_iana_charsets.pl Initial file population. 2013-08-02 13:12:24 -07:00
pdftohtml use the "onsite" keyword in your url filters 2013-09-06 09:37:17 -06:00
Phrases.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Phrases.h Initial file population. 2013-08-02 13:12:24 -07:00
PingServer.cpp added 'gb emailmandrill' for testing. 2013-10-09 17:35:51 -06:00
PingServer.h email and webhook alerts when spider runs out of urls 2013-10-09 11:42:56 -07:00
Placedb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Placedb.h Initial file population. 2013-08-02 13:12:24 -07:00
pngtopnm Initial file population. 2013-08-02 13:12:24 -07:00
pnmscale Initial file population. 2013-08-02 13:12:24 -07:00
Pops.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pops.h Initial file population. 2013-08-02 13:12:24 -07:00
porter.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pos.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pos.h Initial file population. 2013-08-02 13:12:24 -07:00
Posdb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Posdb.h speed up whitelist hashtable like 20x 2013-09-15 21:10:53 -07:00
postalCodes.txt Initial file population. 2013-08-02 13:12:24 -07:00
PostQueryRerank.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PostQueryRerank.h Initial file population. 2013-08-02 13:12:24 -07:00
ppmtojpeg Initial file population. 2013-08-02 13:12:24 -07:00
ppthtml Initial file population. 2013-08-02 13:12:24 -07:00
Process.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Process.h Initial file population. 2013-08-02 13:12:24 -07:00
Profiler.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Profiler.h Initial file population. 2013-08-02 13:12:24 -07:00
Proxy.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Proxy.h Initial file population. 2013-08-02 13:12:24 -07:00
pstotext Initial file population. 2013-08-02 13:12:24 -07:00
QAClient.cpp Initial file population. 2013-08-02 13:12:24 -07:00
QAClient.h Initial file population. 2013-08-02 13:12:24 -07:00
quarantine.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Query.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Query.h Initial file population. 2013-08-02 13:12:24 -07:00
Rdb.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Rdb.h Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
RdbBase.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
RdbBase.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbBuckets.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbBuckets.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbCache.cpp removed MAX_COLL_RECS so we can have unlimited 2013-08-30 16:20:38 -07:00
RdbCache.h removed MAX_COLL_RECS so we can have unlimited 2013-08-30 16:20:38 -07:00
RdbDump.cpp take out log msg 2013-10-09 11:51:39 -07:00
RdbDump.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbList.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbList.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbMap.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbMap.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbMem.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbMem.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbMerge.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbMerge.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbScan.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbScan.h Initial file population. 2013-08-02 13:12:24 -07:00
rdbtest2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
rdbtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbTree.cpp quite a few fixes. something still 2013-09-27 21:00:40 -06:00
RdbTree.h quite a few fixes. something still 2013-09-27 21:00:40 -06:00
README.md updated README.md to reference compare.html 2013-08-19 17:20:30 -06:00
readRec.cpp Initial file population. 2013-08-02 13:12:24 -07:00
reindex2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Repair.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Repair.h Initial file population. 2013-08-02 13:12:24 -07:00
RequestTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RequestTable.h Initial file population. 2013-08-02 13:12:24 -07:00
rescue.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Revdb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Revdb.h Initial file population. 2013-08-02 13:12:24 -07:00
rmbots.cpp Initial file population. 2013-08-02 13:12:24 -07:00
SafeBuf.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
SafeBuf.h Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
SafeList.h Initial file population. 2013-08-02 13:12:24 -07:00
Sanity.h Initial file population. 2013-08-02 13:12:24 -07:00
Scores.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Scores.h Initial file population. 2013-08-02 13:12:24 -07:00
Scraper.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Scraper.h Initial file population. 2013-08-02 13:12:24 -07:00
SearchInput.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
SearchInput.h Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Sections.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Sections.h make sections grow dynamically so we do not 2013-10-06 11:04:10 -06:00
seektest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
seo.h Initial file population. 2013-08-02 13:12:24 -07:00
SiteGetter.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
SiteGetter.h Initial file population. 2013-08-02 13:12:24 -07:00
sleepandlog.cpp Initial file population. 2013-08-02 13:12:24 -07:00
sort.cpp Initial file population. 2013-08-02 13:12:24 -07:00
sort.h Initial file population. 2013-08-02 13:12:24 -07:00
Speller.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Speller.h Initial file population. 2013-08-02 13:12:24 -07:00
Spider.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Spider.h Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Stats.cpp start using html div graph for 2013-10-14 20:35:45 -07:00
Stats.h remove old libplotter references 2013-10-13 23:48:07 -07:00
Statsdb.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Statsdb.h fix potential problem of tons of points in 2013-10-14 22:52:29 -07:00
StopWords.cpp Initial file population. 2013-08-02 13:12:24 -07:00
StopWords.h Initial file population. 2013-08-02 13:12:24 -07:00
streambuf.h Initial file population. 2013-08-02 13:12:24 -07:00
Strings.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Strings.h Initial file population. 2013-08-02 13:12:24 -07:00
Summary.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Summary.h renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
superMergeTest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
supported_charsets.cpp Initial file population. 2013-08-02 13:12:24 -07:00
supported_charsets.txt Initial file population. 2013-08-02 13:12:24 -07:00
Syncdb.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Syncdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Synonyms.cpp fix core from hashtablex::set() not getting 2013-09-15 21:15:58 -07:00
Synonyms.h Initial file population. 2013-08-02 13:12:24 -07:00
Tagdb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Tagdb.h Initial file population. 2013-08-02 13:12:24 -07:00
TcpServer.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
TcpServer.h Initial file population. 2013-08-02 13:12:24 -07:00
TcpSocket.h integrate diffbot from svn back into git. 2013-09-13 09:23:18 -07:00
test2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_convert.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_hash.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_norm.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_parser2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_parser.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_unicode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Test.cpp email and webhook alerts when spider runs out of urls 2013-10-09 11:42:56 -07:00
Test.h Initial file population. 2013-08-02 13:12:24 -07:00
testfloats.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Tfndb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Tfndb.h Initial file population. 2013-08-02 13:12:24 -07:00
Thesaurus.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Thesaurus.h Initial file population. 2013-08-02 13:12:24 -07:00
Threads.cpp cleanup warnings in log. 2013-09-13 14:37:35 -07:00
Threads.h when using pthreads block SIGIO 2013-08-21 15:01:26 -06:00
threadtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
thunder.cpp Initial file population. 2013-08-02 13:12:24 -07:00
tifftopnm Initial file population. 2013-08-02 13:12:24 -07:00
Timedb.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Timedb.h Initial file population. 2013-08-02 13:12:24 -07:00
Timer.h Initial file population. 2013-08-02 13:12:24 -07:00
Title.cpp Merge branch 'master' into diffbot 2013-09-16 09:05:37 -07:00
Title.h Initial file population. 2013-08-02 13:12:24 -07:00
Titledb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Titledb.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
TopTree.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TopTree.h Initial file population. 2013-08-02 13:12:24 -07:00
treetest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TuringTest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TuringTest.h Initial file population. 2013-08-02 13:12:24 -07:00
Turkdb.cpp Initial file population. 2013-08-02 13:12:24 -07:00
types.h fix compiler warning in types.h. 2013-09-08 20:00:52 -06:00
UCNormalizer.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UCNormalizer.h Initial file population. 2013-08-02 13:12:24 -07:00
UCPropTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UCPropTable.h Initial file population. 2013-08-02 13:12:24 -07:00
UCWordIterator.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UCWordIterator.h Initial file population. 2013-08-02 13:12:24 -07:00
UdpProtocol.h Initial file population. 2013-08-02 13:12:24 -07:00
UdpServer.cpp fix core from trying to get the time 2013-09-01 12:55:22 -06:00
UdpServer.h Initial file population. 2013-08-02 13:12:24 -07:00
UdpSlot.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UdpSlot.h Initial file population. 2013-08-02 13:12:24 -07:00
udptest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Unicode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Unicode.h Initial file population. 2013-08-02 13:12:24 -07:00
UnicodeProperties.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UnicodeProperties.h Initial file population. 2013-08-02 13:12:24 -07:00
unifiedDict.txt Initial file population. 2013-08-02 13:12:24 -07:00
uniq2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Url.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Url.h Initial file population. 2013-08-02 13:12:24 -07:00
urlinfo.cpp just ignore all urls with # (hashtag) in them 2013-10-03 23:33:55 -06:00
Users.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Users.h Initial file population. 2013-08-02 13:12:24 -07:00
ValidPointer.cpp Initial file population. 2013-08-02 13:12:24 -07:00
ValidPointer.h Initial file population. 2013-08-02 13:12:24 -07:00
Vector.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Vector.h Initial file population. 2013-08-02 13:12:24 -07:00
Weights.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Weights.h Initial file population. 2013-08-02 13:12:24 -07:00
Wiki.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Wiki.h Initial file population. 2013-08-02 13:12:24 -07:00
wikititles.txt.part1 Initial file population. 2013-08-02 13:12:24 -07:00
wikititles.txt.part2 Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-buf.txt Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-lang.txt Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-syns.dat Initial file population. 2013-08-02 13:12:24 -07:00
Wiktionary.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Wiktionary.h Initial file population. 2013-08-02 13:12:24 -07:00
Words.cpp speed up whitelist hashtable like 20x 2013-09-15 21:10:53 -07:00
Words.h Initial file population. 2013-08-02 13:12:24 -07:00
Xml.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Xml.h Initial file population. 2013-08-02 13:12:24 -07:00
XmlDoc.cpp added simplified redirect urls to spiderdb 2013-10-17 12:06:12 -07:00
XmlDoc.h added simplified redirect urls to spiderdb 2013-10-17 12:06:12 -07:00
XmlNode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
XmlNode.h Initial file population. 2013-08-02 13:12:24 -07:00
zconf.h Initial file population. 2013-08-02 13:12:24 -07:00
zlib.h Initial file population. 2013-08-02 13:12:24 -07:00

open-source-search-engine

An open source web and enterprise search engine. As can be seen http://www.gigablast.com/

RUNNING GIGABLAST

See html/admin.html for all administrative documentation including the quick start instructions.

Alternatively, visit http://www.gigablast.com/admin.html

See html/compare.html for a comparison of Gigablast to SOLR. Although this is very sparse right now, it does include some useful commands.

CODE ARCHITECTURE

See html/developer.html for all code documentation.

Alternatively, visit http://www.gigablast.com/developer.html

CONTACT

Contact me for feature requests or help in general. I will work for free for good use cases. mattdwells@hotmail.com.