Nov 20 2017 -- A distributed open source search engine and spider/crawler written in C/C++ for Linux on Intel/AMD. From gigablast dot com, which has binaries for download. See the README.md file at the very bottom of this page for instructions.
Go to file
Matt Wells b83dd59913 fix bug when we nuke a collnum
from a tree right in the middle of when
saving rdb trees in process.cpp.
2013-10-30 12:27:08 -07:00
antiword-dir Initial file population. 2013-08-02 13:12:24 -07:00
coll.main.0 added "retrictDomain" parm which defaults to 1. 2013-10-29 09:31:57 -07:00
html Merge branch 'master' into diffbot 2013-10-25 12:32:02 -07:00
openssl we already include our own 32-bit 2013-09-15 18:25:49 -06:00
ucdata Initial file population. 2013-08-02 13:12:24 -07:00
Abbreviations.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Abbreviations.h change a couple of possible reserved names in C++ 2013-08-28 22:59:01 -06:00
Accessdb.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Accessdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Address.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Address.h change a couple of possible reserved names in C++ 2013-08-28 22:59:01 -06:00
addtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Ads.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Ads.h Initial file population. 2013-08-02 13:12:24 -07:00
AdultBit.cpp Initial file population. 2013-08-02 13:12:24 -07:00
AdultBit.h Initial file population. 2013-08-02 13:12:24 -07:00
animate.cpp Initial file population. 2013-08-02 13:12:24 -07:00
antiword Initial file population. 2013-08-02 13:12:24 -07:00
AutoBan.cpp various fixes. 2013-09-16 10:16:49 -07:00
AutoBan.h Initial file population. 2013-08-02 13:12:24 -07:00
badcattable.dat Initial file population. 2013-08-02 13:12:24 -07:00
BigFile.cpp Initial file population. 2013-08-02 13:12:24 -07:00
BigFile.h Initial file population. 2013-08-02 13:12:24 -07:00
Bits.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Bits.h Initial file population. 2013-08-02 13:12:24 -07:00
blaster.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Blaster.cpp use ./gb blaster -u <fileofurls> to just inject urls, 2013-08-19 16:33:27 -06:00
Blaster.h use ./gb blaster -u <fileofurls> to just inject urls, 2013-08-19 16:33:27 -06:00
bmptopnm Initial file population. 2013-08-02 13:12:24 -07:00
Cachedb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Cachedb.h Initial file population. 2013-08-02 13:12:24 -07:00
camsort.cpp Initial file population. 2013-08-02 13:12:24 -07:00
catcountry.dat Initial file population. 2013-08-02 13:12:24 -07:00
Catdb.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Catdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Categories.cpp documentation updates. fixed sd=0. 2013-10-13 14:24:41 -07:00
Categories.h documentation updates. fixed sd=0. 2013-10-13 14:24:41 -07:00
CatRec.cpp fix a couple catdb generation bugs. 2013-10-12 20:33:04 -07:00
CatRec.h Initial file population. 2013-08-02 13:12:24 -07:00
character-sets Initial file population. 2013-08-02 13:12:24 -07:00
check_unicode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Clusterdb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Clusterdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Collectiondb.cpp fix bug when we nuke a collnum 2013-10-30 12:27:08 -07:00
Collectiondb.h removed MAX_COLL_RECS so we can have unlimited 2013-08-30 16:20:38 -07:00
CollectionRec.cpp better crawl status reporting. 2013-10-30 10:00:46 -07:00
CollectionRec.h better crawl status reporting. 2013-10-30 10:00:46 -07:00
Conf.cpp Merge branch 'master' into diffbot 2013-09-28 13:13:12 -07:00
Conf.h Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
convert.cpp Initial file population. 2013-08-02 13:12:24 -07:00
CountryCode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
CountryCode.h Initial file population. 2013-08-02 13:12:24 -07:00
create_ucd_tables.cpp Initial file population. 2013-08-02 13:12:24 -07:00
DailyMerge.cpp Fixed some bugs. 2013-08-09 08:52:15 -07:00
DailyMerge.h Initial file population. 2013-08-02 13:12:24 -07:00
DataFeed.cpp Initial file population. 2013-08-02 13:12:24 -07:00
DataFeed.h Initial file population. 2013-08-02 13:12:24 -07:00
Datedb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Datedb.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Dates.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Dates.h Initial file population. 2013-08-02 13:12:24 -07:00
Diff.cpp Fixed some bugs. 2013-08-09 08:52:15 -07:00
Diff.h Initial file population. 2013-08-02 13:12:24 -07:00
Dir.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Dir.h Initial file population. 2013-08-02 13:12:24 -07:00
DiskPageCache.cpp Initial file population. 2013-08-02 13:12:24 -07:00
DiskPageCache.h Initial file population. 2013-08-02 13:12:24 -07:00
dlstubs.c Initial file population. 2013-08-02 13:12:24 -07:00
dmozparse.cpp add support for noindex meta tag. 2013-10-12 22:50:23 -07:00
Dns.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Dns.h Initial file population. 2013-08-02 13:12:24 -07:00
DnsProtocol.h Initial file population. 2013-08-02 13:12:24 -07:00
dnstest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Domains.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Domains.h Initial file population. 2013-08-02 13:12:24 -07:00
dumpcore.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Entities.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Entities.h Initial file population. 2013-08-02 13:12:24 -07:00
Errno.cpp add spider reply even on g_errno now with an error 2013-09-29 09:22:20 -06:00
Errno.h add spider reply even on g_errno now with an error 2013-09-29 09:22:20 -06:00
Events.h Initial file population. 2013-08-02 13:12:24 -07:00
Facebook.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Facebook.h Initial file population. 2013-08-02 13:12:24 -07:00
fastIndexTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
fctypes.cpp fix core from calling a gettime related 2013-09-06 15:39:53 -06:00
fctypes.h Initial file population. 2013-08-02 13:12:24 -07:00
File.cpp couple fixes to makefile etc. 2013-09-28 16:37:39 -06:00
File.h Initial file population. 2013-08-02 13:12:24 -07:00
filterquerylogs.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Flags.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Flags.h Initial file population. 2013-08-02 13:12:24 -07:00
gb-include.h Initial file population. 2013-08-02 13:12:24 -07:00
gb.conf fix respider frequency bug. 2013-10-21 15:06:23 -07:00
gb.pem so we have spider https sites add 2013-10-13 00:15:39 -07:00
gbfilter.cpp Initial file population. 2013-08-02 13:12:24 -07:00
gbtitletest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geneaology.cpp Initial file population. 2013-08-02 13:12:24 -07:00
generateSuperMergeCode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geo_ip_table.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geo_ip_table.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP_internal.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP.c Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIPCity.c Initial file population. 2013-08-02 13:12:24 -07:00
GeoIPCity.h Initial file population. 2013-08-02 13:12:24 -07:00
getsample.cpp Initial file population. 2013-08-02 13:12:24 -07:00
giftopnm Initial file population. 2013-08-02 13:12:24 -07:00
hash.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hash.h get "&site=abc.com+xyz.com"... working to restrict 2013-09-15 20:16:48 -07:00
HashTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
HashTable.h Initial file population. 2013-08-02 13:12:24 -07:00
HashTableT.cpp Initial file population. 2013-08-02 13:12:24 -07:00
HashTableT.h Initial file population. 2013-08-02 13:12:24 -07:00
HashTableX.cpp spider speedups and fixes. 2013-09-25 11:58:03 -06:00
HashTableX.h spider speedups and fixes. 2013-09-25 11:58:03 -06:00
hashtest2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hashtest3.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hashtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Highlight.cpp trying to fix json decoding bug. 2013-10-24 17:55:01 -07:00
Highlight.h trying to fix json decoding bug. 2013-10-24 17:55:01 -07:00
Hostdb.cpp num-mirrors: updates 2013-10-24 14:59:35 -07:00
Hostdb.h fix another bug from shard change. 2013-10-04 16:49:50 -07:00
hosts.conf minor msg update 2013-10-29 15:26:32 -07:00
hosts.cpp Initial file population. 2013-08-02 13:12:24 -07:00
HttpMime.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
HttpMime.h update the dirty word list. but we still 2013-10-15 01:01:19 -07:00
HttpRequest.cpp made webhook return the crawl name 2013-10-28 22:03:10 -07:00
HttpRequest.h Merge branch 'master' into diffbot 2013-09-28 13:13:12 -07:00
HttpServer.cpp /v2/bulk api fixes 2013-10-22 18:51:09 -07:00
HttpServer.h add sendEmailThroughMandrill() to send 2013-10-08 18:01:38 -07:00
iana_charset.cpp new crawlbot api. not backwards compatible any more. 2013-09-17 10:25:54 -07:00
iana_charset.h new crawlbot api. not backwards compatible any more. 2013-09-17 10:25:54 -07:00
iconv.h Initial file population. 2013-08-02 13:12:24 -07:00
Images.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Images.h Initial file population. 2013-08-02 13:12:24 -07:00
Indexdb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Indexdb.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
IndexList.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexList.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexReadInfo.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexReadInfo.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable2.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable.h Initial file population. 2013-08-02 13:12:24 -07:00
injectme3 added injectme3 file and documentation into compare.html 2013-08-17 11:02:26 -06:00
injector.cpp Initial file population. 2013-08-02 13:12:24 -07:00
iostream.h Initial file population. 2013-08-02 13:12:24 -07:00
ip.cpp Initial file population. 2013-08-02 13:12:24 -07:00
ip.h Initial file population. 2013-08-02 13:12:24 -07:00
ipconfig.cpp fixed some cores. brought in fixes from 2013-09-08 16:16:13 -06:00
Iso8859.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Iso8859.h Initial file population. 2013-08-02 13:12:24 -07:00
jointest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
jpegtopnm Initial file population. 2013-08-02 13:12:24 -07:00
Json.cpp a few bug fixes 2013-10-17 18:59:00 -07:00
Json.h a few bug fixes 2013-10-17 18:59:00 -07:00
keepalive.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Lang.cpp comment updates 2013-10-15 23:13:50 -07:00
Lang.h Initial file population. 2013-08-02 13:12:24 -07:00
LangList.cpp Initial file population. 2013-08-02 13:12:24 -07:00
LangList.h Initial file population. 2013-08-02 13:12:24 -07:00
Language.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Language.h Initial file population. 2013-08-02 13:12:24 -07:00
LanguageIdentifier.cpp Initial file population. 2013-08-02 13:12:24 -07:00
LanguageIdentifier.h Initial file population. 2013-08-02 13:12:24 -07:00
LanguagePages.cpp Initial file population. 2013-08-02 13:12:24 -07:00
LanguagePages.h Initial file population. 2013-08-02 13:12:24 -07:00
libc.a Initial file population. 2013-08-02 13:12:24 -07:00
libcrypto.a Initial file population. 2013-08-02 13:12:24 -07:00
libgcc.a Initial file population. 2013-08-02 13:12:24 -07:00
libiconv.a Initial file population. 2013-08-02 13:12:24 -07:00
libiconv.la Initial file population. 2013-08-02 13:12:24 -07:00
libm.a Initial file population. 2013-08-02 13:12:24 -07:00
libpthread.a Initial file population. 2013-08-02 13:12:24 -07:00
libssl.a Initial file population. 2013-08-02 13:12:24 -07:00
libstdc++.a Initial file population. 2013-08-02 13:12:24 -07:00
libz.a Initial file population. 2013-08-02 13:12:24 -07:00
LICENSE exclude events and seo functionality. 2013-09-08 17:07:42 -06:00
Linkdb.cpp fix mem leak of LinkInfo. 2013-10-16 17:17:28 -07:00
Linkdb.h fix mem leak of LinkInfo. 2013-10-16 17:17:28 -07:00
LinkedList.h Initial file population. 2013-08-02 13:12:24 -07:00
linkspam.cpp renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
linkspam.h Initial file population. 2013-08-02 13:12:24 -07:00
Log.cpp fixed up thread/spider log msgs. 2013-08-29 21:15:42 -06:00
Log.h Initial file population. 2013-08-02 13:12:24 -07:00
Loop.cpp cleanup warnings in log. 2013-09-13 14:37:35 -07:00
Loop.h Initial file population. 2013-08-02 13:12:24 -07:00
looptest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
main.cpp Merge branch 'master' into diffbot 2013-10-25 12:32:02 -07:00
Make.depend fix collection resetting. 2013-10-18 15:21:00 -07:00
Makefile spider round updates correction 2013-10-17 17:18:05 -07:00
malloc.c Initial file population. 2013-08-02 13:12:24 -07:00
matches2.cpp dirty word detector revisions. we need 2013-10-16 20:19:49 -07:00
matches2.h renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
Matches.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Matches.h Initial file population. 2013-08-02 13:12:24 -07:00
Mem.cpp fix crawl round end detection etc. 2013-10-23 15:53:59 -07:00
Mem.h fix typo 2013-09-08 19:51:57 -07:00
membustest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPool.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPool.h Initial file population. 2013-08-02 13:12:24 -07:00
MemPoolTree.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPoolTree.h Initial file population. 2013-08-02 13:12:24 -07:00
memtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
mergetest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MetaContainer.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MetaContainer.h Initial file population. 2013-08-02 13:12:24 -07:00
Mime.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Mime.h Initial file population. 2013-08-02 13:12:24 -07:00
mixfile.cpp Initial file population. 2013-08-02 13:12:24 -07:00
mmseg.h Initial file population. 2013-08-02 13:12:24 -07:00
monitor.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Monitordb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Monitordb.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg0.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg0.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg1.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg1.h log fixes for debugging. try to 2013-10-02 22:37:20 -06:00
Msg1f.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg1f.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg2.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg2.h new &sites=xyz.com+abc.com+... functionality compiles ok. 2013-09-15 18:14:32 -06:00
Msg2a.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg2a.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg2b.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg2b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg3.cpp get "&site=abc.com+xyz.com"... working to restrict 2013-09-15 20:16:48 -07:00
Msg3.h almost done adding support for whitelists. 2013-09-15 15:15:56 -06:00
Msg3a.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg3a.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg3e.cpp fix infinite loop from json parsing and 2013-09-27 17:52:36 -06:00
Msg3e.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg4.cpp fix double round increment bug. 2013-10-24 14:05:39 -07:00
Msg4.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg5.cpp try to fix core from spiderdb scan coming back to 2013-10-29 16:51:21 -07:00
Msg5.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg6b.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg6b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg8b.cpp fixes when crawling on distributed 2x2 2013-10-25 14:54:24 -07:00
Msg8b.h Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg9b.cpp fix a couple catdb generation bugs. 2013-10-12 20:33:04 -07:00
Msg9b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg13.cpp fix core when getting new spider reply 2013-10-04 20:44:29 -07:00
Msg13.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg17.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg17.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg20.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg20.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg22.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg22.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg24.cpp new Make.depend. 2013-08-09 17:13:45 -06:00
Msg28.cpp fix core from (broad)casting valueless cgi field. 2013-10-03 14:51:59 -07:00
Msg28.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg30.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg30.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg35.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg35.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg36.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg36.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg37.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg37.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg39.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg39.h new &sites=xyz.com+abc.com+... functionality compiles ok. 2013-09-15 18:14:32 -06:00
Msg40.cpp trying to bring back dmoz integration. 2013-10-02 22:34:21 -06:00
Msg40.h trying to bring back dmoz integration. 2013-10-02 22:34:21 -06:00
Msg40Cache.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg40Cache.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg42.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg42.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg51.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg51.h Initial file population. 2013-08-02 13:12:24 -07:00
Msgaa.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msgaa.h Initial file population. 2013-08-02 13:12:24 -07:00
MsgC.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
MsgC.h Initial file population. 2013-08-02 13:12:24 -07:00
Msge0.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msge0.h Initial file population. 2013-08-02 13:12:24 -07:00
Msge1.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msge1.h Initial file population. 2013-08-02 13:12:24 -07:00
Multicast.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Multicast.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
mysynonyms.txt Initial file population. 2013-08-02 13:12:24 -07:00
numwords.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageAddColl.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageAddUrl.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
PageCatdb.cpp trying to bring back dmoz integration. 2013-10-02 22:34:21 -06:00
PageCrawlBot.cpp better crawl status reporting. 2013-10-30 10:00:46 -07:00
PageCrawlBot.h added "seeds" to json reply. store seed urls 2013-10-21 17:35:14 -07:00
PageDirectory.cpp fix dup bug. 2013-10-13 16:06:38 -07:00
PageEvents.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageGet.cpp trying to fix json decoding bug. 2013-10-24 17:55:01 -07:00
PageHosts.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
PageIndexdb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
PageInject.cpp crawlbot fixes. 2013-10-15 16:31:59 -07:00
PageInject.h crawlbot fixes. 2013-10-15 16:31:59 -07:00
PageLogin.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageLogView.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageNetTest.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
PageNetTest.h Initial file population. 2013-08-02 13:12:24 -07:00
PageOverview.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
PageParser.cpp fix collection resetting. 2013-10-18 15:21:00 -07:00
PageParser.h Initial file population. 2013-08-02 13:12:24 -07:00
PagePerf.cpp half way done fixing performance graph. 2013-10-13 22:02:21 -07:00
PageReindex.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageReindex.h Initial file population. 2013-08-02 13:12:24 -07:00
PageResults.cpp trying to fix json decoding bug. 2013-10-24 17:55:01 -07:00
PageResults.h added searchbox for dmoz pages/sites. 2013-10-13 15:45:12 -07:00
PageRoot.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Pages.cpp /v2/bulk api fixes 2013-10-22 18:51:09 -07:00
Pages.h got email and url notification code compiling. 2013-10-01 15:14:39 -06:00
PageSockets.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageSpam.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageStats.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
PageStatsdb.cpp fix potential problem of tons of points in 2013-10-14 22:52:29 -07:00
PageSubmit.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageThesaurus.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageThreads.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageTitledb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
PageTurk.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageTurk.h Initial file population. 2013-08-02 13:12:24 -07:00
Parms.cpp added "retrictDomain" parm which defaults to 1. 2013-10-29 09:31:57 -07:00
Parms.h code checkpoint 2013-10-14 13:00:05 -06:00
parse_iana_charsets.pl Initial file population. 2013-08-02 13:12:24 -07:00
pdftohtml use the "onsite" keyword in your url filters 2013-09-06 09:37:17 -06:00
Phrases.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Phrases.h Initial file population. 2013-08-02 13:12:24 -07:00
PingServer.cpp better crawl status reporting. 2013-10-30 10:00:46 -07:00
PingServer.h better crawl status reporting. 2013-10-30 10:00:46 -07:00
Placedb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Placedb.h Initial file population. 2013-08-02 13:12:24 -07:00
pngtopnm Initial file population. 2013-08-02 13:12:24 -07:00
pnmscale Initial file population. 2013-08-02 13:12:24 -07:00
Pops.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pops.h Initial file population. 2013-08-02 13:12:24 -07:00
porter.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pos.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pos.h Initial file population. 2013-08-02 13:12:24 -07:00
Posdb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Posdb.h speed up whitelist hashtable like 20x 2013-09-15 21:10:53 -07:00
postalCodes.txt Initial file population. 2013-08-02 13:12:24 -07:00
PostQueryRerank.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PostQueryRerank.h Initial file population. 2013-08-02 13:12:24 -07:00
ppmtojpeg Initial file population. 2013-08-02 13:12:24 -07:00
ppthtml Initial file population. 2013-08-02 13:12:24 -07:00
Process.cpp fix bug when we nuke a collnum 2013-10-30 12:27:08 -07:00
Process.h Initial file population. 2013-08-02 13:12:24 -07:00
Profiler.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Profiler.h Initial file population. 2013-08-02 13:12:24 -07:00
Proxy.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Proxy.h Initial file population. 2013-08-02 13:12:24 -07:00
pstotext Initial file population. 2013-08-02 13:12:24 -07:00
QAClient.cpp Initial file population. 2013-08-02 13:12:24 -07:00
QAClient.h Initial file population. 2013-08-02 13:12:24 -07:00
quarantine.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Query.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Query.h Initial file population. 2013-08-02 13:12:24 -07:00
Rdb.cpp fix bug when we nuke a collnum 2013-10-30 12:27:08 -07:00
Rdb.h fix bug when we nuke a collnum 2013-10-30 12:27:08 -07:00
RdbBase.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
RdbBase.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbBuckets.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbBuckets.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbCache.cpp now we can reset collection mid stream 2013-10-18 17:49:36 -07:00
RdbCache.h removed MAX_COLL_RECS so we can have unlimited 2013-08-30 16:20:38 -07:00
RdbDump.cpp Merge branch 'master' into diffbot 2013-10-25 12:32:02 -07:00
RdbDump.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbList.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbList.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbMap.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbMap.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbMem.cpp track down some nasty cores. fix 2013-10-29 16:37:14 -07:00
RdbMem.h now we can reset collection mid stream 2013-10-18 17:49:36 -07:00
RdbMerge.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbMerge.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbScan.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbScan.h Initial file population. 2013-08-02 13:12:24 -07:00
rdbtest2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
rdbtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbTree.cpp track down some nasty cores. fix 2013-10-29 16:37:14 -07:00
RdbTree.h fix a couple collection related bugs 2013-10-21 11:38:33 -07:00
README.md updated README.md to reference compare.html 2013-08-19 17:20:30 -06:00
readRec.cpp Initial file population. 2013-08-02 13:12:24 -07:00
reindex2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Repair.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Repair.h Initial file population. 2013-08-02 13:12:24 -07:00
RequestTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RequestTable.h Initial file population. 2013-08-02 13:12:24 -07:00
rescue.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Revdb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Revdb.h Initial file population. 2013-08-02 13:12:24 -07:00
rmbots.cpp Initial file population. 2013-08-02 13:12:24 -07:00
SafeBuf.cpp fix that json RE-encoding bug 2013-10-24 18:09:35 -07:00
SafeBuf.h Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
SafeList.h Initial file population. 2013-08-02 13:12:24 -07:00
Sanity.h Initial file population. 2013-08-02 13:12:24 -07:00
Scores.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Scores.h Initial file population. 2013-08-02 13:12:24 -07:00
Scraper.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Scraper.h Initial file population. 2013-08-02 13:12:24 -07:00
SearchInput.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
SearchInput.h Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Sections.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Sections.h make sections grow dynamically so we do not 2013-10-06 11:04:10 -06:00
seektest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
seo.h Initial file population. 2013-08-02 13:12:24 -07:00
SiteGetter.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
SiteGetter.h Initial file population. 2013-08-02 13:12:24 -07:00
sleepandlog.cpp Initial file population. 2013-08-02 13:12:24 -07:00
sort.cpp Initial file population. 2013-08-02 13:12:24 -07:00
sort.h Initial file population. 2013-08-02 13:12:24 -07:00
Speller.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Speller.h Initial file population. 2013-08-02 13:12:24 -07:00
Spider.cpp fix compiler error 2013-10-30 10:06:54 -07:00
Spider.h better crawl status reporting. 2013-10-30 10:00:46 -07:00
Stats.cpp start using html div graph for 2013-10-14 20:35:45 -07:00
Stats.h remove old libplotter references 2013-10-13 23:48:07 -07:00
Statsdb.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Statsdb.h fix potential problem of tons of points in 2013-10-14 22:52:29 -07:00
StopWords.cpp Initial file population. 2013-08-02 13:12:24 -07:00
StopWords.h Initial file population. 2013-08-02 13:12:24 -07:00
streambuf.h Initial file population. 2013-08-02 13:12:24 -07:00
Strings.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Strings.h Initial file population. 2013-08-02 13:12:24 -07:00
Summary.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Summary.h renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
superMergeTest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
supported_charsets.cpp Initial file population. 2013-08-02 13:12:24 -07:00
supported_charsets.txt Initial file population. 2013-08-02 13:12:24 -07:00
Syncdb.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Syncdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Synonyms.cpp fix core from hashtablex::set() not getting 2013-09-15 21:15:58 -07:00
Synonyms.h Initial file population. 2013-08-02 13:12:24 -07:00
Tagdb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Tagdb.h Initial file population. 2013-08-02 13:12:24 -07:00
TcpServer.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
TcpServer.h Initial file population. 2013-08-02 13:12:24 -07:00
TcpSocket.h integrate diffbot from svn back into git. 2013-09-13 09:23:18 -07:00
test2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_convert.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_hash.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_norm.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_parser2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_parser.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_unicode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Test.cpp email and webhook alerts when spider runs out of urls 2013-10-09 11:42:56 -07:00
Test.h Initial file population. 2013-08-02 13:12:24 -07:00
testfloats.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Tfndb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Tfndb.h Initial file population. 2013-08-02 13:12:24 -07:00
Thesaurus.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Thesaurus.h Initial file population. 2013-08-02 13:12:24 -07:00
Threads.cpp cleanup warnings in log. 2013-09-13 14:37:35 -07:00
Threads.h when using pthreads block SIGIO 2013-08-21 15:01:26 -06:00
threadtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
thunder.cpp Initial file population. 2013-08-02 13:12:24 -07:00
tifftopnm Initial file population. 2013-08-02 13:12:24 -07:00
Timedb.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Timedb.h Initial file population. 2013-08-02 13:12:24 -07:00
Timer.h Initial file population. 2013-08-02 13:12:24 -07:00
Title.cpp Merge branch 'master' into diffbot 2013-09-16 09:05:37 -07:00
Title.h Initial file population. 2013-08-02 13:12:24 -07:00
Titledb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Titledb.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
TopTree.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TopTree.h Initial file population. 2013-08-02 13:12:24 -07:00
treetest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TuringTest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TuringTest.h Initial file population. 2013-08-02 13:12:24 -07:00
Turkdb.cpp Initial file population. 2013-08-02 13:12:24 -07:00
types.h fix compiler warning in types.h. 2013-09-08 20:00:52 -06:00
UCNormalizer.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UCNormalizer.h Initial file population. 2013-08-02 13:12:24 -07:00
UCPropTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UCPropTable.h Initial file population. 2013-08-02 13:12:24 -07:00
UCWordIterator.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UCWordIterator.h Initial file population. 2013-08-02 13:12:24 -07:00
UdpProtocol.h Initial file population. 2013-08-02 13:12:24 -07:00
UdpServer.cpp fix core from trying to get the time 2013-09-01 12:55:22 -06:00
UdpServer.h Initial file population. 2013-08-02 13:12:24 -07:00
UdpSlot.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UdpSlot.h Initial file population. 2013-08-02 13:12:24 -07:00
udptest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Unicode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Unicode.h Initial file population. 2013-08-02 13:12:24 -07:00
UnicodeProperties.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UnicodeProperties.h Initial file population. 2013-08-02 13:12:24 -07:00
unifiedDict.txt Initial file population. 2013-08-02 13:12:24 -07:00
uniq2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Url.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Url.h Initial file population. 2013-08-02 13:12:24 -07:00
urlinfo.cpp just ignore all urls with # (hashtag) in them 2013-10-03 23:33:55 -06:00
Users.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Users.h Initial file population. 2013-08-02 13:12:24 -07:00
ValidPointer.cpp Initial file population. 2013-08-02 13:12:24 -07:00
ValidPointer.h Initial file population. 2013-08-02 13:12:24 -07:00
Vector.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Vector.h Initial file population. 2013-08-02 13:12:24 -07:00
Weights.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Weights.h Initial file population. 2013-08-02 13:12:24 -07:00
Wiki.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Wiki.h Initial file population. 2013-08-02 13:12:24 -07:00
wikititles.txt.part1 Initial file population. 2013-08-02 13:12:24 -07:00
wikititles.txt.part2 Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-buf.txt Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-lang.txt Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-syns.dat Initial file population. 2013-08-02 13:12:24 -07:00
Wiktionary.cpp remove debug point. 2013-10-20 10:25:26 -07:00
Wiktionary.h Initial file population. 2013-08-02 13:12:24 -07:00
Words.cpp speed up whitelist hashtable like 20x 2013-09-15 21:10:53 -07:00
Words.h Initial file population. 2013-08-02 13:12:24 -07:00
Xml.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Xml.h Initial file population. 2013-08-02 13:12:24 -07:00
XmlDoc.cpp track down some nasty cores. fix 2013-10-29 16:37:14 -07:00
XmlDoc.h just selecting a url to crawl should 2013-10-28 22:38:15 -07:00
XmlNode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
XmlNode.h Initial file population. 2013-08-02 13:12:24 -07:00
zconf.h Initial file population. 2013-08-02 13:12:24 -07:00
zlib.h Initial file population. 2013-08-02 13:12:24 -07:00

open-source-search-engine

An open source web and enterprise search engine. As can be seen http://www.gigablast.com/

RUNNING GIGABLAST

See html/admin.html for all administrative documentation including the quick start instructions.

Alternatively, visit http://www.gigablast.com/admin.html

See html/compare.html for a comparison of Gigablast to SOLR. Although this is very sparse right now, it does include some useful commands.

CODE ARCHITECTURE

See html/developer.html for all code documentation.

Alternatively, visit http://www.gigablast.com/developer.html

CONTACT

Contact me for feature requests or help in general. I will work for free for good use cases. mattdwells@hotmail.com.