Nov 20 2017 -- A distributed open source search engine and spider/crawler written in C/C++ for Linux on Intel/AMD. From gigablast dot com, which has binaries for download. See the README.md file at the very bottom of this page for instructions.
Go to file
mwells 72dc660598 Merge branch 'testing' into diffbot-matt
Conflicts:
	Collectiondb.cpp
	HttpRequest.h
	PageBasic.cpp
	coll.main.0/coll.conf
2014-04-09 11:18:39 -07:00
antiword-dir Initial file population. 2013-08-02 13:12:24 -07:00
coll.main.0 Merge branch 'testing' into diffbot-matt 2014-04-09 11:18:39 -07:00
html add diffbot support to admin doc 2014-04-07 14:24:52 -07:00
openssl we already include our own 32-bit 2013-09-15 18:25:49 -06:00
ucdata Initial file population. 2013-08-02 13:12:24 -07:00
Abbreviations.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Abbreviations.h change a couple of possible reserved names in C++ 2013-08-28 22:59:01 -06:00
Accessdb.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Accessdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Address.cpp Merge branch 'testing' into diffbot-testing 2014-03-10 12:08:23 -07:00
Address.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
addtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Ads.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Ads.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
AdultBit.cpp Initial file population. 2013-08-02 13:12:24 -07:00
AdultBit.h Initial file population. 2013-08-02 13:12:24 -07:00
animate.cpp Initial file population. 2013-08-02 13:12:24 -07:00
antiword Initial file population. 2013-08-02 13:12:24 -07:00
AutoBan.cpp fix a few minor bugs. 2014-03-16 10:34:58 -07:00
AutoBan.h Initial file population. 2013-08-02 13:12:24 -07:00
badcattable.dat Initial file population. 2013-08-02 13:12:24 -07:00
BigFile.cpp thread fixes. if pthread_create fails then 2014-03-15 20:07:02 -07:00
BigFile.h forgot to push the .h files 2013-12-07 22:12:48 -07:00
Bits.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Bits.h Initial file population. 2013-08-02 13:12:24 -07:00
blaster.cpp fixed bugs with advanced.html advanced search page. 2013-11-17 14:58:47 -07:00
Blaster.cpp for json docs only give them a single 2014-01-25 08:17:38 -08:00
Blaster.h use ./gb blaster -u <fileofurls> to just inject urls, 2013-08-19 16:33:27 -06:00
bmptopnm Initial file population. 2013-08-02 13:12:24 -07:00
Cachedb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Cachedb.h Initial file population. 2013-08-02 13:12:24 -07:00
camsort.cpp Initial file population. 2013-08-02 13:12:24 -07:00
catcountry.dat Initial file population. 2013-08-02 13:12:24 -07:00
Catdb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Catdb.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Categories.cpp documentation updates. fixed sd=0. 2013-10-13 14:24:41 -07:00
Categories.h documentation updates. fixed sd=0. 2013-10-13 14:24:41 -07:00
CatRec.cpp fix a couple catdb generation bugs. 2013-10-12 20:33:04 -07:00
CatRec.h Initial file population. 2013-08-02 13:12:24 -07:00
character-sets Initial file population. 2013-08-02 13:12:24 -07:00
check_unicode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Clusterdb.cpp thread fixes. if pthread_create fails then 2014-03-15 20:07:02 -07:00
Clusterdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Collectiondb.cpp Merge branch 'testing' into diffbot-matt 2014-04-09 11:18:39 -07:00
Collectiondb.h more updates 2014-04-09 11:03:31 -07:00
Conf.cpp Merge branch 'diffbot' into testing 2014-03-08 09:38:44 -07:00
Conf.h Merge branch 'testing' into diffbot-matt 2014-04-09 11:18:39 -07:00
convert.cpp Initial file population. 2013-08-02 13:12:24 -07:00
CountryCode.cpp fix pagecrawlbot.cpp to support &c=token-name. 2014-01-22 23:40:38 -08:00
CountryCode.h fix pagecrawlbot.cpp to support &c=token-name. 2014-01-22 23:40:38 -08:00
create_ucd_tables.cpp Initial file population. 2013-08-02 13:12:24 -07:00
DailyMerge.cpp Fixed some bugs. 2013-08-09 08:52:15 -07:00
DailyMerge.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
DataFeed.cpp Initial file population. 2013-08-02 13:12:24 -07:00
DataFeed.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Datedb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Datedb.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Dates.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
Dates.h Initial file population. 2013-08-02 13:12:24 -07:00
Diff.cpp Fixed some bugs. 2013-08-09 08:52:15 -07:00
Diff.h Initial file population. 2013-08-02 13:12:24 -07:00
Dir.cpp more parmdb fixes 2013-12-16 15:39:24 -08:00
Dir.h fix file descriptor leak in Dir class. 2013-11-19 13:41:56 -08:00
DiskPageCache.cpp thread fixes. if pthread_create fails then 2014-03-15 20:07:02 -07:00
DiskPageCache.h tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
dlstubs.c Initial file population. 2013-08-02 13:12:24 -07:00
dmozparse.cpp add support for noindex meta tag. 2013-10-12 22:50:23 -07:00
Dns.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Dns.h Initial file population. 2013-08-02 13:12:24 -07:00
DnsProtocol.h Initial file population. 2013-08-02 13:12:24 -07:00
dnstest.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
Domains.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Domains.h Initial file population. 2013-08-02 13:12:24 -07:00
dumpcore.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Entities.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Entities.h Initial file population. 2013-08-02 13:12:24 -07:00
Errno.cpp more updates 2014-04-09 11:03:31 -07:00
Errno.h more updates 2014-04-09 11:03:31 -07:00
errnotest.cpp errno test update 2013-11-19 00:10:10 -07:00
Events.h Initial file population. 2013-08-02 13:12:24 -07:00
Facebook.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Facebook.h new graphic icons. minor clean ups. 2013-11-15 14:47:05 -07:00
fastIndexTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
fctypes.cpp for json docs only give them a single 2014-01-25 08:17:38 -08:00
fctypes.h got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
File.cpp log cleanups mostly. 2013-12-18 10:57:18 -08:00
File.h tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
filterquerylogs.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Flags.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Flags.h Initial file population. 2013-08-02 13:12:24 -07:00
gb-include.h Initial file population. 2013-08-02 13:12:24 -07:00
gb.conf more bool query fixes 2014-03-18 10:44:56 -07:00
gb.pem so we have spider https sites add 2013-10-13 00:15:39 -07:00
gbfilter.cpp Initial file population. 2013-08-02 13:12:24 -07:00
gbtitletest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geneaology.cpp Initial file population. 2013-08-02 13:12:24 -07:00
generateSuperMergeCode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geo_ip_table.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geo_ip_table.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP_internal.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP.c Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIPCity.c Initial file population. 2013-08-02 13:12:24 -07:00
GeoIPCity.h Initial file population. 2013-08-02 13:12:24 -07:00
getsample.cpp Initial file population. 2013-08-02 13:12:24 -07:00
giftopnm Initial file population. 2013-08-02 13:12:24 -07:00
hash.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hash.h get "&site=abc.com+xyz.com"... working to restrict 2013-09-15 20:16:48 -07:00
HashTable.cpp fix core from last push. 2013-12-09 14:21:46 -07:00
HashTable.h mem labelling fixes. 2013-12-09 14:05:02 -07:00
HashTableT.cpp Initial file population. 2013-08-02 13:12:24 -07:00
HashTableT.h Initial file population. 2013-08-02 13:12:24 -07:00
HashTableX.cpp quite a few fixes to the quota system, cleanups etc. 2014-01-18 16:23:13 -08:00
HashTableX.h only skip checking to spider a url of its 2014-03-03 13:22:27 -08:00
hashtest2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hashtest3.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hashtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Highlight.cpp Merge branch 'master' into diffbot 2013-12-07 11:34:26 -07:00
Highlight.h trying to fix json decoding bug. 2013-10-24 17:55:01 -07:00
Hostdb.cpp Merge branch 'testing' into diffbot-matt 2014-04-09 11:18:39 -07:00
Hostdb.h create hosts.conf into cwd if not there. 2014-04-06 21:12:52 -07:00
hosts.cpp Initial file population. 2013-08-02 13:12:24 -07:00
HttpMime.cpp mem labelling fixes. 2013-12-09 14:05:02 -07:00
HttpMime.h update the dirty word list. but we still 2013-10-15 01:01:19 -07:00
HttpRequest.cpp parm updates 2014-02-10 21:45:03 -07:00
HttpRequest.h Merge branch 'testing' into diffbot-matt 2014-04-09 11:18:39 -07:00
HttpServer.cpp fix a few minor bugs. 2014-03-16 10:34:58 -07:00
HttpServer.h use &format=0 1 or 2 for html/xml/json now. 2013-11-08 18:00:30 -08:00
iana_charset.cpp Merge branch 'diffbot' into diffbot-testing 2013-12-16 11:06:11 -08:00
iana_charset.h Merge branch 'diffbot' into diffbot-testing 2013-12-16 11:06:11 -08:00
iconv.h Initial file population. 2013-08-02 13:12:24 -07:00
Images.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
Images.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Indexdb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Indexdb.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
IndexList.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexList.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexReadInfo.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexReadInfo.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable2.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
IndexTable2.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable.h Initial file population. 2013-08-02 13:12:24 -07:00
injectme3 added injectme3 file and documentation into compare.html 2013-08-17 11:02:26 -06:00
injector.cpp Initial file population. 2013-08-02 13:12:24 -07:00
iostream.h Initial file population. 2013-08-02 13:12:24 -07:00
ip.cpp fix old bug. 2014-01-10 18:52:47 -07:00
ip.h Initial file population. 2013-08-02 13:12:24 -07:00
ipconfig.cpp fixed some cores. brought in fixes from 2013-09-08 16:16:13 -06:00
Iso8859.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Iso8859.h Initial file population. 2013-08-02 13:12:24 -07:00
jointest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
jpegtopnm Initial file population. 2013-08-02 13:12:24 -07:00
Json.cpp fix bug of gbmin, gbmax etc. not working. 2014-03-26 11:56:06 -07:00
Json.h fixed contenthash32 logic for json objects. 2014-02-05 13:22:03 -08:00
keepalive.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Lang.cpp comment updates 2013-10-15 23:13:50 -07:00
Lang.h Initial file population. 2013-08-02 13:12:24 -07:00
LangList.cpp code cleanups. 2014-01-18 21:19:26 -08:00
LangList.h Initial file population. 2013-08-02 13:12:24 -07:00
Language.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Language.h Initial file population. 2013-08-02 13:12:24 -07:00
LanguageIdentifier.cpp Initial file population. 2013-08-02 13:12:24 -07:00
LanguageIdentifier.h Initial file population. 2013-08-02 13:12:24 -07:00
LanguagePages.cpp Initial file population. 2013-08-02 13:12:24 -07:00
LanguagePages.h Initial file population. 2013-08-02 13:12:24 -07:00
libc.a Initial file population. 2013-08-02 13:12:24 -07:00
libcrypto.a Initial file population. 2013-08-02 13:12:24 -07:00
libgcc.a Initial file population. 2013-08-02 13:12:24 -07:00
libiconv.a Initial file population. 2013-08-02 13:12:24 -07:00
libiconv.la Initial file population. 2013-08-02 13:12:24 -07:00
libm.a Initial file population. 2013-08-02 13:12:24 -07:00
libpthread.a Initial file population. 2013-08-02 13:12:24 -07:00
libssl.a Initial file population. 2013-08-02 13:12:24 -07:00
libstdc++.a Initial file population. 2013-08-02 13:12:24 -07:00
libz.a Initial file population. 2013-08-02 13:12:24 -07:00
LICENSE code checkpoint. time slicing, faster spider code 2014-02-04 17:34:43 -08:00
Linkdb.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
Linkdb.h fixes for new link info code so it doesn't 2014-02-25 10:55:05 -08:00
LinkedList.h Initial file population. 2013-08-02 13:12:24 -07:00
linkspam.cpp renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
linkspam.h Initial file population. 2013-08-02 13:12:24 -07:00
Log.cpp create hosts.conf into cwd if not there. 2014-04-06 21:12:52 -07:00
Log.h create hosts.conf into cwd if not there. 2014-04-06 21:12:52 -07:00
Loop.cpp tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
Loop.h Initial file population. 2013-08-02 13:12:24 -07:00
looptest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
main.cpp Merge branch 'testing' into diffbot-matt 2014-04-09 11:18:39 -07:00
Make.depend Merge branch 'diffbot' into testing 2014-03-08 09:38:44 -07:00
Makefile added some qa testing logic. qa.cpp. 2014-04-05 11:33:42 -07:00
malloc.c Initial file population. 2013-08-02 13:12:24 -07:00
matches2.cpp dirty word detector revisions. we need 2013-10-16 20:19:49 -07:00
matches2.h renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
Matches.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Matches.h Initial file population. 2013-08-02 13:12:24 -07:00
Mem.cpp create hosts.conf into cwd if not there. 2014-04-06 21:12:52 -07:00
Mem.h fix typo 2013-09-08 19:51:57 -07:00
membustest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPool.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPool.h Initial file population. 2013-08-02 13:12:24 -07:00
MemPoolTree.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPoolTree.h Initial file population. 2013-08-02 13:12:24 -07:00
memtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
mergetest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MetaContainer.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MetaContainer.h Initial file population. 2013-08-02 13:12:24 -07:00
Mime.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Mime.h Initial file population. 2013-08-02 13:12:24 -07:00
mixfile.cpp Initial file population. 2013-08-02 13:12:24 -07:00
mmseg.h Initial file population. 2013-08-02 13:12:24 -07:00
monitor.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
Monitordb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Monitordb.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg0.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg0.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg1.cpp get searching on token working 2014-03-06 17:01:41 -08:00
Msg1.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg1f.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg1f.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg2.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg2.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg2a.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg2a.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg2b.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg2b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg3.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg3.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg3a.cpp get searching on token working 2014-03-06 17:01:41 -08:00
Msg3a.h get searching on token working 2014-03-06 17:01:41 -08:00
Msg3e.cpp fix infinite loop from json parsing and 2013-09-27 17:52:36 -06:00
Msg3e.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg4.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
Msg4.h code checkpoint 2014-02-09 12:38:40 -07:00
Msg5.cpp fix critical title alloc/free bug 2014-03-28 08:01:01 -07:00
Msg5.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg6b.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg6b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg8b.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg8b.h Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg9b.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg9b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg13.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
Msg13.h do not download bulkjob urls in crawlbot. 2014-03-21 12:40:38 -07:00
Msg17.cpp first compiled stab at multi collection searching. 2014-03-06 10:45:13 -08:00
Msg17.h first compiled stab at multi collection searching. 2014-03-06 10:45:13 -08:00
Msg20.cpp first compiled stab at multi collection searching. 2014-03-06 10:45:13 -08:00
Msg20.h get searching on token working 2014-03-06 17:01:41 -08:00
Msg22.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg22.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg24.cpp new Make.depend. 2013-08-09 17:13:45 -06:00
Msg28.cpp fix core from (broad)casting valueless cgi field. 2013-10-03 14:51:59 -07:00
Msg28.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg30.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg30.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Msg35.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg35.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg36.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg36.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg37.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg37.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg39.cpp more bool query fixes. 2014-03-20 10:03:25 -07:00
Msg39.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg40.cpp fix core &token= core 2014-03-13 07:57:06 -07:00
Msg40.h get searching on token working 2014-03-06 17:01:41 -08:00
Msg40Cache.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg40Cache.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg42.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg42.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg51.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg51.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msgaa.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msgaa.h Initial file population. 2013-08-02 13:12:24 -07:00
MsgC.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
MsgC.h Initial file population. 2013-08-02 13:12:24 -07:00
Msge0.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msge0.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msge1.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
Msge1.h Initial file population. 2013-08-02 13:12:24 -07:00
Multicast.cpp a lot of times rdb tree has invalid collection 2014-01-21 19:01:44 -08:00
Multicast.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
mysynonyms.txt Initial file population. 2013-08-02 13:12:24 -07:00
numwords.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageAddColl.cpp more spiderdb spider request fixes 2014-01-19 18:00:56 -08:00
PageAddUrl.cpp updates 2014-03-12 08:09:45 -07:00
PageBasic.cpp Merge branch 'testing' into diffbot-matt 2014-04-09 11:18:39 -07:00
PageCatdb.cpp fix a few minor bugs. 2014-03-16 10:34:58 -07:00
PageCrawlBot.cpp more misc updates. 2014-04-05 18:09:04 -07:00
PageCrawlBot.h checkpoint 2014-02-09 15:09:48 -07:00
PageDirectory.cpp code checkpoint. time slicing, faster spider code 2014-02-04 17:34:43 -08:00
PageEvents.cpp fix a few minor bugs. 2014-03-16 10:34:58 -07:00
PageGet.cpp fix security system to actually work now 2014-02-12 00:06:00 -07:00
PageHosts.cpp host table cleanups 2014-03-16 17:14:47 -07:00
PageIndexdb.cpp fix a few minor bugs. 2014-03-16 10:34:58 -07:00
PageInject.cpp aesthetic cleanups 2014-03-16 17:12:04 -07:00
PageInject.h code cleanups. 2014-01-18 21:19:26 -08:00
PageLogView.cpp new printadmintop functionality. 2014-02-07 23:08:04 -07:00
PageNetTest.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
PageNetTest.h Initial file population. 2013-08-02 13:12:24 -07:00
PageOverview.cpp fix a few minor bugs. 2014-03-16 10:34:58 -07:00
PageParser.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
PageParser.h more fixes for new boolean logic. 2014-03-13 13:09:33 -07:00
PagePerf.cpp took out pagecount table. just hafta scan 2014-01-19 20:34:38 -08:00
PageReindex.cpp fix query reindex some more 2014-03-11 14:46:49 -07:00
PageReindex.h fix query reindex some more 2014-03-11 14:46:49 -07:00
PageResults.cpp more updates 2014-04-09 11:03:31 -07:00
PageResults.h specify &header=1 explicitly to get json serp header 2014-03-05 07:41:59 -08:00
PageRoot.cpp more misc updates. 2014-04-05 18:09:04 -07:00
Pages.cpp Merge branch 'testing' into diffbot-matt 2014-04-09 11:18:39 -07:00
Pages.h Merge branch 'testing' into diffbot-matt 2014-04-09 11:18:39 -07:00
PageSockets.cpp fix a few minor bugs. 2014-03-16 10:34:58 -07:00
PageSpam.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageStats.cpp fix a few minor bugs. 2014-03-16 10:34:58 -07:00
PageStatsdb.cpp formatting 2014-01-19 12:37:37 -08:00
PageSubmit.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageThesaurus.cpp fix a few minor bugs. 2014-03-16 10:34:58 -07:00
PageThreads.cpp formatting fixes 2014-01-19 00:57:20 -08:00
PageTitledb.cpp fix security system to actually work now 2014-02-12 00:06:00 -07:00
PageTurk.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageTurk.h Initial file population. 2013-08-02 13:12:24 -07:00
Parms.cpp Merge branch 'testing' into diffbot-matt 2014-04-09 11:18:39 -07:00
Parms.h more updates 2014-04-09 11:03:31 -07:00
parse_iana_charsets.pl move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
pdftohtml use the "onsite" keyword in your url filters 2013-09-06 09:37:17 -06:00
Phrases.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Phrases.h Initial file population. 2013-08-02 13:12:24 -07:00
PingServer.cpp Merge branch 'diffbot' into testing 2014-03-08 09:38:44 -07:00
PingServer.h added emergency msg box on all admin pages 2014-01-11 20:14:44 -08:00
Placedb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Placedb.h Initial file population. 2013-08-02 13:12:24 -07:00
pngtopnm Initial file population. 2013-08-02 13:12:24 -07:00
pnmscale Initial file population. 2013-08-02 13:12:24 -07:00
Pops.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pops.h Initial file population. 2013-08-02 13:12:24 -07:00
porter.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pos.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pos.h Initial file population. 2013-08-02 13:12:24 -07:00
Posdb.cpp more bool query fixes. 2014-03-20 10:03:25 -07:00
Posdb.h more bool query fixes. 2014-03-20 10:03:25 -07:00
postalCodes.txt Initial file population. 2013-08-02 13:12:24 -07:00
PostQueryRerank.cpp handle a bunch of oom conditions that 2013-11-20 10:14:02 -07:00
PostQueryRerank.h Initial file population. 2013-08-02 13:12:24 -07:00
ppmtojpeg Initial file population. 2013-08-02 13:12:24 -07:00
Process.cpp tuning the rebalance loop 2014-03-15 14:56:11 -07:00
Process.h tuning the rebalance loop 2014-03-15 14:56:11 -07:00
Profiler.cpp create hosts.conf into cwd if not there. 2014-04-06 21:12:52 -07:00
Profiler.h Initial file population. 2013-08-02 13:12:24 -07:00
Proxy.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
Proxy.h Initial file population. 2013-08-02 13:12:24 -07:00
pstotext Initial file population. 2013-08-02 13:12:24 -07:00
qa.cpp added some qa testing logic. qa.cpp. 2014-04-05 11:33:42 -07:00
QAClient.cpp Initial file population. 2013-08-02 13:12:24 -07:00
QAClient.h Initial file population. 2013-08-02 13:12:24 -07:00
quarantine.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Query.cpp more bool query fixes. 2014-03-20 10:03:25 -07:00
Query.h more bool query fixes. 2014-03-20 10:03:25 -07:00
Rdb.cpp fix core 2014-03-18 11:12:50 -07:00
Rdb.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
RdbBase.cpp thread fixes. if pthread_create fails then 2014-03-15 20:07:02 -07:00
RdbBase.h tuning the rebalance loop 2014-03-15 14:56:11 -07:00
RdbBuckets.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbBuckets.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbCache.cpp fix some cores. use olddoc contenthash 2014-02-07 18:28:09 -08:00
RdbCache.h removed MAX_COLL_RECS so we can have unlimited 2013-08-30 16:20:38 -07:00
RdbDump.cpp tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
RdbDump.h if coll is deleted or reset in a middle of a dump 2013-12-25 17:12:09 -08:00
RdbList.cpp tuning the rebalance loop 2014-03-15 14:56:11 -07:00
RdbList.h checkpoint for faster spider code. 2014-02-04 16:15:27 -08:00
RdbMap.cpp tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
RdbMap.h tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
RdbMem.cpp track down some nasty cores. fix 2013-10-29 16:37:14 -07:00
RdbMem.h now we can reset collection mid stream 2013-10-18 17:49:36 -07:00
RdbMerge.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
RdbMerge.h if coll is deleted or reset in a middle of a dump 2013-12-25 17:12:09 -08:00
RdbScan.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbScan.h Initial file population. 2013-08-02 13:12:24 -07:00
rdbtest2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
rdbtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbTree.cpp get streaming time sliced results working 2014-02-06 14:25:44 -08:00
RdbTree.h fix annoying rdbtree pos/neg key counting issue 2014-01-11 18:04:28 -08:00
README.md Update README.md 2013-11-16 20:14:06 -08:00
readRec.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Rebalance.cpp tuning the rebalance loop 2014-03-15 14:56:11 -07:00
Rebalance.h tight merge during rebalance to save 2014-03-14 23:37:30 -07:00
reindex2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Repair.cpp fix a few minor bugs. 2014-03-16 10:34:58 -07:00
Repair.h Initial file population. 2013-08-02 13:12:24 -07:00
RequestTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RequestTable.h Initial file population. 2013-08-02 13:12:24 -07:00
rescue.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Revdb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Revdb.h Initial file population. 2013-08-02 13:12:24 -07:00
rmbots.cpp Initial file population. 2013-08-02 13:12:24 -07:00
SafeBuf.cpp index numbers as integers too, not just floats 2014-02-06 20:57:54 -08:00
SafeBuf.h added some qa testing logic. qa.cpp. 2014-04-05 11:33:42 -07:00
SafeList.h Initial file population. 2013-08-02 13:12:24 -07:00
Sanity.h Initial file population. 2013-08-02 13:12:24 -07:00
Scores.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Scores.h Initial file population. 2013-08-02 13:12:24 -07:00
Scraper.cpp take out datedb. no longer used. we store 2014-01-09 13:39:28 -08:00
Scraper.h Initial file population. 2013-08-02 13:12:24 -07:00
SearchInput.cpp hack about 35%ish done 2014-04-08 19:34:43 -07:00
SearchInput.h added some qa testing logic. qa.cpp. 2014-04-05 11:33:42 -07:00
Sections.cpp more misc updates. 2014-04-05 18:09:04 -07:00
Sections.h get new global preemptive cache 2014-01-05 11:51:09 -08:00
seektest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
seo.h Initial file population. 2013-08-02 13:12:24 -07:00
SiteGetter.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
SiteGetter.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
sleepandlog.cpp Initial file population. 2013-08-02 13:12:24 -07:00
sort.cpp Initial file population. 2013-08-02 13:12:24 -07:00
sort.h Initial file population. 2013-08-02 13:12:24 -07:00
Speller.cpp clean up logging so i can see what's going on 2013-12-10 16:41:30 -08:00
Speller.h Initial file population. 2013-08-02 13:12:24 -07:00
Spider.cpp more misc updates. 2014-04-05 18:09:04 -07:00
Spider.h more new site list api fixes 2014-03-09 18:15:57 -07:00
Stats.cpp formatting 2014-01-19 12:37:37 -08:00
Stats.h more formatting 2014-01-19 01:09:38 -08:00
Statsdb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Statsdb.h fix potential problem of tons of points in 2013-10-14 22:52:29 -07:00
StopWords.cpp update common word list 2013-12-01 15:19:33 -07:00
StopWords.h Initial file population. 2013-08-02 13:12:24 -07:00
streambuf.h Initial file population. 2013-08-02 13:12:24 -07:00
Strings.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Strings.h Initial file population. 2013-08-02 13:12:24 -07:00
Summary.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Summary.h renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
superMergeTest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
supported_charsets.cpp Initial file population. 2013-08-02 13:12:24 -07:00
supported_charsets.txt Initial file population. 2013-08-02 13:12:24 -07:00
Syncdb.cpp minor fix 2014-03-07 08:07:09 -08:00
Syncdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Synonyms.cpp now we index all numbers that have field names 2013-11-08 16:16:13 -08:00
Synonyms.h now we index all numbers that have field names 2013-11-08 16:16:13 -08:00
Tagdb.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
Tagdb.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
TcpServer.cpp daemonize on ./gb 0 etc. 2014-04-06 15:57:38 -07:00
TcpServer.h fixes for streaming mode. 2014-02-06 16:28:42 -08:00
TcpSocket.h send single space to socket if not streaming 2014-02-13 08:45:13 -08:00
test2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_convert.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_hash.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_norm.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_parser2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_parser.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_unicode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Test.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
Test.h Initial file population. 2013-08-02 13:12:24 -07:00
testfloats.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Tfndb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Tfndb.h code cleanups. 2014-01-18 21:19:26 -08:00
Thesaurus.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Thesaurus.h Initial file population. 2013-08-02 13:12:24 -07:00
Threads.cpp daemonize on ./gb 0 etc. 2014-04-06 15:57:38 -07:00
Threads.h daemonize on ./gb 0 etc. 2014-04-06 15:57:38 -07:00
threadtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
thunder.cpp Initial file population. 2013-08-02 13:12:24 -07:00
tifftopnm Initial file population. 2013-08-02 13:12:24 -07:00
Timedb.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Timedb.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Timer.h Initial file population. 2013-08-02 13:12:24 -07:00
Title.cpp fix critical title alloc/free bug 2014-03-28 08:01:01 -07:00
Title.h fix core from getting title of json object 2014-02-28 08:18:09 -08:00
Titledb.cpp thread fixes. if pthread_create fails then 2014-03-15 20:07:02 -07:00
Titledb.h code cleanups. 2014-01-18 21:19:26 -08:00
TopTree.cpp index numbers as integers too, not just floats 2014-02-06 20:57:54 -08:00
TopTree.h more fixes for new boolean logic. 2014-03-13 13:09:33 -07:00
treetest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TuringTest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TuringTest.h Initial file population. 2013-08-02 13:12:24 -07:00
Turkdb.cpp Initial file population. 2013-08-02 13:12:24 -07:00
types.h fix compiler warning in types.h. 2013-09-08 20:00:52 -06:00
UCNormalizer.cpp code cleanups. 2014-01-18 21:19:26 -08:00
UCNormalizer.h Initial file population. 2013-08-02 13:12:24 -07:00
UCPropTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UCPropTable.h Initial file population. 2013-08-02 13:12:24 -07:00
UCWordIterator.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UCWordIterator.h Initial file population. 2013-08-02 13:12:24 -07:00
UdpProtocol.h Initial file population. 2013-08-02 13:12:24 -07:00
UdpServer.cpp Merge branch 'diffbot' into testing 2014-03-08 09:38:44 -07:00
UdpServer.h rebalancer working pretty well now 2014-01-15 19:08:47 -08:00
UdpSlot.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UdpSlot.h Initial file population. 2013-08-02 13:12:24 -07:00
udptest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Unicode.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Unicode.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
UnicodeProperties.cpp code cleanups. 2014-01-18 21:19:26 -08:00
UnicodeProperties.h Initial file population. 2013-08-02 13:12:24 -07:00
unifiedDict.txt Initial file population. 2013-08-02 13:12:24 -07:00
uniq2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Url.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Url.h Initial file population. 2013-08-02 13:12:24 -07:00
urlinfo.cpp fixed data corruption bug. m_finalCrawlDelay 2013-11-27 14:18:15 -08:00
Users.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Users.h Initial file population. 2013-08-02 13:12:24 -07:00
ValidPointer.cpp Initial file population. 2013-08-02 13:12:24 -07:00
ValidPointer.h Initial file population. 2013-08-02 13:12:24 -07:00
Vector.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Vector.h Initial file population. 2013-08-02 13:12:24 -07:00
Weights.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Weights.h Initial file population. 2013-08-02 13:12:24 -07:00
Wiki.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Wiki.h Initial file population. 2013-08-02 13:12:24 -07:00
wikititles.txt.part1 Initial file population. 2013-08-02 13:12:24 -07:00
wikititles.txt.part2 Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-buf.txt Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-lang.txt Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-syns.dat Initial file population. 2013-08-02 13:12:24 -07:00
Wiktionary.cpp Merge branch 'master' into diffbot 2013-11-20 15:51:58 -08:00
Wiktionary.h Initial file population. 2013-08-02 13:12:24 -07:00
Words.cpp fixed bugs in sort by prices, etc. 2013-11-11 18:58:45 -08:00
Words.h Initial file population. 2013-08-02 13:12:24 -07:00
Xml.cpp for json docs only give them a single 2014-01-25 08:17:38 -08:00
Xml.h for json docs only give them a single 2014-01-25 08:17:38 -08:00
XmlDoc.cpp more updates 2014-04-09 11:03:31 -07:00
XmlDoc.h hack about 35%ish done 2014-04-08 19:34:43 -07:00
XmlNode.cpp fixed cdata parsing issue 2013-12-19 16:04:53 -08:00
XmlNode.h Initial file population. 2013-08-02 13:12:24 -07:00
zconf.h Initial file population. 2013-08-02 13:12:24 -07:00
zlib.h Initial file population. 2013-08-02 13:12:24 -07:00

open-source-search-engine

An open source web and enterprise search engine. As can be seen on http://www.gigablast.com/ .

RUNNING GIGABLAST

See html/admin.html for all administrative documentation including the quick start instructions.

Alternatively, visit http://www.gigablast.com/admin.html

See html/compare.html for a comparison of Gigablast to SOLR. Although this is very sparse right now, it does include some useful commands.

CODE ARCHITECTURE

See html/developer.html for all code documentation.

Alternatively, visit http://www.gigablast.com/developer.html

CONTACT

Contact me for feature requests or help in general. I will work for free for good use cases. mattdwells@hotmail.com.