Nov 20 2017 -- A distributed open source search engine and spider/crawler written in C/C++ for Linux on Intel/AMD. From gigablast dot com, which has binaries for download. See the README.md file at the very bottom of this page for instructions.
Go to file
Matt Wells b393a1bbbe Merge branch 'testing' into diffbot-matt
Conflicts:
	Errno.cpp
	Errno.h
2014-07-10 10:06:55 -07:00
antiword-dir Initial file population. 2013-08-02 13:12:24 -07:00
diffbot-widget widget updates 2014-04-21 09:21:28 -07:00
html fix dmoz building. 2014-07-05 22:20:15 -07:00
openssl we already include our own 32-bit 2013-09-15 18:25:49 -06:00
ucdata Initial file population. 2013-08-02 13:12:24 -07:00
Abbreviations.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Abbreviations.h change a couple of possible reserved names in C++ 2013-08-28 22:59:01 -06:00
Accessdb.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Accessdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Address.cpp squidproxycache/floaters/sectiondbtagging all compiles. 2014-06-11 17:57:28 -07:00
Address.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
addtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Ads.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Ads.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
AdultBit.cpp Initial file population. 2013-08-02 13:12:24 -07:00
AdultBit.h Initial file population. 2013-08-02 13:12:24 -07:00
animate.cpp Initial file population. 2013-08-02 13:12:24 -07:00
antiword fix ulimit and antiword bugs 2014-06-18 04:06:20 -07:00
AutoBan.cpp fix a few minor bugs. 2014-03-16 10:34:58 -07:00
AutoBan.h Initial file population. 2013-08-02 13:12:24 -07:00
badcattable.dat Initial file population. 2013-08-02 13:12:24 -07:00
BigFile.cpp thread fixes. if pthread_create fails then 2014-03-15 20:07:02 -07:00
BigFile.h make code compile cleaner. 2014-06-07 14:11:12 -07:00
Bits.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Bits.h Initial file population. 2013-08-02 13:12:24 -07:00
blaster2.cpp cygwin updates 2014-06-07 14:58:57 -07:00
Blaster.cpp for json docs only give them a single 2014-01-25 08:17:38 -08:00
Blaster.h use ./gb blaster -u <fileofurls> to just inject urls, 2013-08-19 16:33:27 -06:00
bmptopnm Initial file population. 2013-08-02 13:12:24 -07:00
Cachedb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Cachedb.h Initial file population. 2013-08-02 13:12:24 -07:00
camsort.cpp Initial file population. 2013-08-02 13:12:24 -07:00
catcountry.dat Initial file population. 2013-08-02 13:12:24 -07:00
Catdb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Catdb.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Categories.cpp documentation updates. fixed sd=0. 2013-10-13 14:24:41 -07:00
Categories.h documentation updates. fixed sd=0. 2013-10-13 14:24:41 -07:00
CatRec.cpp fix a couple catdb generation bugs. 2013-10-12 20:33:04 -07:00
CatRec.h Initial file population. 2013-08-02 13:12:24 -07:00
changelog use changelog in binary packages 2014-07-05 18:51:49 -07:00
character-sets Initial file population. 2013-08-02 13:12:24 -07:00
check_unicode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Clusterdb.cpp thread fixes. if pthread_create fails then 2014-03-15 20:07:02 -07:00
Clusterdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Collectiondb.cpp Merge branch 'testing' into diffbot-matt 2014-07-10 10:06:55 -07:00
Collectiondb.h Merge branch 'testing' into diffbot-matt 2014-07-10 10:06:55 -07:00
Conf.cpp some setup for qaspider() 2014-07-08 20:33:13 -07:00
Conf.h Merge branch 'testing' into diffbot-matt 2014-07-09 10:23:27 -07:00
control.deb package bldg updates 2014-06-16 21:50:32 -06:00
convert.cpp Initial file population. 2013-08-02 13:12:24 -07:00
copyright.head package bldg updates 2014-06-16 21:50:32 -06:00
copyright.tail package bldg updates 2014-06-16 21:50:32 -06:00
CountryCode.cpp make code compile cleaner. 2014-06-07 14:11:12 -07:00
CountryCode.h fix pagecrawlbot.cpp to support &c=token-name. 2014-01-22 23:40:38 -08:00
create_ucd_tables.cpp Initial file population. 2013-08-02 13:12:24 -07:00
DailyMerge.cpp Fixed some bugs. 2013-08-09 08:52:15 -07:00
DailyMerge.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
DataFeed.cpp Initial file population. 2013-08-02 13:12:24 -07:00
DataFeed.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Datedb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Datedb.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Dates.cpp Merge branch 'testing' into diffbot-matt 2014-07-07 09:49:59 -07:00
Dates.h timezone fix for atotime1() et al 2014-07-02 14:06:43 -07:00
Diff.cpp Fixed some bugs. 2013-08-09 08:52:15 -07:00
Diff.h Initial file population. 2013-08-02 13:12:24 -07:00
Dir.cpp more parmdb fixes 2013-12-16 15:39:24 -08:00
Dir.h fix file descriptor leak in Dir class. 2013-11-19 13:41:56 -08:00
DiskPageCache.cpp nothing 2014-06-18 08:09:02 -06:00
DiskPageCache.h tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
dlstubs.c Initial file population. 2013-08-02 13:12:24 -07:00
dmozparse.cpp fix dmoz building. 2014-07-05 22:20:15 -07:00
Dns.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Dns.h Initial file population. 2013-08-02 13:12:24 -07:00
DnsProtocol.h Initial file population. 2013-08-02 13:12:24 -07:00
dnstest.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
Domains.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Domains.h Initial file population. 2013-08-02 13:12:24 -07:00
dumpcore.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Entities.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Entities.h Initial file population. 2013-08-02 13:12:24 -07:00
Errno.cpp Merge branch 'testing' into diffbot-matt 2014-07-10 10:06:55 -07:00
Errno.h Merge branch 'testing' into diffbot-matt 2014-07-10 10:06:55 -07:00
errnotest.cpp errno test update 2013-11-19 00:10:10 -07:00
Events.h Initial file population. 2013-08-02 13:12:24 -07:00
Facebook.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Facebook.h fixes for page inject 2014-06-15 08:26:27 -07:00
fastIndexTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
fctypes.cpp updates to compile cleanly on redhat. 2014-05-24 23:58:12 -04:00
fctypes.h updates to compile cleanly on redhat. 2014-05-24 23:58:12 -04:00
File.cpp File::set() fix for //'s 2014-06-08 15:24:30 -07:00
File.h tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
filterquerylogs.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Flags.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Flags.h Initial file population. 2013-08-02 13:12:24 -07:00
gb-1.0.spec make it so we don't need --nodeps with 2014-05-25 22:08:46 -04:00
gb-include.h compiler cleanups for cygwin compile 2014-06-07 14:20:04 -07:00
gb.deb.rules if netpbm pkg already installed use it. 2014-07-06 09:54:28 -07:00
gb.pem so we have spider https sites add 2013-10-13 00:15:39 -07:00
gbfilter.cpp Initial file population. 2013-08-02 13:12:24 -07:00
gbtitletest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geneaology.cpp Initial file population. 2013-08-02 13:12:24 -07:00
generateSuperMergeCode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geo_ip_table.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geo_ip_table.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP_internal.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP.c Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIPCity.c Initial file population. 2013-08-02 13:12:24 -07:00
GeoIPCity.h Initial file population. 2013-08-02 13:12:24 -07:00
getsample.cpp Initial file population. 2013-08-02 13:12:24 -07:00
giftopnm Initial file population. 2013-08-02 13:12:24 -07:00
hash.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hash.h get "&site=abc.com+xyz.com"... working to restrict 2013-09-15 20:16:48 -07:00
HashTable.cpp fix core from last push. 2013-12-09 14:21:46 -07:00
HashTable.h mem labelling fixes. 2013-12-09 14:05:02 -07:00
HashTableT.cpp Initial file population. 2013-08-02 13:12:24 -07:00
HashTableT.h Initial file population. 2013-08-02 13:12:24 -07:00
HashTableX.cpp quite a few fixes to the quota system, cleanups etc. 2014-01-18 16:23:13 -08:00
HashTableX.h new facet crap compiling now. 2014-06-20 12:28:50 -07:00
hashtest2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hashtest3.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hashtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Highlight.cpp Merge branch 'master' into diffbot 2013-12-07 11:34:26 -07:00
Highlight.h trying to fix json decoding bug. 2013-10-24 17:55:01 -07:00
Hostdb.cpp shard gbfacetstr:gbxpathsitehash123456 terms by termid for speed. 2014-07-07 12:32:27 -07:00
Hostdb.h shard gbfacetstr:gbxpathsitehash123456 terms by termid for speed. 2014-07-07 12:32:27 -07:00
hosts.cpp Initial file population. 2013-08-02 13:12:24 -07:00
HttpMime.cpp Merge branch 'testing' into diffbot-matt 2014-07-09 10:23:27 -07:00
HttpMime.h Merge branch 'testing' into diffbot-matt 2014-06-13 11:00:09 -07:00
HttpRequest.cpp return error if we get CONNECT requests. we don't 2014-07-09 11:06:46 -07:00
HttpRequest.h Merge branch 'testing' into diffbot-matt 2014-06-27 17:17:14 -07:00
HttpServer.cpp Merge branch 'testing' into diffbot-matt 2014-07-10 10:06:55 -07:00
HttpServer.h Merge branch 'testing' into diffbot-matt 2014-07-10 10:06:55 -07:00
iana_charset.cpp merge diffbot-testing 2014-04-09 20:10:30 -07:00
iana_charset.h merge diffbot-testing 2014-04-09 20:10:30 -07:00
iconv.h Initial file population. 2013-08-02 13:12:24 -07:00
Images.cpp if netpbm pkg already installed use it. 2014-07-06 09:54:28 -07:00
Images.h support og:image images. allow user to 2014-07-04 15:33:27 -07:00
Indexdb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Indexdb.h retry if too man docids deduped when &stream=1 2014-05-01 17:07:31 -07:00
IndexList.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexList.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexReadInfo.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexReadInfo.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable2.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
IndexTable2.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable.h Initial file population. 2013-08-02 13:12:24 -07:00
init.gb.conf minor make install changes 2014-05-22 18:46:38 -07:00
injectme3 added injectme3 file and documentation into compare.html 2013-08-17 11:02:26 -06:00
injector.cpp Initial file population. 2013-08-02 13:12:24 -07:00
iostream.h Initial file population. 2013-08-02 13:12:24 -07:00
ip.cpp fix old bug. 2014-01-10 18:52:47 -07:00
ip.h Initial file population. 2013-08-02 13:12:24 -07:00
ipconfig.cpp fixed some cores. brought in fixes from 2013-09-08 16:16:13 -06:00
Iso8859.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Iso8859.h Initial file population. 2013-08-02 13:12:24 -07:00
jointest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
jpegtopnm Initial file population. 2013-08-02 13:12:24 -07:00
Json.cpp fix json parser core from bad json. 2014-06-16 06:56:16 -07:00
Json.h v3 support for tokenized diffbot replies 2014-05-12 16:13:24 -07:00
keepalive.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Lang.cpp comment updates 2013-10-15 23:13:50 -07:00
Lang.h when user searches for a word without the 2014-06-01 09:37:00 -07:00
LangList.cpp code cleanups. 2014-01-18 21:19:26 -08:00
LangList.h Initial file population. 2013-08-02 13:12:24 -07:00
Language.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Language.h Initial file population. 2013-08-02 13:12:24 -07:00
LanguageIdentifier.cpp Initial file population. 2013-08-02 13:12:24 -07:00
LanguageIdentifier.h Initial file population. 2013-08-02 13:12:24 -07:00
LanguagePages.cpp Initial file population. 2013-08-02 13:12:24 -07:00
LanguagePages.h Initial file population. 2013-08-02 13:12:24 -07:00
libc.a Initial file population. 2013-08-02 13:12:24 -07:00
libcrypto.a turn off hearbeats when compiling openssl libs 2014-04-22 16:39:40 -07:00
libgcc.a Initial file population. 2013-08-02 13:12:24 -07:00
libiconv.a Initial file population. 2013-08-02 13:12:24 -07:00
libiconv.la Initial file population. 2013-08-02 13:12:24 -07:00
libjpeg.so.62 thumbnail generation support back in. 2014-04-24 10:13:45 -07:00
libm.a Initial file population. 2013-08-02 13:12:24 -07:00
libnetpbm.so.10 thumbnail generation support back in. 2014-04-24 10:13:45 -07:00
libpng12.so.0 thumbnail generation support back in. 2014-04-24 10:13:45 -07:00
libpthread.a Initial file population. 2013-08-02 13:12:24 -07:00
libssl.a turn off hearbeats when compiling openssl libs 2014-04-22 16:39:40 -07:00
libstdc++.a Initial file population. 2013-08-02 13:12:24 -07:00
libtiff.so.4 thumbnail generation support back in. 2014-04-24 10:13:45 -07:00
libz.a Initial file population. 2013-08-02 13:12:24 -07:00
libz.so.1 thumbnail generation support back in. 2014-04-24 10:13:45 -07:00
LICENSE license fix 2014-06-16 13:52:51 -07:00
Linkdb.cpp disambiguate error msg 2014-05-26 10:46:10 -07:00
Linkdb.h fixes for new link info code so it doesn't 2014-02-25 10:55:05 -08:00
LinkedList.h Initial file population. 2013-08-02 13:12:24 -07:00
linkspam.cpp renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
linkspam.h Initial file population. 2013-08-02 13:12:24 -07:00
Log.cpp try to fix msg22 based cores 2014-05-14 07:46:32 -07:00
Log.h keep thumbnail gen msgs in the log file 2014-07-04 08:34:42 -07:00
Loop.cpp cygwin updates 2014-06-07 14:37:21 -07:00
Loop.h updates to compile cleanly on redhat. 2014-05-24 23:58:12 -04:00
looptest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
main.cpp Merge branch 'testing' into diffbot-matt 2014-07-09 10:23:27 -07:00
Make.depend Merge branch 'testing' into diffbot-matt 2014-06-27 17:17:14 -07:00
Makefile Merge branch 'testing' into diffbot-matt 2014-07-07 09:49:59 -07:00
malloc.c Initial file population. 2013-08-02 13:12:24 -07:00
matches2.cpp dirty word detector revisions. we need 2013-10-16 20:19:49 -07:00
matches2.h renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
Matches.cpp fix annoying bug when adding new parms. 2014-06-10 12:29:50 -07:00
Matches.h Initial file population. 2013-08-02 13:12:24 -07:00
Mem.cpp Merge branch 'testing' into diffbot-matt 2014-06-27 17:17:14 -07:00
Mem.h fix getPitPosLL() error causing 2014-05-28 07:35:05 -07:00
membustest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPool.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPool.h Initial file population. 2013-08-02 13:12:24 -07:00
MemPoolTree.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPoolTree.h Initial file population. 2013-08-02 13:12:24 -07:00
memtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
mergetest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MetaContainer.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MetaContainer.h Initial file population. 2013-08-02 13:12:24 -07:00
Mime.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Mime.h Initial file population. 2013-08-02 13:12:24 -07:00
mixfile.cpp Initial file population. 2013-08-02 13:12:24 -07:00
mmseg.h Initial file population. 2013-08-02 13:12:24 -07:00
monitor.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
Monitordb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Monitordb.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg0.cpp retry if too man docids deduped when &stream=1 2014-05-01 17:07:31 -07:00
Msg0.h retry if too man docids deduped when &stream=1 2014-05-01 17:07:31 -07:00
Msg1.cpp get searching on token working 2014-03-06 17:01:41 -08:00
Msg1.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg1f.cpp if logging to stderr then return err when trying to 2014-07-05 14:16:33 -07:00
Msg1f.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg2.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg2.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg2a.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg2a.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg2b.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg2b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg3.cpp minor print fix 2014-04-26 13:41:08 -07:00
Msg3.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg3a.cpp Merge branch 'testing' into diffbot-matt 2014-07-08 09:58:54 -07:00
Msg3a.h shard gbfacetstr:gbxpathsitehash123456 terms by termid for speed. 2014-07-07 12:32:27 -07:00
Msg3e.cpp fix infinite loop from json parsing and 2013-09-27 17:52:36 -06:00
Msg3e.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg4.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
Msg4.h code checkpoint 2014-02-09 12:38:40 -07:00
Msg5.cpp fix statsdb/graph page 2014-07-04 09:53:42 -07:00
Msg5.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg6b.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg6b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg8b.cpp fix errors from restarting collection and 2014-04-10 23:59:30 -07:00
Msg8b.h Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg9b.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg9b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg13.cpp more vote infusion and squid proxy fixes. 2014-07-09 14:57:58 -07:00
Msg13.h more vote infusion and squid proxy fixes. 2014-07-09 14:57:58 -07:00
Msg17.cpp first compiled stab at multi collection searching. 2014-03-06 10:45:13 -08:00
Msg17.h first compiled stab at multi collection searching. 2014-03-06 10:45:13 -08:00
Msg20.cpp inject docs that come through our squid proxy 2014-07-09 12:25:23 -07:00
Msg20.h send back facet field/value pairs in msg20reply 2014-07-08 14:22:55 -07:00
Msg22.cpp fixed nasty bug of resetting RdbBases for 2014-06-09 10:16:29 -07:00
Msg22.h try to fix msg22 core some more 2014-05-14 08:16:47 -07:00
Msg24.cpp new Make.depend. 2013-08-09 17:13:45 -06:00
Msg28.cpp fix core from (broad)casting valueless cgi field. 2013-10-03 14:51:59 -07:00
Msg28.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg30.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg30.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Msg35.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg35.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg36.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg36.h retry if too man docids deduped when &stream=1 2014-05-01 17:07:31 -07:00
Msg37.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg37.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg39.cpp section voting markup updates 2014-07-08 11:14:45 -07:00
Msg39.h Merge branch 'testing' into diffbot-matt 2014-07-08 09:58:54 -07:00
Msg40.cpp added a few new search parms that can be used 2014-07-08 07:01:51 -07:00
Msg40.h fix debug print statements 2014-07-01 11:46:01 -07:00
Msg40Cache.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg40Cache.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg42.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg42.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg51.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg51.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msgaa.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msgaa.h Initial file population. 2013-08-02 13:12:24 -07:00
MsgC.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
MsgC.h Initial file population. 2013-08-02 13:12:24 -07:00
Msge0.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msge0.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msge1.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
Msge1.h Initial file population. 2013-08-02 13:12:24 -07:00
Multicast.cpp a lot of times rdb tree has invalid collection 2014-01-21 19:01:44 -08:00
Multicast.h shard gbfacetstr:gbxpathsitehash123456 terms by termid for speed. 2014-07-07 12:32:27 -07:00
mysynonyms.txt Initial file population. 2013-08-02 13:12:24 -07:00
numwords.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageAddColl.cpp added a clone coll page. for after creation cloning. 2014-07-09 07:54:29 -07:00
PageAddUrl.cpp more api work 2014-07-10 07:10:49 -07:00
PageBasic.cpp Merge branch 'testing' into diffbot-matt 2014-07-10 10:06:55 -07:00
PageCatdb.cpp fix a few minor bugs. 2014-03-16 10:34:58 -07:00
PageCrawlBot.cpp show actual diffbot error in urls.csv. 2014-07-02 11:53:24 -07:00
PageCrawlBot.h printCrawlDetailsInJson signature without version 2014-05-28 10:41:32 -07:00
PageDirectory.cpp fix dmoz building. 2014-07-05 22:20:15 -07:00
PageEvents.cpp fixes for page inject 2014-06-15 08:26:27 -07:00
PageGet.cpp sectioning stuff working halfway decent. 2014-07-07 16:46:38 -07:00
PageHosts.cpp host table cleanups 2014-03-16 17:14:47 -07:00
PageIndexdb.cpp fix a few minor bugs. 2014-03-16 10:34:58 -07:00
PageInject.cpp inject docs that come through our squid proxy 2014-07-09 12:25:23 -07:00
PageInject.h inject docs that come through our squid proxy 2014-07-09 12:25:23 -07:00
PageLogView.cpp new printadmintop functionality. 2014-02-07 23:08:04 -07:00
PageNetTest.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
PageNetTest.h Initial file population. 2013-08-02 13:12:24 -07:00
PageOverview.cpp fix a few minor bugs. 2014-03-16 10:34:58 -07:00
PageParser.cpp fix for searching for query pipe operator in quotes. 2014-06-03 13:08:35 -07:00
PageParser.h more fixes for new boolean logic. 2014-03-13 13:09:33 -07:00
PagePerf.cpp took out pagecount table. just hafta scan 2014-01-19 20:34:38 -08:00
PageReindex.cpp fix page reindex bugs. 2014-07-02 17:13:37 -07:00
PageReindex.h fix query reindex some more 2014-03-11 14:46:49 -07:00
PageResults.cpp Merge branch 'testing' into diffbot-matt 2014-07-09 10:23:27 -07:00
PageResults.h get html head and tail working again now. 2014-06-21 21:07:38 -07:00
PageRoot.cpp fix dmoz building. 2014-07-05 22:20:15 -07:00
Pages.cpp Merge branch 'testing' into diffbot-matt 2014-07-10 10:06:55 -07:00
Pages.h Merge branch 'testing' into diffbot-matt 2014-07-10 10:06:55 -07:00
PageSockets.cpp fix a few minor bugs. 2014-03-16 10:34:58 -07:00
PageSpam.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageStats.cpp fix a few minor bugs. 2014-03-16 10:34:58 -07:00
PageStatsdb.cpp more api updates 2014-07-05 10:16:21 -07:00
PageSubmit.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageThesaurus.cpp fix a few minor bugs. 2014-03-16 10:34:58 -07:00
PageThreads.cpp formatting fixes 2014-01-19 00:57:20 -08:00
PageTitledb.cpp try to start indexing spider replies 2014-05-09 11:18:24 -07:00
PageTurk.cpp fixes for page inject 2014-06-15 08:26:27 -07:00
PageTurk.h Initial file population. 2013-08-02 13:12:24 -07:00
Parms.cpp Merge branch 'testing' into diffbot-matt 2014-07-10 10:06:55 -07:00
Parms.h Merge branch 'testing' into diffbot-matt 2014-07-10 10:06:55 -07:00
parse_iana_charsets.pl move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
pdftohtml use statically compiled pdftohtml 2014-04-24 08:31:52 -07:00
Phrases.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Phrases.h Initial file population. 2013-08-02 13:12:24 -07:00
PingServer.cpp make code compile cleaner. 2014-06-07 14:11:12 -07:00
PingServer.h added emergency msg box on all admin pages 2014-01-11 20:14:44 -08:00
Placedb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Placedb.h Initial file population. 2013-08-02 13:12:24 -07:00
pngtopnm Initial file population. 2013-08-02 13:12:24 -07:00
pnmscale Initial file population. 2013-08-02 13:12:24 -07:00
Pops.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pops.h Initial file population. 2013-08-02 13:12:24 -07:00
porter.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pos.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pos.h Initial file population. 2013-08-02 13:12:24 -07:00
Posdb.cpp added gbequalint: query operator for showing 2014-07-07 17:40:49 -07:00
Posdb.h make sectiondb stats just a special case of facets 2014-06-17 16:39:02 -06:00
postalCodes.txt Initial file population. 2013-08-02 13:12:24 -07:00
PostQueryRerank.cpp beginning of total parm overhaul. 2014-06-12 21:27:06 -07:00
PostQueryRerank.h Initial file population. 2013-08-02 13:12:24 -07:00
ppmtojpeg Initial file population. 2013-08-02 13:12:24 -07:00
Process.cpp Merge branch 'diffbot-testing' into diffbot-matt 2014-06-27 17:23:03 -07:00
Process.h work on make install. 2014-05-11 12:48:56 -07:00
Profiler.cpp cygwin cleanups 2014-06-07 15:59:32 -07:00
Profiler.h Initial file population. 2013-08-02 13:12:24 -07:00
Proxy.cpp spider proxy updates 2014-06-02 13:18:18 -07:00
Proxy.h Initial file population. 2013-08-02 13:12:24 -07:00
pstotext Initial file population. 2013-08-02 13:12:24 -07:00
qa.cpp Merge branch 'testing' into diffbot-matt 2014-07-09 10:23:27 -07:00
QAClient.cpp Initial file population. 2013-08-02 13:12:24 -07:00
QAClient.h Initial file population. 2013-08-02 13:12:24 -07:00
quarantine.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Query.cpp added gbequalint: query operator for showing 2014-07-07 17:40:49 -07:00
Query.h added gbequalint: query operator for showing 2014-07-07 17:40:49 -07:00
Rdb.cpp fix merging getting clogged by so many 2014-06-05 21:27:33 -07:00
Rdb.h make trash dir for image thumbs automatically 2014-04-29 17:01:48 -06:00
RdbBase.cpp fixed nasty bug of resetting RdbBases for 2014-06-09 10:16:29 -07:00
RdbBase.h fixed nasty bug of resetting RdbBases for 2014-06-09 10:16:29 -07:00
RdbBuckets.cpp log msg cleanups 2014-05-11 21:55:44 -07:00
RdbBuckets.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbCache.cpp fix some cores. use olddoc contenthash 2014-02-07 18:28:09 -08:00
RdbCache.h removed MAX_COLL_RECS so we can have unlimited 2013-08-30 16:20:38 -07:00
RdbDump.cpp fix dmoz building. 2014-07-05 22:20:15 -07:00
RdbDump.h fix dmoz building. 2014-07-05 22:20:15 -07:00
RdbList.cpp fix data corruption detection and repair bug. 2014-05-01 10:38:00 -07:00
RdbList.h checkpoint for faster spider code. 2014-02-04 16:15:27 -08:00
RdbMap.cpp tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
RdbMap.h tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
RdbMem.cpp track down some nasty cores. fix 2013-10-29 16:37:14 -07:00
RdbMem.h now we can reset collection mid stream 2013-10-18 17:49:36 -07:00
RdbMerge.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
RdbMerge.h if coll is deleted or reset in a middle of a dump 2013-12-25 17:12:09 -08:00
RdbScan.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbScan.h Initial file population. 2013-08-02 13:12:24 -07:00
rdbtest2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
rdbtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbTree.cpp fix dmoz building. 2014-07-05 22:20:15 -07:00
RdbTree.h fix annoying rdbtree pos/neg key counting issue 2014-01-11 18:04:28 -08:00
README.md Update README.md 2013-11-16 20:14:06 -08:00
readRec.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Rebalance.cpp tuning the rebalance loop 2014-03-15 14:56:11 -07:00
Rebalance.h tight merge during rebalance to save 2014-03-14 23:37:30 -07:00
reindex2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Repair.cpp try to start indexing spider replies 2014-05-09 11:18:24 -07:00
Repair.h Initial file population. 2013-08-02 13:12:24 -07:00
RequestTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RequestTable.h Initial file population. 2013-08-02 13:12:24 -07:00
rescue.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Revdb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Revdb.h Initial file population. 2013-08-02 13:12:24 -07:00
rmbots.cpp Initial file population. 2013-08-02 13:12:24 -07:00
S99gb added S99gb for loading at boot. 2014-06-23 07:32:38 -06:00
SafeBuf.cpp print facets for each search result 2014-07-08 19:38:54 -07:00
SafeBuf.h print facets for each search result 2014-07-08 19:38:54 -07:00
SafeList.h Initial file population. 2013-08-02 13:12:24 -07:00
Sanity.h Initial file population. 2013-08-02 13:12:24 -07:00
Scores.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Scores.h Initial file population. 2013-08-02 13:12:24 -07:00
Scraper.cpp take out datedb. no longer used. we store 2014-01-09 13:39:28 -08:00
Scraper.h Initial file population. 2013-08-02 13:12:24 -07:00
SearchInput.cpp added a few new search parms that can be used 2014-07-08 07:01:51 -07:00
SearchInput.h added a few new search parms that can be used 2014-07-08 07:01:51 -07:00
Sections.cpp do not hash redundant xpaths that have the same inner sentence/alnum 2014-07-09 17:16:01 -07:00
Sections.h do not hash redundant xpaths that have the same inner sentence/alnum 2014-07-09 17:16:01 -07:00
seektest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
seo.h Initial file population. 2013-08-02 13:12:24 -07:00
SiteGetter.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
SiteGetter.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
sleepandlog.cpp Initial file population. 2013-08-02 13:12:24 -07:00
sort.cpp Initial file population. 2013-08-02 13:12:24 -07:00
sort.h Initial file population. 2013-08-02 13:12:24 -07:00
Speller.cpp work on make install. 2014-05-11 12:48:56 -07:00
Speller.h Initial file population. 2013-08-02 13:12:24 -07:00
Spider.cpp Merge branch 'testing' into diffbot-matt 2014-07-07 09:49:59 -07:00
Spider.h Merge branch 'diffbot-testing' into diffbot-matt 2014-06-27 17:23:03 -07:00
SpiderProxy.cpp now floaters are working pretty well 2014-06-30 16:26:10 -06:00
SpiderProxy.h got new floater/proxy logic compiling. 2014-06-06 15:11:51 -07:00
Stats.cpp formatting 2014-01-19 12:37:37 -08:00
Stats.h more formatting 2014-01-19 01:09:38 -08:00
Statsdb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Statsdb.h fix potential problem of tons of points in 2013-10-14 22:52:29 -07:00
StopWords.cpp update common word list 2013-12-01 15:19:33 -07:00
StopWords.h Initial file population. 2013-08-02 13:12:24 -07:00
streambuf.h Initial file population. 2013-08-02 13:12:24 -07:00
Strings.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Strings.h Initial file population. 2013-08-02 13:12:24 -07:00
Summary.cpp fix default summary m_displayLen bug 2014-07-04 10:55:46 -07:00
Summary.h get summary "ns" parm and collectionrec 2014-07-03 07:29:44 -07:00
superMergeTest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
supported_charsets.cpp Initial file population. 2013-08-02 13:12:24 -07:00
supported_charsets.txt Initial file population. 2013-08-02 13:12:24 -07:00
Syncdb.cpp Merge branch 'diffbot-testing' into diffbot-matt 2014-06-09 12:42:54 -07:00
Syncdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Synonyms.cpp fix stack smash core. 2014-06-01 10:42:49 -07:00
Synonyms.h fix stack smash core. 2014-06-01 10:42:49 -07:00
Tagdb.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
Tagdb.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
TcpServer.cpp Merge branch 'testing' into diffbot-matt 2014-07-07 09:49:59 -07:00
TcpServer.h finally got http tunnel logic working. 2014-07-01 16:28:15 -06:00
TcpSocket.h add support for tunnelling https fetch 2014-07-01 10:43:52 -06:00
test2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_convert.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_hash.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_norm.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_parser2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_parser.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_unicode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Test.cpp fix annoying bug when adding new parms. 2014-06-10 12:29:50 -07:00
Test.h Initial file population. 2013-08-02 13:12:24 -07:00
testfloats.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Tfndb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Tfndb.h code cleanups. 2014-01-18 21:19:26 -08:00
Thesaurus.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Thesaurus.h Initial file population. 2013-08-02 13:12:24 -07:00
Threads.cpp make code compile cleaner. 2014-06-07 14:11:12 -07:00
Threads.h logging fixes 2014-04-10 23:04:00 -07:00
threadtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
thunder.cpp Initial file population. 2013-08-02 13:12:24 -07:00
tifftopnm Initial file population. 2013-08-02 13:12:24 -07:00
Timedb.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Timedb.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Timer.h Initial file population. 2013-08-02 13:12:24 -07:00
Title.cpp title max len fixes. 2014-07-02 08:03:33 -07:00
Title.h fix core from getting title of json object 2014-02-28 08:18:09 -08:00
Titledb.cpp thread fixes. if pthread_create fails then 2014-03-15 20:07:02 -07:00
Titledb.h code cleanups. 2014-01-18 21:19:26 -08:00
TopTree.cpp index numbers as integers too, not just floats 2014-02-06 20:57:54 -08:00
TopTree.h more fixes for new boolean logic. 2014-03-13 13:09:33 -07:00
treetest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TuringTest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TuringTest.h Initial file population. 2013-08-02 13:12:24 -07:00
Turkdb.cpp Initial file population. 2013-08-02 13:12:24 -07:00
types.h updates to compile cleanly on redhat. 2014-05-24 23:58:12 -04:00
UCNormalizer.cpp code cleanups. 2014-01-18 21:19:26 -08:00
UCNormalizer.h Initial file population. 2013-08-02 13:12:24 -07:00
UCPropTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UCPropTable.h Initial file population. 2013-08-02 13:12:24 -07:00
UCWordIterator.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UCWordIterator.h Initial file population. 2013-08-02 13:12:24 -07:00
UdpProtocol.h Initial file population. 2013-08-02 13:12:24 -07:00
UdpServer.cpp make code compile cleaner. 2014-06-07 14:11:12 -07:00
UdpServer.h rebalancer working pretty well now 2014-01-15 19:08:47 -08:00
UdpSlot.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UdpSlot.h Initial file population. 2013-08-02 13:12:24 -07:00
udptest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Unicode.cpp when user searches for a word without the 2014-06-01 09:37:00 -07:00
Unicode.h add support for stripping accent marks from greek letters. 2014-05-30 20:09:37 -07:00
UnicodeProperties.cpp code cleanups. 2014-01-18 21:19:26 -08:00
UnicodeProperties.h Initial file population. 2013-08-02 13:12:24 -07:00
unifiedDict.txt Initial file population. 2013-08-02 13:12:24 -07:00
uniq2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Url.cpp fix relative url bug when relative url starts with ? 2014-06-03 10:54:50 -07:00
Url.h Initial file population. 2013-08-02 13:12:24 -07:00
urlinfo.cpp fixed data corruption bug. m_finalCrawlDelay 2013-11-27 14:18:15 -08:00
Users.cpp redhat build updates on fedora 2014-05-25 09:58:07 -04:00
Users.h Initial file population. 2013-08-02 13:12:24 -07:00
ValidPointer.cpp Initial file population. 2013-08-02 13:12:24 -07:00
ValidPointer.h Initial file population. 2013-08-02 13:12:24 -07:00
Vector.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Vector.h Initial file population. 2013-08-02 13:12:24 -07:00
Weights.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Weights.h Initial file population. 2013-08-02 13:12:24 -07:00
Wiki.cpp log msg cleanups 2014-05-11 21:55:44 -07:00
Wiki.h Initial file population. 2013-08-02 13:12:24 -07:00
wikititles.txt.part1 Initial file population. 2013-08-02 13:12:24 -07:00
wikititles.txt.part2 Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-buf.txt when user searches for a word without the 2014-06-01 09:37:00 -07:00
wiktionary-lang.txt when user searches for a word without the 2014-06-01 09:37:00 -07:00
wiktionary-syns.dat when user searches for a word without the 2014-06-01 09:37:00 -07:00
Wiktionary.cpp fix compiler bug 2014-06-16 11:10:38 -07:00
Wiktionary.h Initial file population. 2013-08-02 13:12:24 -07:00
Words.cpp fixed bugs in sort by prices, etc. 2013-11-11 18:58:45 -08:00
Words.h Initial file population. 2013-08-02 13:12:24 -07:00
Xml.cpp for json docs only give them a single 2014-01-25 08:17:38 -08:00
Xml.h for json docs only give them a single 2014-01-25 08:17:38 -08:00
XmlDoc.cpp fix core when getting facet values in xmldoc.cpp 2014-07-10 10:02:53 -07:00
XmlDoc.h print facets for each search result 2014-07-08 19:38:54 -07:00
XmlNode.cpp fix geth1tag some more. 2014-07-07 08:20:21 -07:00
XmlNode.h Initial file population. 2013-08-02 13:12:24 -07:00
zconf.h Initial file population. 2013-08-02 13:12:24 -07:00
zlib.h Initial file population. 2013-08-02 13:12:24 -07:00

open-source-search-engine

An open source web and enterprise search engine. As can be seen on http://www.gigablast.com/ .

RUNNING GIGABLAST

See html/admin.html for all administrative documentation including the quick start instructions.

Alternatively, visit http://www.gigablast.com/admin.html

See html/compare.html for a comparison of Gigablast to SOLR. Although this is very sparse right now, it does include some useful commands.

CODE ARCHITECTURE

See html/developer.html for all code documentation.

Alternatively, visit http://www.gigablast.com/developer.html

CONTACT

Contact me for feature requests or help in general. I will work for free for good use cases. mattdwells@hotmail.com.