Nov 20 2017 -- A distributed open source search engine and spider/crawler written in C/C++ for Linux on Intel/AMD. From gigablast dot com, which has binaries for download. See the README.md file at the very bottom of this page for instructions.
Go to file
Matt Wells dc5b1408bc threads update for more warning msgs and
to save thread enabled status to gb.conf.
2014-10-02 16:11:51 -07:00
antiword-dir Initial file population. 2013-08-02 13:12:24 -07:00
diffbot-widget widget updates 2014-04-21 09:21:28 -07:00
html more updates to cloud code 2014-09-29 18:28:36 -07:00
openssl we already include our own 32-bit 2013-09-15 18:25:49 -06:00
ucdata Initial file population. 2013-08-02 13:12:24 -07:00
Abbreviations.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Abbreviations.h change a couple of possible reserved names in C++ 2013-08-28 22:59:01 -06:00
Accessdb.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Accessdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Address.cpp misc/various bug fixes. 2014-08-28 18:07:22 -07:00
Address.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
addtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Ads.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Ads.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
AdultBit.cpp Initial file population. 2013-08-02 13:12:24 -07:00
AdultBit.h Initial file population. 2013-08-02 13:12:24 -07:00
animate.cpp Initial file population. 2013-08-02 13:12:24 -07:00
antiword fix ulimit and antiword bugs 2014-06-18 04:06:20 -07:00
AutoBan.cpp fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
AutoBan.h Initial file population. 2013-08-02 13:12:24 -07:00
badcattable.dat Initial file population. 2013-08-02 13:12:24 -07:00
BigFile.cpp disable threads for disk and intersect/merge by default 2014-10-02 15:24:40 -07:00
BigFile.h make code compile cleaner. 2014-06-07 14:11:12 -07:00
Bits.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Bits.h Initial file population. 2013-08-02 13:12:24 -07:00
blaster2.cpp more core fixes. more stability. 2014-07-16 12:52:51 -07:00
Blaster.cpp for json docs only give them a single 2014-01-25 08:17:38 -08:00
Blaster.h use ./gb blaster -u <fileofurls> to just inject urls, 2013-08-19 16:33:27 -06:00
bmptopnm Initial file population. 2013-08-02 13:12:24 -07:00
Cachedb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Cachedb.h Initial file population. 2013-08-02 13:12:24 -07:00
camsort.cpp Initial file population. 2013-08-02 13:12:24 -07:00
catcountry.dat Initial file population. 2013-08-02 13:12:24 -07:00
Catdb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Catdb.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Categories.cpp misc/various bug fixes. 2014-08-28 18:07:22 -07:00
Categories.h documentation updates. fixed sd=0. 2013-10-13 14:24:41 -07:00
CatRec.cpp fix a couple catdb generation bugs. 2013-10-12 20:33:04 -07:00
CatRec.h Initial file population. 2013-08-02 13:12:24 -07:00
changelog version update 2014-09-26 08:52:09 -06:00
character-sets Initial file population. 2013-08-02 13:12:24 -07:00
check_unicode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Clusterdb.cpp thread fixes. if pthread_create fails then 2014-03-15 20:07:02 -07:00
Clusterdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Collectiondb.cpp Merge branch 'testing' into diffbot-testing 2014-09-30 16:02:07 -07:00
Collectiondb.h Merge branch 'diffbot-testing' into testing 2014-10-01 21:27:18 -07:00
Conf.cpp fix qa test so we can roll out proxy code. 2014-09-30 15:40:02 -07:00
Conf.h disable threads for disk and intersect/merge by default 2014-10-02 15:24:40 -07:00
control.deb package bldg updates 2014-06-16 21:50:32 -06:00
convert.cpp Initial file population. 2013-08-02 13:12:24 -07:00
copyright.head package bldg updates 2014-06-16 21:50:32 -06:00
copyright.tail package bldg updates 2014-06-16 21:50:32 -06:00
CountryCode.cpp misc/various bug fixes. 2014-08-28 18:07:22 -07:00
CountryCode.h fix pagecrawlbot.cpp to support &c=token-name. 2014-01-22 23:40:38 -08:00
create_ucd_tables.cpp Initial file population. 2013-08-02 13:12:24 -07:00
DailyMerge.cpp Fixed some bugs. 2013-08-09 08:52:15 -07:00
DailyMerge.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
DataFeed.cpp Initial file population. 2013-08-02 13:12:24 -07:00
DataFeed.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Datedb.cpp more core stability fixes. prevent core dumps 2014-07-16 12:07:39 -07:00
Datedb.h more core stability fixes. prevent core dumps 2014-07-16 12:07:39 -07:00
Dates.cpp misc/various bug fixes. 2014-08-28 18:07:22 -07:00
Dates.h fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
Diff.cpp Fixed some bugs. 2013-08-09 08:52:15 -07:00
Diff.h Initial file population. 2013-08-02 13:12:24 -07:00
Dir.cpp use gbsystem() not system() so it can turn off alarms 2014-09-11 05:01:55 -07:00
Dir.h fix file descriptor leak in Dir class. 2013-11-19 13:41:56 -08:00
DiskPageCache.cpp hacked up to debug why we're not getting 2014-08-27 10:37:03 -07:00
DiskPageCache.h tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
dlstubs.c Initial file population. 2013-08-02 13:12:24 -07:00
dmozparse.cpp fix dmoz building. 2014-07-05 22:20:15 -07:00
Dns.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Dns.h fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
DnsProtocol.h fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
dnstest.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
Domains.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Domains.h Initial file population. 2013-08-02 13:12:24 -07:00
dumpcore.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Entities.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Entities.h Initial file population. 2013-08-02 13:12:24 -07:00
Errno.cpp rename admin.html to faq.html etc. file juggling. 2014-08-31 09:51:21 -07:00
Errno.h Merge branch 'testing' into diffbot-matt 2014-07-10 10:06:55 -07:00
errnotest.cpp errno test update 2013-11-19 00:10:10 -07:00
Events.h Initial file population. 2013-08-02 13:12:24 -07:00
Facebook.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Facebook.h fixes for page inject 2014-06-15 08:26:27 -07:00
fastIndexTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
fctypes.cpp Merge branch 'testing' into diffbot-testing 2014-08-29 11:23:13 -07:00
fctypes.h fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
File.cpp cygwin fixes 2014-09-26 23:04:16 -07:00
File.h tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
filterquerylogs.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Flags.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Flags.h Initial file population. 2013-08-02 13:12:24 -07:00
gb-1.0.spec make it so we don't need --nodeps with 2014-05-25 22:08:46 -04:00
gb-include.h compiler cleanups for cygwin compile 2014-06-07 14:20:04 -07:00
gb.deb.rules if netpbm pkg already installed use it. 2014-07-06 09:54:28 -07:00
gb.pem so we have spider https sites add 2013-10-13 00:15:39 -07:00
gbfilter.cpp Initial file population. 2013-08-02 13:12:24 -07:00
gbtitletest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geneaology.cpp Initial file population. 2013-08-02 13:12:24 -07:00
generateSuperMergeCode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geo_ip_table.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geo_ip_table.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP_internal.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP.c Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIPCity.c Initial file population. 2013-08-02 13:12:24 -07:00
GeoIPCity.h Initial file population. 2013-08-02 13:12:24 -07:00
getsample.cpp Initial file population. 2013-08-02 13:12:24 -07:00
giftopnm Initial file population. 2013-08-02 13:12:24 -07:00
hash.cpp undo hashtab change. too much overhead. 2014-09-27 08:39:22 -07:00
hash.h get "&site=abc.com+xyz.com"... working to restrict 2013-09-15 20:16:48 -07:00
HashTable.cpp fix core from last push. 2013-12-09 14:21:46 -07:00
HashTable.h fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
HashTableT.cpp Initial file population. 2013-08-02 13:12:24 -07:00
HashTableT.h Initial file population. 2013-08-02 13:12:24 -07:00
HashTableX.cpp several fixes for floater proxy through squid proxy. 2014-10-02 02:08:38 -07:00
HashTableX.h fix more bugs in squid proxy implementation. 2014-10-02 11:54:50 -07:00
hashtest2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hashtest3.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hashtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Highlight.cpp Merge branch 'master' into diffbot 2013-12-07 11:34:26 -07:00
Highlight.h trying to fix json decoding bug. 2013-10-24 17:55:01 -07:00
Hostdb.cpp use gbsystem() not system() so it can turn off alarms 2014-09-11 05:01:55 -07:00
Hostdb.h fix core 2014-09-19 14:00:57 -06:00
hosts.cpp Initial file population. 2013-08-02 13:12:24 -07:00
HttpMime.cpp fix time/date core 2014-09-22 07:00:10 -07:00
HttpMime.h Merge branch 'testing' into diffbot-matt 2014-06-13 11:00:09 -07:00
HttpRequest.cpp support the CONNECT for gb squid proxy 2014-10-02 12:36:43 -07:00
HttpRequest.h fixes for cloud support. 2014-08-31 16:23:11 -07:00
HttpServer.cpp support the CONNECT for gb squid proxy 2014-10-02 12:36:43 -07:00
HttpServer.h added http server compression (gzip) stats. 2014-09-26 11:06:38 -07:00
iana_charset.cpp merge diffbot-testing 2014-04-09 20:10:30 -07:00
iana_charset.h merge diffbot-testing 2014-04-09 20:10:30 -07:00
iconv.h Initial file population. 2013-08-02 13:12:24 -07:00
Images.cpp multiple core fixes 2014-09-22 07:07:40 -07:00
Images.h support og:image images. allow user to 2014-07-04 15:33:27 -07:00
Indexdb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Indexdb.h retry if too man docids deduped when &stream=1 2014-05-01 17:07:31 -07:00
IndexList.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexList.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexReadInfo.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexReadInfo.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable2.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
IndexTable2.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable.h Initial file population. 2013-08-02 13:12:24 -07:00
init.gb.conf minor make install changes 2014-05-22 18:46:38 -07:00
injectme3 added injectme3 file and documentation into compare.html 2013-08-17 11:02:26 -06:00
injector.cpp Initial file population. 2013-08-02 13:12:24 -07:00
iostream.h Initial file population. 2013-08-02 13:12:24 -07:00
ip.cpp misc/various bug fixes. 2014-08-28 18:07:22 -07:00
ip.h Initial file population. 2013-08-02 13:12:24 -07:00
ipconfig.cpp fixed some cores. brought in fixes from 2013-09-08 16:16:13 -06:00
Iso8859.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Iso8859.h Initial file population. 2013-08-02 13:12:24 -07:00
jointest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
jpegtopnm Initial file population. 2013-08-02 13:12:24 -07:00
Json.cpp multiple core fixes 2014-09-22 07:07:40 -07:00
Json.h v3 support for tokenized diffbot replies 2014-05-12 16:13:24 -07:00
keepalive.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Lang.cpp fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
Lang.h when user searches for a word without the 2014-06-01 09:37:00 -07:00
LangList.cpp misc/various bug fixes. 2014-08-28 18:07:22 -07:00
LangList.h Initial file population. 2013-08-02 13:12:24 -07:00
Language.cpp use gbsystem() not system() so it can turn off alarms 2014-09-11 05:01:55 -07:00
Language.h Initial file population. 2013-08-02 13:12:24 -07:00
LanguageIdentifier.cpp more minor bug fixes. 2014-08-28 18:11:07 -07:00
LanguageIdentifier.h Initial file population. 2013-08-02 13:12:24 -07:00
LanguagePages.cpp misc/various bug fixes. 2014-08-28 18:07:22 -07:00
LanguagePages.h Initial file population. 2013-08-02 13:12:24 -07:00
libc.a Initial file population. 2013-08-02 13:12:24 -07:00
libcrypto.a turn off hearbeats when compiling openssl libs 2014-04-22 16:39:40 -07:00
libgcc.a Initial file population. 2013-08-02 13:12:24 -07:00
libiconv.a Initial file population. 2013-08-02 13:12:24 -07:00
libiconv.la Initial file population. 2013-08-02 13:12:24 -07:00
libjpeg.so.62 thumbnail generation support back in. 2014-04-24 10:13:45 -07:00
libm.a Initial file population. 2013-08-02 13:12:24 -07:00
libnetpbm.so.10 thumbnail generation support back in. 2014-04-24 10:13:45 -07:00
libpng12.so.0 thumbnail generation support back in. 2014-04-24 10:13:45 -07:00
libpthread.a Initial file population. 2013-08-02 13:12:24 -07:00
libssl.a turn off hearbeats when compiling openssl libs 2014-04-22 16:39:40 -07:00
libstdc++.a Initial file population. 2013-08-02 13:12:24 -07:00
libtiff.so.4 thumbnail generation support back in. 2014-04-24 10:13:45 -07:00
libz.a Initial file population. 2013-08-02 13:12:24 -07:00
libz.so.1 thumbnail generation support back in. 2014-04-24 10:13:45 -07:00
LICENSE license fix 2014-06-16 13:52:51 -07:00
Linkdb.cpp Merge branch 'master' into testing 2014-09-20 08:26:38 -06:00
Linkdb.h fixes for new link info code so it doesn't 2014-02-25 10:55:05 -08:00
LinkedList.h Initial file population. 2013-08-02 13:12:24 -07:00
linkspam.cpp fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
linkspam.h Initial file population. 2013-08-02 13:12:24 -07:00
Log.cpp try to fix msg22 based cores 2014-05-14 07:46:32 -07:00
Log.h keep thumbnail gen msgs in the log file 2014-07-04 08:34:42 -07:00
Loop.cpp fix collection swap logic a bunch. seems to work now. 2014-09-29 13:05:20 -07:00
Loop.h fix collection swap logic a bunch. seems to work now. 2014-09-29 13:05:20 -07:00
looptest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
main.cpp various bug fixes. more qa tests. 2014-09-24 20:03:16 -07:00
Make.depend force gb to recompile version every time 2014-09-19 12:23:40 -07:00
Makefile cygwin fixes 2014-09-26 23:04:16 -07:00
malloc.c Initial file population. 2013-08-02 13:12:24 -07:00
matches2.cpp dirty word detector revisions. we need 2013-10-16 20:19:49 -07:00
matches2.h renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
Matches.cpp qa test fixes 2014-07-15 10:06:33 -07:00
Matches.h Initial file population. 2013-08-02 13:12:24 -07:00
Mem.cpp fix printf compiler warnings 2014-08-28 13:23:46 -07:00
Mem.h fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
membustest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPool.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPool.h Initial file population. 2013-08-02 13:12:24 -07:00
MemPoolTree.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPoolTree.h Initial file population. 2013-08-02 13:12:24 -07:00
memtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
mergetest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MetaContainer.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MetaContainer.h Initial file population. 2013-08-02 13:12:24 -07:00
Mime.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Mime.h Initial file population. 2013-08-02 13:12:24 -07:00
mixfile.cpp Initial file population. 2013-08-02 13:12:24 -07:00
mmseg.h Initial file population. 2013-08-02 13:12:24 -07:00
monitor.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
Monitordb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Monitordb.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg0.cpp retry if too man docids deduped when &stream=1 2014-05-01 17:07:31 -07:00
Msg0.h retry if too man docids deduped when &stream=1 2014-05-01 17:07:31 -07:00
Msg1.cpp ignore ENOCOLLREC msgs in handleRequest1() in Msg1.cpp. 2014-07-14 12:21:32 -07:00
Msg1.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg1f.cpp if logging to stderr then return err when trying to 2014-07-05 14:16:33 -07:00
Msg1f.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg2.cpp fix section stats display bugs 2014-07-10 15:55:18 -07:00
Msg2.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg2a.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg2a.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg2b.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg2b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg3.cpp minor print fix 2014-04-26 13:41:08 -07:00
Msg3.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg3a.cpp facet text lookup fixes. 2014-07-29 19:32:27 -07:00
Msg3a.h shard gbfacetstr:gbxpathsitehash123456 terms by termid for speed. 2014-07-07 12:32:27 -07:00
Msg3e.cpp fix infinite loop from json parsing and 2013-09-27 17:52:36 -06:00
Msg3e.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg4.cpp get qa tests working again. 2014-09-23 17:48:40 -07:00
Msg4.h upped MAX_SPIDERS from 100 to 300. 2014-09-03 07:25:40 -07:00
Msg5.cpp disable threads for disk and intersect/merge by default 2014-10-02 15:24:40 -07:00
Msg5.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg6b.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg6b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg8b.cpp fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
Msg8b.h Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg9b.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg9b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg13.cpp support the CONNECT for gb squid proxy 2014-10-02 12:36:43 -07:00
Msg13.h fix floater bug from reading hashtable off disk. 2014-09-26 15:30:42 -07:00
Msg17.cpp first compiled stab at multi collection searching. 2014-03-06 10:45:13 -08:00
Msg17.h first compiled stab at multi collection searching. 2014-03-06 10:45:13 -08:00
Msg20.cpp inject docs that come through our squid proxy 2014-07-09 12:25:23 -07:00
Msg20.h fixes for cloud support. 2014-08-31 16:23:11 -07:00
Msg22.cpp fix 2014-09-20 07:59:41 -06:00
Msg22.h try to fix msg22 core some more 2014-05-14 08:16:47 -07:00
Msg24.cpp new Make.depend. 2013-08-09 17:13:45 -06:00
Msg28.cpp fix core from (broad)casting valueless cgi field. 2013-10-03 14:51:59 -07:00
Msg28.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg30.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg30.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Msg35.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg35.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg36.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg36.h retry if too man docids deduped when &stream=1 2014-05-01 17:07:31 -07:00
Msg37.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg37.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg39.cpp facet text lookup fixes. 2014-07-29 19:32:27 -07:00
Msg39.h added langw and langwieght to control weight 2014-09-21 18:47:30 -07:00
Msg40.cpp minor fixes 2014-09-27 17:01:16 -07:00
Msg40.h add <omitCount> stuff. fix getDocIds() recalls 2014-09-27 09:56:23 -07:00
Msg40Cache.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg40Cache.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg42.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg42.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg51.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msg51.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msgaa.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msgaa.h Initial file population. 2013-08-02 13:12:24 -07:00
MsgC.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
MsgC.h Initial file population. 2013-08-02 13:12:24 -07:00
Msge0.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msge0.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Msge1.cpp fixed qa tests when doing it over multi-host cluster 2014-09-16 10:25:45 -06:00
Msge1.h Initial file population. 2013-08-02 13:12:24 -07:00
Multicast.cpp fix not shutting down bug 2014-07-16 13:00:16 -07:00
Multicast.h fix data import function some more. added qa test. 2014-09-24 12:40:39 -07:00
mysynonyms.txt Initial file population. 2013-08-02 13:12:24 -07:00
numwords.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageAddColl.cpp fix qa test so we can roll out proxy code. 2014-09-30 15:40:02 -07:00
PageAddUrl.cpp fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
PageBasic.cpp print chrome on other pages 2014-09-23 20:59:48 -07:00
PageCatdb.cpp fix a few minor bugs. 2014-03-16 10:34:58 -07:00
PageCrawlBot.cpp Merge branch 'diffbot-testing' into testing 2014-07-28 14:37:44 -07:00
PageCrawlBot.h more api updates 2014-07-13 09:35:44 -07:00
PageDirectory.cpp fix dmoz building. 2014-07-05 22:20:15 -07:00
PageEvents.cpp fixes for cloud support. 2014-08-31 16:23:11 -07:00
PageGet.cpp fix pageget.cpp 2014-10-02 12:45:58 -07:00
PageHosts.cpp update proxy algo so not all proxies get cutoff 2014-09-30 13:08:35 -07:00
PageIndexdb.cpp fixes for cloud support. 2014-08-31 16:23:11 -07:00
PageInject.cpp import fixes 2014-09-25 20:48:34 -07:00
PageInject.h fix data import function some more. added qa test. 2014-09-24 12:40:39 -07:00
PageLogView.cpp new printadmintop functionality. 2014-02-07 23:08:04 -07:00
PageNetTest.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
PageNetTest.h Initial file population. 2013-08-02 13:12:24 -07:00
PageOverview.cpp fix a few minor bugs. 2014-03-16 10:34:58 -07:00
PageParser.cpp fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
PageParser.h more fixes for new boolean logic. 2014-03-13 13:09:33 -07:00
PagePerf.cpp took out pagecount table. just hafta scan 2014-01-19 20:34:38 -08:00
PageReindex.cpp added the query reindex smoke test. 2014-09-25 17:44:35 -07:00
PageReindex.h fix query reindex some more 2014-03-11 14:46:49 -07:00
PageResults.cpp added sorting by site # inlinks/pop to menu for testing 2014-10-01 12:09:43 -07:00
PageResults.h get html head and tail working again now. 2014-06-21 21:07:38 -07:00
PageRoot.cpp remove graphix 2014-10-01 20:00:35 -07:00
Pages.cpp threads update for more warning msgs and 2014-10-02 16:11:51 -07:00
Pages.h fixes before smoke testing 2014-09-30 16:22:18 -07:00
PageSockets.cpp put dropped requests in bold red 2014-09-04 11:01:49 -07:00
PageSpam.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageStats.cpp Merge branch 'testing' into diffbot-testing 2014-09-30 16:02:07 -07:00
PageStatsdb.cpp fix core 2014-09-17 16:58:24 -06:00
PageSubmit.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageThesaurus.cpp fix a few minor bugs. 2014-03-16 10:34:58 -07:00
PageThreads.cpp formatting fixes 2014-01-19 00:57:20 -08:00
PageTitledb.cpp fixes for cloud support. 2014-08-31 16:23:11 -07:00
PageTurk.cpp fixes for cloud support. 2014-08-31 16:23:11 -07:00
PageTurk.h Initial file population. 2013-08-02 13:12:24 -07:00
Parms.cpp threads update for more warning msgs and 2014-10-02 16:11:51 -07:00
Parms.h Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing 2014-10-01 19:55:45 -07:00
parse_iana_charsets.pl move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
pdftohtml try new pdftohtml binary 2014-09-26 08:02:17 -07:00
Phrases.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Phrases.h Initial file population. 2013-08-02 13:12:24 -07:00
PingServer.cpp force gb to recompile version every time 2014-09-19 12:23:40 -07:00
PingServer.h added emergency msg box on all admin pages 2014-01-11 20:14:44 -08:00
Placedb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Placedb.h Initial file population. 2013-08-02 13:12:24 -07:00
pngtopnm Initial file population. 2013-08-02 13:12:24 -07:00
pnmscale Initial file population. 2013-08-02 13:12:24 -07:00
Pops.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pops.h Initial file population. 2013-08-02 13:12:24 -07:00
porter.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pos.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pos.h Initial file population. 2013-08-02 13:12:24 -07:00
Posdb.cpp added the query reindex smoke test. 2014-09-25 17:44:35 -07:00
Posdb.h various bug fixes. more qa tests. 2014-09-24 20:03:16 -07:00
postalCodes.txt Initial file population. 2013-08-02 13:12:24 -07:00
PostQueryRerank.cpp beginning of total parm overhaul. 2014-06-12 21:27:06 -07:00
PostQueryRerank.h Initial file population. 2013-08-02 13:12:24 -07:00
ppmtojpeg Initial file population. 2013-08-02 13:12:24 -07:00
Process.cpp get qa tests working again. 2014-09-23 17:48:40 -07:00
Process.h stage 1 import tool 2014-09-20 16:58:12 -07:00
Profiler.cpp fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
Profiler.h Initial file population. 2013-08-02 13:12:24 -07:00
Proxy.cpp fixes for cloud support. 2014-08-31 16:23:11 -07:00
Proxy.h Initial file population. 2013-08-02 13:12:24 -07:00
pstotext Initial file population. 2013-08-02 13:12:24 -07:00
qa.cpp fix qa test so we can roll out proxy code. 2014-09-30 15:40:02 -07:00
QAClient.cpp Initial file population. 2013-08-02 13:12:24 -07:00
QAClient.h Initial file population. 2013-08-02 13:12:24 -07:00
quarantine.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Query.cpp add example for gbsortby:sitenuminlinks into syntax page 2014-10-01 12:07:01 -07:00
Query.h added the query reindex smoke test. 2014-09-25 17:44:35 -07:00
Rdb.cpp more collection swapping fixes 2014-09-29 21:52:58 -07:00
Rdb.h fix # docs and recs bug. 2014-08-28 07:45:43 -07:00
RdbBase.cpp get collection/root login system working 2014-09-29 19:56:31 -07:00
RdbBase.h fixed nasty bug of resetting RdbBases for 2014-06-09 10:16:29 -07:00
RdbBuckets.cpp log msg cleanups 2014-05-11 21:55:44 -07:00
RdbBuckets.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbCache.cpp misc/various bug fixes. 2014-08-28 18:07:22 -07:00
RdbCache.h removed MAX_COLL_RECS so we can have unlimited 2013-08-30 16:20:38 -07:00
RdbDump.cpp fix dmoz building. 2014-07-05 22:20:15 -07:00
RdbDump.h fix dmoz building. 2014-07-05 22:20:15 -07:00
RdbList.cpp fix data corruption detection and repair bug. 2014-05-01 10:38:00 -07:00
RdbList.h checkpoint for faster spider code. 2014-02-04 16:15:27 -08:00
RdbMap.cpp fix core from keys out of order when dumping 2014-09-18 12:33:56 -06:00
RdbMap.h tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
RdbMem.cpp track down some nasty cores. fix 2013-10-29 16:37:14 -07:00
RdbMem.h now we can reset collection mid stream 2013-10-18 17:49:36 -07:00
RdbMerge.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
RdbMerge.h if coll is deleted or reset in a middle of a dump 2013-12-25 17:12:09 -08:00
RdbScan.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbScan.h Initial file population. 2013-08-02 13:12:24 -07:00
rdbtest2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
rdbtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbTree.cpp Merge branch 'diffbot-testing' into testing 2014-07-22 10:47:33 -07:00
RdbTree.h fix annoying rdbtree pos/neg key counting issue 2014-01-11 18:04:28 -08:00
README.md documentation updates 2014-09-12 03:09:06 -07:00
readRec.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Rebalance.cpp tuning the rebalance loop 2014-03-15 14:56:11 -07:00
Rebalance.h tight merge during rebalance to save 2014-03-14 23:37:30 -07:00
reindex2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Repair.cpp try to start indexing spider replies 2014-05-09 11:18:24 -07:00
Repair.h Initial file population. 2013-08-02 13:12:24 -07:00
RequestTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RequestTable.h Initial file population. 2013-08-02 13:12:24 -07:00
rescue.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Revdb.cpp use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
Revdb.h Initial file population. 2013-08-02 13:12:24 -07:00
rmbots.cpp Initial file population. 2013-08-02 13:12:24 -07:00
S99gb added S99gb for loading at boot. 2014-06-23 07:32:38 -06:00
SafeBuf.cpp fixes for cloud support. 2014-08-31 16:23:11 -07:00
SafeBuf.h fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
SafeList.h Initial file population. 2013-08-02 13:12:24 -07:00
Sanity.h Initial file population. 2013-08-02 13:12:24 -07:00
Scores.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Scores.h Initial file population. 2013-08-02 13:12:24 -07:00
Scraper.cpp take out datedb. no longer used. we store 2014-01-09 13:39:28 -08:00
Scraper.h misc/various bug fixes. 2014-08-28 18:07:22 -07:00
SearchInput.cpp added sorting by site # inlinks/pop to menu for testing 2014-10-01 12:09:43 -07:00
SearchInput.h fix qa test so we can roll out proxy code. 2014-09-30 15:40:02 -07:00
Sections.cpp qa fixes 2014-08-02 09:07:33 -07:00
Sections.h do not hash redundant xpaths that have the same inner sentence/alnum 2014-07-09 17:16:01 -07:00
seektest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
seo.h Initial file population. 2013-08-02 13:12:24 -07:00
SiteGetter.cpp nomenclature changes to reduce collissions. 2014-03-31 15:02:17 -07:00
SiteGetter.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
sleepandlog.cpp Initial file population. 2013-08-02 13:12:24 -07:00
sort.cpp Initial file population. 2013-08-02 13:12:24 -07:00
sort.h Initial file population. 2013-08-02 13:12:24 -07:00
Speller.cpp work on make install. 2014-05-11 12:48:56 -07:00
Speller.h Initial file population. 2013-08-02 13:12:24 -07:00
Spider.cpp support the CONNECT for gb squid proxy 2014-10-02 12:36:43 -07:00
Spider.h raised MAX_SPIDERS from 100 to 300. watch out for oom though. 2014-09-03 07:26:17 -07:00
SpiderProxy.cpp fix more bugs in squid proxy implementation. 2014-10-02 11:54:50 -07:00
SpiderProxy.h added parm to reset proxy stats in table. erases 2014-09-30 17:38:59 -07:00
Stats.cpp comment out unused code. make thread cleanups 2014-09-03 09:48:43 -07:00
Stats.h comment out unused code. make thread cleanups 2014-09-03 09:48:43 -07:00
Statsdb.cpp fix graph window some 2014-09-17 06:23:49 -07:00
Statsdb.h fix potential problem of tons of points in 2013-10-14 22:52:29 -07:00
StopWords.cpp update common word list 2013-12-01 15:19:33 -07:00
StopWords.h Initial file population. 2013-08-02 13:12:24 -07:00
streambuf.h Initial file population. 2013-08-02 13:12:24 -07:00
Strings.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Strings.h Initial file population. 2013-08-02 13:12:24 -07:00
Summary.cpp fix empty summary related core. 2014-09-28 14:31:56 -07:00
Summary.h get summary "ns" parm and collectionrec 2014-07-03 07:29:44 -07:00
superMergeTest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
supported_charsets.cpp Initial file population. 2013-08-02 13:12:24 -07:00
supported_charsets.txt Initial file population. 2013-08-02 13:12:24 -07:00
Syncdb.cpp Merge branch 'diffbot-testing' into diffbot-matt 2014-06-09 12:42:54 -07:00
Syncdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Synonyms.cpp syn fix for 'sports' when lang is unknown. we default 2014-09-06 10:49:22 -07:00
Synonyms.h fix stack smash core. 2014-06-01 10:42:49 -07:00
Tagdb.cpp when docid is banned do not print json/xml 2014-09-19 07:27:33 -07:00
Tagdb.h use collnum instead of coll string. 2014-03-06 15:48:11 -08:00
TcpServer.cpp fixed ssl CONNECT reply from being truncated. 2014-10-02 14:47:55 -07:00
TcpServer.h finally got http tunnel logic working. 2014-07-01 16:28:15 -06:00
TcpSocket.h add support for tunnelling https fetch 2014-07-01 10:43:52 -06:00
test2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_convert.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_hash.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_norm.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_parser2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_parser.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_unicode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Test.cpp fix annoying bug when adding new parms. 2014-06-10 12:29:50 -07:00
Test.h Initial file population. 2013-08-02 13:12:24 -07:00
testfloats.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Tfndb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Tfndb.h code cleanups. 2014-01-18 21:19:26 -08:00
Thesaurus.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Thesaurus.h Initial file population. 2013-08-02 13:12:24 -07:00
Threads.cpp disable threads for disk and intersect/merge by default 2014-10-02 15:24:40 -07:00
Threads.h comment out unused code. make thread cleanups 2014-09-03 09:48:43 -07:00
threadtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
thunder.cpp Initial file population. 2013-08-02 13:12:24 -07:00
tifftopnm Initial file population. 2013-08-02 13:12:24 -07:00
Timedb.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Timedb.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Timer.h Initial file population. 2013-08-02 13:12:24 -07:00
Title.cpp qa test fixes 2014-07-15 10:06:33 -07:00
Title.h qa test fixes 2014-07-15 10:06:33 -07:00
Titledb.cpp thread fixes. if pthread_create fails then 2014-03-15 20:07:02 -07:00
Titledb.h code cleanups. 2014-01-18 21:19:26 -08:00
TopTree.cpp index numbers as integers too, not just floats 2014-02-06 20:57:54 -08:00
TopTree.h more fixes for new boolean logic. 2014-03-13 13:09:33 -07:00
treetest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TuringTest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TuringTest.h Initial file population. 2013-08-02 13:12:24 -07:00
Turkdb.cpp Initial file population. 2013-08-02 13:12:24 -07:00
types.h fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
UCNormalizer.cpp code cleanups. 2014-01-18 21:19:26 -08:00
UCNormalizer.h Initial file population. 2013-08-02 13:12:24 -07:00
UCPropTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UCPropTable.h Initial file population. 2013-08-02 13:12:24 -07:00
UCWordIterator.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UCWordIterator.h Initial file population. 2013-08-02 13:12:24 -07:00
UdpProtocol.h Initial file population. 2013-08-02 13:12:24 -07:00
UdpServer.cpp fix core from too many facet strs 2014-09-21 09:26:13 -07:00
UdpServer.h rebalancer working pretty well now 2014-01-15 19:08:47 -08:00
UdpSlot.cpp fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
UdpSlot.h hacked up to debug why we're not getting 2014-08-27 10:37:03 -07:00
udptest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Unicode.cpp when user searches for a word without the 2014-06-01 09:37:00 -07:00
Unicode.h new import code copiling. now needs runtime testing and 2014-09-20 20:12:28 -07:00
UnicodeProperties.cpp code cleanups. 2014-01-18 21:19:26 -08:00
UnicodeProperties.h Initial file population. 2013-08-02 13:12:24 -07:00
unifiedDict.txt Initial file population. 2013-08-02 13:12:24 -07:00
uniq2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Url.cpp fix nyt.com cookie redir bug. 2014-08-05 17:04:11 -07:00
Url.h Initial file population. 2013-08-02 13:12:24 -07:00
urlinfo.cpp do not add crazy urls into spiderdb 2014-09-20 08:26:22 -06:00
Users.cpp get collection/root login system working 2014-09-29 19:56:31 -07:00
Users.h Initial file population. 2013-08-02 13:12:24 -07:00
ValidPointer.cpp Initial file population. 2013-08-02 13:12:24 -07:00
ValidPointer.h Initial file population. 2013-08-02 13:12:24 -07:00
Vector.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Vector.h Initial file population. 2013-08-02 13:12:24 -07:00
Version.cpp force gb to recompile version every time 2014-09-19 12:23:40 -07:00
Version.h makefile updates 2014-09-19 13:51:08 -06:00
Weights.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Weights.h Initial file population. 2013-08-02 13:12:24 -07:00
Wiki.cpp log msg cleanups 2014-05-11 21:55:44 -07:00
Wiki.h Initial file population. 2013-08-02 13:12:24 -07:00
wikititles.txt.part1 Initial file population. 2013-08-02 13:12:24 -07:00
wikititles.txt.part2 Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-buf.txt when user searches for a word without the 2014-06-01 09:37:00 -07:00
wiktionary-lang.txt when user searches for a word without the 2014-06-01 09:37:00 -07:00
wiktionary-syns.dat when user searches for a word without the 2014-06-01 09:37:00 -07:00
Wiktionary.cpp fix compiler bug 2014-06-16 11:10:38 -07:00
Wiktionary.h Initial file population. 2013-08-02 13:12:24 -07:00
Words.cpp fixed bugs in sort by prices, etc. 2013-11-11 18:58:45 -08:00
Words.h Initial file population. 2013-08-02 13:12:24 -07:00
Xml.cpp fix support for indexing xml docs. 2014-09-28 10:43:41 -07:00
Xml.h index xml docs properly like we do json 2014-09-28 09:20:16 -07:00
XmlDoc.cpp fixed ssl CONNECT reply from being truncated. 2014-10-02 14:47:55 -07:00
XmlDoc.h fix gbfacetstr: operator for xml docs 2014-09-28 12:09:04 -07:00
XmlNode.cpp Merge branch 'diffbot-testing' into testing 2014-07-14 18:10:13 -07:00
XmlNode.h added Xml::getCompoundName() 2014-09-28 08:39:46 -07:00
zconf.h Initial file population. 2013-08-02 13:12:24 -07:00
zlib.h Initial file population. 2013-08-02 13:12:24 -07:00

open-source-search-engine

An open source web and enterprise search engine. As can be seen on http://www.gigablast.com/ .

RUNNING GIGABLAST

See html/faq.html for all administrative documentation including the quick start instructions.

Alternatively, visit http://www.gigablast.com/admin.html

See html/compare.html for a comparison of Gigablast to SOLR. Although this is very sparse right now, it does include some useful commands.

CODE ARCHITECTURE

See html/developer.html for all code documentation.

Alternatively, visit http://www.gigablast.com/developer.html

CONTACT

Contact me for feature requests or help in general. I will work for free for good use cases. mattdwells@hotmail.com.