Nov 20 2017 -- A distributed open source search engine and spider/crawler written in C/C++ for Linux on Intel/AMD. From gigablast dot com, which has binaries for download. See the README.md file at the very bottom of this page for instructions.
Go to file
2014-01-30 20:16:41 -08:00
antiword-dir Initial file population. 2013-08-02 13:12:24 -07:00
coll.main.0 fix another core from deleting a coll 2014-01-30 19:56:43 -08:00
html image updates 2014-01-30 13:11:26 -08:00
openssl we already include our own 32-bit 2013-09-15 18:25:49 -06:00
ucdata Initial file population. 2013-08-02 13:12:24 -07:00
Abbreviations.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Abbreviations.h change a couple of possible reserved names in C++ 2013-08-28 22:59:01 -06:00
Accessdb.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Accessdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Address.cpp a bunch of bug fixes, mostly spider related. 2013-12-07 21:56:37 -07:00
Address.h change a couple of possible reserved names in C++ 2013-08-28 22:59:01 -06:00
addtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Ads.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Ads.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
AdultBit.cpp Initial file population. 2013-08-02 13:12:24 -07:00
AdultBit.h Initial file population. 2013-08-02 13:12:24 -07:00
animate.cpp Initial file population. 2013-08-02 13:12:24 -07:00
antiword Initial file population. 2013-08-02 13:12:24 -07:00
AutoBan.cpp formatting changes 2014-01-19 00:38:02 -08:00
AutoBan.h Initial file population. 2013-08-02 13:12:24 -07:00
badcattable.dat Initial file population. 2013-08-02 13:12:24 -07:00
BigFile.cpp fix a core 2014-01-22 22:26:50 -08:00
BigFile.h forgot to push the .h files 2013-12-07 22:12:48 -07:00
Bits.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Bits.h Initial file population. 2013-08-02 13:12:24 -07:00
blaster.cpp fixed bugs with advanced.html advanced search page. 2013-11-17 14:58:47 -07:00
Blaster.cpp for json docs only give them a single 2014-01-25 08:17:38 -08:00
Blaster.h use ./gb blaster -u <fileofurls> to just inject urls, 2013-08-19 16:33:27 -06:00
bmptopnm Initial file population. 2013-08-02 13:12:24 -07:00
Cachedb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Cachedb.h Initial file population. 2013-08-02 13:12:24 -07:00
camsort.cpp Initial file population. 2013-08-02 13:12:24 -07:00
catcountry.dat Initial file population. 2013-08-02 13:12:24 -07:00
Catdb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Catdb.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Categories.cpp documentation updates. fixed sd=0. 2013-10-13 14:24:41 -07:00
Categories.h documentation updates. fixed sd=0. 2013-10-13 14:24:41 -07:00
CatRec.cpp fix a couple catdb generation bugs. 2013-10-12 20:33:04 -07:00
CatRec.h Initial file population. 2013-08-02 13:12:24 -07:00
character-sets Initial file population. 2013-08-02 13:12:24 -07:00
check_unicode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Clusterdb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Clusterdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Collectiondb.cpp fix a couple cores related to deleting collections 2014-01-29 15:56:07 -08:00
Collectiondb.h had to add per round page and process counts 2014-01-23 13:23:09 -08:00
Conf.cpp parm simplifcations 2014-01-09 19:00:21 -08:00
Conf.h added diffbot retry rules. 2014-01-22 19:57:38 -08:00
convert.cpp Initial file population. 2013-08-02 13:12:24 -07:00
CountryCode.cpp fix pagecrawlbot.cpp to support &c=token-name. 2014-01-22 23:40:38 -08:00
CountryCode.h fix pagecrawlbot.cpp to support &c=token-name. 2014-01-22 23:40:38 -08:00
create_ucd_tables.cpp Initial file population. 2013-08-02 13:12:24 -07:00
DailyMerge.cpp Fixed some bugs. 2013-08-09 08:52:15 -07:00
DailyMerge.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
DataFeed.cpp Initial file population. 2013-08-02 13:12:24 -07:00
DataFeed.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Datedb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Datedb.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Dates.cpp get new global preemptive cache 2014-01-05 11:51:09 -08:00
Dates.h Initial file population. 2013-08-02 13:12:24 -07:00
Diff.cpp Fixed some bugs. 2013-08-09 08:52:15 -07:00
Diff.h Initial file population. 2013-08-02 13:12:24 -07:00
Dir.cpp more parmdb fixes 2013-12-16 15:39:24 -08:00
Dir.h fix file descriptor leak in Dir class. 2013-11-19 13:41:56 -08:00
DiskPageCache.cpp disk page cache back on 2014-01-21 19:03:47 -08:00
DiskPageCache.h tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
dlstubs.c Initial file population. 2013-08-02 13:12:24 -07:00
dmozparse.cpp add support for noindex meta tag. 2013-10-12 22:50:23 -07:00
Dns.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Dns.h Initial file population. 2013-08-02 13:12:24 -07:00
DnsProtocol.h Initial file population. 2013-08-02 13:12:24 -07:00
dnstest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Domains.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Domains.h Initial file population. 2013-08-02 13:12:24 -07:00
dumpcore.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Entities.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Entities.h Initial file population. 2013-08-02 13:12:24 -07:00
Errno.cpp added and fixed support for <link ahref=xxx rel=canonical>. 2014-01-30 10:37:59 -08:00
Errno.h added ability to treat <link xyz.com rel=canoical> as meta redirects. 2014-01-30 10:04:09 -08:00
errnotest.cpp errno test update 2013-11-19 00:10:10 -07:00
Events.h Initial file population. 2013-08-02 13:12:24 -07:00
Facebook.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Facebook.h new graphic icons. minor clean ups. 2013-11-15 14:47:05 -07:00
fastIndexTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
fctypes.cpp for json docs only give them a single 2014-01-25 08:17:38 -08:00
fctypes.h got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
File.cpp log cleanups mostly. 2013-12-18 10:57:18 -08:00
File.h tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
filterquerylogs.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Flags.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Flags.h Initial file population. 2013-08-02 13:12:24 -07:00
gb-include.h Initial file population. 2013-08-02 13:12:24 -07:00
gb.conf take out confusing function no longer used 2014-01-28 11:10:59 -08:00
gb.pem so we have spider https sites add 2013-10-13 00:15:39 -07:00
gbfilter.cpp Initial file population. 2013-08-02 13:12:24 -07:00
gbtitletest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geneaology.cpp Initial file population. 2013-08-02 13:12:24 -07:00
generateSuperMergeCode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geo_ip_table.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geo_ip_table.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP_internal.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP.c Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIPCity.c Initial file population. 2013-08-02 13:12:24 -07:00
GeoIPCity.h Initial file population. 2013-08-02 13:12:24 -07:00
getsample.cpp Initial file population. 2013-08-02 13:12:24 -07:00
giftopnm Initial file population. 2013-08-02 13:12:24 -07:00
hash.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hash.h get "&site=abc.com+xyz.com"... working to restrict 2013-09-15 20:16:48 -07:00
HashTable.cpp fix core from last push. 2013-12-09 14:21:46 -07:00
HashTable.h mem labelling fixes. 2013-12-09 14:05:02 -07:00
HashTableT.cpp Initial file population. 2013-08-02 13:12:24 -07:00
HashTableT.h Initial file population. 2013-08-02 13:12:24 -07:00
HashTableX.cpp quite a few fixes to the quota system, cleanups etc. 2014-01-18 16:23:13 -08:00
HashTableX.h get new global preemptive cache 2014-01-05 11:51:09 -08:00
hashtest2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hashtest3.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hashtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Highlight.cpp Merge branch 'master' into diffbot 2013-12-07 11:34:26 -07:00
Highlight.h trying to fix json decoding bug. 2013-10-24 17:55:01 -07:00
Hostdb.cpp fix another core from freening wrong byte sized 2014-01-30 20:16:41 -08:00
Hostdb.h fix another core from freening wrong byte sized 2014-01-30 20:16:41 -08:00
hosts.conf fix hosts.conf 2013-12-26 09:34:35 -08:00
hosts.cpp Initial file population. 2013-08-02 13:12:24 -07:00
HttpMime.cpp mem labelling fixes. 2013-12-09 14:05:02 -07:00
HttpMime.h update the dirty word list. but we still 2013-10-15 01:01:19 -07:00
HttpRequest.cpp fix a few cores. assume any ip that matches 2014-01-10 18:34:47 -07:00
HttpRequest.h show all crawl details in url webhook 2013-11-07 13:59:43 -08:00
HttpServer.cpp added url download support 2014-01-20 23:17:04 -08:00
HttpServer.h use &format=0 1 or 2 for html/xml/json now. 2013-11-08 18:00:30 -08:00
iana_charset.cpp Merge branch 'diffbot' into diffbot-testing 2013-12-16 11:06:11 -08:00
iana_charset.h Merge branch 'diffbot' into diffbot-testing 2013-12-16 11:06:11 -08:00
iconv.h Initial file population. 2013-08-02 13:12:24 -07:00
Images.cpp got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
Images.h Initial file population. 2013-08-02 13:12:24 -07:00
Indexdb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Indexdb.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
IndexList.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexList.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexReadInfo.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexReadInfo.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable2.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
IndexTable2.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable.h Initial file population. 2013-08-02 13:12:24 -07:00
injectme3 added injectme3 file and documentation into compare.html 2013-08-17 11:02:26 -06:00
injector.cpp Initial file population. 2013-08-02 13:12:24 -07:00
iostream.h Initial file population. 2013-08-02 13:12:24 -07:00
ip.cpp fix old bug. 2014-01-10 18:52:47 -07:00
ip.h Initial file population. 2013-08-02 13:12:24 -07:00
ipconfig.cpp fixed some cores. brought in fixes from 2013-09-08 16:16:13 -06:00
Iso8859.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Iso8859.h Initial file population. 2013-08-02 13:12:24 -07:00
jointest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
jpegtopnm Initial file population. 2013-08-02 13:12:24 -07:00
Json.cpp tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
Json.h fix json double decoding issue. no more 2013-11-22 14:16:14 -08:00
keepalive.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Lang.cpp comment updates 2013-10-15 23:13:50 -07:00
Lang.h Initial file population. 2013-08-02 13:12:24 -07:00
LangList.cpp code cleanups. 2014-01-18 21:19:26 -08:00
LangList.h Initial file population. 2013-08-02 13:12:24 -07:00
Language.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Language.h Initial file population. 2013-08-02 13:12:24 -07:00
LanguageIdentifier.cpp Initial file population. 2013-08-02 13:12:24 -07:00
LanguageIdentifier.h Initial file population. 2013-08-02 13:12:24 -07:00
LanguagePages.cpp Initial file population. 2013-08-02 13:12:24 -07:00
LanguagePages.h Initial file population. 2013-08-02 13:12:24 -07:00
libc.a Initial file population. 2013-08-02 13:12:24 -07:00
libcrypto.a Initial file population. 2013-08-02 13:12:24 -07:00
libgcc.a Initial file population. 2013-08-02 13:12:24 -07:00
libiconv.a Initial file population. 2013-08-02 13:12:24 -07:00
libiconv.la Initial file population. 2013-08-02 13:12:24 -07:00
libm.a Initial file population. 2013-08-02 13:12:24 -07:00
libpthread.a Initial file population. 2013-08-02 13:12:24 -07:00
libssl.a Initial file population. 2013-08-02 13:12:24 -07:00
libstdc++.a Initial file population. 2013-08-02 13:12:24 -07:00
libz.a Initial file population. 2013-08-02 13:12:24 -07:00
LICENSE Update LICENSE 2013-12-01 13:43:14 -08:00
Linkdb.cpp for json docs only give them a single 2014-01-25 08:17:38 -08:00
Linkdb.h fix LinkInfo mem leaks 2013-11-16 17:50:32 -08:00
LinkedList.h Initial file population. 2013-08-02 13:12:24 -07:00
linkspam.cpp renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
linkspam.h Initial file population. 2013-08-02 13:12:24 -07:00
Log.cpp forgot to unlock thread lock 2013-12-15 10:43:34 -07:00
Log.h Initial file population. 2013-08-02 13:12:24 -07:00
Loop.cpp tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
Loop.h Initial file population. 2013-08-02 13:12:24 -07:00
looptest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
main.cpp test fix for keep alive infinite loop bug. 2014-01-30 14:16:16 -08:00
Make.depend change deduping logic to be first come first 2014-01-29 16:14:42 -08:00
Makefile got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
malloc.c Initial file population. 2013-08-02 13:12:24 -07:00
matches2.cpp dirty word detector revisions. we need 2013-10-16 20:19:49 -07:00
matches2.h renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
Matches.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Matches.h Initial file population. 2013-08-02 13:12:24 -07:00
Mem.cpp always use kstart. 2014-01-28 14:37:21 -08:00
Mem.h fix typo 2013-09-08 19:51:57 -07:00
membustest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPool.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPool.h Initial file population. 2013-08-02 13:12:24 -07:00
MemPoolTree.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPoolTree.h Initial file population. 2013-08-02 13:12:24 -07:00
memtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
mergetest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MetaContainer.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MetaContainer.h Initial file population. 2013-08-02 13:12:24 -07:00
Mime.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Mime.h Initial file population. 2013-08-02 13:12:24 -07:00
mixfile.cpp Initial file population. 2013-08-02 13:12:24 -07:00
mmseg.h Initial file population. 2013-08-02 13:12:24 -07:00
monitor.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Monitordb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Monitordb.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg0.cpp got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
Msg0.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg1.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg1.h log fixes for debugging. try to 2013-10-02 22:37:20 -06:00
Msg1f.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg1f.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg2.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg2.h new &sites=xyz.com+abc.com+... functionality compiles ok. 2013-09-15 18:14:32 -06:00
Msg2a.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg2a.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg2b.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg2b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg3.cpp tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
Msg3.h almost done adding support for whitelists. 2013-09-15 15:15:56 -06:00
Msg3a.cpp got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
Msg3a.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg3e.cpp fix infinite loop from json parsing and 2013-09-27 17:52:36 -06:00
Msg3e.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg4.cpp fixed bug of not saving waiting trees! 2014-01-23 01:02:11 -08:00
Msg4.h got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
Msg5.cpp change deduping logic to be first come first 2014-01-29 16:14:42 -08:00
Msg5.h fix a couple cores related to deleting collections 2014-01-29 15:56:07 -08:00
Msg6b.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg6b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg8b.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Msg8b.h Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg9b.cpp fix a couple catdb generation bugs. 2013-10-12 20:33:04 -07:00
Msg9b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg13.cpp take out potentially bad robots.txt 2014-01-28 18:26:16 -08:00
Msg13.h measure crawl delay by default from 2013-11-26 14:07:28 -08:00
Msg17.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg17.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg20.cpp added ability to treat <link xyz.com rel=canoical> as meta redirects. 2014-01-30 10:04:09 -08:00
Msg20.h get new global preemptive cache 2014-01-05 11:51:09 -08:00
Msg22.cpp fix bugs to try to get sharding working 2014-01-21 13:58:21 -08:00
Msg22.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg24.cpp new Make.depend. 2013-08-09 17:13:45 -06:00
Msg28.cpp fix core from (broad)casting valueless cgi field. 2013-10-03 14:51:59 -07:00
Msg28.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg30.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg30.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Msg35.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg35.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg36.cpp got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
Msg36.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg37.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg37.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg39.cpp Merge branch 'master' into diffbot 2014-01-17 20:17:40 -08:00
Msg39.h new &sites=xyz.com+abc.com+... functionality compiles ok. 2013-09-15 18:14:32 -06:00
Msg40.cpp fixed form input. fixed page parser submission. 2014-01-29 14:10:08 -08:00
Msg40.h added ifdef NEEDSLICENSE 2013-12-01 14:47:08 -07:00
Msg40Cache.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg40Cache.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg42.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg42.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg51.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Msg51.h Initial file population. 2013-08-02 13:12:24 -07:00
Msgaa.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msgaa.h Initial file population. 2013-08-02 13:12:24 -07:00
MsgC.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
MsgC.h Initial file population. 2013-08-02 13:12:24 -07:00
Msge0.cpp fix a core 2014-01-22 22:26:50 -08:00
Msge0.h Initial file population. 2013-08-02 13:12:24 -07:00
Msge1.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msge1.h Initial file population. 2013-08-02 13:12:24 -07:00
Multicast.cpp a lot of times rdb tree has invalid collection 2014-01-21 19:01:44 -08:00
Multicast.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
mysynonyms.txt Initial file population. 2013-08-02 13:12:24 -07:00
numwords.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageAddColl.cpp more spiderdb spider request fixes 2014-01-19 18:00:56 -08:00
PageAddUrl.cpp more bug fixes associated with collections 2014-01-18 11:54:58 -08:00
PageCatdb.cpp more formatting 2014-01-19 11:56:36 -08:00
PageCrawlBot.cpp always use kstart. 2014-01-28 14:37:21 -08:00
PageCrawlBot.h added "seeds" to json reply. store seed urls 2013-10-21 17:35:14 -07:00
PageDirectory.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
PageEvents.cpp formatting changes 2014-01-19 00:38:02 -08:00
PageGet.cpp for json docs only give them a single 2014-01-25 08:17:38 -08:00
PageHosts.cpp a lot of times rdb tree has invalid collection 2014-01-21 19:01:44 -08:00
PageIndexdb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
PageInject.cpp formatting 2014-01-19 15:06:02 -08:00
PageInject.h code cleanups. 2014-01-18 21:19:26 -08:00
PageLogin.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageLogView.cpp formatting fixes 2014-01-19 00:57:20 -08:00
PageNetTest.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
PageNetTest.h Initial file population. 2013-08-02 13:12:24 -07:00
PageOverview.cpp list collections in sidebar. 2014-01-09 21:13:41 -08:00
PageParser.cpp more formatting 2014-01-19 11:56:36 -08:00
PageParser.h Initial file population. 2013-08-02 13:12:24 -07:00
PagePerf.cpp took out pagecount table. just hafta scan 2014-01-19 20:34:38 -08:00
PageReindex.cpp more formatting 2014-01-19 11:56:36 -08:00
PageReindex.h fixed pagereindex. we now add spiderreplies 2013-12-07 10:01:17 -07:00
PageResults.cpp always use kstart. 2014-01-28 14:37:21 -08:00
PageResults.h added searchbox for dmoz pages/sites. 2013-10-13 15:45:12 -07:00
PageRoot.cpp more collection fixes 2014-01-18 12:09:33 -08:00
Pages.cpp fixed form input. fixed page parser submission. 2014-01-29 14:10:08 -08:00
Pages.h more spiderdb spider request fixes 2014-01-19 18:00:56 -08:00
PageSockets.cpp fix msge0 msg0 overload in sockets table 2014-01-22 20:34:55 -08:00
PageSpam.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageStats.cpp fix infinite keep alive restart bug some more 2014-01-30 14:12:32 -08:00
PageStatsdb.cpp formatting 2014-01-19 12:37:37 -08:00
PageSubmit.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageThesaurus.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageThreads.cpp formatting fixes 2014-01-19 00:57:20 -08:00
PageTitledb.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
PageTurk.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageTurk.h Initial file population. 2013-08-02 13:12:24 -07:00
Parms.cpp added ability to treat <link xyz.com rel=canoical> as meta redirects. 2014-01-30 10:04:09 -08:00
Parms.h fix up round incrementing logic. 2014-01-25 14:35:41 -08:00
parse_iana_charsets.pl move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
pdftohtml use the "onsite" keyword in your url filters 2013-09-06 09:37:17 -06:00
Phrases.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Phrases.h Initial file population. 2013-08-02 13:12:24 -07:00
PingServer.cpp fix repeat rounds sticking bug 2014-01-17 17:17:10 -08:00
PingServer.h added emergency msg box on all admin pages 2014-01-11 20:14:44 -08:00
Placedb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Placedb.h Initial file population. 2013-08-02 13:12:24 -07:00
pngtopnm Initial file population. 2013-08-02 13:12:24 -07:00
pnmscale Initial file population. 2013-08-02 13:12:24 -07:00
Pops.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pops.h Initial file population. 2013-08-02 13:12:24 -07:00
porter.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pos.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pos.h Initial file population. 2013-08-02 13:12:24 -07:00
Posdb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Posdb.h got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
postalCodes.txt Initial file population. 2013-08-02 13:12:24 -07:00
PostQueryRerank.cpp handle a bunch of oom conditions that 2013-11-20 10:14:02 -07:00
PostQueryRerank.h Initial file population. 2013-08-02 13:12:24 -07:00
ppmtojpeg Initial file population. 2013-08-02 13:12:24 -07:00
Process.cpp fixed bug of waiting trees not saving. 2014-01-23 01:04:24 -08:00
Process.h parmdb overhaul. support collection add/del 2013-12-10 13:09:55 -08:00
Profiler.cpp formatting fixes 2014-01-19 00:57:20 -08:00
Profiler.h Initial file population. 2013-08-02 13:12:24 -07:00
Proxy.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Proxy.h Initial file population. 2013-08-02 13:12:24 -07:00
pstotext Initial file population. 2013-08-02 13:12:24 -07:00
QAClient.cpp Initial file population. 2013-08-02 13:12:24 -07:00
QAClient.h Initial file population. 2013-08-02 13:12:24 -07:00
quarantine.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Query.cpp quite a few fixes to the quota system, cleanups etc. 2014-01-18 16:23:13 -08:00
Query.h test and get gbparenturl: query working. 2014-01-18 09:28:58 -08:00
Rdb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Rdb.h code cleanups. 2014-01-18 21:19:26 -08:00
RdbBase.cpp more shard rebalancer fixes 2014-01-22 00:44:33 -08:00
RdbBase.h code cleanups. 2014-01-18 21:19:26 -08:00
RdbBuckets.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbBuckets.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbCache.cpp code cleanups. 2014-01-18 21:19:26 -08:00
RdbCache.h removed MAX_COLL_RECS so we can have unlimited 2013-08-30 16:20:38 -07:00
RdbDump.cpp tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
RdbDump.h if coll is deleted or reset in a middle of a dump 2013-12-25 17:12:09 -08:00
RdbList.cpp try to fix rebalancing some more. 2014-01-21 22:39:01 -08:00
RdbList.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbMap.cpp tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
RdbMap.h tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
RdbMem.cpp track down some nasty cores. fix 2013-10-29 16:37:14 -07:00
RdbMem.h now we can reset collection mid stream 2013-10-18 17:49:36 -07:00
RdbMerge.cpp fix problem scanning spiderdb. 2014-01-16 17:04:08 -08:00
RdbMerge.h if coll is deleted or reset in a middle of a dump 2013-12-25 17:12:09 -08:00
RdbScan.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbScan.h Initial file population. 2013-08-02 13:12:24 -07:00
rdbtest2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
rdbtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbTree.cpp free spidercolls on exit 2014-01-22 23:52:23 -08:00
RdbTree.h fix annoying rdbtree pos/neg key counting issue 2014-01-11 18:04:28 -08:00
README.md Update README.md 2013-11-16 20:14:06 -08:00
readRec.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Rebalance.cpp update "this round" counts to at least 2014-01-23 18:22:13 -08:00
Rebalance.h fix problem scanning spiderdb. 2014-01-16 17:04:08 -08:00
reindex2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Repair.cpp formatting fixes 2014-01-19 00:57:20 -08:00
Repair.h Initial file population. 2013-08-02 13:12:24 -07:00
RequestTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RequestTable.h Initial file population. 2013-08-02 13:12:24 -07:00
rescue.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Revdb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Revdb.h Initial file population. 2013-08-02 13:12:24 -07:00
rmbots.cpp Initial file population. 2013-08-02 13:12:24 -07:00
SafeBuf.cpp take out confusing function no longer used 2014-01-28 11:10:59 -08:00
SafeBuf.h take out confusing function no longer used 2014-01-28 11:10:59 -08:00
SafeList.h Initial file population. 2013-08-02 13:12:24 -07:00
Sanity.h Initial file population. 2013-08-02 13:12:24 -07:00
Scores.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Scores.h Initial file population. 2013-08-02 13:12:24 -07:00
Scraper.cpp take out datedb. no longer used. we store 2014-01-09 13:39:28 -08:00
Scraper.h Initial file population. 2013-08-02 13:12:24 -07:00
SearchInput.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
SearchInput.h quite a few fixes to the quota system, cleanups etc. 2014-01-18 16:23:13 -08:00
Sections.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Sections.h get new global preemptive cache 2014-01-05 11:51:09 -08:00
seektest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
seo.h Initial file population. 2013-08-02 13:12:24 -07:00
SiteGetter.cpp got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
SiteGetter.h Initial file population. 2013-08-02 13:12:24 -07:00
sleepandlog.cpp Initial file population. 2013-08-02 13:12:24 -07:00
sort.cpp Initial file population. 2013-08-02 13:12:24 -07:00
sort.h Initial file population. 2013-08-02 13:12:24 -07:00
Speller.cpp clean up logging so i can see what's going on 2013-12-10 16:41:30 -08:00
Speller.h Initial file population. 2013-08-02 13:12:24 -07:00
Spider.cpp fix another core from freening wrong byte sized 2014-01-30 20:16:41 -08:00
Spider.h make crawl sync bug fixes. 2014-01-25 13:47:03 -08:00
Stats.cpp formatting 2014-01-19 12:37:37 -08:00
Stats.h more formatting 2014-01-19 01:09:38 -08:00
Statsdb.cpp formatting 2014-01-19 12:37:37 -08:00
Statsdb.h fix potential problem of tons of points in 2013-10-14 22:52:29 -07:00
StopWords.cpp update common word list 2013-12-01 15:19:33 -07:00
StopWords.h Initial file population. 2013-08-02 13:12:24 -07:00
streambuf.h Initial file population. 2013-08-02 13:12:24 -07:00
Strings.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Strings.h Initial file population. 2013-08-02 13:12:24 -07:00
Summary.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Summary.h renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
superMergeTest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
supported_charsets.cpp Initial file population. 2013-08-02 13:12:24 -07:00
supported_charsets.txt Initial file population. 2013-08-02 13:12:24 -07:00
Syncdb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Syncdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Synonyms.cpp now we index all numbers that have field names 2013-11-08 16:16:13 -08:00
Synonyms.h now we index all numbers that have field names 2013-11-08 16:16:13 -08:00
Tagdb.cpp more formatting 2014-01-19 11:56:36 -08:00
Tagdb.h fix msge0 msg0 overload in sockets table 2014-01-22 20:34:55 -08:00
TcpServer.cpp fix core on broken pipe when calling 2014-01-23 11:34:49 -08:00
TcpServer.h added awesome streaming mode support 2014-01-17 16:26:17 -08:00
TcpSocket.h fix streaming mode for sending back json 2014-01-17 18:28:17 -08:00
test2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_convert.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_hash.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_norm.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_parser2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_parser.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_unicode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Test.cpp parmdb updates 2013-12-16 17:07:15 -08:00
Test.h Initial file population. 2013-08-02 13:12:24 -07:00
testfloats.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Tfndb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Tfndb.h code cleanups. 2014-01-18 21:19:26 -08:00
Thesaurus.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Thesaurus.h Initial file population. 2013-08-02 13:12:24 -07:00
Threads.cpp fix send email notification bug. 2014-01-23 16:59:55 -08:00
Threads.h do not try to join on thread when pthread_create() fails 2013-11-16 18:28:49 -07:00
threadtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
thunder.cpp Initial file population. 2013-08-02 13:12:24 -07:00
tifftopnm Initial file population. 2013-08-02 13:12:24 -07:00
Timedb.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Timedb.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Timer.h Initial file population. 2013-08-02 13:12:24 -07:00
Title.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Title.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Titledb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Titledb.h code cleanups. 2014-01-18 21:19:26 -08:00
TopTree.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TopTree.h Initial file population. 2013-08-02 13:12:24 -07:00
treetest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TuringTest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TuringTest.h Initial file population. 2013-08-02 13:12:24 -07:00
Turkdb.cpp Initial file population. 2013-08-02 13:12:24 -07:00
types.h fix compiler warning in types.h. 2013-09-08 20:00:52 -06:00
UCNormalizer.cpp code cleanups. 2014-01-18 21:19:26 -08:00
UCNormalizer.h Initial file population. 2013-08-02 13:12:24 -07:00
UCPropTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UCPropTable.h Initial file population. 2013-08-02 13:12:24 -07:00
UCWordIterator.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UCWordIterator.h Initial file population. 2013-08-02 13:12:24 -07:00
UdpProtocol.h Initial file population. 2013-08-02 13:12:24 -07:00
UdpServer.cpp added diffbot retry rules. 2014-01-22 19:57:38 -08:00
UdpServer.h rebalancer working pretty well now 2014-01-15 19:08:47 -08:00
UdpSlot.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UdpSlot.h Initial file population. 2013-08-02 13:12:24 -07:00
udptest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Unicode.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Unicode.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
UnicodeProperties.cpp code cleanups. 2014-01-18 21:19:26 -08:00
UnicodeProperties.h Initial file population. 2013-08-02 13:12:24 -07:00
unifiedDict.txt Initial file population. 2013-08-02 13:12:24 -07:00
uniq2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Url.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Url.h Initial file population. 2013-08-02 13:12:24 -07:00
urlinfo.cpp fixed data corruption bug. m_finalCrawlDelay 2013-11-27 14:18:15 -08:00
Users.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Users.h Initial file population. 2013-08-02 13:12:24 -07:00
ValidPointer.cpp Initial file population. 2013-08-02 13:12:24 -07:00
ValidPointer.h Initial file population. 2013-08-02 13:12:24 -07:00
Vector.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Vector.h Initial file population. 2013-08-02 13:12:24 -07:00
Weights.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Weights.h Initial file population. 2013-08-02 13:12:24 -07:00
Wiki.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Wiki.h Initial file population. 2013-08-02 13:12:24 -07:00
wikititles.txt.part1 Initial file population. 2013-08-02 13:12:24 -07:00
wikititles.txt.part2 Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-buf.txt Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-lang.txt Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-syns.dat Initial file population. 2013-08-02 13:12:24 -07:00
Wiktionary.cpp Merge branch 'master' into diffbot 2013-11-20 15:51:58 -08:00
Wiktionary.h Initial file population. 2013-08-02 13:12:24 -07:00
Words.cpp fixed bugs in sort by prices, etc. 2013-11-11 18:58:45 -08:00
Words.h Initial file population. 2013-08-02 13:12:24 -07:00
Xml.cpp for json docs only give them a single 2014-01-25 08:17:38 -08:00
Xml.h for json docs only give them a single 2014-01-25 08:17:38 -08:00
XmlDoc.cpp fix infinite keep alive restart bug some more 2014-01-30 14:12:32 -08:00
XmlDoc.h added ability to treat <link xyz.com rel=canoical> as meta redirects. 2014-01-30 10:04:09 -08:00
XmlNode.cpp fixed cdata parsing issue 2013-12-19 16:04:53 -08:00
XmlNode.h Initial file population. 2013-08-02 13:12:24 -07:00
zconf.h Initial file population. 2013-08-02 13:12:24 -07:00
zlib.h Initial file population. 2013-08-02 13:12:24 -07:00

open-source-search-engine

An open source web and enterprise search engine. As can be seen on http://www.gigablast.com/ .

RUNNING GIGABLAST

See html/admin.html for all administrative documentation including the quick start instructions.

Alternatively, visit http://www.gigablast.com/admin.html

See html/compare.html for a comparison of Gigablast to SOLR. Although this is very sparse right now, it does include some useful commands.

CODE ARCHITECTURE

See html/developer.html for all code documentation.

Alternatively, visit http://www.gigablast.com/developer.html

CONTACT

Contact me for feature requests or help in general. I will work for free for good use cases. mattdwells@hotmail.com.