Nov 20 2017 -- A distributed open source search engine and spider/crawler written in C/C++ for Linux on Intel/AMD. From gigablast dot com, which has binaries for download. See the README.md file at the very bottom of this page for instructions.
Go to file
Matt Wells 5c8b9af1d3 fix rdbcache corruption from -O2 compile bug.
fix too many spiders per ip bug!
2014-02-05 16:58:21 -08:00
antiword-dir Initial file population. 2013-08-02 13:12:24 -07:00
coll.main.0 Merge branch 'diffbot' 2014-02-01 11:28:31 -07:00
html image updates 2014-01-30 13:11:26 -08:00
openssl we already include our own 32-bit 2013-09-15 18:25:49 -06:00
ucdata Initial file population. 2013-08-02 13:12:24 -07:00
Abbreviations.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Abbreviations.h change a couple of possible reserved names in C++ 2013-08-28 22:59:01 -06:00
Accessdb.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Accessdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Address.cpp a bunch of bug fixes, mostly spider related. 2013-12-07 21:56:37 -07:00
Address.h change a couple of possible reserved names in C++ 2013-08-28 22:59:01 -06:00
addtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Ads.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Ads.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
AdultBit.cpp Initial file population. 2013-08-02 13:12:24 -07:00
AdultBit.h Initial file population. 2013-08-02 13:12:24 -07:00
animate.cpp Initial file population. 2013-08-02 13:12:24 -07:00
antiword Initial file population. 2013-08-02 13:12:24 -07:00
AutoBan.cpp formatting changes 2014-01-19 00:38:02 -08:00
AutoBan.h Initial file population. 2013-08-02 13:12:24 -07:00
badcattable.dat Initial file population. 2013-08-02 13:12:24 -07:00
BigFile.cpp fix a core 2014-01-22 22:26:50 -08:00
BigFile.h forgot to push the .h files 2013-12-07 22:12:48 -07:00
Bits.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Bits.h Initial file population. 2013-08-02 13:12:24 -07:00
blaster.cpp fixed bugs with advanced.html advanced search page. 2013-11-17 14:58:47 -07:00
Blaster.cpp for json docs only give them a single 2014-01-25 08:17:38 -08:00
Blaster.h use ./gb blaster -u <fileofurls> to just inject urls, 2013-08-19 16:33:27 -06:00
bmptopnm Initial file population. 2013-08-02 13:12:24 -07:00
Cachedb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Cachedb.h Initial file population. 2013-08-02 13:12:24 -07:00
camsort.cpp Initial file population. 2013-08-02 13:12:24 -07:00
catcountry.dat Initial file population. 2013-08-02 13:12:24 -07:00
Catdb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Catdb.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Categories.cpp documentation updates. fixed sd=0. 2013-10-13 14:24:41 -07:00
Categories.h documentation updates. fixed sd=0. 2013-10-13 14:24:41 -07:00
CatRec.cpp fix a couple catdb generation bugs. 2013-10-12 20:33:04 -07:00
CatRec.h Initial file population. 2013-08-02 13:12:24 -07:00
character-sets Initial file population. 2013-08-02 13:12:24 -07:00
check_unicode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Clusterdb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Clusterdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Collectiondb.cpp Merge branch 'diffbot' 2014-02-01 11:31:00 -07:00
Collectiondb.h Merge branch 'diffbot' 2014-02-01 11:28:31 -07:00
Conf.cpp parm simplifcations 2014-01-09 19:00:21 -08:00
Conf.h add connectips back. call them adminIps this time. 2014-02-03 20:47:48 -07:00
convert.cpp Initial file population. 2013-08-02 13:12:24 -07:00
CountryCode.cpp fix pagecrawlbot.cpp to support &c=token-name. 2014-01-22 23:40:38 -08:00
CountryCode.h fix pagecrawlbot.cpp to support &c=token-name. 2014-01-22 23:40:38 -08:00
create_ucd_tables.cpp Initial file population. 2013-08-02 13:12:24 -07:00
DailyMerge.cpp Fixed some bugs. 2013-08-09 08:52:15 -07:00
DailyMerge.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
DataFeed.cpp Initial file population. 2013-08-02 13:12:24 -07:00
DataFeed.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Datedb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Datedb.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Dates.cpp get new global preemptive cache 2014-01-05 11:51:09 -08:00
Dates.h Initial file population. 2013-08-02 13:12:24 -07:00
Diff.cpp Fixed some bugs. 2013-08-09 08:52:15 -07:00
Diff.h Initial file population. 2013-08-02 13:12:24 -07:00
Dir.cpp more parmdb fixes 2013-12-16 15:39:24 -08:00
Dir.h fix file descriptor leak in Dir class. 2013-11-19 13:41:56 -08:00
DiskPageCache.cpp disk page cache back on 2014-01-21 19:03:47 -08:00
DiskPageCache.h tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
dlstubs.c Initial file population. 2013-08-02 13:12:24 -07:00
dmozparse.cpp add support for noindex meta tag. 2013-10-12 22:50:23 -07:00
Dns.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Dns.h Initial file population. 2013-08-02 13:12:24 -07:00
DnsProtocol.h Initial file population. 2013-08-02 13:12:24 -07:00
dnstest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Domains.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Domains.h Initial file population. 2013-08-02 13:12:24 -07:00
dumpcore.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Entities.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Entities.h Initial file population. 2013-08-02 13:12:24 -07:00
Errno.cpp added and fixed support for <link ahref=xxx rel=canonical>. 2014-01-30 10:37:59 -08:00
Errno.h added ability to treat <link xyz.com rel=canoical> as meta redirects. 2014-01-30 10:04:09 -08:00
errnotest.cpp errno test update 2013-11-19 00:10:10 -07:00
Events.h Initial file population. 2013-08-02 13:12:24 -07:00
Facebook.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Facebook.h new graphic icons. minor clean ups. 2013-11-15 14:47:05 -07:00
fastIndexTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
fctypes.cpp for json docs only give them a single 2014-01-25 08:17:38 -08:00
fctypes.h got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
File.cpp log cleanups mostly. 2013-12-18 10:57:18 -08:00
File.h tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
filterquerylogs.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Flags.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Flags.h Initial file population. 2013-08-02 13:12:24 -07:00
gb-include.h Initial file population. 2013-08-02 13:12:24 -07:00
gb.conf fix rdbcache corruption from -O2 compile bug. 2014-02-05 16:58:21 -08:00
gb.pem so we have spider https sites add 2013-10-13 00:15:39 -07:00
gbfilter.cpp Initial file population. 2013-08-02 13:12:24 -07:00
gbtitletest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geneaology.cpp Initial file population. 2013-08-02 13:12:24 -07:00
generateSuperMergeCode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geo_ip_table.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geo_ip_table.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP_internal.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP.c Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIPCity.c Initial file population. 2013-08-02 13:12:24 -07:00
GeoIPCity.h Initial file population. 2013-08-02 13:12:24 -07:00
getsample.cpp Initial file population. 2013-08-02 13:12:24 -07:00
giftopnm Initial file population. 2013-08-02 13:12:24 -07:00
hash.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hash.h get "&site=abc.com+xyz.com"... working to restrict 2013-09-15 20:16:48 -07:00
HashTable.cpp fix core from last push. 2013-12-09 14:21:46 -07:00
HashTable.h mem labelling fixes. 2013-12-09 14:05:02 -07:00
HashTableT.cpp Initial file population. 2013-08-02 13:12:24 -07:00
HashTableT.h Initial file population. 2013-08-02 13:12:24 -07:00
HashTableX.cpp quite a few fixes to the quota system, cleanups etc. 2014-01-18 16:23:13 -08:00
HashTableX.h code checkpoint. time slicing, faster spider code 2014-02-04 17:34:43 -08:00
hashtest2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hashtest3.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hashtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Highlight.cpp Merge branch 'master' into diffbot 2013-12-07 11:34:26 -07:00
Highlight.h trying to fix json decoding bug. 2013-10-24 17:55:01 -07:00
Hostdb.cpp fix another core from freening wrong byte sized 2014-01-30 20:16:41 -08:00
Hostdb.h added recovery mode display in hosts table 2014-02-01 10:16:46 -08:00
hosts.conf fix hosts.conf 2013-12-26 09:34:35 -08:00
hosts.cpp Initial file population. 2013-08-02 13:12:24 -07:00
HttpMime.cpp mem labelling fixes. 2013-12-09 14:05:02 -07:00
HttpMime.h update the dirty word list. but we still 2013-10-15 01:01:19 -07:00
HttpRequest.cpp add connectips back. call them adminIps this time. 2014-02-03 20:47:48 -07:00
HttpRequest.h show all crawl details in url webhook 2013-11-07 13:59:43 -08:00
HttpServer.cpp added url download support 2014-01-20 23:17:04 -08:00
HttpServer.h use &format=0 1 or 2 for html/xml/json now. 2013-11-08 18:00:30 -08:00
iana_charset.cpp Merge branch 'diffbot' into diffbot-testing 2013-12-16 11:06:11 -08:00
iana_charset.h Merge branch 'diffbot' into diffbot-testing 2013-12-16 11:06:11 -08:00
iconv.h Initial file population. 2013-08-02 13:12:24 -07:00
Images.cpp got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
Images.h Initial file population. 2013-08-02 13:12:24 -07:00
Indexdb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Indexdb.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
IndexList.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexList.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexReadInfo.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexReadInfo.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable2.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
IndexTable2.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable.h Initial file population. 2013-08-02 13:12:24 -07:00
injectme3 added injectme3 file and documentation into compare.html 2013-08-17 11:02:26 -06:00
injector.cpp Initial file population. 2013-08-02 13:12:24 -07:00
iostream.h Initial file population. 2013-08-02 13:12:24 -07:00
ip.cpp fix old bug. 2014-01-10 18:52:47 -07:00
ip.h Initial file population. 2013-08-02 13:12:24 -07:00
ipconfig.cpp fixed some cores. brought in fixes from 2013-09-08 16:16:13 -06:00
Iso8859.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Iso8859.h Initial file population. 2013-08-02 13:12:24 -07:00
jointest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
jpegtopnm Initial file population. 2013-08-02 13:12:24 -07:00
Json.cpp fixed contenthash32 logic for json objects. 2014-02-05 13:22:03 -08:00
Json.h fixed contenthash32 logic for json objects. 2014-02-05 13:22:03 -08:00
keepalive.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Lang.cpp comment updates 2013-10-15 23:13:50 -07:00
Lang.h Initial file population. 2013-08-02 13:12:24 -07:00
LangList.cpp code cleanups. 2014-01-18 21:19:26 -08:00
LangList.h Initial file population. 2013-08-02 13:12:24 -07:00
Language.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Language.h Initial file population. 2013-08-02 13:12:24 -07:00
LanguageIdentifier.cpp Initial file population. 2013-08-02 13:12:24 -07:00
LanguageIdentifier.h Initial file population. 2013-08-02 13:12:24 -07:00
LanguagePages.cpp Initial file population. 2013-08-02 13:12:24 -07:00
LanguagePages.h Initial file population. 2013-08-02 13:12:24 -07:00
libc.a Initial file population. 2013-08-02 13:12:24 -07:00
libcrypto.a Initial file population. 2013-08-02 13:12:24 -07:00
libgcc.a Initial file population. 2013-08-02 13:12:24 -07:00
libiconv.a Initial file population. 2013-08-02 13:12:24 -07:00
libiconv.la Initial file population. 2013-08-02 13:12:24 -07:00
libm.a Initial file population. 2013-08-02 13:12:24 -07:00
libpthread.a Initial file population. 2013-08-02 13:12:24 -07:00
libssl.a Initial file population. 2013-08-02 13:12:24 -07:00
libstdc++.a Initial file population. 2013-08-02 13:12:24 -07:00
libz.a Initial file population. 2013-08-02 13:12:24 -07:00
LICENSE code checkpoint. time slicing, faster spider code 2014-02-04 17:34:43 -08:00
Linkdb.cpp added some repair logic for 0001.dat files. 2014-02-01 10:14:25 -08:00
Linkdb.h fix LinkInfo mem leaks 2013-11-16 17:50:32 -08:00
LinkedList.h Initial file population. 2013-08-02 13:12:24 -07:00
linkspam.cpp renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
linkspam.h Initial file population. 2013-08-02 13:12:24 -07:00
Log.cpp forgot to unlock thread lock 2013-12-15 10:43:34 -07:00
Log.h Initial file population. 2013-08-02 13:12:24 -07:00
Loop.cpp tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
Loop.h Initial file population. 2013-08-02 13:12:24 -07:00
looptest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
main.cpp fix excessive dupcache deduping. 2014-02-05 13:41:15 -08:00
Make.depend change deduping logic to be first come first 2014-01-29 16:14:42 -08:00
Makefile fix rdbcache corruption from -O2 compile bug. 2014-02-05 16:58:21 -08:00
malloc.c Initial file population. 2013-08-02 13:12:24 -07:00
matches2.cpp dirty word detector revisions. we need 2013-10-16 20:19:49 -07:00
matches2.h renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
Matches.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Matches.h Initial file population. 2013-08-02 13:12:24 -07:00
Mem.cpp always use kstart. 2014-01-28 14:37:21 -08:00
Mem.h fix typo 2013-09-08 19:51:57 -07:00
membustest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPool.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPool.h Initial file population. 2013-08-02 13:12:24 -07:00
MemPoolTree.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPoolTree.h Initial file population. 2013-08-02 13:12:24 -07:00
memtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
mergetest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MetaContainer.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MetaContainer.h Initial file population. 2013-08-02 13:12:24 -07:00
Mime.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Mime.h Initial file population. 2013-08-02 13:12:24 -07:00
mixfile.cpp Initial file population. 2013-08-02 13:12:24 -07:00
mmseg.h Initial file population. 2013-08-02 13:12:24 -07:00
monitor.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Monitordb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Monitordb.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg0.cpp got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
Msg0.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg1.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg1.h log fixes for debugging. try to 2013-10-02 22:37:20 -06:00
Msg1f.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg1f.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg2.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg2.h new &sites=xyz.com+abc.com+... functionality compiles ok. 2013-09-15 18:14:32 -06:00
Msg2a.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg2a.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg2b.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg2b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg3.cpp tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
Msg3.h almost done adding support for whitelists. 2013-09-15 15:15:56 -06:00
Msg3a.cpp got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
Msg3a.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg3e.cpp fix infinite loop from json parsing and 2013-09-27 17:52:36 -06:00
Msg3e.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg4.cpp fixed bug of not saving waiting trees! 2014-01-23 01:02:11 -08:00
Msg4.h got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
Msg5.cpp change deduping logic to be first come first 2014-01-29 16:14:42 -08:00
Msg5.h fix a couple cores related to deleting collections 2014-01-29 15:56:07 -08:00
Msg6b.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg6b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg8b.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Msg8b.h Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg9b.cpp fix a couple catdb generation bugs. 2013-10-12 20:33:04 -07:00
Msg9b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg13.cpp take out potentially bad robots.txt 2014-01-28 18:26:16 -08:00
Msg13.h measure crawl delay by default from 2013-11-26 14:07:28 -08:00
Msg17.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg17.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg20.cpp code checkpoint. time slicing, faster spider code 2014-02-04 17:34:43 -08:00
Msg20.h code checkpoint. time slicing, faster spider code 2014-02-04 17:34:43 -08:00
Msg22.cpp fix bugs to try to get sharding working 2014-01-21 13:58:21 -08:00
Msg22.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg24.cpp new Make.depend. 2013-08-09 17:13:45 -06:00
Msg28.cpp fix core from (broad)casting valueless cgi field. 2013-10-03 14:51:59 -07:00
Msg28.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg30.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg30.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Msg35.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg35.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg36.cpp got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
Msg36.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg37.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg37.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg39.cpp Merge branch 'diffbot' 2014-02-01 11:28:31 -07:00
Msg39.h new &sites=xyz.com+abc.com+... functionality compiles ok. 2013-09-15 18:14:32 -06:00
Msg40.cpp code checkpoint. time slicing, faster spider code 2014-02-04 17:34:43 -08:00
Msg40.h code checkpoint. time slicing, faster spider code 2014-02-04 17:34:43 -08:00
Msg40Cache.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg40Cache.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg42.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg42.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg51.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Msg51.h Initial file population. 2013-08-02 13:12:24 -07:00
Msgaa.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msgaa.h Initial file population. 2013-08-02 13:12:24 -07:00
MsgC.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
MsgC.h Initial file population. 2013-08-02 13:12:24 -07:00
Msge0.cpp fix a core 2014-01-22 22:26:50 -08:00
Msge0.h Initial file population. 2013-08-02 13:12:24 -07:00
Msge1.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msge1.h Initial file population. 2013-08-02 13:12:24 -07:00
Multicast.cpp a lot of times rdb tree has invalid collection 2014-01-21 19:01:44 -08:00
Multicast.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
mysynonyms.txt Initial file population. 2013-08-02 13:12:24 -07:00
numwords.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageAddColl.cpp more spiderdb spider request fixes 2014-01-19 18:00:56 -08:00
PageAddUrl.cpp more bug fixes associated with collections 2014-01-18 11:54:58 -08:00
PageCatdb.cpp more formatting 2014-01-19 11:56:36 -08:00
PageCrawlBot.cpp show userobots for bulk jobs 2014-02-05 15:45:39 -08:00
PageCrawlBot.h added "seeds" to json reply. store seed urls 2013-10-21 17:35:14 -07:00
PageDirectory.cpp code checkpoint. time slicing, faster spider code 2014-02-04 17:34:43 -08:00
PageEvents.cpp formatting changes 2014-01-19 00:38:02 -08:00
PageGet.cpp for json docs only give them a single 2014-01-25 08:17:38 -08:00
PageHosts.cpp added recovery mode display in hosts table 2014-02-01 10:16:46 -08:00
PageIndexdb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
PageInject.cpp formatting 2014-01-19 15:06:02 -08:00
PageInject.h code cleanups. 2014-01-18 21:19:26 -08:00
PageLogin.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageLogView.cpp formatting fixes 2014-01-19 00:57:20 -08:00
PageNetTest.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
PageNetTest.h Initial file population. 2013-08-02 13:12:24 -07:00
PageOverview.cpp list collections in sidebar. 2014-01-09 21:13:41 -08:00
PageParser.cpp more formatting 2014-01-19 11:56:36 -08:00
PageParser.h Initial file population. 2013-08-02 13:12:24 -07:00
PagePerf.cpp took out pagecount table. just hafta scan 2014-01-19 20:34:38 -08:00
PageReindex.cpp more formatting 2014-01-19 11:56:36 -08:00
PageReindex.h fixed pagereindex. we now add spiderreplies 2013-12-07 10:01:17 -07:00
PageResults.cpp core fixes. csv fixes. 2014-02-05 14:56:22 -08:00
PageResults.h code checkpoint. time slicing, faster spider code 2014-02-04 17:34:43 -08:00
PageRoot.cpp Merge branch 'diffbot' 2014-02-01 11:28:31 -07:00
Pages.cpp add connectips back. call them adminIps this time. 2014-02-03 20:47:48 -07:00
Pages.h more spiderdb spider request fixes 2014-01-19 18:00:56 -08:00
PageSockets.cpp fix msge0 msg0 overload in sockets table 2014-01-22 20:34:55 -08:00
PageSpam.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageStats.cpp fix infinite keep alive restart bug some more 2014-01-30 14:12:32 -08:00
PageStatsdb.cpp formatting 2014-01-19 12:37:37 -08:00
PageSubmit.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageThesaurus.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageThreads.cpp formatting fixes 2014-01-19 00:57:20 -08:00
PageTitledb.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
PageTurk.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageTurk.h Initial file population. 2013-08-02 13:12:24 -07:00
Parms.cpp dup cache fixes. do not add dups to spiderdb either. 2014-02-05 14:09:35 -08:00
Parms.h fix up round incrementing logic. 2014-01-25 14:35:41 -08:00
parse_iana_charsets.pl move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
pdftohtml use the "onsite" keyword in your url filters 2013-09-06 09:37:17 -06:00
Phrases.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Phrases.h Initial file population. 2013-08-02 13:12:24 -07:00
PingServer.cpp added recovery mode display in hosts table 2014-02-01 10:16:46 -08:00
PingServer.h added emergency msg box on all admin pages 2014-01-11 20:14:44 -08:00
Placedb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Placedb.h Initial file population. 2013-08-02 13:12:24 -07:00
pngtopnm Initial file population. 2013-08-02 13:12:24 -07:00
pnmscale Initial file population. 2013-08-02 13:12:24 -07:00
Pops.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pops.h Initial file population. 2013-08-02 13:12:24 -07:00
porter.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pos.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pos.h Initial file population. 2013-08-02 13:12:24 -07:00
Posdb.cpp core fixes. csv fixes. 2014-02-05 14:56:22 -08:00
Posdb.h time slice checkpoint 2014-02-03 19:17:58 -08:00
postalCodes.txt Initial file population. 2013-08-02 13:12:24 -07:00
PostQueryRerank.cpp handle a bunch of oom conditions that 2013-11-20 10:14:02 -07:00
PostQueryRerank.h Initial file population. 2013-08-02 13:12:24 -07:00
ppmtojpeg Initial file population. 2013-08-02 13:12:24 -07:00
Process.cpp fixed bug of waiting trees not saving. 2014-01-23 01:04:24 -08:00
Process.h parmdb overhaul. support collection add/del 2013-12-10 13:09:55 -08:00
Profiler.cpp formatting fixes 2014-01-19 00:57:20 -08:00
Profiler.h Initial file population. 2013-08-02 13:12:24 -07:00
Proxy.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Proxy.h Initial file population. 2013-08-02 13:12:24 -07:00
pstotext Initial file population. 2013-08-02 13:12:24 -07:00
QAClient.cpp Initial file population. 2013-08-02 13:12:24 -07:00
QAClient.h Initial file population. 2013-08-02 13:12:24 -07:00
quarantine.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Query.cpp time slice checkpoint 2014-02-03 19:17:58 -08:00
Query.h time slice checkpoint 2014-02-03 19:17:58 -08:00
Rdb.cpp dup cache fixes. do not add dups to spiderdb either. 2014-02-05 14:09:35 -08:00
Rdb.h code cleanups. 2014-01-18 21:19:26 -08:00
RdbBase.cpp added some repair logic for 0001.dat files. 2014-02-01 10:14:25 -08:00
RdbBase.h added some repair logic for 0001.dat files. 2014-02-01 10:14:25 -08:00
RdbBuckets.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbBuckets.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbCache.cpp code cleanups. 2014-01-18 21:19:26 -08:00
RdbCache.h removed MAX_COLL_RECS so we can have unlimited 2013-08-30 16:20:38 -07:00
RdbDump.cpp tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
RdbDump.h if coll is deleted or reset in a middle of a dump 2013-12-25 17:12:09 -08:00
RdbList.cpp checkpoint for faster spider code. 2014-02-04 16:15:27 -08:00
RdbList.h checkpoint for faster spider code. 2014-02-04 16:15:27 -08:00
RdbMap.cpp tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
RdbMap.h tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
RdbMem.cpp track down some nasty cores. fix 2013-10-29 16:37:14 -07:00
RdbMem.h now we can reset collection mid stream 2013-10-18 17:49:36 -07:00
RdbMerge.cpp fix problem scanning spiderdb. 2014-01-16 17:04:08 -08:00
RdbMerge.h if coll is deleted or reset in a middle of a dump 2013-12-25 17:12:09 -08:00
RdbScan.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbScan.h Initial file population. 2013-08-02 13:12:24 -07:00
rdbtest2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
rdbtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbTree.cpp free spidercolls on exit 2014-01-22 23:52:23 -08:00
RdbTree.h fix annoying rdbtree pos/neg key counting issue 2014-01-11 18:04:28 -08:00
README.md Update README.md 2013-11-16 20:14:06 -08:00
readRec.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Rebalance.cpp update "this round" counts to at least 2014-01-23 18:22:13 -08:00
Rebalance.h fix problem scanning spiderdb. 2014-01-16 17:04:08 -08:00
reindex2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Repair.cpp formatting fixes 2014-01-19 00:57:20 -08:00
Repair.h Initial file population. 2013-08-02 13:12:24 -07:00
RequestTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RequestTable.h Initial file population. 2013-08-02 13:12:24 -07:00
rescue.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Revdb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Revdb.h Initial file population. 2013-08-02 13:12:24 -07:00
rmbots.cpp Initial file population. 2013-08-02 13:12:24 -07:00
SafeBuf.cpp set safebuf::m_buf to null in destructor 2014-02-02 12:16:11 -07:00
SafeBuf.h Merge branch 'diffbot' 2014-02-01 11:28:31 -07:00
SafeList.h Initial file population. 2013-08-02 13:12:24 -07:00
Sanity.h Initial file population. 2013-08-02 13:12:24 -07:00
Scores.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Scores.h Initial file population. 2013-08-02 13:12:24 -07:00
Scraper.cpp take out datedb. no longer used. we store 2014-01-09 13:39:28 -08:00
Scraper.h Initial file population. 2013-08-02 13:12:24 -07:00
SearchInput.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
SearchInput.h streaming results code checkpoint. 2014-02-04 17:05:43 -08:00
Sections.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Sections.h get new global preemptive cache 2014-01-05 11:51:09 -08:00
seektest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
seo.h Initial file population. 2013-08-02 13:12:24 -07:00
SiteGetter.cpp got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
SiteGetter.h Initial file population. 2013-08-02 13:12:24 -07:00
sleepandlog.cpp Initial file population. 2013-08-02 13:12:24 -07:00
sort.cpp Initial file population. 2013-08-02 13:12:24 -07:00
sort.h Initial file population. 2013-08-02 13:12:24 -07:00
Speller.cpp clean up logging so i can see what's going on 2013-12-10 16:41:30 -08:00
Speller.h Initial file population. 2013-08-02 13:12:24 -07:00
Spider.cpp fix rdbcache corruption from -O2 compile bug. 2014-02-05 16:58:21 -08:00
Spider.h dup cache fixes. do not add dups to spiderdb either. 2014-02-05 14:09:35 -08:00
Stats.cpp formatting 2014-01-19 12:37:37 -08:00
Stats.h more formatting 2014-01-19 01:09:38 -08:00
Statsdb.cpp formatting 2014-01-19 12:37:37 -08:00
Statsdb.h fix potential problem of tons of points in 2013-10-14 22:52:29 -07:00
StopWords.cpp update common word list 2013-12-01 15:19:33 -07:00
StopWords.h Initial file population. 2013-08-02 13:12:24 -07:00
streambuf.h Initial file population. 2013-08-02 13:12:24 -07:00
Strings.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Strings.h Initial file population. 2013-08-02 13:12:24 -07:00
Summary.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Summary.h renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
superMergeTest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
supported_charsets.cpp Initial file population. 2013-08-02 13:12:24 -07:00
supported_charsets.txt Initial file population. 2013-08-02 13:12:24 -07:00
Syncdb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Syncdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Synonyms.cpp now we index all numbers that have field names 2013-11-08 16:16:13 -08:00
Synonyms.h now we index all numbers that have field names 2013-11-08 16:16:13 -08:00
Tagdb.cpp Merge branch 'diffbot' 2014-02-01 11:28:31 -07:00
Tagdb.h fix msge0 msg0 overload in sockets table 2014-01-22 20:34:55 -08:00
TcpServer.cpp undo canonical deduping. 2014-01-31 14:53:49 -08:00
TcpServer.h added awesome streaming mode support 2014-01-17 16:26:17 -08:00
TcpSocket.h fix streaming mode for sending back json 2014-01-17 18:28:17 -08:00
test2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_convert.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_hash.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_norm.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_parser2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_parser.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_unicode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Test.cpp parmdb updates 2013-12-16 17:07:15 -08:00
Test.h Initial file population. 2013-08-02 13:12:24 -07:00
testfloats.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Tfndb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Tfndb.h code cleanups. 2014-01-18 21:19:26 -08:00
Thesaurus.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Thesaurus.h Initial file population. 2013-08-02 13:12:24 -07:00
Threads.cpp fix core when thread fails to spawn. 2014-02-03 07:27:32 -07:00
Threads.h do not try to join on thread when pthread_create() fails 2013-11-16 18:28:49 -07:00
threadtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
thunder.cpp Initial file population. 2013-08-02 13:12:24 -07:00
tifftopnm Initial file population. 2013-08-02 13:12:24 -07:00
Timedb.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Timedb.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Timer.h Initial file population. 2013-08-02 13:12:24 -07:00
Title.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Title.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Titledb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Titledb.h code cleanups. 2014-01-18 21:19:26 -08:00
TopTree.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TopTree.h Initial file population. 2013-08-02 13:12:24 -07:00
treetest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TuringTest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TuringTest.h Initial file population. 2013-08-02 13:12:24 -07:00
Turkdb.cpp Initial file population. 2013-08-02 13:12:24 -07:00
types.h fix compiler warning in types.h. 2013-09-08 20:00:52 -06:00
UCNormalizer.cpp code cleanups. 2014-01-18 21:19:26 -08:00
UCNormalizer.h Initial file population. 2013-08-02 13:12:24 -07:00
UCPropTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UCPropTable.h Initial file population. 2013-08-02 13:12:24 -07:00
UCWordIterator.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UCWordIterator.h Initial file population. 2013-08-02 13:12:24 -07:00
UdpProtocol.h Initial file population. 2013-08-02 13:12:24 -07:00
UdpServer.cpp added some repair logic for 0001.dat files. 2014-02-01 10:14:25 -08:00
UdpServer.h rebalancer working pretty well now 2014-01-15 19:08:47 -08:00
UdpSlot.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UdpSlot.h Initial file population. 2013-08-02 13:12:24 -07:00
udptest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Unicode.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Unicode.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
UnicodeProperties.cpp code cleanups. 2014-01-18 21:19:26 -08:00
UnicodeProperties.h Initial file population. 2013-08-02 13:12:24 -07:00
unifiedDict.txt Initial file population. 2013-08-02 13:12:24 -07:00
uniq2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Url.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Url.h Initial file population. 2013-08-02 13:12:24 -07:00
urlinfo.cpp fixed data corruption bug. m_finalCrawlDelay 2013-11-27 14:18:15 -08:00
Users.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Users.h Initial file population. 2013-08-02 13:12:24 -07:00
ValidPointer.cpp Initial file population. 2013-08-02 13:12:24 -07:00
ValidPointer.h Initial file population. 2013-08-02 13:12:24 -07:00
Vector.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Vector.h Initial file population. 2013-08-02 13:12:24 -07:00
Weights.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Weights.h Initial file population. 2013-08-02 13:12:24 -07:00
Wiki.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Wiki.h Initial file population. 2013-08-02 13:12:24 -07:00
wikititles.txt.part1 Initial file population. 2013-08-02 13:12:24 -07:00
wikititles.txt.part2 Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-buf.txt Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-lang.txt Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-syns.dat Initial file population. 2013-08-02 13:12:24 -07:00
Wiktionary.cpp Merge branch 'master' into diffbot 2013-11-20 15:51:58 -08:00
Wiktionary.h Initial file population. 2013-08-02 13:12:24 -07:00
Words.cpp fixed bugs in sort by prices, etc. 2013-11-11 18:58:45 -08:00
Words.h Initial file population. 2013-08-02 13:12:24 -07:00
Xml.cpp for json docs only give them a single 2014-01-25 08:17:38 -08:00
Xml.h for json docs only give them a single 2014-01-25 08:17:38 -08:00
XmlDoc.cpp wait 180 secs for diffbot reply 2014-02-05 15:46:26 -08:00
XmlDoc.h added ability to treat <link xyz.com rel=canoical> as meta redirects. 2014-01-30 10:04:09 -08:00
XmlNode.cpp fixed cdata parsing issue 2013-12-19 16:04:53 -08:00
XmlNode.h Initial file population. 2013-08-02 13:12:24 -07:00
zconf.h Initial file population. 2013-08-02 13:12:24 -07:00
zlib.h Initial file population. 2013-08-02 13:12:24 -07:00

open-source-search-engine

An open source web and enterprise search engine. As can be seen on http://www.gigablast.com/ .

RUNNING GIGABLAST

See html/admin.html for all administrative documentation including the quick start instructions.

Alternatively, visit http://www.gigablast.com/admin.html

See html/compare.html for a comparison of Gigablast to SOLR. Although this is very sparse right now, it does include some useful commands.

CODE ARCHITECTURE

See html/developer.html for all code documentation.

Alternatively, visit http://www.gigablast.com/developer.html

CONTACT

Contact me for feature requests or help in general. I will work for free for good use cases. mattdwells@hotmail.com.