Nov 20 2017 -- A distributed open source search engine and spider/crawler written in C/C++ for Linux on Intel/AMD. From gigablast dot com, which has binaries for download. See the README.md file at the very bottom of this page for instructions.
Go to file
Matt Wells f8135e628e fall back to hop count if priority
is tied (and both are due to be spidered).
defaults back to breadth first like it was
doing before.
2014-02-16 15:52:08 -08:00
antiword-dir Initial file population. 2013-08-02 13:12:24 -07:00
coll.main.0 Merge branch 'diffbot' 2014-02-01 11:28:31 -07:00
html image updates 2014-01-30 13:11:26 -08:00
openssl we already include our own 32-bit 2013-09-15 18:25:49 -06:00
ucdata Initial file population. 2013-08-02 13:12:24 -07:00
Abbreviations.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Abbreviations.h change a couple of possible reserved names in C++ 2013-08-28 22:59:01 -06:00
Accessdb.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Accessdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Address.cpp a bunch of bug fixes, mostly spider related. 2013-12-07 21:56:37 -07:00
Address.h change a couple of possible reserved names in C++ 2013-08-28 22:59:01 -06:00
addtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Ads.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Ads.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
AdultBit.cpp Initial file population. 2013-08-02 13:12:24 -07:00
AdultBit.h Initial file population. 2013-08-02 13:12:24 -07:00
animate.cpp Initial file population. 2013-08-02 13:12:24 -07:00
antiword Initial file population. 2013-08-02 13:12:24 -07:00
AutoBan.cpp formatting changes 2014-01-19 00:38:02 -08:00
AutoBan.h Initial file population. 2013-08-02 13:12:24 -07:00
badcattable.dat Initial file population. 2013-08-02 13:12:24 -07:00
BigFile.cpp fix a core 2014-01-22 22:26:50 -08:00
BigFile.h forgot to push the .h files 2013-12-07 22:12:48 -07:00
Bits.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Bits.h Initial file population. 2013-08-02 13:12:24 -07:00
blaster.cpp fixed bugs with advanced.html advanced search page. 2013-11-17 14:58:47 -07:00
Blaster.cpp for json docs only give them a single 2014-01-25 08:17:38 -08:00
Blaster.h use ./gb blaster -u <fileofurls> to just inject urls, 2013-08-19 16:33:27 -06:00
bmptopnm Initial file population. 2013-08-02 13:12:24 -07:00
Cachedb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Cachedb.h Initial file population. 2013-08-02 13:12:24 -07:00
camsort.cpp Initial file population. 2013-08-02 13:12:24 -07:00
catcountry.dat Initial file population. 2013-08-02 13:12:24 -07:00
Catdb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Catdb.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Categories.cpp documentation updates. fixed sd=0. 2013-10-13 14:24:41 -07:00
Categories.h documentation updates. fixed sd=0. 2013-10-13 14:24:41 -07:00
CatRec.cpp fix a couple catdb generation bugs. 2013-10-12 20:33:04 -07:00
CatRec.h Initial file population. 2013-08-02 13:12:24 -07:00
character-sets Initial file population. 2013-08-02 13:12:24 -07:00
check_unicode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Clusterdb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Clusterdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Collectiondb.cpp allow coll delete if not the one being repaired 2014-02-16 10:55:34 -08:00
Collectiondb.h EDOCUNCHANGED fixes for diffbot 2014-02-10 16:23:39 -08:00
Conf.cpp parm simplifcations 2014-01-09 19:00:21 -08:00
Conf.h add connectips back. call them adminIps this time. 2014-02-03 20:47:48 -07:00
convert.cpp Initial file population. 2013-08-02 13:12:24 -07:00
CountryCode.cpp fix pagecrawlbot.cpp to support &c=token-name. 2014-01-22 23:40:38 -08:00
CountryCode.h fix pagecrawlbot.cpp to support &c=token-name. 2014-01-22 23:40:38 -08:00
create_ucd_tables.cpp Initial file population. 2013-08-02 13:12:24 -07:00
DailyMerge.cpp Fixed some bugs. 2013-08-09 08:52:15 -07:00
DailyMerge.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
DataFeed.cpp Initial file population. 2013-08-02 13:12:24 -07:00
DataFeed.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Datedb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Datedb.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Dates.cpp get new global preemptive cache 2014-01-05 11:51:09 -08:00
Dates.h Initial file population. 2013-08-02 13:12:24 -07:00
Diff.cpp Fixed some bugs. 2013-08-09 08:52:15 -07:00
Diff.h Initial file population. 2013-08-02 13:12:24 -07:00
Dir.cpp more parmdb fixes 2013-12-16 15:39:24 -08:00
Dir.h fix file descriptor leak in Dir class. 2013-11-19 13:41:56 -08:00
DiskPageCache.cpp disk page cache back on 2014-01-21 19:03:47 -08:00
DiskPageCache.h tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
dlstubs.c Initial file population. 2013-08-02 13:12:24 -07:00
dmozparse.cpp add support for noindex meta tag. 2013-10-12 22:50:23 -07:00
Dns.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Dns.h Initial file population. 2013-08-02 13:12:24 -07:00
DnsProtocol.h Initial file population. 2013-08-02 13:12:24 -07:00
dnstest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Domains.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Domains.h Initial file population. 2013-08-02 13:12:24 -07:00
dumpcore.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Entities.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Entities.h Initial file population. 2013-08-02 13:12:24 -07:00
Errno.cpp added and fixed support for <link ahref=xxx rel=canonical>. 2014-01-30 10:37:59 -08:00
Errno.h added ability to treat <link xyz.com rel=canoical> as meta redirects. 2014-01-30 10:04:09 -08:00
errnotest.cpp errno test update 2013-11-19 00:10:10 -07:00
Events.h Initial file population. 2013-08-02 13:12:24 -07:00
Facebook.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Facebook.h new graphic icons. minor clean ups. 2013-11-15 14:47:05 -07:00
fastIndexTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
fctypes.cpp for json docs only give them a single 2014-01-25 08:17:38 -08:00
fctypes.h got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
File.cpp log cleanups mostly. 2013-12-18 10:57:18 -08:00
File.h tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
filterquerylogs.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Flags.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Flags.h Initial file population. 2013-08-02 13:12:24 -07:00
gb-include.h Initial file population. 2013-08-02 13:12:24 -07:00
gb.conf gb.conf spiders back on 2014-02-05 16:59:06 -08:00
gb.pem so we have spider https sites add 2013-10-13 00:15:39 -07:00
gbfilter.cpp Initial file population. 2013-08-02 13:12:24 -07:00
gbtitletest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geneaology.cpp Initial file population. 2013-08-02 13:12:24 -07:00
generateSuperMergeCode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geo_ip_table.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geo_ip_table.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP_internal.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP.c Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIPCity.c Initial file population. 2013-08-02 13:12:24 -07:00
GeoIPCity.h Initial file population. 2013-08-02 13:12:24 -07:00
getsample.cpp Initial file population. 2013-08-02 13:12:24 -07:00
giftopnm Initial file population. 2013-08-02 13:12:24 -07:00
hash.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hash.h get "&site=abc.com+xyz.com"... working to restrict 2013-09-15 20:16:48 -07:00
HashTable.cpp fix core from last push. 2013-12-09 14:21:46 -07:00
HashTable.h mem labelling fixes. 2013-12-09 14:05:02 -07:00
HashTableT.cpp Initial file population. 2013-08-02 13:12:24 -07:00
HashTableT.h Initial file population. 2013-08-02 13:12:24 -07:00
HashTableX.cpp quite a few fixes to the quota system, cleanups etc. 2014-01-18 16:23:13 -08:00
HashTableX.h code checkpoint. time slicing, faster spider code 2014-02-04 17:34:43 -08:00
hashtest2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hashtest3.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hashtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Highlight.cpp Merge branch 'master' into diffbot 2013-12-07 11:34:26 -07:00
Highlight.h trying to fix json decoding bug. 2013-10-24 17:55:01 -07:00
Hostdb.cpp fix another core from freening wrong byte sized 2014-01-30 20:16:41 -08:00
Hostdb.h added recovery mode display in hosts table 2014-02-01 10:16:46 -08:00
hosts.conf fix hosts.conf 2013-12-26 09:34:35 -08:00
hosts.cpp Initial file population. 2013-08-02 13:12:24 -07:00
HttpMime.cpp mem labelling fixes. 2013-12-09 14:05:02 -07:00
HttpMime.h update the dirty word list. but we still 2013-10-15 01:01:19 -07:00
HttpRequest.cpp add connectips back. call them adminIps this time. 2014-02-03 20:47:48 -07:00
HttpRequest.h show all crawl details in url webhook 2013-11-07 13:59:43 -08:00
HttpServer.cpp fix a core from bad return values 2014-02-12 13:21:30 -08:00
HttpServer.h use &format=0 1 or 2 for html/xml/json now. 2013-11-08 18:00:30 -08:00
iana_charset.cpp Merge branch 'diffbot' into diffbot-testing 2013-12-16 11:06:11 -08:00
iana_charset.h Merge branch 'diffbot' into diffbot-testing 2013-12-16 11:06:11 -08:00
iconv.h Initial file population. 2013-08-02 13:12:24 -07:00
Images.cpp got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
Images.h Initial file population. 2013-08-02 13:12:24 -07:00
Indexdb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Indexdb.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
IndexList.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexList.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexReadInfo.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexReadInfo.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable2.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
IndexTable2.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable.h Initial file population. 2013-08-02 13:12:24 -07:00
injectme3 added injectme3 file and documentation into compare.html 2013-08-17 11:02:26 -06:00
injector.cpp Initial file population. 2013-08-02 13:12:24 -07:00
iostream.h Initial file population. 2013-08-02 13:12:24 -07:00
ip.cpp fix old bug. 2014-01-10 18:52:47 -07:00
ip.h Initial file population. 2013-08-02 13:12:24 -07:00
ipconfig.cpp fixed some cores. brought in fixes from 2013-09-08 16:16:13 -06:00
Iso8859.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Iso8859.h Initial file population. 2013-08-02 13:12:24 -07:00
jointest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
jpegtopnm Initial file population. 2013-08-02 13:12:24 -07:00
Json.cpp fixed contenthash32 logic for json objects. 2014-02-05 13:22:03 -08:00
Json.h fixed contenthash32 logic for json objects. 2014-02-05 13:22:03 -08:00
keepalive.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Lang.cpp comment updates 2013-10-15 23:13:50 -07:00
Lang.h Initial file population. 2013-08-02 13:12:24 -07:00
LangList.cpp code cleanups. 2014-01-18 21:19:26 -08:00
LangList.h Initial file population. 2013-08-02 13:12:24 -07:00
Language.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Language.h Initial file population. 2013-08-02 13:12:24 -07:00
LanguageIdentifier.cpp Initial file population. 2013-08-02 13:12:24 -07:00
LanguageIdentifier.h Initial file population. 2013-08-02 13:12:24 -07:00
LanguagePages.cpp Initial file population. 2013-08-02 13:12:24 -07:00
LanguagePages.h Initial file population. 2013-08-02 13:12:24 -07:00
libc.a Initial file population. 2013-08-02 13:12:24 -07:00
libcrypto.a Initial file population. 2013-08-02 13:12:24 -07:00
libgcc.a Initial file population. 2013-08-02 13:12:24 -07:00
libiconv.a Initial file population. 2013-08-02 13:12:24 -07:00
libiconv.la Initial file population. 2013-08-02 13:12:24 -07:00
libm.a Initial file population. 2013-08-02 13:12:24 -07:00
libpthread.a Initial file population. 2013-08-02 13:12:24 -07:00
libssl.a Initial file population. 2013-08-02 13:12:24 -07:00
libstdc++.a Initial file population. 2013-08-02 13:12:24 -07:00
libz.a Initial file population. 2013-08-02 13:12:24 -07:00
LICENSE code checkpoint. time slicing, faster spider code 2014-02-04 17:34:43 -08:00
Linkdb.cpp added some repair logic for 0001.dat files. 2014-02-01 10:14:25 -08:00
Linkdb.h fix LinkInfo mem leaks 2013-11-16 17:50:32 -08:00
LinkedList.h Initial file population. 2013-08-02 13:12:24 -07:00
linkspam.cpp renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
linkspam.h Initial file population. 2013-08-02 13:12:24 -07:00
Log.cpp forgot to unlock thread lock 2013-12-15 10:43:34 -07:00
Log.h Initial file population. 2013-08-02 13:12:24 -07:00
Loop.cpp tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
Loop.h Initial file population. 2013-08-02 13:12:24 -07:00
looptest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
main.cpp fix excessive dupcache deduping. 2014-02-05 13:41:15 -08:00
Make.depend many more fixes for streaming mode 2014-02-06 18:21:22 -08:00
Makefile fix rdbcache corruption from -O2 compile bug. 2014-02-05 16:58:21 -08:00
malloc.c Initial file population. 2013-08-02 13:12:24 -07:00
matches2.cpp dirty word detector revisions. we need 2013-10-16 20:19:49 -07:00
matches2.h renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
Matches.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Matches.h Initial file population. 2013-08-02 13:12:24 -07:00
Mem.cpp always use kstart. 2014-01-28 14:37:21 -08:00
Mem.h fix typo 2013-09-08 19:51:57 -07:00
membustest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPool.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPool.h Initial file population. 2013-08-02 13:12:24 -07:00
MemPoolTree.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPoolTree.h Initial file population. 2013-08-02 13:12:24 -07:00
memtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
mergetest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MetaContainer.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MetaContainer.h Initial file population. 2013-08-02 13:12:24 -07:00
Mime.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Mime.h Initial file population. 2013-08-02 13:12:24 -07:00
mixfile.cpp Initial file population. 2013-08-02 13:12:24 -07:00
mmseg.h Initial file population. 2013-08-02 13:12:24 -07:00
monitor.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Monitordb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Monitordb.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg0.cpp got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
Msg0.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg1.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg1.h log fixes for debugging. try to 2013-10-02 22:37:20 -06:00
Msg1f.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg1f.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg2.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg2.h new &sites=xyz.com+abc.com+... functionality compiles ok. 2013-09-15 18:14:32 -06:00
Msg2a.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg2a.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg2b.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg2b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg3.cpp tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
Msg3.h almost done adding support for whitelists. 2013-09-15 15:15:56 -06:00
Msg3a.cpp index numbers as integers too, not just floats 2014-02-06 20:57:54 -08:00
Msg3a.h index numbers as integers too, not just floats 2014-02-06 20:57:54 -08:00
Msg3e.cpp fix infinite loop from json parsing and 2013-09-27 17:52:36 -06:00
Msg3e.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg4.cpp fix core from a high priority 2014-02-14 10:51:02 -08:00
Msg4.h got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
Msg5.cpp change deduping logic to be first come first 2014-01-29 16:14:42 -08:00
Msg5.h fix a couple cores related to deleting collections 2014-01-29 15:56:07 -08:00
Msg6b.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg6b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg8b.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Msg8b.h Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg9b.cpp fix a couple catdb generation bugs. 2013-10-12 20:33:04 -07:00
Msg9b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg13.cpp take out potentially bad robots.txt 2014-01-28 18:26:16 -08:00
Msg13.h measure crawl delay by default from 2013-11-26 14:07:28 -08:00
Msg17.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg17.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg20.cpp fix out of alloc slots core 2014-02-13 11:21:39 -08:00
Msg20.h code checkpoint. time slicing, faster spider code 2014-02-04 17:34:43 -08:00
Msg22.cpp fix bugs to try to get sharding working 2014-01-21 13:58:21 -08:00
Msg22.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg24.cpp new Make.depend. 2013-08-09 17:13:45 -06:00
Msg28.cpp fix core from (broad)casting valueless cgi field. 2013-10-03 14:51:59 -07:00
Msg28.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg30.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg30.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Msg35.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg35.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg36.cpp got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
Msg36.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg37.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg37.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg39.cpp index numbers as integers too, not just floats 2014-02-06 20:57:54 -08:00
Msg39.h index numbers as integers too, not just floats 2014-02-06 20:57:54 -08:00
Msg40.cpp do not do fuzzy deduping if &icc=1 (include cached copy) 2014-02-13 08:51:03 -08:00
Msg40.h send single space to socket if not streaming 2014-02-13 08:45:13 -08:00
Msg40Cache.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg40Cache.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg42.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg42.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg51.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Msg51.h Initial file population. 2013-08-02 13:12:24 -07:00
Msgaa.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msgaa.h Initial file population. 2013-08-02 13:12:24 -07:00
MsgC.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
MsgC.h Initial file population. 2013-08-02 13:12:24 -07:00
Msge0.cpp fix a core 2014-01-22 22:26:50 -08:00
Msge0.h Initial file population. 2013-08-02 13:12:24 -07:00
Msge1.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msge1.h Initial file population. 2013-08-02 13:12:24 -07:00
Multicast.cpp a lot of times rdb tree has invalid collection 2014-01-21 19:01:44 -08:00
Multicast.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
mysynonyms.txt Initial file population. 2013-08-02 13:12:24 -07:00
numwords.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageAddColl.cpp more spiderdb spider request fixes 2014-01-19 18:00:56 -08:00
PageAddUrl.cpp more bug fixes associated with collections 2014-01-18 11:54:58 -08:00
PageCatdb.cpp more formatting 2014-01-19 11:56:36 -08:00
PageCrawlBot.cpp do not allow spot/seeds to be added to collnum being repaired 2014-02-16 15:18:50 -08:00
PageCrawlBot.h added "seeds" to json reply. store seed urls 2013-10-21 17:35:14 -07:00
PageDirectory.cpp code checkpoint. time slicing, faster spider code 2014-02-04 17:34:43 -08:00
PageEvents.cpp formatting changes 2014-01-19 00:38:02 -08:00
PageGet.cpp for json docs only give them a single 2014-01-25 08:17:38 -08:00
PageHosts.cpp added recovery mode display in hosts table 2014-02-01 10:16:46 -08:00
PageIndexdb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
PageInject.cpp formatting 2014-01-19 15:06:02 -08:00
PageInject.h code cleanups. 2014-01-18 21:19:26 -08:00
PageLogin.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageLogView.cpp formatting fixes 2014-01-19 00:57:20 -08:00
PageNetTest.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
PageNetTest.h Initial file population. 2013-08-02 13:12:24 -07:00
PageOverview.cpp list collections in sidebar. 2014-01-09 21:13:41 -08:00
PageParser.cpp more formatting 2014-01-19 11:56:36 -08:00
PageParser.h Initial file population. 2013-08-02 13:12:24 -07:00
PagePerf.cpp took out pagecount table. just hafta scan 2014-01-19 20:34:38 -08:00
PageReindex.cpp more formatting 2014-01-19 11:56:36 -08:00
PageReindex.h fixed pagereindex. we now add spiderreplies 2013-12-07 10:01:17 -07:00
PageResults.cpp more checksum fixes for json. fixes for 2014-02-16 10:46:41 -08:00
PageResults.h send single space to socket if not streaming 2014-02-13 08:45:13 -08:00
PageRoot.cpp Merge branch 'diffbot' 2014-02-01 11:28:31 -07:00
Pages.cpp add connectips back. call them adminIps this time. 2014-02-03 20:47:48 -07:00
Pages.h more spiderdb spider request fixes 2014-01-19 18:00:56 -08:00
PageSockets.cpp fix msge0 msg0 overload in sockets table 2014-01-22 20:34:55 -08:00
PageSpam.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageStats.cpp fix infinite keep alive restart bug some more 2014-01-30 14:12:32 -08:00
PageStatsdb.cpp formatting 2014-01-19 12:37:37 -08:00
PageSubmit.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageThesaurus.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageThreads.cpp formatting fixes 2014-01-19 00:57:20 -08:00
PageTitledb.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
PageTurk.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageTurk.h Initial file population. 2013-08-02 13:12:24 -07:00
Parms.cpp more checksum fixes for json. fixes for 2014-02-16 10:46:41 -08:00
Parms.h fix up round incrementing logic. 2014-01-25 14:35:41 -08:00
parse_iana_charsets.pl move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
pdftohtml use the "onsite" keyword in your url filters 2013-09-06 09:37:17 -06:00
Phrases.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Phrases.h Initial file population. 2013-08-02 13:12:24 -07:00
PingServer.cpp added recovery mode display in hosts table 2014-02-01 10:16:46 -08:00
PingServer.h added emergency msg box on all admin pages 2014-01-11 20:14:44 -08:00
Placedb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Placedb.h Initial file population. 2013-08-02 13:12:24 -07:00
pngtopnm Initial file population. 2013-08-02 13:12:24 -07:00
pnmscale Initial file population. 2013-08-02 13:12:24 -07:00
Pops.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pops.h Initial file population. 2013-08-02 13:12:24 -07:00
porter.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pos.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pos.h Initial file population. 2013-08-02 13:12:24 -07:00
Posdb.cpp index numbers as integers too, not just floats 2014-02-06 20:57:54 -08:00
Posdb.h index numbers as integers too, not just floats 2014-02-06 20:57:54 -08:00
postalCodes.txt Initial file population. 2013-08-02 13:12:24 -07:00
PostQueryRerank.cpp handle a bunch of oom conditions that 2013-11-20 10:14:02 -07:00
PostQueryRerank.h Initial file population. 2013-08-02 13:12:24 -07:00
ppmtojpeg Initial file population. 2013-08-02 13:12:24 -07:00
Process.cpp more checksum fixes for json. fixes for 2014-02-16 10:46:41 -08:00
Process.h parmdb overhaul. support collection add/del 2013-12-10 13:09:55 -08:00
Profiler.cpp formatting fixes 2014-01-19 00:57:20 -08:00
Profiler.h Initial file population. 2013-08-02 13:12:24 -07:00
Proxy.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Proxy.h Initial file population. 2013-08-02 13:12:24 -07:00
pstotext Initial file population. 2013-08-02 13:12:24 -07:00
QAClient.cpp Initial file population. 2013-08-02 13:12:24 -07:00
QAClient.h Initial file population. 2013-08-02 13:12:24 -07:00
quarantine.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Query.cpp fix bug in gbminint. 2014-02-06 21:36:47 -08:00
Query.h index numbers as integers too, not just floats 2014-02-06 20:57:54 -08:00
Rdb.cpp more checksum fixes for json. fixes for 2014-02-16 10:46:41 -08:00
Rdb.h code cleanups. 2014-01-18 21:19:26 -08:00
RdbBase.cpp more checksum fixes for json. fixes for 2014-02-16 10:46:41 -08:00
RdbBase.h added some repair logic for 0001.dat files. 2014-02-01 10:14:25 -08:00
RdbBuckets.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbBuckets.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbCache.cpp fix some cores. use olddoc contenthash 2014-02-07 18:28:09 -08:00
RdbCache.h removed MAX_COLL_RECS so we can have unlimited 2013-08-30 16:20:38 -07:00
RdbDump.cpp tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
RdbDump.h if coll is deleted or reset in a middle of a dump 2013-12-25 17:12:09 -08:00
RdbList.cpp checkpoint for faster spider code. 2014-02-04 16:15:27 -08:00
RdbList.h checkpoint for faster spider code. 2014-02-04 16:15:27 -08:00
RdbMap.cpp tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
RdbMap.h tons of changes from live github on neo. 2014-01-17 21:01:43 -08:00
RdbMem.cpp track down some nasty cores. fix 2013-10-29 16:37:14 -07:00
RdbMem.h now we can reset collection mid stream 2013-10-18 17:49:36 -07:00
RdbMerge.cpp fix problem scanning spiderdb. 2014-01-16 17:04:08 -08:00
RdbMerge.h if coll is deleted or reset in a middle of a dump 2013-12-25 17:12:09 -08:00
RdbScan.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbScan.h Initial file population. 2013-08-02 13:12:24 -07:00
rdbtest2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
rdbtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbTree.cpp get streaming time sliced results working 2014-02-06 14:25:44 -08:00
RdbTree.h fix annoying rdbtree pos/neg key counting issue 2014-01-11 18:04:28 -08:00
README.md Update README.md 2013-11-16 20:14:06 -08:00
readRec.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Rebalance.cpp update "this round" counts to at least 2014-01-23 18:22:13 -08:00
Rebalance.h fix problem scanning spiderdb. 2014-01-16 17:04:08 -08:00
reindex2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Repair.cpp more checksum fixes for json. fixes for 2014-02-16 10:46:41 -08:00
Repair.h Initial file population. 2013-08-02 13:12:24 -07:00
RequestTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RequestTable.h Initial file population. 2013-08-02 13:12:24 -07:00
rescue.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Revdb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Revdb.h Initial file population. 2013-08-02 13:12:24 -07:00
rmbots.cpp Initial file population. 2013-08-02 13:12:24 -07:00
SafeBuf.cpp index numbers as integers too, not just floats 2014-02-06 20:57:54 -08:00
SafeBuf.h index numbers as integers too, not just floats 2014-02-06 20:57:54 -08:00
SafeList.h Initial file population. 2013-08-02 13:12:24 -07:00
Sanity.h Initial file population. 2013-08-02 13:12:24 -07:00
Scores.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Scores.h Initial file population. 2013-08-02 13:12:24 -07:00
Scraper.cpp take out datedb. no longer used. we store 2014-01-09 13:39:28 -08:00
Scraper.h Initial file population. 2013-08-02 13:12:24 -07:00
SearchInput.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
SearchInput.h streaming results code checkpoint. 2014-02-04 17:05:43 -08:00
Sections.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Sections.h get new global preemptive cache 2014-01-05 11:51:09 -08:00
seektest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
seo.h Initial file population. 2013-08-02 13:12:24 -07:00
SiteGetter.cpp got code with shard rebalancing compiling. 2014-01-11 16:08:42 -08:00
SiteGetter.h Initial file population. 2013-08-02 13:12:24 -07:00
sleepandlog.cpp Initial file population. 2013-08-02 13:12:24 -07:00
sort.cpp Initial file population. 2013-08-02 13:12:24 -07:00
sort.h Initial file population. 2013-08-02 13:12:24 -07:00
Speller.cpp clean up logging so i can see what's going on 2013-12-10 16:41:30 -08:00
Speller.h Initial file population. 2013-08-02 13:12:24 -07:00
Spider.cpp fall back to hop count if priority 2014-02-16 15:52:08 -08:00
Spider.h fall back to hop count if priority 2014-02-16 15:52:08 -08:00
Stats.cpp formatting 2014-01-19 12:37:37 -08:00
Stats.h more formatting 2014-01-19 01:09:38 -08:00
Statsdb.cpp formatting 2014-01-19 12:37:37 -08:00
Statsdb.h fix potential problem of tons of points in 2013-10-14 22:52:29 -07:00
StopWords.cpp update common word list 2013-12-01 15:19:33 -07:00
StopWords.h Initial file population. 2013-08-02 13:12:24 -07:00
streambuf.h Initial file population. 2013-08-02 13:12:24 -07:00
Strings.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Strings.h Initial file population. 2013-08-02 13:12:24 -07:00
Summary.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Summary.h renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
superMergeTest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
supported_charsets.cpp Initial file population. 2013-08-02 13:12:24 -07:00
supported_charsets.txt Initial file population. 2013-08-02 13:12:24 -07:00
Syncdb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Syncdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Synonyms.cpp now we index all numbers that have field names 2013-11-08 16:16:13 -08:00
Synonyms.h now we index all numbers that have field names 2013-11-08 16:16:13 -08:00
Tagdb.cpp Merge branch 'diffbot' 2014-02-01 11:28:31 -07:00
Tagdb.h fix msge0 msg0 overload in sockets table 2014-01-22 20:34:55 -08:00
TcpServer.cpp send single space to socket if not streaming 2014-02-13 08:45:13 -08:00
TcpServer.h fixes for streaming mode. 2014-02-06 16:28:42 -08:00
TcpSocket.h send single space to socket if not streaming 2014-02-13 08:45:13 -08:00
test2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_convert.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_hash.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_norm.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_parser2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_parser.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_unicode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Test.cpp parmdb updates 2013-12-16 17:07:15 -08:00
Test.h Initial file population. 2013-08-02 13:12:24 -07:00
testfloats.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Tfndb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Tfndb.h code cleanups. 2014-01-18 21:19:26 -08:00
Thesaurus.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Thesaurus.h Initial file population. 2013-08-02 13:12:24 -07:00
Threads.cpp fix core when thread fails to spawn. 2014-02-03 07:27:32 -07:00
Threads.h do not try to join on thread when pthread_create() fails 2013-11-16 18:28:49 -07:00
threadtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
thunder.cpp Initial file population. 2013-08-02 13:12:24 -07:00
tifftopnm Initial file population. 2013-08-02 13:12:24 -07:00
Timedb.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Timedb.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Timer.h Initial file population. 2013-08-02 13:12:24 -07:00
Title.cpp move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Title.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
Titledb.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Titledb.h code cleanups. 2014-01-18 21:19:26 -08:00
TopTree.cpp index numbers as integers too, not just floats 2014-02-06 20:57:54 -08:00
TopTree.h index numbers as integers too, not just floats 2014-02-06 20:57:54 -08:00
treetest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TuringTest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TuringTest.h Initial file population. 2013-08-02 13:12:24 -07:00
Turkdb.cpp Initial file population. 2013-08-02 13:12:24 -07:00
types.h fix compiler warning in types.h. 2013-09-08 20:00:52 -06:00
UCNormalizer.cpp code cleanups. 2014-01-18 21:19:26 -08:00
UCNormalizer.h Initial file population. 2013-08-02 13:12:24 -07:00
UCPropTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UCPropTable.h Initial file population. 2013-08-02 13:12:24 -07:00
UCWordIterator.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UCWordIterator.h Initial file population. 2013-08-02 13:12:24 -07:00
UdpProtocol.h Initial file population. 2013-08-02 13:12:24 -07:00
UdpServer.cpp added some repair logic for 0001.dat files. 2014-02-01 10:14:25 -08:00
UdpServer.h rebalancer working pretty well now 2014-01-15 19:08:47 -08:00
UdpSlot.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UdpSlot.h Initial file population. 2013-08-02 13:12:24 -07:00
udptest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Unicode.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Unicode.h move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
UnicodeProperties.cpp code cleanups. 2014-01-18 21:19:26 -08:00
UnicodeProperties.h Initial file population. 2013-08-02 13:12:24 -07:00
unifiedDict.txt Initial file population. 2013-08-02 13:12:24 -07:00
uniq2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Url.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Url.h Initial file population. 2013-08-02 13:12:24 -07:00
urlinfo.cpp fixed data corruption bug. m_finalCrawlDelay 2013-11-27 14:18:15 -08:00
Users.cpp code cleanups. 2014-01-18 21:19:26 -08:00
Users.h Initial file population. 2013-08-02 13:12:24 -07:00
ValidPointer.cpp Initial file population. 2013-08-02 13:12:24 -07:00
ValidPointer.h Initial file population. 2013-08-02 13:12:24 -07:00
Vector.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Vector.h Initial file population. 2013-08-02 13:12:24 -07:00
Weights.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Weights.h Initial file population. 2013-08-02 13:12:24 -07:00
Wiki.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Wiki.h Initial file population. 2013-08-02 13:12:24 -07:00
wikititles.txt.part1 Initial file population. 2013-08-02 13:12:24 -07:00
wikititles.txt.part2 Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-buf.txt Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-lang.txt Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-syns.dat Initial file population. 2013-08-02 13:12:24 -07:00
Wiktionary.cpp Merge branch 'master' into diffbot 2013-11-20 15:51:58 -08:00
Wiktionary.h Initial file population. 2013-08-02 13:12:24 -07:00
Words.cpp fixed bugs in sort by prices, etc. 2013-11-11 18:58:45 -08:00
Words.h Initial file population. 2013-08-02 13:12:24 -07:00
Xml.cpp for json docs only give them a single 2014-01-25 08:17:38 -08:00
Xml.h for json docs only give them a single 2014-01-25 08:17:38 -08:00
XmlDoc.cpp more checksum fixes for json. fixes for 2014-02-16 10:46:41 -08:00
XmlDoc.h fix content hash issues for json. do not 2014-02-15 14:40:56 -08:00
XmlNode.cpp fixed cdata parsing issue 2013-12-19 16:04:53 -08:00
XmlNode.h Initial file population. 2013-08-02 13:12:24 -07:00
zconf.h Initial file population. 2013-08-02 13:12:24 -07:00
zlib.h Initial file population. 2013-08-02 13:12:24 -07:00

open-source-search-engine

An open source web and enterprise search engine. As can be seen on http://www.gigablast.com/ .

RUNNING GIGABLAST

See html/admin.html for all administrative documentation including the quick start instructions.

Alternatively, visit http://www.gigablast.com/admin.html

See html/compare.html for a comparison of Gigablast to SOLR. Although this is very sparse right now, it does include some useful commands.

CODE ARCHITECTURE

See html/developer.html for all code documentation.

Alternatively, visit http://www.gigablast.com/developer.html

CONTACT

Contact me for feature requests or help in general. I will work for free for good use cases. mattdwells@hotmail.com.