Nov 20 2017 -- A distributed open source search engine and spider/crawler written in C/C++ for Linux on Intel/AMD. From gigablast dot com, which has binaries for download. See the README.md file at the very bottom of this page for instructions.
Go to file
Matt Wells 57eb231a4e do not add timestamps to lastdownload
cache if skiphammercheck is true. those
are like robots.txt or redirs or root files.
2013-11-26 14:21:17 -08:00
antiword-dir Initial file population. 2013-08-02 13:12:24 -07:00
coll.main.0 fix a few bugs. 2013-11-10 22:11:13 -08:00
html Merge branch 'master' into diffbot 2013-11-18 16:59:33 -08:00
openssl we already include our own 32-bit 2013-09-15 18:25:49 -06:00
ucdata Initial file population. 2013-08-02 13:12:24 -07:00
Abbreviations.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Abbreviations.h change a couple of possible reserved names in C++ 2013-08-28 22:59:01 -06:00
Accessdb.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Accessdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Address.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Address.h change a couple of possible reserved names in C++ 2013-08-28 22:59:01 -06:00
addtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Ads.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Ads.h Initial file population. 2013-08-02 13:12:24 -07:00
AdultBit.cpp Initial file population. 2013-08-02 13:12:24 -07:00
AdultBit.h Initial file population. 2013-08-02 13:12:24 -07:00
animate.cpp Initial file population. 2013-08-02 13:12:24 -07:00
antiword Initial file population. 2013-08-02 13:12:24 -07:00
AutoBan.cpp Merge branch 'master' into diffbot 2013-11-20 15:51:58 -08:00
AutoBan.h Initial file population. 2013-08-02 13:12:24 -07:00
badcattable.dat Initial file population. 2013-08-02 13:12:24 -07:00
BigFile.cpp committing an abandoned asyncio project. 2013-11-17 19:15:38 -07:00
BigFile.h committing an abandoned asyncio project. 2013-11-17 19:15:38 -07:00
Bits.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Bits.h Initial file population. 2013-08-02 13:12:24 -07:00
blaster.cpp fixed bugs with advanced.html advanced search page. 2013-11-17 14:58:47 -07:00
Blaster.cpp use ./gb blaster -u <fileofurls> to just inject urls, 2013-08-19 16:33:27 -06:00
Blaster.h use ./gb blaster -u <fileofurls> to just inject urls, 2013-08-19 16:33:27 -06:00
bmptopnm Initial file population. 2013-08-02 13:12:24 -07:00
Cachedb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Cachedb.h Initial file population. 2013-08-02 13:12:24 -07:00
camsort.cpp Initial file population. 2013-08-02 13:12:24 -07:00
catcountry.dat Initial file population. 2013-08-02 13:12:24 -07:00
Catdb.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Catdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Categories.cpp documentation updates. fixed sd=0. 2013-10-13 14:24:41 -07:00
Categories.h documentation updates. fixed sd=0. 2013-10-13 14:24:41 -07:00
CatRec.cpp fix a couple catdb generation bugs. 2013-10-12 20:33:04 -07:00
CatRec.h Initial file population. 2013-08-02 13:12:24 -07:00
character-sets Initial file population. 2013-08-02 13:12:24 -07:00
check_unicode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Clusterdb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Clusterdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Collectiondb.cpp fix file descriptor leak in Dir class. 2013-11-19 13:41:56 -08:00
Collectiondb.h support for &restart=1 2013-11-14 14:02:56 -08:00
CollectionRec.cpp rdbbase not fully resetting? it was 2013-11-15 09:01:58 -08:00
CollectionRec.h make getNumSpidersOutPerIp() specific to a coll 2013-11-18 14:13:28 -08:00
Conf.cpp Merge branch 'master' into diffbot 2013-11-18 16:59:33 -08:00
Conf.h Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
convert.cpp Initial file population. 2013-08-02 13:12:24 -07:00
CountryCode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
CountryCode.h Initial file population. 2013-08-02 13:12:24 -07:00
create_ucd_tables.cpp Initial file population. 2013-08-02 13:12:24 -07:00
DailyMerge.cpp Fixed some bugs. 2013-08-09 08:52:15 -07:00
DailyMerge.h Initial file population. 2013-08-02 13:12:24 -07:00
DataFeed.cpp Initial file population. 2013-08-02 13:12:24 -07:00
DataFeed.h Initial file population. 2013-08-02 13:12:24 -07:00
Datedb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Datedb.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Dates.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Dates.h Initial file population. 2013-08-02 13:12:24 -07:00
Diff.cpp Fixed some bugs. 2013-08-09 08:52:15 -07:00
Diff.h Initial file population. 2013-08-02 13:12:24 -07:00
Dir.cpp fix file descriptor leak in Dir class. 2013-11-19 13:41:56 -08:00
Dir.h fix file descriptor leak in Dir class. 2013-11-19 13:41:56 -08:00
DiskPageCache.cpp Initial file population. 2013-08-02 13:12:24 -07:00
DiskPageCache.h Initial file population. 2013-08-02 13:12:24 -07:00
dlstubs.c Initial file population. 2013-08-02 13:12:24 -07:00
dmozparse.cpp add support for noindex meta tag. 2013-10-12 22:50:23 -07:00
Dns.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Dns.h Initial file population. 2013-08-02 13:12:24 -07:00
DnsProtocol.h Initial file population. 2013-08-02 13:12:24 -07:00
dnstest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Domains.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Domains.h Initial file population. 2013-08-02 13:12:24 -07:00
dumpcore.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Entities.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Entities.h Initial file population. 2013-08-02 13:12:24 -07:00
Errno.cpp try to fix json parser overflow error. needs 2013-11-15 11:30:16 -08:00
Errno.h try to fix json parser overflow error. needs 2013-11-15 11:30:16 -08:00
errnotest.cpp errno test update 2013-11-19 00:10:10 -07:00
Events.h Initial file population. 2013-08-02 13:12:24 -07:00
Facebook.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Facebook.h new graphic icons. minor clean ups. 2013-11-15 14:47:05 -07:00
fastIndexTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
fctypes.cpp now we index all numbers that have field names 2013-11-08 16:16:13 -08:00
fctypes.h Initial file population. 2013-08-02 13:12:24 -07:00
File.cpp couple fixes to makefile etc. 2013-09-28 16:37:39 -06:00
File.h Initial file population. 2013-08-02 13:12:24 -07:00
filterquerylogs.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Flags.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Flags.h Initial file population. 2013-08-02 13:12:24 -07:00
gb-include.h Initial file population. 2013-08-02 13:12:24 -07:00
gb.conf spider log debug msg fix. 2013-11-22 14:17:10 -08:00
gb.pem so we have spider https sites add 2013-10-13 00:15:39 -07:00
gbfilter.cpp Initial file population. 2013-08-02 13:12:24 -07:00
gbtitletest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geneaology.cpp Initial file population. 2013-08-02 13:12:24 -07:00
generateSuperMergeCode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geo_ip_table.cpp Initial file population. 2013-08-02 13:12:24 -07:00
geo_ip_table.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP_internal.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP.c Initial file population. 2013-08-02 13:12:24 -07:00
GeoIP.h Initial file population. 2013-08-02 13:12:24 -07:00
GeoIPCity.c Initial file population. 2013-08-02 13:12:24 -07:00
GeoIPCity.h Initial file population. 2013-08-02 13:12:24 -07:00
getsample.cpp Initial file population. 2013-08-02 13:12:24 -07:00
giftopnm Initial file population. 2013-08-02 13:12:24 -07:00
hash.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hash.h get "&site=abc.com+xyz.com"... working to restrict 2013-09-15 20:16:48 -07:00
HashTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
HashTable.h Initial file population. 2013-08-02 13:12:24 -07:00
HashTableT.cpp Initial file population. 2013-08-02 13:12:24 -07:00
HashTableT.h Initial file population. 2013-08-02 13:12:24 -07:00
HashTableX.cpp make waiting trees grow dynamically to save 2013-11-19 15:23:25 -08:00
HashTableX.h spider speedups and fixes. 2013-09-25 11:58:03 -06:00
hashtest2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hashtest3.cpp Initial file population. 2013-08-02 13:12:24 -07:00
hashtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Highlight.cpp Merge branch 'master' into diffbot 2013-11-20 15:51:58 -08:00
Highlight.h trying to fix json decoding bug. 2013-10-24 17:55:01 -07:00
Hostdb.cpp num-mirrors: updates 2013-10-24 14:59:35 -07:00
Hostdb.h fix another bug from shard change. 2013-10-04 16:49:50 -07:00
hosts.conf hosts.conf fix 2013-11-22 14:18:03 -08:00
hosts.cpp Initial file population. 2013-08-02 13:12:24 -07:00
HttpMime.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
HttpMime.h update the dirty word list. but we still 2013-10-15 01:01:19 -07:00
HttpRequest.cpp test to make sure diffbot reply contains 2013-11-21 12:37:08 -08:00
HttpRequest.h show all crawl details in url webhook 2013-11-07 13:59:43 -08:00
HttpServer.cpp bulk api nominal updates 2013-11-13 14:30:51 -08:00
HttpServer.h use &format=0 1 or 2 for html/xml/json now. 2013-11-08 18:00:30 -08:00
iana_charset.cpp new crawlbot api. not backwards compatible any more. 2013-09-17 10:25:54 -07:00
iana_charset.h new crawlbot api. not backwards compatible any more. 2013-09-17 10:25:54 -07:00
iconv.h Initial file population. 2013-08-02 13:12:24 -07:00
Images.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Images.h Initial file population. 2013-08-02 13:12:24 -07:00
Indexdb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Indexdb.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
IndexList.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexList.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexReadInfo.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexReadInfo.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable2.h Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
IndexTable.h Initial file population. 2013-08-02 13:12:24 -07:00
injectme3 added injectme3 file and documentation into compare.html 2013-08-17 11:02:26 -06:00
injector.cpp Initial file population. 2013-08-02 13:12:24 -07:00
iostream.h Initial file population. 2013-08-02 13:12:24 -07:00
ip.cpp Initial file population. 2013-08-02 13:12:24 -07:00
ip.h Initial file population. 2013-08-02 13:12:24 -07:00
ipconfig.cpp fixed some cores. brought in fixes from 2013-09-08 16:16:13 -06:00
Iso8859.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Iso8859.h Initial file population. 2013-08-02 13:12:24 -07:00
jointest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
jpegtopnm Initial file population. 2013-08-02 13:12:24 -07:00
Json.cpp fix json double decoding issue. no more 2013-11-22 14:16:14 -08:00
Json.h fix json double decoding issue. no more 2013-11-22 14:16:14 -08:00
keepalive.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Lang.cpp comment updates 2013-10-15 23:13:50 -07:00
Lang.h Initial file population. 2013-08-02 13:12:24 -07:00
LangList.cpp Initial file population. 2013-08-02 13:12:24 -07:00
LangList.h Initial file population. 2013-08-02 13:12:24 -07:00
Language.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Language.h Initial file population. 2013-08-02 13:12:24 -07:00
LanguageIdentifier.cpp Initial file population. 2013-08-02 13:12:24 -07:00
LanguageIdentifier.h Initial file population. 2013-08-02 13:12:24 -07:00
LanguagePages.cpp Initial file population. 2013-08-02 13:12:24 -07:00
LanguagePages.h Initial file population. 2013-08-02 13:12:24 -07:00
libc.a Initial file population. 2013-08-02 13:12:24 -07:00
libcrypto.a Initial file population. 2013-08-02 13:12:24 -07:00
libgcc.a Initial file population. 2013-08-02 13:12:24 -07:00
libiconv.a Initial file population. 2013-08-02 13:12:24 -07:00
libiconv.la Initial file population. 2013-08-02 13:12:24 -07:00
libm.a Initial file population. 2013-08-02 13:12:24 -07:00
libpthread.a Initial file population. 2013-08-02 13:12:24 -07:00
libssl.a Initial file population. 2013-08-02 13:12:24 -07:00
libstdc++.a Initial file population. 2013-08-02 13:12:24 -07:00
libz.a Initial file population. 2013-08-02 13:12:24 -07:00
LICENSE exclude events and seo functionality. 2013-09-08 17:07:42 -06:00
Linkdb.cpp fix LinkInfo mem leaks 2013-11-16 17:50:32 -08:00
Linkdb.h fix LinkInfo mem leaks 2013-11-16 17:50:32 -08:00
LinkedList.h Initial file population. 2013-08-02 13:12:24 -07:00
linkspam.cpp renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
linkspam.h Initial file population. 2013-08-02 13:12:24 -07:00
Log.cpp fixed up thread/spider log msgs. 2013-08-29 21:15:42 -06:00
Log.h Initial file population. 2013-08-02 13:12:24 -07:00
Loop.cpp cleanup warnings in log. 2013-09-13 14:37:35 -07:00
Loop.h Initial file population. 2013-08-02 13:12:24 -07:00
looptest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
main.cpp fix file descriptor leak in Dir class. 2013-11-19 13:41:56 -08:00
Make.depend fix json double decoding issue. no more 2013-11-22 14:16:14 -08:00
Makefile use -DPTHREADS not _PTHREADS_ 2013-11-19 00:49:43 -08:00
malloc.c Initial file population. 2013-08-02 13:12:24 -07:00
matches2.cpp dirty word detector revisions. we need 2013-10-16 20:19:49 -07:00
matches2.h renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
Matches.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Matches.h Initial file population. 2013-08-02 13:12:24 -07:00
Mem.cpp Merge branch 'master' into diffbot 2013-11-20 15:51:58 -08:00
Mem.h fix typo 2013-09-08 19:51:57 -07:00
membustest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPool.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPool.h Initial file population. 2013-08-02 13:12:24 -07:00
MemPoolTree.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MemPoolTree.h Initial file population. 2013-08-02 13:12:24 -07:00
memtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
mergetest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MetaContainer.cpp Initial file population. 2013-08-02 13:12:24 -07:00
MetaContainer.h Initial file population. 2013-08-02 13:12:24 -07:00
Mime.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Mime.h Initial file population. 2013-08-02 13:12:24 -07:00
mixfile.cpp Initial file population. 2013-08-02 13:12:24 -07:00
mmseg.h Initial file population. 2013-08-02 13:12:24 -07:00
monitor.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Monitordb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Monitordb.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg0.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg0.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg1.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg1.h log fixes for debugging. try to 2013-10-02 22:37:20 -06:00
Msg1f.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg1f.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg2.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg2.h new &sites=xyz.com+abc.com+... functionality compiles ok. 2013-09-15 18:14:32 -06:00
Msg2a.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg2a.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg2b.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg2b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg3.cpp get "&site=abc.com+xyz.com"... working to restrict 2013-09-15 20:16:48 -07:00
Msg3.h almost done adding support for whitelists. 2013-09-15 15:15:56 -06:00
Msg3a.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg3a.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg3e.cpp fix infinite loop from json parsing and 2013-09-27 17:52:36 -06:00
Msg3e.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg4.cpp nothing 2013-11-04 14:41:36 -08:00
Msg4.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg5.cpp try to fix core from spiderdb scan coming back to 2013-10-29 16:51:21 -07:00
Msg5.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg6b.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg6b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg8b.cpp fix compilation error 2013-10-28 08:05:22 -07:00
Msg8b.h Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
Msg9b.cpp fix a couple catdb generation bugs. 2013-10-12 20:33:04 -07:00
Msg9b.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg13.cpp do not add timestamps to lastdownload 2013-11-26 14:21:17 -08:00
Msg13.h measure crawl delay by default from 2013-11-26 14:07:28 -08:00
Msg17.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg17.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg20.cpp Merge branch 'master' into diffbot 2013-11-20 15:51:58 -08:00
Msg20.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg22.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg22.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg24.cpp new Make.depend. 2013-08-09 17:13:45 -06:00
Msg28.cpp fix core from (broad)casting valueless cgi field. 2013-10-03 14:51:59 -07:00
Msg28.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg30.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg30.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg35.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg35.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg36.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg36.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg37.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg37.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg39.cpp Merge branch 'master' into diffbot 2013-11-20 15:51:58 -08:00
Msg39.h new &sites=xyz.com+abc.com+... functionality compiles ok. 2013-09-15 18:14:32 -06:00
Msg40.cpp Merge branch 'master' into diffbot 2013-11-20 15:51:58 -08:00
Msg40.h trying to bring back dmoz integration. 2013-10-02 22:34:21 -06:00
Msg40Cache.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg40Cache.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg42.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msg42.h Initial file population. 2013-08-02 13:12:24 -07:00
Msg51.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Msg51.h Initial file population. 2013-08-02 13:12:24 -07:00
Msgaa.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msgaa.h Initial file population. 2013-08-02 13:12:24 -07:00
MsgC.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
MsgC.h Initial file population. 2013-08-02 13:12:24 -07:00
Msge0.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msge0.h Initial file population. 2013-08-02 13:12:24 -07:00
Msge1.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Msge1.h Initial file population. 2013-08-02 13:12:24 -07:00
Multicast.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Multicast.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
mysynonyms.txt Initial file population. 2013-08-02 13:12:24 -07:00
numwords.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageAddColl.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageAddUrl.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
PageCatdb.cpp trying to bring back dmoz integration. 2013-10-02 22:34:21 -06:00
PageCrawlBot.cpp now show json items in csv with aligned columns. 2013-11-20 10:45:10 -08:00
PageCrawlBot.h added "seeds" to json reply. store seed urls 2013-10-21 17:35:14 -07:00
PageDirectory.cpp fix dup bug. 2013-10-13 16:06:38 -07:00
PageEvents.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageGet.cpp fix getTokenizedDiffbotReply() 2013-11-25 13:58:31 -08:00
PageHosts.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
PageIndexdb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
PageInject.cpp crawlbot fixes. 2013-10-15 16:31:59 -07:00
PageInject.h crawlbot fixes. 2013-10-15 16:31:59 -07:00
PageLogin.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageLogView.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageNetTest.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
PageNetTest.h Initial file population. 2013-08-02 13:12:24 -07:00
PageOverview.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
PageParser.cpp added X-referring-url: X-anchor-text: and 2013-10-31 11:44:09 -07:00
PageParser.h Initial file population. 2013-08-02 13:12:24 -07:00
PagePerf.cpp label the bigger safebuf chunks of mem 2013-11-19 23:53:40 -07:00
PageReindex.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageReindex.h Initial file population. 2013-08-02 13:12:24 -07:00
PageResults.cpp fix json double decoding issue. no more 2013-11-22 14:16:14 -08:00
PageResults.h added searchbox for dmoz pages/sites. 2013-10-13 15:45:12 -07:00
PageRoot.cpp Merge branch 'master' into diffbot 2013-11-18 16:59:33 -08:00
Pages.cpp search results in csv format. 2013-11-12 16:33:45 -08:00
Pages.h got email and url notification code compiling. 2013-10-01 15:14:39 -06:00
PageSockets.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageSpam.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageStats.cpp Merge branch 'master' into diffbot 2013-10-16 14:28:42 -07:00
PageStatsdb.cpp label the bigger safebuf chunks of mem 2013-11-19 23:53:40 -07:00
PageSubmit.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageThesaurus.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageThreads.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageTitledb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
PageTurk.cpp Initial file population. 2013-08-02 13:12:24 -07:00
PageTurk.h Initial file population. 2013-08-02 13:12:24 -07:00
Parms.cpp search results in csv format. 2013-11-12 16:33:45 -08:00
Parms.h code checkpoint 2013-10-14 13:00:05 -06:00
parse_iana_charsets.pl Initial file population. 2013-08-02 13:12:24 -07:00
pdftohtml use the "onsite" keyword in your url filters 2013-09-06 09:37:17 -06:00
Phrases.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Phrases.h Initial file population. 2013-08-02 13:12:24 -07:00
PingServer.cpp just POST a full request for webhook now 2013-11-07 14:20:15 -08:00
PingServer.h better crawl status reporting. 2013-10-30 10:00:46 -07:00
Placedb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Placedb.h Initial file population. 2013-08-02 13:12:24 -07:00
pngtopnm Initial file population. 2013-08-02 13:12:24 -07:00
pnmscale Initial file population. 2013-08-02 13:12:24 -07:00
Pops.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pops.h Initial file population. 2013-08-02 13:12:24 -07:00
porter.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pos.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Pos.h Initial file population. 2013-08-02 13:12:24 -07:00
Posdb.cpp Merge branch 'master' into diffbot 2013-11-20 15:51:58 -08:00
Posdb.h fixed bugs in sort by prices, etc. 2013-11-11 18:58:45 -08:00
postalCodes.txt Initial file population. 2013-08-02 13:12:24 -07:00
PostQueryRerank.cpp handle a bunch of oom conditions that 2013-11-20 10:14:02 -07:00
PostQueryRerank.h Initial file population. 2013-08-02 13:12:24 -07:00
ppmtojpeg Initial file population. 2013-08-02 13:12:24 -07:00
ppthtml Revert "fix spider "could launch" setting because" 2013-11-06 10:16:46 -08:00
Process.cpp fix a few bugs. 2013-11-10 22:11:13 -08:00
Process.h would block when deleting or resetting 2013-10-30 13:12:46 -07:00
Profiler.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Profiler.h Initial file population. 2013-08-02 13:12:24 -07:00
Proxy.cpp Merge branch 'master' into diffbot 2013-11-18 16:59:33 -08:00
Proxy.h Initial file population. 2013-08-02 13:12:24 -07:00
pstotext Initial file population. 2013-08-02 13:12:24 -07:00
QAClient.cpp Initial file population. 2013-08-02 13:12:24 -07:00
QAClient.h Initial file population. 2013-08-02 13:12:24 -07:00
quarantine.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Query.cpp display json objects that are not in arrays 2013-11-12 13:51:52 -08:00
Query.h display json objects that are not in arrays 2013-11-12 13:51:52 -08:00
Rdb.cpp rdbbase not fully resetting? it was 2013-11-15 09:01:58 -08:00
Rdb.h fix file descriptor leak in Dir class. 2013-11-19 13:41:56 -08:00
RdbBase.cpp log debug update. 2013-11-21 12:37:53 -08:00
RdbBase.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbBuckets.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbBuckets.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbCache.cpp Merge branch 'master' into diffbot 2013-11-20 15:51:58 -08:00
RdbCache.h removed MAX_COLL_RECS so we can have unlimited 2013-08-30 16:20:38 -07:00
RdbDump.cpp Merge branch 'master' into diffbot 2013-10-25 12:32:02 -07:00
RdbDump.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbList.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbList.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbMap.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbMap.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbMem.cpp track down some nasty cores. fix 2013-10-29 16:37:14 -07:00
RdbMem.h now we can reset collection mid stream 2013-10-18 17:49:36 -07:00
RdbMerge.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbMerge.h Initial file population. 2013-08-02 13:12:24 -07:00
RdbScan.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbScan.h Initial file population. 2013-08-02 13:12:24 -07:00
rdbtest2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
rdbtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RdbTree.cpp make waiting trees grow dynamically to save 2013-11-19 15:23:25 -08:00
RdbTree.h make waiting trees grow dynamically to save 2013-11-19 15:23:25 -08:00
README.md Update README.md 2013-11-16 20:14:06 -08:00
readRec.cpp Initial file population. 2013-08-02 13:12:24 -07:00
reindex2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Repair.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Repair.h Initial file population. 2013-08-02 13:12:24 -07:00
RequestTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
RequestTable.h Initial file population. 2013-08-02 13:12:24 -07:00
rescue.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Revdb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Revdb.h Initial file population. 2013-08-02 13:12:24 -07:00
rmbots.cpp Initial file population. 2013-08-02 13:12:24 -07:00
SafeBuf.cpp fix json double decoding issue. no more 2013-11-22 14:16:14 -08:00
SafeBuf.h fix json double decoding issue. no more 2013-11-22 14:16:14 -08:00
SafeList.h Initial file population. 2013-08-02 13:12:24 -07:00
Sanity.h Initial file population. 2013-08-02 13:12:24 -07:00
Scores.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Scores.h Initial file population. 2013-08-02 13:12:24 -07:00
Scraper.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Scraper.h Initial file population. 2013-08-02 13:12:24 -07:00
SearchInput.cpp Merge branch 'master' into diffbot 2013-11-20 15:51:58 -08:00
SearchInput.h Merge branch 'master' into diffbot 2013-11-18 16:59:33 -08:00
Sections.cpp Merge branch 'master' into diffbot 2013-11-20 15:51:58 -08:00
Sections.h make sections grow dynamically so we do not 2013-10-06 11:04:10 -06:00
seektest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
seo.h Initial file population. 2013-08-02 13:12:24 -07:00
SiteGetter.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
SiteGetter.h Initial file population. 2013-08-02 13:12:24 -07:00
sleepandlog.cpp Initial file population. 2013-08-02 13:12:24 -07:00
sort.cpp Initial file population. 2013-08-02 13:12:24 -07:00
sort.h Initial file population. 2013-08-02 13:12:24 -07:00
Speller.cpp Merge branch 'master' into diffbot 2013-11-20 15:51:58 -08:00
Speller.h Initial file population. 2013-08-02 13:12:24 -07:00
Spider.cpp spider log debug msg fix. 2013-11-22 14:17:10 -08:00
Spider.h fix bug of perpetual round incrementing ad nauseam. 2013-11-22 11:14:03 -08:00
Stats.cpp fix graphing bug when graphing performance 2013-11-17 11:48:17 -07:00
Stats.h make performance table taller. we are losing 2013-11-20 10:10:40 -07:00
Statsdb.cpp Merge branch 'master' into diffbot 2013-11-20 15:51:58 -08:00
Statsdb.h fix potential problem of tons of points in 2013-10-14 22:52:29 -07:00
StopWords.cpp Initial file population. 2013-08-02 13:12:24 -07:00
StopWords.h Initial file population. 2013-08-02 13:12:24 -07:00
streambuf.h Initial file population. 2013-08-02 13:12:24 -07:00
Strings.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Strings.h Initial file population. 2013-08-02 13:12:24 -07:00
Summary.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Summary.h renamed matches.h and matches.cpp to 2013-10-01 07:58:24 -07:00
superMergeTest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
supported_charsets.cpp Initial file population. 2013-08-02 13:12:24 -07:00
supported_charsets.txt Initial file population. 2013-08-02 13:12:24 -07:00
Syncdb.cpp fix addColl() logic for collectionless rdbs 2013-10-16 14:38:09 -07:00
Syncdb.h Initial file population. 2013-08-02 13:12:24 -07:00
Synonyms.cpp now we index all numbers that have field names 2013-11-08 16:16:13 -08:00
Synonyms.h now we index all numbers that have field names 2013-11-08 16:16:13 -08:00
Tagdb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Tagdb.h Initial file population. 2013-08-02 13:12:24 -07:00
TcpServer.cpp fix file descriptor leak in Dir class. 2013-11-19 13:41:56 -08:00
TcpServer.h Initial file population. 2013-08-02 13:12:24 -07:00
TcpSocket.h integrate diffbot from svn back into git. 2013-09-13 09:23:18 -07:00
test2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_convert.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_hash.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_norm.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_parser2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_parser.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test_unicode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
test.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Test.cpp support for &restart=1 2013-11-14 14:02:56 -08:00
Test.h Initial file population. 2013-08-02 13:12:24 -07:00
testfloats.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Tfndb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Tfndb.h Initial file population. 2013-08-02 13:12:24 -07:00
Thesaurus.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Thesaurus.h Initial file population. 2013-08-02 13:12:24 -07:00
Threads.cpp fix file descriptor leak in Dir class. 2013-11-19 13:41:56 -08:00
Threads.h do not try to join on thread when pthread_create() fails 2013-11-16 18:28:49 -07:00
threadtest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
thunder.cpp Initial file population. 2013-08-02 13:12:24 -07:00
tifftopnm Initial file population. 2013-08-02 13:12:24 -07:00
Timedb.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Timedb.h Initial file population. 2013-08-02 13:12:24 -07:00
Timer.h Initial file population. 2013-08-02 13:12:24 -07:00
Title.cpp fix json double decoding issue. no more 2013-11-22 14:16:14 -08:00
Title.h Initial file population. 2013-08-02 13:12:24 -07:00
Titledb.cpp move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
Titledb.h move from groups to shards. got rid of annoying 2013-10-04 16:18:56 -07:00
TopTree.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TopTree.h Initial file population. 2013-08-02 13:12:24 -07:00
treetest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TuringTest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
TuringTest.h Initial file population. 2013-08-02 13:12:24 -07:00
Turkdb.cpp Initial file population. 2013-08-02 13:12:24 -07:00
types.h fix compiler warning in types.h. 2013-09-08 20:00:52 -06:00
UCNormalizer.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UCNormalizer.h Initial file population. 2013-08-02 13:12:24 -07:00
UCPropTable.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UCPropTable.h Initial file population. 2013-08-02 13:12:24 -07:00
UCWordIterator.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UCWordIterator.h Initial file population. 2013-08-02 13:12:24 -07:00
UdpProtocol.h Initial file population. 2013-08-02 13:12:24 -07:00
UdpServer.cpp fix core from trying to get the time 2013-09-01 12:55:22 -06:00
UdpServer.h Initial file population. 2013-08-02 13:12:24 -07:00
UdpSlot.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UdpSlot.h Initial file population. 2013-08-02 13:12:24 -07:00
udptest.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Unicode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Unicode.h Initial file population. 2013-08-02 13:12:24 -07:00
UnicodeProperties.cpp Initial file population. 2013-08-02 13:12:24 -07:00
UnicodeProperties.h Initial file population. 2013-08-02 13:12:24 -07:00
unifiedDict.txt Initial file population. 2013-08-02 13:12:24 -07:00
uniq2.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Url.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Url.h Initial file population. 2013-08-02 13:12:24 -07:00
urlinfo.cpp just ignore all urls with # (hashtag) in them 2013-10-03 23:33:55 -06:00
Users.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Users.h Initial file population. 2013-08-02 13:12:24 -07:00
ValidPointer.cpp Initial file population. 2013-08-02 13:12:24 -07:00
ValidPointer.h Initial file population. 2013-08-02 13:12:24 -07:00
Vector.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Vector.h Initial file population. 2013-08-02 13:12:24 -07:00
Weights.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Weights.h Initial file population. 2013-08-02 13:12:24 -07:00
Wiki.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Wiki.h Initial file population. 2013-08-02 13:12:24 -07:00
wikititles.txt.part1 Initial file population. 2013-08-02 13:12:24 -07:00
wikititles.txt.part2 Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-buf.txt Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-lang.txt Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-syns.dat Initial file population. 2013-08-02 13:12:24 -07:00
Wiktionary.cpp Merge branch 'master' into diffbot 2013-11-20 15:51:58 -08:00
Wiktionary.h Initial file population. 2013-08-02 13:12:24 -07:00
Words.cpp fixed bugs in sort by prices, etc. 2013-11-11 18:58:45 -08:00
Words.h Initial file population. 2013-08-02 13:12:24 -07:00
Xml.cpp Initial file population. 2013-08-02 13:12:24 -07:00
Xml.h Initial file population. 2013-08-02 13:12:24 -07:00
XmlDoc.cpp measure crawl delay by default from 2013-11-26 14:07:28 -08:00
XmlDoc.h crawldelay works now but it measures 2013-11-26 12:58:14 -08:00
XmlNode.cpp Initial file population. 2013-08-02 13:12:24 -07:00
XmlNode.h Initial file population. 2013-08-02 13:12:24 -07:00
zconf.h Initial file population. 2013-08-02 13:12:24 -07:00
zlib.h Initial file population. 2013-08-02 13:12:24 -07:00

open-source-search-engine

An open source web and enterprise search engine. As can be seen on http://www.gigablast.com/ .

RUNNING GIGABLAST

See html/admin.html for all administrative documentation including the quick start instructions.

Alternatively, visit http://www.gigablast.com/admin.html

See html/compare.html for a comparison of Gigablast to SOLR. Although this is very sparse right now, it does include some useful commands.

CODE ARCHITECTURE

See html/developer.html for all code documentation.

Alternatively, visit http://www.gigablast.com/developer.html

CONTACT

Contact me for feature requests or help in general. I will work for free for good use cases. mattdwells@hotmail.com.