Commit Graph

3801 Commits

Author SHA1 Message Date
Dmitry Smirnov
c124eda914 cleanup: remove local zlib. All distros provide zlib1g-dev. 2021-05-05 10:36:21 +10:00
Dmitry Smirnov
3a55b74050 cleanup: removed useless local binaries (libgcc.a libc.a) 2021-05-05 10:24:45 +10:00
Dmitry Smirnov
e18d2396a6 Removed private OpenSSL [hygiene,FTBFS]. All distros provide OpenSSL. 2021-05-05 10:20:11 +10:00
Dmitry Smirnov
7a2bca9649 Compile with "-std=c++98" to fix FTBFS (Closes: #164)
~~~~
Mem.cpp:233:8: error: declaration of ‘void* operator new(size_t) throw (std::bad_alloc)’ has a different exception specifier
  233 | void * operator new (size_t size) throw (std::bad_alloc) {
      |        ^~~~~~~~
~~~~
2021-05-05 10:07:55 +10:00
Gigablast
9146f05574
Merge pull request #157 from shijuraj/master
Merge pull request #1 from gigablast/master
2020-05-04 08:55:25 -06:00
Shijuraj J
c1e5f9fd7f
Merge pull request #1 from gigablast/master
Commits on Jun 02, 2017
2019-09-21 08:24:58 +05:30
Gigablast
4a943f1c79 Merge pull request #136 from vonbetz/master
Fix infinite loop on malformed proxy.
2017-06-02 11:32:56 -06:00
Zak Betz
f10fdada73 Fix infinite loop on malformed proxy. 2017-06-02 11:28:58 -06:00
Matt
3d248732d0 fix to shut up app checker. 2016-11-04 17:28:26 -06:00
Matt
c0b2cdb60a hide the verify disk writes parm, seems to be causing
cores when activated. and shouldn't really need to be used.
is for debugging disk issues.
2016-11-04 17:09:15 -06:00
Matt
8891100c2a fix add url on root page to set collnum properly.
fix Summary::getBestWindow() underrun bug.
2016-04-06 10:31:04 -06:00
Matt
70ca2fe48c update ./gb -h desc for ./gb inject. 2016-04-05 21:06:38 -06:00
Gigablast
f5d0045b43 Merge pull request #82 from vonbetz/testing
Fix for сацминэнергорф --> сацминэнерго.рф in getDisplayUrl(...)
2016-03-29 13:12:56 -06:00
Zak Betz
3c140b87aa Merge branch 'testing' of https://github.com/gigablast/open-source-search-engine into testing 2016-03-29 12:42:05 -06:00
Zak Betz
cf7ec13de6 Fix international domain printing bug. 2016-03-29 12:41:34 -06:00
Matt
33e76af1d1 Merge branch 'testing' 2016-03-29 04:11:30 -06:00
Matt
816d69b34c a lot of bug fixes thanks to isj. 2016-03-29 04:08:17 -06:00
Matt
5072e851b7 fix misspelling 2016-03-28 17:26:40 -06:00
Matt
5935619eb2 hack on parentUrlDocId to the json object dump
of diffbot objects.
2016-03-28 12:39:48 -06:00
Matt
cab6d5c519 fix keysize==8 bug in keycmp 2016-03-28 09:17:01 -06:00
Matt
b65a16caee Merge branch 'diffbot-testing' into testing 2016-03-22 16:25:21 -06:00
Matt Wells
3c743a7d0e allow more docids to be downloaded/served in search results. 2016-03-22 15:24:33 -07:00
Matt Wells
04a8433256 show gbssParentDocId in status doc for children docs,
like diffbot object docs.
2016-03-22 09:00:10 -07:00
Matt Wells
483d69d7f7 added httprequest debug line 2016-03-21 14:46:10 -07:00
Matt Wells
136d23816c fix hashbang properly 2016-03-21 09:29:55 -07:00
Matt
48398d0cd7 Merge branch 'diffbot-testing' into testing 2016-03-20 23:14:26 -06:00
Matt Wells
136b8842db fix more data corruption bugs. hopefully
will dump out all the collections this time and
not leave any in the tree, otherwise, especially if there
are a lot left behind, they get corrupted.
2016-03-20 21:04:01 -07:00
Matt Wells
61ef806dea hash bang fix.
detect more corruption.
don't dump titledb and spiderdb at same time,
seems to reduce corruption in rdbmem.
2016-03-20 12:50:43 -07:00
Matt Wells
fc495a5bf5 fix dump core when collection deleted while dumping 2016-03-18 06:46:38 -07:00
Matt
8922b8e69c Merge branch 'diffbot-testing' into testing 2016-03-17 14:31:22 -06:00
Matt Wells
56bde4c3ef fix the data corruption fix 2016-03-17 13:22:56 -07:00
Matt Wells
8bc653c31c after dump completes scan tree to ensure all nodes
reference secondary mem ptr so they don't get their
data overwritten.
2016-03-17 10:09:49 -07:00
Matt Wells
0caf345850 if running ./gb start and another gb is already bound on the port
then quickly exit(0) and have the bash keep alive loop exit the loop
based on that return value. we can't use ./cleanexit file because it
doesn't get remove and will mess up the main process that is running.
2016-03-16 16:56:48 -07:00
Matt Wells
36fdbf2f5a rename log files in the gb main.cpp code not in the
bash loop. do not rename the log file if failed to start
gb because socket was already bound. prevents us from double
starts moving the log file,  which is annoying.
2016-03-16 16:08:08 -07:00
Matt Wells
a2e8a3a1fd use ./cleanexit file to ensure gb doesn't restart
after a graceful exit in the bash keep alive loop.
2016-03-16 14:57:19 -07:00
Matt Wells
7396e57660 show docids of corrupted title recs found.
show key range of each dump to disk.
fix 'sentToDiffbot' bug for unchanged docs in status docs.
make sure firstKeyInQueue is set properly from current key,
so reset list ptr before doing that in RdbDump.cpp.
2016-03-16 13:53:08 -07:00
Matt
5e8c47adfd Merge branch 'diffbot-testing' into testing 2016-03-16 01:14:37 -06:00
Matt Wells
1faff50f5a if msg22a never called to get docid, then
error out.
2016-03-16 00:14:02 -07:00
Matt
c7c8c9e5ad Merge branch 'diffbot-testing' into testing 2016-03-16 00:54:49 -06:00
Matt Wells
0b5f417349 if old title rec was corrupted we would get a random docid
when re-spidering the url causing some chaos. now things
should return to normal and we should overwrite the corrupted
titlerec on the next spidering. also, no longer do robots.txt
titlerec lookups. silly.
2016-03-15 23:26:57 -07:00
Matt Wells
58993dbbf9 do not allow crawlbot seeds to be deduped out 2016-03-15 20:42:28 -07:00
Matt Wells
bf45db6f48 Merge branch 'diffbot-testing' into testing 2016-03-15 15:55:55 -07:00
Matt Wells
8a65d21371 fix the source of lots of corruption in spiderdb and titledb.
rdbmem.cpp was storing in secondary mem which got reset when
dump completed. also do not add keys that are in collnum and
key range of list currently being dumped, return ETRYAGAIN.
added verify writes parm. clean out tree of titledb and spiderdb
corruption on startup.
2016-03-15 15:54:12 -07:00
Matt Wells
0fdbaa4196 makefile optimizations 2016-03-14 16:34:24 -07:00
Matt
0dbc304bbf fix to allow us to gather ip-only url outlinks again 2016-03-14 10:56:33 -06:00
Matt
2c167aada7 fix redirect to self bug that requires setting cookie 2016-03-14 10:33:05 -06:00
Matt Wells
d6fe684b99 fix another core caused by deleted coll 2016-03-07 10:20:25 -08:00
Matt Wells
d4e16a4dab pass a crawlbotnightly smoke 2016-03-04 13:14:28 -08:00
Matt Wells
e75d80abbe ignore meta redirect tags in html comment tags. 2016-02-22 12:41:03 -08:00
Matt Wells
412b04bbd4 fix neverending crawl rounds by only trying each url
once per round. updated url filters.
2016-02-22 09:28:46 -08:00