Commit Graph

2738 Commits

Author SHA1 Message Date
Matt
ef99aabf4d try to fix qainject1 core in qa.cpp 2015-02-17 20:17:59 -07:00
Matt
dce8d9f930 fix qa bug of not resetting s_i.
fix tcpserver.cpp bug of destroying a streaming
socket after what is really not the final write.
2015-02-17 20:10:13 -07:00
Matt
d14cb2d5b0 fix debug log msgs. 2015-02-17 19:15:43 -07:00
Matt
2488c1a338 added proper write callback registration into
TcpServer.cpp so we only register write callbacks
when a non-blocking write does not write all the
bytes requested of it, or when a connection does not
complete. also fixed up the sslHandshake() function
which calls SSL_connect().
2015-02-16 14:48:39 -07:00
Matt
cd9c158199 loop.cpp cleanups.
make it so non-linux os will break out
of the select() loop eventually even if select()
only gets EINTRs all the time. so we can process
shutdown cmd.
save ips.txt again for qatest123 qa collection.
do not use winnerlist cache when we have 'sitepages'
url filter expression. it messes it up.
2015-02-13 12:07:10 -08:00
Matt
b891f2ff22 format updates for qa tool 2015-02-12 17:19:14 -08:00
Matt Wells
596a674c61 fixes for rebuilding the active list
in SpiderLoop class.
2015-02-12 17:00:38 -08:00
Matt
24eac820d5 fixed bad deletenode call causing dups in
winnertree.
2015-02-12 16:12:23 -08:00
Matt
579a08d287 fixed link overflow logic. 2015-02-12 15:03:01 -08:00
Matt
735667be22 fixed Rdb::reclaimMemFromDeletedTreeNodes() 2015-02-12 14:23:16 -08:00
Matt
415c96fc56 added overflow checks to ensure we don't have more
than 10M unique urls for a given "firstip"
queued up to be spidered in spiderdb
that have never been spidered. should prevent us
from having 20GB spiderdbs for spidering those sites
that essentially have an infinite # of urls, black hole
sites, that seems to be plaguing crawls.
2015-02-12 13:41:40 -08:00
Matt
c8fb1af5c4 added tree mem reclaimer for doledb since it
is now a tree-only rdb.
2015-02-12 12:12:25 -08:00
Matt
04cc8adbdd fix &admin=0 so it works again 2015-02-12 11:16:34 -08:00
Matt
c009430b6c more fixes for new spider updates 2015-02-11 21:54:36 -08:00
Matt
b12913ed83 only add urls we should spider to our own doledbtree 2015-02-11 19:27:28 -08:00
Matt
9ea53ed89e bug fixes. spidering seems to work somewhat again. 2015-02-11 19:23:36 -08:00
Matt
30a77dd422 checkpoint on massive spidering speed ups. 2015-02-11 17:55:28 -08:00
Matt
f6723ddaa3 new much faster spider. cache the winner tree
basically. TODO: need to update cache if
new spiderrequests are added that should be
in the cached winner tree.
2015-02-10 21:27:21 -08:00
mwells
5c31dbda9a Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing 2015-02-10 21:07:02 -07:00
mwells
5b538e7cee fix core in linkdb logic 2015-02-10 21:06:47 -07:00
Matt Wells
7909df5b5e Merge branch 'diffbot' into diffbot-testing 2015-02-10 12:21:29 -08:00
Matt Wells
acbf4c582f show sigpipes and sigios for help debugging 2015-02-10 12:20:32 -08:00
Matt
12cdc7c9d4 more spider speed ups based on profiler data.
added Rdb::getCollNumTotalRecs() function.
2015-02-10 12:00:04 -08:00
Matt
4c7ee42dd9 speed up spiderDoledUrls() loop calling of
gettimeofdayInMillisecondsSynced() using
g_clockNeedsUpdate logic.
2015-02-10 11:47:53 -08:00
Matt Wells
18d449c681 show pause message before next round to start msg. 2015-02-09 16:14:59 -08:00
Matt Wells
01687fcb0e fix gb thrutest disk tests 2015-02-09 10:29:08 -08:00
mwells
b40ee75187 fix core from certain queries 2015-02-08 22:06:28 -07:00
Matt
5eeeaef446 do not compile redhat's gb with -static.
even if we yum install the static libs
there's still problems.
2015-02-08 19:43:32 -08:00
mwells
5e752e78ef add 'more from this site' link back to results. 2015-02-08 18:13:48 -07:00
mwells
53bfd960c5 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing 2015-02-08 16:05:17 -07:00
mwells
bccdd6b65a fix site cluster by default parm bug 2015-02-08 16:05:04 -07:00
Matt
8fff54621c doc updates 2015-02-07 12:13:40 -08:00
Matt
67a143864c take out add gigablast to your browser's search engines for now 2015-02-07 12:10:43 -08:00
Matt
9327ebf61f take out FEED link for now 2015-02-07 12:09:21 -08:00
Matt
afbe35c5a9 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing 2015-02-07 12:07:52 -08:00
Matt
580736d766 support arc injections 2015-02-07 12:07:42 -08:00
mwells
aff7e49db2 fix case bug 2015-02-06 19:55:45 -07:00
Matt Wells
85b244337c fix parm out of band core. fix hostdb conf symlink bug. 2015-02-06 15:35:00 -08:00
Matt Wells
f2a87358e6 try to speed up threads more 2015-02-05 15:00:18 -08:00
Matt
6c1c2c66c4 added dstart to gb -h help menu 2015-02-05 12:39:13 -08:00
Matt
9f22e268a2 try to fix crawlbot nightly smoke tests 2015-02-05 12:29:43 -08:00
Matt
e9c36d1f75 comment update 2015-02-05 10:35:52 -08:00
mwells
b0f81b848c fix flush bug 2015-02-04 10:13:34 -07:00
Matt
e426877eea make a note of obscure condition 2015-02-03 19:45:18 -08:00
Matt
3e1cc9a450 fix bug of parms being set at seemingly random. 2015-02-03 17:52:44 -08:00
Matt
76ec7f3a4a add # of tcp connections to hosts table 2015-02-03 14:14:17 -08:00
Matt
a6435bb210 miscellaneous spider/injection speedups. 2015-02-03 14:04:53 -08:00
Matt
f70d533525 make threads enabled for disk the default setting
now that creating threads should be much faster.
2015-02-03 13:43:13 -08:00
Matt
93fce690d6 more speedups. do not calls sigprocmask in main thread
before pthread_create(). instead call pthread_sigmask()
from thread like we were doing already for SIGINT.
2015-02-03 13:39:23 -08:00
Matt
3badbb69f4 fix injection bug 2015-02-03 13:00:47 -08:00