Matt
ef99aabf4d
try to fix qainject1 core in qa.cpp
2015-02-17 20:17:59 -07:00
Matt
dce8d9f930
fix qa bug of not resetting s_i.
...
fix tcpserver.cpp bug of destroying a streaming
socket after what is really not the final write.
2015-02-17 20:10:13 -07:00
Matt
d14cb2d5b0
fix debug log msgs.
2015-02-17 19:15:43 -07:00
Matt
2488c1a338
added proper write callback registration into
...
TcpServer.cpp so we only register write callbacks
when a non-blocking write does not write all the
bytes requested of it, or when a connection does not
complete. also fixed up the sslHandshake() function
which calls SSL_connect().
2015-02-16 14:48:39 -07:00
Matt
cd9c158199
loop.cpp cleanups.
...
make it so non-linux os will break out
of the select() loop eventually even if select()
only gets EINTRs all the time. so we can process
shutdown cmd.
save ips.txt again for qatest123 qa collection.
do not use winnerlist cache when we have 'sitepages'
url filter expression. it messes it up.
2015-02-13 12:07:10 -08:00
Matt
b891f2ff22
format updates for qa tool
2015-02-12 17:19:14 -08:00
Matt Wells
596a674c61
fixes for rebuilding the active list
...
in SpiderLoop class.
2015-02-12 17:00:38 -08:00
Matt
24eac820d5
fixed bad deletenode call causing dups in
...
winnertree.
2015-02-12 16:12:23 -08:00
Matt
579a08d287
fixed link overflow logic.
2015-02-12 15:03:01 -08:00
Matt
735667be22
fixed Rdb::reclaimMemFromDeletedTreeNodes()
2015-02-12 14:23:16 -08:00
Matt
415c96fc56
added overflow checks to ensure we don't have more
...
than 10M unique urls for a given "firstip"
queued up to be spidered in spiderdb
that have never been spidered. should prevent us
from having 20GB spiderdbs for spidering those sites
that essentially have an infinite # of urls, black hole
sites, that seems to be plaguing crawls.
2015-02-12 13:41:40 -08:00
Matt
c8fb1af5c4
added tree mem reclaimer for doledb since it
...
is now a tree-only rdb.
2015-02-12 12:12:25 -08:00
Matt
04cc8adbdd
fix &admin=0 so it works again
2015-02-12 11:16:34 -08:00
Matt
c009430b6c
more fixes for new spider updates
2015-02-11 21:54:36 -08:00
Matt
b12913ed83
only add urls we should spider to our own doledbtree
2015-02-11 19:27:28 -08:00
Matt
9ea53ed89e
bug fixes. spidering seems to work somewhat again.
2015-02-11 19:23:36 -08:00
Matt
30a77dd422
checkpoint on massive spidering speed ups.
2015-02-11 17:55:28 -08:00
Matt
f6723ddaa3
new much faster spider. cache the winner tree
...
basically. TODO: need to update cache if
new spiderrequests are added that should be
in the cached winner tree.
2015-02-10 21:27:21 -08:00
mwells
5c31dbda9a
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
2015-02-10 21:07:02 -07:00
mwells
5b538e7cee
fix core in linkdb logic
2015-02-10 21:06:47 -07:00
Matt Wells
7909df5b5e
Merge branch 'diffbot' into diffbot-testing
2015-02-10 12:21:29 -08:00
Matt Wells
acbf4c582f
show sigpipes and sigios for help debugging
2015-02-10 12:20:32 -08:00
Matt
12cdc7c9d4
more spider speed ups based on profiler data.
...
added Rdb::getCollNumTotalRecs() function.
2015-02-10 12:00:04 -08:00
Matt
4c7ee42dd9
speed up spiderDoledUrls() loop calling of
...
gettimeofdayInMillisecondsSynced() using
g_clockNeedsUpdate logic.
2015-02-10 11:47:53 -08:00
Matt Wells
18d449c681
show pause message before next round to start msg.
2015-02-09 16:14:59 -08:00
Matt Wells
01687fcb0e
fix gb thrutest disk tests
2015-02-09 10:29:08 -08:00
mwells
b40ee75187
fix core from certain queries
2015-02-08 22:06:28 -07:00
Matt
5eeeaef446
do not compile redhat's gb with -static.
...
even if we yum install the static libs
there's still problems.
2015-02-08 19:43:32 -08:00
mwells
5e752e78ef
add 'more from this site' link back to results.
2015-02-08 18:13:48 -07:00
mwells
53bfd960c5
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
2015-02-08 16:05:17 -07:00
mwells
bccdd6b65a
fix site cluster by default parm bug
2015-02-08 16:05:04 -07:00
Matt
8fff54621c
doc updates
2015-02-07 12:13:40 -08:00
Matt
67a143864c
take out add gigablast to your browser's search engines for now
2015-02-07 12:10:43 -08:00
Matt
9327ebf61f
take out FEED link for now
2015-02-07 12:09:21 -08:00
Matt
afbe35c5a9
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
2015-02-07 12:07:52 -08:00
Matt
580736d766
support arc injections
2015-02-07 12:07:42 -08:00
mwells
aff7e49db2
fix case bug
2015-02-06 19:55:45 -07:00
Matt Wells
85b244337c
fix parm out of band core. fix hostdb conf symlink bug.
2015-02-06 15:35:00 -08:00
Matt Wells
f2a87358e6
try to speed up threads more
2015-02-05 15:00:18 -08:00
Matt
6c1c2c66c4
added dstart to gb -h help menu
2015-02-05 12:39:13 -08:00
Matt
9f22e268a2
try to fix crawlbot nightly smoke tests
2015-02-05 12:29:43 -08:00
Matt
e9c36d1f75
comment update
2015-02-05 10:35:52 -08:00
mwells
b0f81b848c
fix flush bug
2015-02-04 10:13:34 -07:00
Matt
e426877eea
make a note of obscure condition
2015-02-03 19:45:18 -08:00
Matt
3e1cc9a450
fix bug of parms being set at seemingly random.
2015-02-03 17:52:44 -08:00
Matt
76ec7f3a4a
add # of tcp connections to hosts table
2015-02-03 14:14:17 -08:00
Matt
a6435bb210
miscellaneous spider/injection speedups.
2015-02-03 14:04:53 -08:00
Matt
f70d533525
make threads enabled for disk the default setting
...
now that creating threads should be much faster.
2015-02-03 13:43:13 -08:00
Matt
93fce690d6
more speedups. do not calls sigprocmask in main thread
...
before pthread_create(). instead call pthread_sigmask()
from thread like we were doing already for SIGINT.
2015-02-03 13:39:23 -08:00
Matt
3badbb69f4
fix injection bug
2015-02-03 13:00:47 -08:00