Commit Graph

23 Commits

Author SHA1 Message Date
Matt Wells
060f7da967 fix data corruption detection and repair bug.
do not core on corrupt http reply missing \0.
just set the g_errno to ECORRUPTDATA.
give more informative corruption log msgs.
2014-05-01 10:38:00 -07:00
Matt Wells
f3c06ced57 try to fix core from deleting coll 2014-04-25 11:52:17 -07:00
Matt Wells
9c8410767d fix critical title alloc/free bug
in title.cpp.
2014-03-28 08:01:01 -07:00
Matt Wells
5ca411e3e2 tuning the rebalance loop 2014-03-15 14:56:11 -07:00
Matt Wells
bd4484db3c Merge branch 'testing' into diffbot-testing 2014-03-10 12:08:23 -07:00
Matt Wells
27e8e810d2 use collnum instead of coll string.
more stable since resetting collections
keeps string the same but changes the collnum.
2014-03-06 15:48:11 -08:00
Matt Wells
fc47c18aec new printadmintop functionality. 2014-02-07 23:08:04 -07:00
Matt Wells
6af9441818 change deduping logic to be first come first
server, but site rank trumps. fixed bug from
fix before.
2014-01-29 16:14:42 -08:00
Matt Wells
b40f393f4c fix a couple cores related to deleting collections
in progress. support termlist dump with terms
containing colons.
2014-01-29 15:56:07 -08:00
Matt Wells
3a6a271dd9 make crawl sync bug fixes.
fix Puz crawl from dying out on host 9
because spider reply did not resuscitate waiting
tree for its ip.
fix mike's zola crawl with a repeat of 3 days
from not incmreneting the round because it had
maxrounds 0, which means to ignore... assume 0
means to ignore now. send out 0xc1 crawl info
requests to even dead hosts so we can at least use
their last known good info.
2014-01-25 13:47:03 -08:00
Matt Wells
066d910934 try to fix rebalancing some more. 2014-01-21 22:39:01 -08:00
Matt Wells
33c5d9c07f a lot of times rdb tree has invalid collection
numbers in it so fix our counting algo in case
the collection rec no longer exists!
2014-01-21 19:01:44 -08:00
Matt Wells
5b7170e8c6 Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
Conflicts:
	Json.cpp
	PageAddUrl.cpp
	PageStats.cpp
	Spider.cpp
2014-01-17 21:07:08 -08:00
Matt Wells
4e803210ee tons of changes from live github on neo.
lots of core fixes.
took out ppthtml powerpoint convert, it hangs.
dynamic rdbmap to save memory per coll.
fixed disk page cache logic and brought it
back.
2014-01-17 21:01:43 -08:00
Matt Wells
980d63632a more msg5 re-read fixes.
stop re-reading if increasing minrecsizes did nothing.
fix tight merges so they work over all colls.
fix merge counting to be fast and not loop over
all rdbbases which could be thousands.
add num mirrors to rebalance.txt.
fix updateCrawlInfo to wait for all replies. critical error!
2014-01-16 13:38:22 -08:00
Matt Wells
ae3aa445e8 rebalancer working pretty well now 2014-01-15 19:08:47 -08:00
Matt Wells
6ba3936d0b various core fixes. need to fix
json parser mem allocation right though.
Added dynamic rdb map ptr allocation
to save memory when you have thousands
of collections.
2014-01-09 11:34:52 -08:00
mwells
82494baa89 move CollectionRec stuff into Collectiondb files
for simplicity.
2013-12-10 15:28:04 -08:00
Matt Wells
a1ac5a5348 try to fix core from spiderdb scan coming back to
no collection record b/c user deleted it.
2013-10-29 16:51:21 -07:00
Matt Wells
fe97e08281 move from groups to shards. got rid of annoying
groupid bit mask thing.
2013-10-04 16:18:56 -07:00
mwells
e71266e2db fix data downloading for large files 2013-09-30 13:48:37 -06:00
mwells
84fae9a3c6 Fix issue of reading spiderrequests from
doledb at the very first key in spiderdb.
causes lots of positive/negative key annihilations.
we end up re-reading like 300 times in some
cases just to get a url from a doledb priority.
2013-08-29 21:16:59 -06:00
Matt Wells
f6e560c1f4 Initial file population. 2013-08-02 13:12:24 -07:00