Commit Graph

38 Commits

Author SHA1 Message Date
Matt Wells
edbd61b0c5 thread fixes. if pthread_create fails then
keep thread queue and just return. will try to
relaunch later. do not count delete keys towards
shard rebalance count.
2014-03-15 20:07:02 -07:00
Matt Wells
5ca411e3e2 tuning the rebalance loop 2014-03-15 14:56:11 -07:00
Matt Wells
86147fe22c tight merge during rebalance to save
disk space, so neg recs annihilate pos recs.
2014-03-14 23:37:30 -07:00
Matt Wells
82ac3fab6c merge fixes 2014-03-14 22:15:08 -07:00
Matt Wells
553aefdb55 keep files tightly merged when doing rebalanced
to avoid running out of disk space
2014-03-14 19:19:41 -07:00
Matt Wells
27e8e810d2 use collnum instead of coll string.
more stable since resetting collections
keeps string the same but changes the collnum.
2014-03-06 15:48:11 -08:00
Matt Wells
a6b7e088f5 take out tfndb, unused. fix core
from diffbot url too long.
2014-02-26 01:07:13 -08:00
Matt Wells
32526a9b25 more checksum fixes for json. fixes for
repair/rebuild procedure.
2014-02-16 10:46:41 -08:00
Matt Wells
b634d06287 fix some cores. use olddoc contenthash
for msg13 call for EDOCUNCHANGED errors.
2014-02-07 18:28:09 -08:00
Matt Wells
4d2eafe39b added some repair logic for 0001.dat files.
turn of spiderdb disk cache for now.
2014-02-01 10:14:25 -08:00
Matt Wells
a4be05d8d0 more shard rebalancer fixes 2014-01-22 00:44:33 -08:00
Matt Wells
066d910934 try to fix rebalancing some more. 2014-01-21 22:39:01 -08:00
Matt Wells
33c5d9c07f a lot of times rdb tree has invalid collection
numbers in it so fix our counting algo in case
the collection rec no longer exists!
2014-01-21 19:01:44 -08:00
Matt Wells
45cb5c9a0c fix bugs to try to get sharding working
on crawlbot today
2014-01-21 13:58:21 -08:00
Matt Wells
4606e88721 code cleanups.
xmldoc::injectDoc(), and it'll
add a SpiderRequest as well.
better collectiondb init code.
2014-01-18 21:19:26 -08:00
Matt Wells
980d63632a more msg5 re-read fixes.
stop re-reading if increasing minrecsizes did nothing.
fix tight merges so they work over all colls.
fix merge counting to be fast and not loop over
all rdbbases which could be thousands.
add num mirrors to rebalance.txt.
fix updateCrawlInfo to wait for all replies. critical error!
2014-01-16 13:38:22 -08:00
Matt Wells
299a208253 reduce log spam 2014-01-11 16:49:43 -08:00
Matt Wells
d8554bfb0f update default parm settings. 2014-01-09 13:22:51 -08:00
Matt Wells
6ba3936d0b various core fixes. need to fix
json parser mem allocation right though.
Added dynamic rdb map ptr allocation
to save memory when you have thousands
of collections.
2014-01-09 11:34:52 -08:00
Matt Wells
048b715962 if coll is deleted or reset in a middle of a dump
or merge then stop the dump/merge with ENOCOLLREC
error. avoid calling "base->" functions since it
could be NULL if deleted.
2013-12-25 17:12:09 -08:00
Matt Wells
a0ceade641 fix oom doleiptable using too much mem
so bulk job went oom
2013-12-18 17:20:53 -08:00
Matt Wells
1b5057ad42 log cleanups mostly.
took out disk page cache,
kinda buggy... need to fix at some point.
2013-12-18 10:57:18 -08:00
Matt Wells
2cd53386ad parm updates 2013-12-17 09:51:08 -08:00
Matt Wells
3f19ece776 parmdb updates 2013-12-16 17:07:15 -08:00
Matt Wells
7b768d4b86 Merge branch 'diffbot' into diffbot-testing
Conflicts:
	iana_charset.cpp
	iana_charset.h
2013-12-12 13:01:49 -08:00
Matt Wells
16e91375f4 bring in changes from live beta from ~/github.
limit spiders to 50, not 500 to prevent oom.
resume killed merges that had num files shrunk even
if down to one file. show collnum in spider queue.
remove back-to-back whitespace, and make all space
a ' ' for getting the doc checksum for deduping.
2013-12-12 12:58:58 -08:00
mwells
76bb3d05e1 clean up logging so i can see what's going on 2013-12-10 16:41:30 -08:00
mwells
82494baa89 move CollectionRec stuff into Collectiondb files
for simplicity.
2013-12-10 15:28:04 -08:00
Matt Wells
3353a90a85 fix resuming a killed merge condition. 2013-12-08 15:50:45 -07:00
Matt Wells
a2e52a5dc3 little fix 2013-12-08 10:15:54 -07:00
Matt Wells
65e75167e3 limit posdb merging to 8 files max.
added some more url filters documentation.
2013-12-08 09:41:05 -07:00
Matt Wells
2a5d92a639 log debug update. 2013-11-21 12:37:53 -08:00
Matt Wells
c669f8c138 fix file descriptor leak in Dir class.
try to fix core from Thread getting SIGALRM.
try to set NOFILES to 1024 at startup in case
more are allowed.
2013-11-19 13:41:56 -08:00
Matt Wells
a31b13ad61 fix a few bugs. 2013-11-13 13:27:22 -08:00
Matt Wells
396a88799a fix bad bug of basically emptying out all our data
on auto-save!
2013-11-06 19:49:20 -08:00
Matt Wells
fe97e08281 move from groups to shards. got rid of annoying
groupid bit mask thing.
2013-10-04 16:18:56 -07:00
Matt Wells
94e6492916 removed MAX_COLL_RECS so we can have unlimited
collections, really limited by the sizeof(collnum_t) only now,
which is 16bits, 15bits unsigned, which is the limitation.
can always expand this so we can have more than 32k collections.
2013-08-30 16:20:38 -07:00
Matt Wells
f6e560c1f4 Initial file population. 2013-08-02 13:12:24 -07:00