Matt Wells
56af753c3e
fixed nasty bug of resetting RdbBases for
...
random collnums, causing data loss and corruption.
2014-06-09 10:16:29 -07:00
Matt Wells
172d7071a7
fix to rename tagdb0000.002.dat
2014-06-05 22:21:41 -07:00
Matt Wells
8ac691f324
fix merging getting clogged by so many
...
collections tring to merge tagdb at once
2014-06-05 21:27:33 -07:00
Matt Wells
7b4b8b27bd
more debug msgs
2014-06-05 14:58:20 -07:00
Matt Wells
1fe2c94322
add some debug notes
2014-06-05 12:26:06 -07:00
Matt Wells
4298e4e752
sanity checks for debugging duplicate
...
titledb file bug.
2014-06-04 12:15:12 -07:00
Matt Wells
edbd61b0c5
thread fixes. if pthread_create fails then
...
keep thread queue and just return. will try to
relaunch later. do not count delete keys towards
shard rebalance count.
2014-03-15 20:07:02 -07:00
Matt Wells
5ca411e3e2
tuning the rebalance loop
2014-03-15 14:56:11 -07:00
Matt Wells
86147fe22c
tight merge during rebalance to save
...
disk space, so neg recs annihilate pos recs.
2014-03-14 23:37:30 -07:00
Matt Wells
82ac3fab6c
merge fixes
2014-03-14 22:15:08 -07:00
Matt Wells
553aefdb55
keep files tightly merged when doing rebalanced
...
to avoid running out of disk space
2014-03-14 19:19:41 -07:00
Matt Wells
27e8e810d2
use collnum instead of coll string.
...
more stable since resetting collections
keeps string the same but changes the collnum.
2014-03-06 15:48:11 -08:00
Matt Wells
a6b7e088f5
take out tfndb, unused. fix core
...
from diffbot url too long.
2014-02-26 01:07:13 -08:00
Matt Wells
32526a9b25
more checksum fixes for json. fixes for
...
repair/rebuild procedure.
2014-02-16 10:46:41 -08:00
Matt Wells
b634d06287
fix some cores. use olddoc contenthash
...
for msg13 call for EDOCUNCHANGED errors.
2014-02-07 18:28:09 -08:00
Matt Wells
4d2eafe39b
added some repair logic for 0001.dat files.
...
turn of spiderdb disk cache for now.
2014-02-01 10:14:25 -08:00
Matt Wells
a4be05d8d0
more shard rebalancer fixes
2014-01-22 00:44:33 -08:00
Matt Wells
066d910934
try to fix rebalancing some more.
2014-01-21 22:39:01 -08:00
Matt Wells
33c5d9c07f
a lot of times rdb tree has invalid collection
...
numbers in it so fix our counting algo in case
the collection rec no longer exists!
2014-01-21 19:01:44 -08:00
Matt Wells
45cb5c9a0c
fix bugs to try to get sharding working
...
on crawlbot today
2014-01-21 13:58:21 -08:00
Matt Wells
4606e88721
code cleanups.
...
xmldoc::injectDoc(), and it'll
add a SpiderRequest as well.
better collectiondb init code.
2014-01-18 21:19:26 -08:00
Matt Wells
980d63632a
more msg5 re-read fixes.
...
stop re-reading if increasing minrecsizes did nothing.
fix tight merges so they work over all colls.
fix merge counting to be fast and not loop over
all rdbbases which could be thousands.
add num mirrors to rebalance.txt.
fix updateCrawlInfo to wait for all replies. critical error!
2014-01-16 13:38:22 -08:00
Matt Wells
299a208253
reduce log spam
2014-01-11 16:49:43 -08:00
Matt Wells
d8554bfb0f
update default parm settings.
2014-01-09 13:22:51 -08:00
Matt Wells
6ba3936d0b
various core fixes. need to fix
...
json parser mem allocation right though.
Added dynamic rdb map ptr allocation
to save memory when you have thousands
of collections.
2014-01-09 11:34:52 -08:00
Matt Wells
048b715962
if coll is deleted or reset in a middle of a dump
...
or merge then stop the dump/merge with ENOCOLLREC
error. avoid calling "base->" functions since it
could be NULL if deleted.
2013-12-25 17:12:09 -08:00
Matt Wells
a0ceade641
fix oom doleiptable using too much mem
...
so bulk job went oom
2013-12-18 17:20:53 -08:00
Matt Wells
1b5057ad42
log cleanups mostly.
...
took out disk page cache,
kinda buggy... need to fix at some point.
2013-12-18 10:57:18 -08:00
Matt Wells
2cd53386ad
parm updates
2013-12-17 09:51:08 -08:00
Matt Wells
3f19ece776
parmdb updates
2013-12-16 17:07:15 -08:00
Matt Wells
7b768d4b86
Merge branch 'diffbot' into diffbot-testing
...
Conflicts:
iana_charset.cpp
iana_charset.h
2013-12-12 13:01:49 -08:00
Matt Wells
16e91375f4
bring in changes from live beta from ~/github.
...
limit spiders to 50, not 500 to prevent oom.
resume killed merges that had num files shrunk even
if down to one file. show collnum in spider queue.
remove back-to-back whitespace, and make all space
a ' ' for getting the doc checksum for deduping.
2013-12-12 12:58:58 -08:00
mwells
76bb3d05e1
clean up logging so i can see what's going on
2013-12-10 16:41:30 -08:00
mwells
82494baa89
move CollectionRec stuff into Collectiondb files
...
for simplicity.
2013-12-10 15:28:04 -08:00
Matt Wells
3353a90a85
fix resuming a killed merge condition.
2013-12-08 15:50:45 -07:00
Matt Wells
a2e52a5dc3
little fix
2013-12-08 10:15:54 -07:00
Matt Wells
65e75167e3
limit posdb merging to 8 files max.
...
added some more url filters documentation.
2013-12-08 09:41:05 -07:00
Matt Wells
2a5d92a639
log debug update.
2013-11-21 12:37:53 -08:00
Matt Wells
c669f8c138
fix file descriptor leak in Dir class.
...
try to fix core from Thread getting SIGALRM.
try to set NOFILES to 1024 at startup in case
more are allowed.
2013-11-19 13:41:56 -08:00
Matt Wells
a31b13ad61
fix a few bugs.
2013-11-13 13:27:22 -08:00
Matt Wells
396a88799a
fix bad bug of basically emptying out all our data
...
on auto-save!
2013-11-06 19:49:20 -08:00
Matt Wells
fe97e08281
move from groups to shards. got rid of annoying
...
groupid bit mask thing.
2013-10-04 16:18:56 -07:00
Matt Wells
94e6492916
removed MAX_COLL_RECS so we can have unlimited
...
collections, really limited by the sizeof(collnum_t) only now,
which is 16bits, 15bits unsigned, which is the limitation.
can always expand this so we can have more than 32k collections.
2013-08-30 16:20:38 -07:00
Matt Wells
f6e560c1f4
Initial file population.
2013-08-02 13:12:24 -07:00