Matt Wells
136b8842db
fix more data corruption bugs. hopefully
...
will dump out all the collections this time and
not leave any in the tree, otherwise, especially if there
are a lot left behind, they get corrupted.
2016-03-20 21:04:01 -07:00
Matt Wells
61ef806dea
hash bang fix.
...
detect more corruption.
don't dump titledb and spiderdb at same time,
seems to reduce corruption in rdbmem.
2016-03-20 12:50:43 -07:00
Matt Wells
fc495a5bf5
fix dump core when collection deleted while dumping
2016-03-18 06:46:38 -07:00
Matt Wells
8bc653c31c
after dump completes scan tree to ensure all nodes
...
reference secondary mem ptr so they don't get their
data overwritten.
2016-03-17 10:09:49 -07:00
Matt Wells
8a65d21371
fix the source of lots of corruption in spiderdb and titledb.
...
rdbmem.cpp was storing in secondary mem which got reset when
dump completed. also do not add keys that are in collnum and
key range of list currently being dumped, return ETRYAGAIN.
added verify writes parm. clean out tree of titledb and spiderdb
corruption on startup.
2016-03-15 15:54:12 -07:00
Matt Wells
1d2dfe1456
bring back max doc len parms.
...
index gbssIsContentTruncated field.
fix 30-day wait for >= 3 errors.
fix gbss formatting some more.
2016-02-08 14:10:04 -08:00
Matt Wells
b049554531
added some more quickpolls.
...
improved heartbeat log msg.
timed pthread_join.
brought back max heart beat delay parm.
2015-12-04 09:02:03 -08:00
Matt Wells
b8d57dcd3a
fix bug of dumping too many files to disk and not
...
being able to merge, and corrupting RdbBase::m_files[]
array and associated arrays.
2015-11-17 09:52:41 -08:00
Matt
37cc4f2ba8
Merge branch 'diffbot-testing' into testing
2015-11-09 11:13:42 -07:00
Matt Wells
3db9ae5d4d
rebuild fix
2015-11-07 13:14:38 -08:00
Matt
e57e3481b4
fix innerloop strangeness when counting keys in buckets
2015-10-14 13:52:42 -06:00
Matt
100888d691
fix file/dir creation permissions bugs
2015-09-21 12:44:41 -06:00
Matt
74cde33a3a
just use the user's umask val for all file/dir creation
2015-09-21 11:33:38 -06:00
Matt
ce7b06fc4d
all files made are now group writable.
...
if you don't like that then you can make
a special group and set the directory just
group writable for that group using chmod g+s <dir>.
2015-09-21 11:19:34 -06:00
Matt
09de59f026
do not store cblock, etc. tags into tagdb to save
...
disk space. added tagdb file cache for better performance,
less disk accesses. will help reduce disk load.
put file cache sizes in master controls and if they change
then update the cache size dynamically.
2015-09-10 12:46:00 -06:00
Matt Wells
3a522f52d3
no longer use read size based thread queues.
...
re-added merge disk read thread queue.
fixed attemptMergeAll().
2015-09-04 16:14:18 -07:00
Matt
a1a38bd2b2
fix attempt merge some more
2015-09-02 07:21:32 -06:00
Matt
a9854394ef
attempt merge clusterdb forgotten
2015-09-01 09:16:18 -06:00
Matt Wells
6e0dfd5a23
fix merge attempts
2015-09-01 01:07:43 -07:00
Matt Wells
c766e40357
set g_errno to ENOCOLLREC if getRdbBase() returns null.
2015-08-25 11:41:17 -07:00
Matt Wells
e140b001d8
try merging 1000 collections per call to preserve cpu
2015-08-25 08:25:55 -07:00
Matt
49e9f5a827
fixes for umsg00 electric fence.
...
take out catdb/statsdb merging attempts.
2015-08-24 11:35:33 -06:00
Matt Wells
bb16341f51
try to fix core dumps. not sure how
...
mem is getting corrupted.
2015-08-22 08:52:28 -07:00
Matt Wells
74ec812959
try to fix core from adding a file that already exists.
...
just return an error now. hopefully merge will try again later.
also core if you try to write recs to an rdbmap that
has already had its memory footprint reduced so we can find
that overrun bug.
2015-08-21 14:00:40 -07:00
Matt Wells
f8fb266844
fix new merging algo.
2015-08-16 10:11:21 -07:00
Matt Wells
178721d35b
speed up getFileSize() by using stat() func again.
...
despam logs at startup.
do not perm check every coll dir, only first 100, on
startup to make things faster.
2015-08-15 22:21:15 -07:00
Matt
bff643b555
use a linked list of merge candidates to
...
make attemptMergeAll() much much faster.
2015-08-15 19:26:37 -06:00
Matt
d9422d8b0e
get rid of limits on file sizes. dynamically allocate
...
file names and fixed-size File array in BigFile class. should
save gigabytes of memory in many-collection systems with
1+ million files or so.
2015-08-14 20:14:50 -06:00
Matt
a1ed368d82
bring back max mem control into master controls.
...
it's useful to limit per process mem usage to prevent
oom killer because we can't save if we get killed.
overhaul diskpagecache to just use rdbcache. much simpler
and faster, but disabled for now until debugged more.
reduce min files to merge for crawlbot collections so
they stay more tightly merged to conserve fds and mem.
improved logDebugDisk msgs.
overhauled File.cpp fd pool. now it is way faster and
doesn't use any extra mem. much simpler too. although
could be sped up a little by using a linked list, but
probably is not significant enough to warrant doing right now.
increase mem ptr table from 3M to 8M slots. should really make
dynamic though. fix core from null msg20s[0]->m_r.
only call attemptMergeAll once every 60 seconds really.
do not attempt merge if already merging.
2015-08-14 12:58:54 -06:00
Matt Wells
f8047ac5ef
speed up Rdb::attemptMargeAll() because it is a problem
...
according to the profiler when we got tens of thousands
of collections.
2015-08-08 12:27:18 -07:00
Matt Wells
d050fb81b5
fix rebuild code to rebuild spider status docs in index,
...
and to remove them from titledb if user has disabled
'index spider replies' in the spider controls to save disk.
made them off by default by now since they use some disk.
2015-06-16 16:29:26 -06:00
Matt Wells
b08d12a11e
fix cores associated with new spider status docs.
2015-04-07 10:33:54 -07:00
Matt
90456222b6
now we add the spider status docs as json documents.
...
so you can facet/sortby the various fields, etc.
2015-03-19 16:17:36 -06:00
Matt
7cf549bf2a
fix spider request overflow/dropping algo.
2015-03-10 13:07:00 -07:00
Matt Wells
eccb969e5b
put in some fixes to deal with doledb tree
...
that seems to have m_data[i] and m_data[j]
pointing to the same thing. wtf? anyway,
deal with that. it should fix the tree or
something automatically at startup?
2015-03-08 20:36:13 -07:00
Matt Wells
79879976fa
try to fix a couple cores. one when parsing
...
bad json. the other in reclaiming doledb tree mem.
2015-03-08 08:56:10 -07:00
mwells
ada18e648b
try to fix core in reclaiming doledb mem
2015-02-20 08:11:49 -07:00
Matt
24eac820d5
fixed bad deletenode call causing dups in
...
winnertree.
2015-02-12 16:12:23 -08:00
Matt
735667be22
fixed Rdb::reclaimMemFromDeletedTreeNodes()
2015-02-12 14:23:16 -08:00
Matt
c8fb1af5c4
added tree mem reclaimer for doledb since it
...
is now a tree-only rdb.
2015-02-12 12:12:25 -08:00
Matt
f6723ddaa3
new much faster spider. cache the winner tree
...
basically. TODO: need to update cache if
new spiderrequests are added that should be
in the cached winner tree.
2015-02-10 21:27:21 -08:00
Matt
12cdc7c9d4
more spider speed ups based on profiler data.
...
added Rdb::getCollNumTotalRecs() function.
2015-02-10 12:00:04 -08:00
mwells
87285ba3cd
use gbmemcpy not memcpy so we can get profiler working again
...
since memcpy can't be interrupted and backtrace() called.
2015-01-13 12:25:42 -07:00
Matt Wells
f52e163fb0
fix a couple bugs.
...
added out of sync indicator.
2014-12-17 14:28:32 -08:00
Matt
2977845375
simplify Inlinks class in LinkInfo.cpp.
...
fix some more 64-bit related cores.
2014-11-18 16:50:31 -08:00
Matt
931a1c4bc6
good checkpoint. quite a few fixes.
2014-11-17 18:13:36 -08:00
Matt
4c19453ea9
working with -m32 for basic testing.
...
compiles for 64-bit.
2014-11-12 11:38:37 -08:00
Matt
96b8197ad3
now it compiles with -m32
2014-11-10 14:45:11 -08:00
Matt Wells
e7dd8f7956
replace long long with int64_t
2014-10-30 13:36:39 -06:00
Matt Wells
98ce40967f
more collection swapping fixes
2014-09-29 21:52:58 -07:00