Commit Graph

131 Commits

Author SHA1 Message Date
Matt Wells
136b8842db fix more data corruption bugs. hopefully
will dump out all the collections this time and
not leave any in the tree, otherwise, especially if there
are a lot left behind, they get corrupted.
2016-03-20 21:04:01 -07:00
Matt Wells
61ef806dea hash bang fix.
detect more corruption.
don't dump titledb and spiderdb at same time,
seems to reduce corruption in rdbmem.
2016-03-20 12:50:43 -07:00
Matt Wells
fc495a5bf5 fix dump core when collection deleted while dumping 2016-03-18 06:46:38 -07:00
Matt Wells
8bc653c31c after dump completes scan tree to ensure all nodes
reference secondary mem ptr so they don't get their
data overwritten.
2016-03-17 10:09:49 -07:00
Matt Wells
8a65d21371 fix the source of lots of corruption in spiderdb and titledb.
rdbmem.cpp was storing in secondary mem which got reset when
dump completed. also do not add keys that are in collnum and
key range of list currently being dumped, return ETRYAGAIN.
added verify writes parm. clean out tree of titledb and spiderdb
corruption on startup.
2016-03-15 15:54:12 -07:00
Matt Wells
1d2dfe1456 bring back max doc len parms.
index gbssIsContentTruncated field.
fix 30-day wait for >= 3 errors.
fix gbss formatting some more.
2016-02-08 14:10:04 -08:00
Matt Wells
b049554531 added some more quickpolls.
improved heartbeat log msg.
timed pthread_join.
brought back max heart beat delay parm.
2015-12-04 09:02:03 -08:00
Matt Wells
b8d57dcd3a fix bug of dumping too many files to disk and not
being able to merge, and corrupting RdbBase::m_files[]
array and associated arrays.
2015-11-17 09:52:41 -08:00
Matt
37cc4f2ba8 Merge branch 'diffbot-testing' into testing 2015-11-09 11:13:42 -07:00
Matt Wells
3db9ae5d4d rebuild fix 2015-11-07 13:14:38 -08:00
Matt
e57e3481b4 fix innerloop strangeness when counting keys in buckets 2015-10-14 13:52:42 -06:00
Matt
100888d691 fix file/dir creation permissions bugs 2015-09-21 12:44:41 -06:00
Matt
74cde33a3a just use the user's umask val for all file/dir creation 2015-09-21 11:33:38 -06:00
Matt
ce7b06fc4d all files made are now group writable.
if you don't like that then you can make
a special group and set the directory just
group writable for that group using chmod g+s <dir>.
2015-09-21 11:19:34 -06:00
Matt
09de59f026 do not store cblock, etc. tags into tagdb to save
disk space. added tagdb file cache for better performance,
less disk accesses. will help reduce disk load.
put file cache sizes in master controls and if they change
then update the cache size dynamically.
2015-09-10 12:46:00 -06:00
Matt Wells
3a522f52d3 no longer use read size based thread queues.
re-added merge disk read thread queue.
fixed attemptMergeAll().
2015-09-04 16:14:18 -07:00
Matt
a1a38bd2b2 fix attempt merge some more 2015-09-02 07:21:32 -06:00
Matt
a9854394ef attempt merge clusterdb forgotten 2015-09-01 09:16:18 -06:00
Matt Wells
6e0dfd5a23 fix merge attempts 2015-09-01 01:07:43 -07:00
Matt Wells
c766e40357 set g_errno to ENOCOLLREC if getRdbBase() returns null. 2015-08-25 11:41:17 -07:00
Matt Wells
e140b001d8 try merging 1000 collections per call to preserve cpu 2015-08-25 08:25:55 -07:00
Matt
49e9f5a827 fixes for umsg00 electric fence.
take out catdb/statsdb merging attempts.
2015-08-24 11:35:33 -06:00
Matt Wells
bb16341f51 try to fix core dumps. not sure how
mem is getting corrupted.
2015-08-22 08:52:28 -07:00
Matt Wells
74ec812959 try to fix core from adding a file that already exists.
just return an error now. hopefully merge will try again later.
also core if you try to write recs to an rdbmap that
has already had its memory footprint reduced so we can find
that overrun bug.
2015-08-21 14:00:40 -07:00
Matt Wells
f8fb266844 fix new merging algo. 2015-08-16 10:11:21 -07:00
Matt Wells
178721d35b speed up getFileSize() by using stat() func again.
despam logs at startup.
do not perm check every coll dir, only first 100, on
startup to make things faster.
2015-08-15 22:21:15 -07:00
Matt
bff643b555 use a linked list of merge candidates to
make attemptMergeAll() much much faster.
2015-08-15 19:26:37 -06:00
Matt
d9422d8b0e get rid of limits on file sizes. dynamically allocate
file names and fixed-size File array in BigFile class. should
save gigabytes of memory in many-collection systems with
1+ million files or so.
2015-08-14 20:14:50 -06:00
Matt
a1ed368d82 bring back max mem control into master controls.
it's useful to limit per process mem usage to prevent
oom killer because we can't save if we get killed.
overhaul diskpagecache to just use rdbcache. much simpler
and faster, but disabled for now until debugged more.
reduce min files to merge for crawlbot collections so
they stay more tightly merged to conserve fds and mem.
improved logDebugDisk msgs.
overhauled File.cpp fd pool. now it is way faster and
doesn't use any extra mem. much simpler too. although
could be sped up a little by using a linked list, but
probably is not significant enough to warrant doing right now.
increase mem ptr table from 3M to 8M slots. should really make
dynamic though. fix core from null msg20s[0]->m_r.
only call attemptMergeAll once every 60 seconds really.
do not attempt merge if already merging.
2015-08-14 12:58:54 -06:00
Matt Wells
f8047ac5ef speed up Rdb::attemptMargeAll() because it is a problem
according to the profiler when we got tens of thousands
of collections.
2015-08-08 12:27:18 -07:00
Matt Wells
d050fb81b5 fix rebuild code to rebuild spider status docs in index,
and to remove them from titledb if user has disabled
'index spider replies' in the spider controls to save disk.
made them off by default by now since they use some disk.
2015-06-16 16:29:26 -06:00
Matt Wells
b08d12a11e fix cores associated with new spider status docs. 2015-04-07 10:33:54 -07:00
Matt
90456222b6 now we add the spider status docs as json documents.
so you can facet/sortby the various fields, etc.
2015-03-19 16:17:36 -06:00
Matt
7cf549bf2a fix spider request overflow/dropping algo. 2015-03-10 13:07:00 -07:00
Matt Wells
eccb969e5b put in some fixes to deal with doledb tree
that seems to have m_data[i] and m_data[j]
pointing to the same thing. wtf? anyway,
deal with that. it should fix the tree or
something automatically at startup?
2015-03-08 20:36:13 -07:00
Matt Wells
79879976fa try to fix a couple cores. one when parsing
bad json. the other in reclaiming doledb tree mem.
2015-03-08 08:56:10 -07:00
mwells
ada18e648b try to fix core in reclaiming doledb mem 2015-02-20 08:11:49 -07:00
Matt
24eac820d5 fixed bad deletenode call causing dups in
winnertree.
2015-02-12 16:12:23 -08:00
Matt
735667be22 fixed Rdb::reclaimMemFromDeletedTreeNodes() 2015-02-12 14:23:16 -08:00
Matt
c8fb1af5c4 added tree mem reclaimer for doledb since it
is now a tree-only rdb.
2015-02-12 12:12:25 -08:00
Matt
f6723ddaa3 new much faster spider. cache the winner tree
basically. TODO: need to update cache if
new spiderrequests are added that should be
in the cached winner tree.
2015-02-10 21:27:21 -08:00
Matt
12cdc7c9d4 more spider speed ups based on profiler data.
added Rdb::getCollNumTotalRecs() function.
2015-02-10 12:00:04 -08:00
mwells
87285ba3cd use gbmemcpy not memcpy so we can get profiler working again
since memcpy can't be interrupted and backtrace() called.
2015-01-13 12:25:42 -07:00
Matt Wells
f52e163fb0 fix a couple bugs.
added out of sync indicator.
2014-12-17 14:28:32 -08:00
Matt
2977845375 simplify Inlinks class in LinkInfo.cpp.
fix some more 64-bit related cores.
2014-11-18 16:50:31 -08:00
Matt
931a1c4bc6 good checkpoint. quite a few fixes. 2014-11-17 18:13:36 -08:00
Matt
4c19453ea9 working with -m32 for basic testing.
compiles for 64-bit.
2014-11-12 11:38:37 -08:00
Matt
96b8197ad3 now it compiles with -m32 2014-11-10 14:45:11 -08:00
Matt Wells
e7dd8f7956 replace long long with int64_t 2014-10-30 13:36:39 -06:00
Matt Wells
98ce40967f more collection swapping fixes 2014-09-29 21:52:58 -07:00