Commit Graph

117 Commits

Author SHA1 Message Date
Matt
09de59f026 do not store cblock, etc. tags into tagdb to save
disk space. added tagdb file cache for better performance,
less disk accesses. will help reduce disk load.
put file cache sizes in master controls and if they change
then update the cache size dynamically.
2015-09-10 12:46:00 -06:00
Matt Wells
3a522f52d3 no longer use read size based thread queues.
re-added merge disk read thread queue.
fixed attemptMergeAll().
2015-09-04 16:14:18 -07:00
Matt
a1a38bd2b2 fix attempt merge some more 2015-09-02 07:21:32 -06:00
Matt
a9854394ef attempt merge clusterdb forgotten 2015-09-01 09:16:18 -06:00
Matt Wells
6e0dfd5a23 fix merge attempts 2015-09-01 01:07:43 -07:00
Matt Wells
c766e40357 set g_errno to ENOCOLLREC if getRdbBase() returns null. 2015-08-25 11:41:17 -07:00
Matt Wells
e140b001d8 try merging 1000 collections per call to preserve cpu 2015-08-25 08:25:55 -07:00
Matt
49e9f5a827 fixes for umsg00 electric fence.
take out catdb/statsdb merging attempts.
2015-08-24 11:35:33 -06:00
Matt Wells
bb16341f51 try to fix core dumps. not sure how
mem is getting corrupted.
2015-08-22 08:52:28 -07:00
Matt Wells
74ec812959 try to fix core from adding a file that already exists.
just return an error now. hopefully merge will try again later.
also core if you try to write recs to an rdbmap that
has already had its memory footprint reduced so we can find
that overrun bug.
2015-08-21 14:00:40 -07:00
Matt Wells
f8fb266844 fix new merging algo. 2015-08-16 10:11:21 -07:00
Matt Wells
178721d35b speed up getFileSize() by using stat() func again.
despam logs at startup.
do not perm check every coll dir, only first 100, on
startup to make things faster.
2015-08-15 22:21:15 -07:00
Matt
bff643b555 use a linked list of merge candidates to
make attemptMergeAll() much much faster.
2015-08-15 19:26:37 -06:00
Matt
d9422d8b0e get rid of limits on file sizes. dynamically allocate
file names and fixed-size File array in BigFile class. should
save gigabytes of memory in many-collection systems with
1+ million files or so.
2015-08-14 20:14:50 -06:00
Matt
a1ed368d82 bring back max mem control into master controls.
it's useful to limit per process mem usage to prevent
oom killer because we can't save if we get killed.
overhaul diskpagecache to just use rdbcache. much simpler
and faster, but disabled for now until debugged more.
reduce min files to merge for crawlbot collections so
they stay more tightly merged to conserve fds and mem.
improved logDebugDisk msgs.
overhauled File.cpp fd pool. now it is way faster and
doesn't use any extra mem. much simpler too. although
could be sped up a little by using a linked list, but
probably is not significant enough to warrant doing right now.
increase mem ptr table from 3M to 8M slots. should really make
dynamic though. fix core from null msg20s[0]->m_r.
only call attemptMergeAll once every 60 seconds really.
do not attempt merge if already merging.
2015-08-14 12:58:54 -06:00
Matt Wells
f8047ac5ef speed up Rdb::attemptMargeAll() because it is a problem
according to the profiler when we got tens of thousands
of collections.
2015-08-08 12:27:18 -07:00
Matt Wells
d050fb81b5 fix rebuild code to rebuild spider status docs in index,
and to remove them from titledb if user has disabled
'index spider replies' in the spider controls to save disk.
made them off by default by now since they use some disk.
2015-06-16 16:29:26 -06:00
Matt Wells
b08d12a11e fix cores associated with new spider status docs. 2015-04-07 10:33:54 -07:00
Matt
90456222b6 now we add the spider status docs as json documents.
so you can facet/sortby the various fields, etc.
2015-03-19 16:17:36 -06:00
Matt
7cf549bf2a fix spider request overflow/dropping algo. 2015-03-10 13:07:00 -07:00
Matt Wells
eccb969e5b put in some fixes to deal with doledb tree
that seems to have m_data[i] and m_data[j]
pointing to the same thing. wtf? anyway,
deal with that. it should fix the tree or
something automatically at startup?
2015-03-08 20:36:13 -07:00
Matt Wells
79879976fa try to fix a couple cores. one when parsing
bad json. the other in reclaiming doledb tree mem.
2015-03-08 08:56:10 -07:00
mwells
ada18e648b try to fix core in reclaiming doledb mem 2015-02-20 08:11:49 -07:00
Matt
24eac820d5 fixed bad deletenode call causing dups in
winnertree.
2015-02-12 16:12:23 -08:00
Matt
735667be22 fixed Rdb::reclaimMemFromDeletedTreeNodes() 2015-02-12 14:23:16 -08:00
Matt
c8fb1af5c4 added tree mem reclaimer for doledb since it
is now a tree-only rdb.
2015-02-12 12:12:25 -08:00
Matt
f6723ddaa3 new much faster spider. cache the winner tree
basically. TODO: need to update cache if
new spiderrequests are added that should be
in the cached winner tree.
2015-02-10 21:27:21 -08:00
Matt
12cdc7c9d4 more spider speed ups based on profiler data.
added Rdb::getCollNumTotalRecs() function.
2015-02-10 12:00:04 -08:00
mwells
87285ba3cd use gbmemcpy not memcpy so we can get profiler working again
since memcpy can't be interrupted and backtrace() called.
2015-01-13 12:25:42 -07:00
Matt Wells
f52e163fb0 fix a couple bugs.
added out of sync indicator.
2014-12-17 14:28:32 -08:00
Matt
2977845375 simplify Inlinks class in LinkInfo.cpp.
fix some more 64-bit related cores.
2014-11-18 16:50:31 -08:00
Matt
931a1c4bc6 good checkpoint. quite a few fixes. 2014-11-17 18:13:36 -08:00
Matt
4c19453ea9 working with -m32 for basic testing.
compiles for 64-bit.
2014-11-12 11:38:37 -08:00
Matt
96b8197ad3 now it compiles with -m32 2014-11-10 14:45:11 -08:00
Matt Wells
e7dd8f7956 replace long long with int64_t 2014-10-30 13:36:39 -06:00
Matt Wells
98ce40967f more collection swapping fixes 2014-09-29 21:52:58 -07:00
Matt Wells
8c6d216a14 lots of fixes for collection swapping. 2014-09-29 20:16:39 -07:00
mwells
bca24fb0e6 fix collection swap logic a bunch. seems to work now. 2014-09-29 13:05:20 -07:00
mwells
257a7e3c10 first stab at swapping out collection recs
to save memory when # of collections is high
2014-09-29 11:37:05 -07:00
mwells
10f897e5be use gbsystem() not system() so it can turn off alarms
since it forks.
2014-09-11 05:01:55 -07:00
mwells
38cef7d52e fix # docs and recs bug. 2014-08-28 07:45:43 -07:00
mwells
c3699f0da5 fix bugs found from qa tests. 2014-08-25 14:34:30 -07:00
mwells
e45c0d32f6 Merge branch 'diffbot-testing' into testing 2014-08-15 17:05:22 -07:00
Matt Wells
2af299da2c various fixes.
prioritize process only urls over crawl urls to get data faster.
do not merge on high negative rec concentration. we need to fix that more.
allow simplified redirs again for custom crawls to avoid too many dups.
raise crawlinfo delay from 1 sec to 5 secs to reduce network usage
for now. add back in injection enabled parm, but hidden.
2014-08-15 10:27:50 -07:00
Matt Wells
d0bc187a77 more core fixes. more stability. 2014-07-16 12:52:51 -07:00
Matt Wells
6b797f5023 more core stability fixes. prevent core dumps 2014-07-16 12:07:39 -07:00
Matt Wells
8ac691f324 fix merging getting clogged by so many
collections tring to merge tagdb at once
2014-06-05 21:27:33 -07:00
Matt Wells
4298e4e752 sanity checks for debugging duplicate
titledb file bug.
2014-06-04 12:15:12 -07:00
mwells
45b8bb3421 log msg cleanups 2014-05-11 21:55:44 -07:00
mwells
6e922722da tree repair logic. 2014-05-10 12:32:01 -07:00