Commit Graph

22 Commits

Author SHA1 Message Date
Matt Wells
178721d35b speed up getFileSize() by using stat() func again.
despam logs at startup.
do not perm check every coll dir, only first 100, on
startup to make things faster.
2015-08-15 22:21:15 -07:00
Matt
d9422d8b0e get rid of limits on file sizes. dynamically allocate
file names and fixed-size File array in BigFile class. should
save gigabytes of memory in many-collection systems with
1+ million files or so.
2015-08-14 20:14:50 -06:00
Matt
a1ed368d82 bring back max mem control into master controls.
it's useful to limit per process mem usage to prevent
oom killer because we can't save if we get killed.
overhaul diskpagecache to just use rdbcache. much simpler
and faster, but disabled for now until debugged more.
reduce min files to merge for crawlbot collections so
they stay more tightly merged to conserve fds and mem.
improved logDebugDisk msgs.
overhauled File.cpp fd pool. now it is way faster and
doesn't use any extra mem. much simpler too. although
could be sped up a little by using a linked list, but
probably is not significant enough to warrant doing right now.
increase mem ptr table from 3M to 8M slots. should really make
dynamic though. fix core from null msg20s[0]->m_r.
only call attemptMergeAll once every 60 seconds really.
do not attempt merge if already merging.
2015-08-14 12:58:54 -06:00
Matt
dfdea910ad fix fix 2015-06-19 08:50:59 -07:00
Matt Wells
0493e7a899 use linked lists for closing least used fds for speed.
right now just log if it differs from current algo.
2015-06-18 09:19:13 -07:00
Matt
3191980f49 the new urls.csv format is ready.
added url discovered time to gbssdocs so we know when
we first found a url. also added to new urls.csv.
fixed spiderdb list deduping so as not to discard
the oldest spider request any more so we keep our
discovered time in tact.
2015-04-15 12:13:27 -06:00
Matt Wells
0c88ebba9b removed buggy close least used linked list logic.
was causing data corruption in reads and writes.
go to urgent shutdown mode if on 10th try so gb
will actually exit. do not startup if there is
critical data corruption.
2015-04-14 15:26:46 -07:00
Matt
13d0361756 try to speed up host #4 on seraph 2015-04-10 09:20:18 -06:00
Matt
bc6f065457 fix getFileSize(). fix warc injector. 2015-01-20 19:12:58 -07:00
mwells
87285ba3cd use gbmemcpy not memcpy so we can get profiler working again
since memcpy can't be interrupted and backtrace() called.
2015-01-13 12:25:42 -07:00
Matt
f488be4ede make new logfile when current logfile hits 1GB.
this will save disk space so we can delete the old
log files that can be many GBs in size.
2015-01-05 11:29:49 -08:00
Matt
4e8a42e024 text replacements for bad int32_t substitutions 2014-11-17 18:24:38 -08:00
Matt
931a1c4bc6 good checkpoint. quite a few fixes. 2014-11-17 18:13:36 -08:00
Matt
96b8197ad3 now it compiles with -m32 2014-11-10 14:45:11 -08:00
Matt Wells
e7dd8f7956 replace long long with int64_t 2014-10-30 13:36:39 -06:00
mwells
e10f6cdd61 cygwin fixes 2014-09-26 23:04:16 -07:00
mwells
778e67130f File::set() fix for //'s 2014-06-08 15:24:30 -07:00
Matt Wells
8ac691f324 fix merging getting clogged by so many
collections tring to merge tagdb at once
2014-06-05 21:27:33 -07:00
Matt Wells
1b5057ad42 log cleanups mostly.
took out disk page cache,
kinda buggy... need to fix at some point.
2013-12-18 10:57:18 -08:00
mwells
76bb3d05e1 clean up logging so i can see what's going on 2013-12-10 16:41:30 -08:00
mwells
d11e9520bd couple fixes to makefile etc. 2013-09-28 16:37:39 -06:00
Matt Wells
f6e560c1f4 Initial file population. 2013-08-02 13:12:24 -07:00