Commit Graph

87 Commits

Author SHA1 Message Date
Dmitry Smirnov
b1ace63607 codespell: spelling corrections 2021-05-06 01:52:55 +10:00
Matt Wells
ec5c38bab5 fix urgent merge mode bug some more?
limit spiders to 5 per custom crawl coll per shard.
2015-11-24 08:51:18 -08:00
Matt Wells
b8d57dcd3a fix bug of dumping too many files to disk and not
being able to merge, and corrupting RdbBase::m_files[]
array and associated arrays.
2015-11-17 09:52:41 -08:00
Matt
100888d691 fix file/dir creation permissions bugs 2015-09-21 12:44:41 -06:00
Matt
74cde33a3a just use the user's umask val for all file/dir creation 2015-09-21 11:33:38 -06:00
Matt
ce7b06fc4d all files made are now group writable.
if you don't like that then you can make
a special group and set the directory just
group writable for that group using chmod g+s <dir>.
2015-09-21 11:19:34 -06:00
Matt
09de59f026 do not store cblock, etc. tags into tagdb to save
disk space. added tagdb file cache for better performance,
less disk accesses. will help reduce disk load.
put file cache sizes in master controls and if they change
then update the cache size dynamically.
2015-09-10 12:46:00 -06:00
Matt Wells
647d004c04 fix core from sending a url alert, then customer deleting
collection before email alert reply comes back. then it
comes back to a delete collrec and cores.
2015-09-08 15:57:46 -07:00
Matt Wells
90f79a31e1 prevent log spam 2015-09-04 16:49:45 -07:00
Matt Wells
3a522f52d3 no longer use read size based thread queues.
re-added merge disk read thread queue.
fixed attemptMergeAll().
2015-09-04 16:14:18 -07:00
Matt Wells
bb16341f51 try to fix core dumps. not sure how
mem is getting corrupted.
2015-08-22 08:52:28 -07:00
Matt Wells
74ec812959 try to fix core from adding a file that already exists.
just return an error now. hopefully merge will try again later.
also core if you try to write recs to an rdbmap that
has already had its memory footprint reduced so we can find
that overrun bug.
2015-08-21 14:00:40 -07:00
Matt Wells
3a67480b63 for BigFile::m_fileBuf array of Files make sure
to clear it for files that do not exist so
File::m_calledSet is false on them. so BigFile::getFile(j)
returns a File ptr whose m_calledSet is false if the file
does not exist on disk. and BigFile::removePart(j) sets
((File *)m_fileBuf.m_bufStart)[j].m_calledSet = false.
2015-08-16 19:40:08 -07:00
Matt Wells
e671be17ca fix log msg 2015-08-16 17:14:21 -07:00
Matt
bff643b555 use a linked list of merge candidates to
make attemptMergeAll() much much faster.
2015-08-15 19:26:37 -06:00
Matt
d9422d8b0e get rid of limits on file sizes. dynamically allocate
file names and fixed-size File array in BigFile class. should
save gigabytes of memory in many-collection systems with
1+ million files or so.
2015-08-14 20:14:50 -06:00
Matt
a1ed368d82 bring back max mem control into master controls.
it's useful to limit per process mem usage to prevent
oom killer because we can't save if we get killed.
overhaul diskpagecache to just use rdbcache. much simpler
and faster, but disabled for now until debugged more.
reduce min files to merge for crawlbot collections so
they stay more tightly merged to conserve fds and mem.
improved logDebugDisk msgs.
overhauled File.cpp fd pool. now it is way faster and
doesn't use any extra mem. much simpler too. although
could be sped up a little by using a linked list, but
probably is not significant enough to warrant doing right now.
increase mem ptr table from 3M to 8M slots. should really make
dynamic though. fix core from null msg20s[0]->m_r.
only call attemptMergeAll once every 60 seconds really.
do not attempt merge if already merging.
2015-08-14 12:58:54 -06:00
Matt Wells
840ca3fea1 fix rdbmap reduce mem thing 2015-08-08 15:43:09 -07:00
Matt Wells
f8047ac5ef speed up Rdb::attemptMargeAll() because it is a problem
according to the profiler when we got tens of thousands
of collections.
2015-08-08 12:27:18 -07:00
Matt Wells
68d04b239f auto move dat/map files we can't regen map for
to trash subdir. later: try to repair them better.
2015-06-17 06:55:49 -07:00
Matt Wells
5f84ad2c5d raise mem table ptrs from 1.2M to 3M. shard 22 was suffering
really slow mem ops because of it.
2015-06-14 11:34:13 -07:00
Matt Wells
0c88ebba9b removed buggy close least used linked list logic.
was causing data corruption in reads and writes.
go to urgent shutdown mode if on 10th try so gb
will actually exit. do not startup if there is
critical data corruption.
2015-04-14 15:26:46 -07:00
Matt Wells
05a66cc367 fix bug of not able to get ip address because
peeksize is too big.
2015-04-07 12:29:19 -07:00
Matt Wells
b08d12a11e fix cores associated with new spider status docs. 2015-04-07 10:33:54 -07:00
Matt Wells
051a8f0ad6 dont attempt merge in quickpoll. just return do not
core.
2015-03-02 07:26:38 -08:00
mwells
87285ba3cd use gbmemcpy not memcpy so we can get profiler working again
since memcpy can't be interrupted and backtrace() called.
2015-01-13 12:25:42 -07:00
Matt Wells
b693fe1530 fix bugs related to restarting a cored shard
during repair mode. need to be able to resume
repair/rebuild scan.
2015-01-06 11:28:55 -08:00
Matt Wells
d19ee6ceea Merge branch 'diffbot' into diffbot-testing
Conflicts:
	Collectiondb.h
2014-12-11 08:40:55 -08:00
Matt Wells
7d67f104fb emergency fixes 2014-12-11 08:39:26 -08:00
mwells
4f71a95da5 reinstantiate linkdb min files to merge parm. 2014-12-11 07:20:15 -07:00
Matt
5b92b5f6d5 now term freqs are almost exact for qatest123.
sometimes an off by 1 bug. we should really call
msg5 to get the list w/o thread and get a truly
exact term freq for qatest123 for consistency.
that would be in Posdb.cpp::getTermFreq()
2014-11-25 15:54:15 -07:00
Matt
adcef39376 Merge branch 'diffbot-testing' into diffbot-matt
Conflicts:
	Collectiondb.cpp
	Collectiondb.h
	Conf.cpp
	Conf.h
	Msg39.cpp
	PageEvents.cpp
	PageResults.cpp
	PageTurk.cpp
	Pages.cpp
	Parms.cpp
	Posdb.cpp
	Proxy.cpp
	Query.cpp
	Query.h
	RdbBase.cpp
	RdbMap.cpp
	Repair.cpp
	Repair.h
	SafeBuf.cpp
	Spider.cpp
	Tagdb.cpp
	TopTree.cpp
	XmlDoc.cpp
	main.cpp
2014-11-20 16:53:07 -08:00
Matt
bf2013345d fix up diskpagecache. how did it work before
without storing the vfd? because linked list was
over many different vfds, but the map from one diskpage
to a mem offset was specific to each vfd.
2014-11-20 15:05:34 -08:00
Matt
d9f129dcf7 fix "4" bug in RdbBase.cpp 2014-11-18 11:58:53 -08:00
Matt
4e8a42e024 text replacements for bad int32_t substitutions 2014-11-17 18:24:38 -08:00
Matt
931a1c4bc6 good checkpoint. quite a few fixes. 2014-11-17 18:13:36 -08:00
Matt
96b8197ad3 now it compiles with -m32 2014-11-10 14:45:11 -08:00
Matt Wells
444ed14cde reduce mem usage in rdbmap. useful
for when there are thousands of tiny collections.
2014-11-07 08:49:08 -08:00
Matt Wells
e7dd8f7956 replace long long with int64_t 2014-10-30 13:36:39 -06:00
mwells
bca24fb0e6 fix collection swap logic a bunch. seems to work now. 2014-09-29 13:05:20 -07:00
Matt Wells
fbd288d2a5 fix inifinite loop when rebalancing.
was trying to read more bytes from titlerec than
what it could support.
2014-09-11 12:11:34 -07:00
mwells
caee238c46 fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
Matt Wells
2af299da2c various fixes.
prioritize process only urls over crawl urls to get data faster.
do not merge on high negative rec concentration. we need to fix that more.
allow simplified redirs again for custom crawls to avoid too many dups.
raise crawlinfo delay from 1 sec to 5 secs to reduce network usage
for now. add back in injection enabled parm, but hidden.
2014-08-15 10:27:50 -07:00
Matt Wells
56af753c3e fixed nasty bug of resetting RdbBases for
random collnums, causing data loss and corruption.
2014-06-09 10:16:29 -07:00
Matt Wells
172d7071a7 fix to rename tagdb0000.002.dat 2014-06-05 22:21:41 -07:00
Matt Wells
8ac691f324 fix merging getting clogged by so many
collections tring to merge tagdb at once
2014-06-05 21:27:33 -07:00
Matt Wells
7b4b8b27bd more debug msgs 2014-06-05 14:58:20 -07:00
Matt Wells
1fe2c94322 add some debug notes 2014-06-05 12:26:06 -07:00
Matt Wells
4298e4e752 sanity checks for debugging duplicate
titledb file bug.
2014-06-04 12:15:12 -07:00
Matt Wells
edbd61b0c5 thread fixes. if pthread_create fails then
keep thread queue and just return. will try to
relaunch later. do not count delete keys towards
shard rebalance count.
2014-03-15 20:07:02 -07:00