Commit Graph

261 Commits

Author SHA1 Message Date
Matt Wells
34b33f478a added gb rwtest and exposed seektest and thrutest in gb -h.
use -o sync when mounting ssds to avoid really slow and spiky
linux file/page cache. allow launching of more than 1 non-disk
thread again. should help with unlinking, intersects, etc.
2015-11-30 21:29:17 -07:00
Matt
b92853ae50 update built-in gb cmd line tests for ssd performance. 2015-11-30 18:47:44 -07:00
Matt
c37ab2697e Merge branch 'ia' into testing
Conflicts:
	Parms.cpp
	Threads.cpp
2015-10-12 10:40:16 -06:00
Matt
1708d0608c some fixes for detecting corrupted injection requests.
seems to be very common.
2015-10-07 21:47:10 -06:00
Matt
e2fad81227 Merge branch 'testing' of github.com:gigablast/open-source-search-engine into testing 2015-09-25 08:24:54 -06:00
Matt
ce7b06fc4d all files made are now group writable.
if you don't like that then you can make
a special group and set the directory just
group writable for that group using chmod g+s <dir>.
2015-09-21 11:19:34 -06:00
Matt
786ba76d10 Merge branch 'ia-zak' into testing
Conflicts:
	main.cpp
2015-09-21 10:12:58 -06:00
Zak Betz
519b2c4f42 Fix repeating xn--xn-- when there are spaces in the domain.
Make gb unittest take a name of the unit test to run.
2015-09-14 10:24:22 -06:00
Zak Betz
5622ca47ee Work on non-ascii domain names. It works on correct inputs, but
will crash on some non correct inputs, so it is forced to be disabled.
2015-09-14 00:34:44 -06:00
Zak Betz
911b2837ca Merge branch 'testing' of https://github.com/gigablast/open-source-search-engine into testing
Conflicts:
	Makefile
	Spider.cpp
2015-09-12 15:51:59 -06:00
Zak Betz
aefd8772cf Merge branch 'ia-zak' of https://github.com/gigablast/open-source-search-engine into ia-zak 2015-09-10 21:32:36 -06:00
Matt
09de59f026 do not store cblock, etc. tags into tagdb to save
disk space. added tagdb file cache for better performance,
less disk accesses. will help reduce disk load.
put file cache sizes in master controls and if they change
then update the cache size dynamically.
2015-09-10 12:46:00 -06:00
Zak Betz
ae58ab98fb Merge branch 'ia-zak' of https://github.com/gigablast/open-source-search-engine into ia-zak 2015-09-01 08:48:38 -06:00
Matt Wells
34a4068ddb fix gb start script 2015-09-01 01:17:14 -07:00
Zak Betz
b199c67355 Merge branch 'ia-zak' of https://github.com/gigablast/open-source-search-engine into ia-zak 2015-08-31 23:19:45 -06:00
Zak Betz
5a7b01585d Work on graph axis autoscaling. 2015-08-31 23:19:28 -06:00
Matt
efa93aad18 prevent double ./gb start calls from messing
things up.
2015-08-31 11:13:33 -06:00
Zak Betz
60c4c5c437 Add nospider and noquery options. 2015-08-25 13:48:20 -06:00
Matt Wells
6c14d659b8 move 2nd occurence of same collnum_t collection id
on the same shard to the trash/ subdir.
put call to syncParmsWithHost0 in a sleep loop in case
host #0 has error, although the timeout is really high.
2015-08-18 18:59:01 -07:00
Matt Wells
9642947136 fix so host #0 will delete then re-add collections
that use the same collnum but have a different name.
fixed some unlabelled safebufs.
fix core when deleting collnum from tree/buckets that
is higher than Collectiondb.m_numRecs.
fix File::m_filename safebufs that were not freed on exit.
2015-08-18 14:09:16 -07:00
Matt Wells
178721d35b speed up getFileSize() by using stat() func again.
despam logs at startup.
do not perm check every coll dir, only first 100, on
startup to make things faster.
2015-08-15 22:21:15 -07:00
Matt
bff643b555 use a linked list of merge candidates to
make attemptMergeAll() much much faster.
2015-08-15 19:26:37 -06:00
Matt
a1ed368d82 bring back max mem control into master controls.
it's useful to limit per process mem usage to prevent
oom killer because we can't save if we get killed.
overhaul diskpagecache to just use rdbcache. much simpler
and faster, but disabled for now until debugged more.
reduce min files to merge for crawlbot collections so
they stay more tightly merged to conserve fds and mem.
improved logDebugDisk msgs.
overhauled File.cpp fd pool. now it is way faster and
doesn't use any extra mem. much simpler too. although
could be sped up a little by using a linked list, but
probably is not significant enough to warrant doing right now.
increase mem ptr table from 3M to 8M slots. should really make
dynamic though. fix core from null msg20s[0]->m_r.
only call attemptMergeAll once every 60 seconds really.
do not attempt merge if already merging.
2015-08-14 12:58:54 -06:00
Matt
5c67cbe65d undo 2015-08-12 08:43:44 -07:00
Matt
444ebeeb65 one scp install per host 2015-08-12 08:39:01 -07:00
Matt Wells
f8047ac5ef speed up Rdb::attemptMargeAll() because it is a problem
according to the profiler when we got tens of thousands
of collections.
2015-08-08 12:27:18 -07:00
Matt
16fd428887 fix more cores from the dynamic query size changes.
add how many query terms we truncated in the json/xml replies.
document those fields as well.
2015-07-18 14:15:47 -06:00
Matt
815bd7ce0a quite a few bug fixes. 2015-07-02 17:42:05 -06:00
Matt Wells
d050fb81b5 fix rebuild code to rebuild spider status docs in index,
and to remove them from titledb if user has disabled
'index spider replies' in the spider controls to save disk.
made them off by default by now since they use some disk.
2015-06-16 16:29:26 -06:00
Matt Wells
7462b0cd84 gb -h fix 2015-04-22 12:51:32 -06:00
Matt
4a43e1387e better fixes for core from sig alarms 2015-04-13 10:28:43 -06:00
Matt
f5a7423336 fix bug of never calling callback 2015-04-13 09:56:21 -06:00
Matt
2ce107e4be keep track of how many times the host exited/cored as an exponent
to the 'x' in the hosts table. this way we can detect hosts that
have restarted many times and fix them.
2015-04-01 16:28:58 -06:00
Matt
2839c38dac warc injection fixes 2015-03-07 15:01:47 -08:00
Matt Wells
56d65a7c55 adjust dump tagdb cmdline cmd to start at
a specified site to aid us in fixing
sitelinks.txt missing some sites bug.
2015-03-05 19:27:36 -08:00
Matt Wells
93b505e7bb fix isCollAdmin() function to return false
if not using coll passwords. they'll have to
be master admin.
2015-03-02 07:47:05 -08:00
Matt
064d022d6f call mkdir on 'gb install' cmd. 2015-02-25 19:49:36 -07:00
Matt
24eac820d5 fixed bad deletenode call causing dups in
winnertree.
2015-02-12 16:12:23 -08:00
Matt
c009430b6c more fixes for new spider updates 2015-02-11 21:54:36 -08:00
Matt Wells
01687fcb0e fix gb thrutest disk tests 2015-02-09 10:29:08 -08:00
Matt
afbe35c5a9 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing 2015-02-07 12:07:52 -08:00
Matt
580736d766 support arc injections 2015-02-07 12:07:42 -08:00
Matt
6c1c2c66c4 added dstart to gb -h help menu 2015-02-05 12:39:13 -08:00
Matt
93fce690d6 more speedups. do not calls sigprocmask in main thread
before pthread_create(). instead call pthread_sigmask()
from thread like we were doing already for SIGINT.
2015-02-03 13:39:23 -08:00
Matt
79a1d632cd need to have sitelinks.txt present in dir. 2015-01-31 22:58:05 -07:00
Matt
0c3ad724f8 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing 2015-01-31 15:18:30 -07:00
Matt
cad1d3d076 added support for sitelinks.txt file 2015-01-31 15:18:06 -07:00
Matt
a87b582145 little fix 2015-01-29 19:26:15 -07:00
Matt Wells
ec55540432 fix gb dump sitelinks 2015-01-25 19:33:31 -08:00
Matt Wells
7c4a625779 fix dumping tagdb for sitelinks.txt 2015-01-25 18:04:15 -08:00