Matt Wells
34b33f478a
added gb rwtest and exposed seektest and thrutest in gb -h.
...
use -o sync when mounting ssds to avoid really slow and spiky
linux file/page cache. allow launching of more than 1 non-disk
thread again. should help with unlinking, intersects, etc.
2015-11-30 21:29:17 -07:00
Matt
b92853ae50
update built-in gb cmd line tests for ssd performance.
2015-11-30 18:47:44 -07:00
Matt
c37ab2697e
Merge branch 'ia' into testing
...
Conflicts:
Parms.cpp
Threads.cpp
2015-10-12 10:40:16 -06:00
Matt
1708d0608c
some fixes for detecting corrupted injection requests.
...
seems to be very common.
2015-10-07 21:47:10 -06:00
Matt
e2fad81227
Merge branch 'testing' of github.com:gigablast/open-source-search-engine into testing
2015-09-25 08:24:54 -06:00
Matt
ce7b06fc4d
all files made are now group writable.
...
if you don't like that then you can make
a special group and set the directory just
group writable for that group using chmod g+s <dir>.
2015-09-21 11:19:34 -06:00
Matt
786ba76d10
Merge branch 'ia-zak' into testing
...
Conflicts:
main.cpp
2015-09-21 10:12:58 -06:00
Zak Betz
519b2c4f42
Fix repeating xn--xn-- when there are spaces in the domain.
...
Make gb unittest take a name of the unit test to run.
2015-09-14 10:24:22 -06:00
Zak Betz
5622ca47ee
Work on non-ascii domain names. It works on correct inputs, but
...
will crash on some non correct inputs, so it is forced to be disabled.
2015-09-14 00:34:44 -06:00
Zak Betz
911b2837ca
Merge branch 'testing' of https://github.com/gigablast/open-source-search-engine into testing
...
Conflicts:
Makefile
Spider.cpp
2015-09-12 15:51:59 -06:00
Zak Betz
aefd8772cf
Merge branch 'ia-zak' of https://github.com/gigablast/open-source-search-engine into ia-zak
2015-09-10 21:32:36 -06:00
Matt
09de59f026
do not store cblock, etc. tags into tagdb to save
...
disk space. added tagdb file cache for better performance,
less disk accesses. will help reduce disk load.
put file cache sizes in master controls and if they change
then update the cache size dynamically.
2015-09-10 12:46:00 -06:00
Zak Betz
ae58ab98fb
Merge branch 'ia-zak' of https://github.com/gigablast/open-source-search-engine into ia-zak
2015-09-01 08:48:38 -06:00
Matt Wells
34a4068ddb
fix gb start script
2015-09-01 01:17:14 -07:00
Zak Betz
b199c67355
Merge branch 'ia-zak' of https://github.com/gigablast/open-source-search-engine into ia-zak
2015-08-31 23:19:45 -06:00
Zak Betz
5a7b01585d
Work on graph axis autoscaling.
2015-08-31 23:19:28 -06:00
Matt
efa93aad18
prevent double ./gb start calls from messing
...
things up.
2015-08-31 11:13:33 -06:00
Zak Betz
60c4c5c437
Add nospider and noquery options.
2015-08-25 13:48:20 -06:00
Matt Wells
6c14d659b8
move 2nd occurence of same collnum_t collection id
...
on the same shard to the trash/ subdir.
put call to syncParmsWithHost0 in a sleep loop in case
host #0 has error, although the timeout is really high.
2015-08-18 18:59:01 -07:00
Matt Wells
9642947136
fix so host #0 will delete then re-add collections
...
that use the same collnum but have a different name.
fixed some unlabelled safebufs.
fix core when deleting collnum from tree/buckets that
is higher than Collectiondb.m_numRecs.
fix File::m_filename safebufs that were not freed on exit.
2015-08-18 14:09:16 -07:00
Matt Wells
178721d35b
speed up getFileSize() by using stat() func again.
...
despam logs at startup.
do not perm check every coll dir, only first 100, on
startup to make things faster.
2015-08-15 22:21:15 -07:00
Matt
bff643b555
use a linked list of merge candidates to
...
make attemptMergeAll() much much faster.
2015-08-15 19:26:37 -06:00
Matt
a1ed368d82
bring back max mem control into master controls.
...
it's useful to limit per process mem usage to prevent
oom killer because we can't save if we get killed.
overhaul diskpagecache to just use rdbcache. much simpler
and faster, but disabled for now until debugged more.
reduce min files to merge for crawlbot collections so
they stay more tightly merged to conserve fds and mem.
improved logDebugDisk msgs.
overhauled File.cpp fd pool. now it is way faster and
doesn't use any extra mem. much simpler too. although
could be sped up a little by using a linked list, but
probably is not significant enough to warrant doing right now.
increase mem ptr table from 3M to 8M slots. should really make
dynamic though. fix core from null msg20s[0]->m_r.
only call attemptMergeAll once every 60 seconds really.
do not attempt merge if already merging.
2015-08-14 12:58:54 -06:00
Matt
5c67cbe65d
undo
2015-08-12 08:43:44 -07:00
Matt
444ebeeb65
one scp install per host
2015-08-12 08:39:01 -07:00
Matt Wells
f8047ac5ef
speed up Rdb::attemptMargeAll() because it is a problem
...
according to the profiler when we got tens of thousands
of collections.
2015-08-08 12:27:18 -07:00
Matt
16fd428887
fix more cores from the dynamic query size changes.
...
add how many query terms we truncated in the json/xml replies.
document those fields as well.
2015-07-18 14:15:47 -06:00
Matt
815bd7ce0a
quite a few bug fixes.
2015-07-02 17:42:05 -06:00
Matt Wells
d050fb81b5
fix rebuild code to rebuild spider status docs in index,
...
and to remove them from titledb if user has disabled
'index spider replies' in the spider controls to save disk.
made them off by default by now since they use some disk.
2015-06-16 16:29:26 -06:00
Matt Wells
7462b0cd84
gb -h fix
2015-04-22 12:51:32 -06:00
Matt
4a43e1387e
better fixes for core from sig alarms
2015-04-13 10:28:43 -06:00
Matt
f5a7423336
fix bug of never calling callback
2015-04-13 09:56:21 -06:00
Matt
2ce107e4be
keep track of how many times the host exited/cored as an exponent
...
to the 'x' in the hosts table. this way we can detect hosts that
have restarted many times and fix them.
2015-04-01 16:28:58 -06:00
Matt
2839c38dac
warc injection fixes
2015-03-07 15:01:47 -08:00
Matt Wells
56d65a7c55
adjust dump tagdb cmdline cmd to start at
...
a specified site to aid us in fixing
sitelinks.txt missing some sites bug.
2015-03-05 19:27:36 -08:00
Matt Wells
93b505e7bb
fix isCollAdmin() function to return false
...
if not using coll passwords. they'll have to
be master admin.
2015-03-02 07:47:05 -08:00
Matt
064d022d6f
call mkdir on 'gb install' cmd.
2015-02-25 19:49:36 -07:00
Matt
24eac820d5
fixed bad deletenode call causing dups in
...
winnertree.
2015-02-12 16:12:23 -08:00
Matt
c009430b6c
more fixes for new spider updates
2015-02-11 21:54:36 -08:00
Matt Wells
01687fcb0e
fix gb thrutest disk tests
2015-02-09 10:29:08 -08:00
Matt
afbe35c5a9
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
2015-02-07 12:07:52 -08:00
Matt
580736d766
support arc injections
2015-02-07 12:07:42 -08:00
Matt
6c1c2c66c4
added dstart to gb -h help menu
2015-02-05 12:39:13 -08:00
Matt
93fce690d6
more speedups. do not calls sigprocmask in main thread
...
before pthread_create(). instead call pthread_sigmask()
from thread like we were doing already for SIGINT.
2015-02-03 13:39:23 -08:00
Matt
79a1d632cd
need to have sitelinks.txt present in dir.
2015-01-31 22:58:05 -07:00
Matt
0c3ad724f8
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
2015-01-31 15:18:30 -07:00
Matt
cad1d3d076
added support for sitelinks.txt file
2015-01-31 15:18:06 -07:00
Matt
a87b582145
little fix
2015-01-29 19:26:15 -07:00
Matt Wells
ec55540432
fix gb dump sitelinks
2015-01-25 19:33:31 -08:00
Matt Wells
7c4a625779
fix dumping tagdb for sitelinks.txt
2015-01-25 18:04:15 -08:00