Matt
adbec58f41
fix core from asking for too many docids
2015-08-19 08:53:39 -07:00
Matt Wells
6c14d659b8
move 2nd occurence of same collnum_t collection id
...
on the same shard to the trash/ subdir.
put call to syncParmsWithHost0 in a sleep loop in case
host #0 has error, although the timeout is really high.
2015-08-18 18:59:01 -07:00
Matt Wells
9642947136
fix so host #0 will delete then re-add collections
...
that use the same collnum but have a different name.
fixed some unlabelled safebufs.
fix core when deleting collnum from tree/buckets that
is higher than Collectiondb.m_numRecs.
fix File::m_filename safebufs that were not freed on exit.
2015-08-18 14:09:16 -07:00
Matt Wells
dd9b4e0ca2
fix little core
2015-08-17 15:04:16 -07:00
Matt Wells
30693c3cf7
use setBuf() func instead
2015-08-16 22:19:30 -07:00
Matt Wells
28644f127e
fix problem of saving rdbmap when coring in a malloc/free.
2015-08-16 22:14:53 -07:00
Matt Wells
be1ebfbcd0
do not execute backtrace function if core
...
was in Mem.cpp basically otherwise we don't save state.
2015-08-16 20:29:14 -07:00
Matt Wells
3a67480b63
for BigFile::m_fileBuf array of Files make sure
...
to clear it for files that do not exist so
File::m_calledSet is false on them. so BigFile::getFile(j)
returns a File ptr whose m_calledSet is false if the file
does not exist on disk. and BigFile::removePart(j) sets
((File *)m_fileBuf.m_bufStart)[j].m_calledSet = false.
2015-08-16 19:40:08 -07:00
Matt Wells
63c7752734
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
2015-08-16 17:14:33 -07:00
Matt Wells
e671be17ca
fix log msg
2015-08-16 17:14:21 -07:00
Matt
b709f736f4
show max mem alloc slots in pagestats.cpp
2015-08-16 17:32:47 -06:00
Matt Wells
ffa6c09c74
fix BigFile::addPart(n) when adding parts out of order.
2015-08-16 15:13:59 -07:00
Matt Wells
f8fb266844
fix new merging algo.
2015-08-16 10:11:21 -07:00
Matt Wells
178721d35b
speed up getFileSize() by using stat() func again.
...
despam logs at startup.
do not perm check every coll dir, only first 100, on
startup to make things faster.
2015-08-15 22:21:15 -07:00
Matt
bff643b555
use a linked list of merge candidates to
...
make attemptMergeAll() much much faster.
2015-08-15 19:26:37 -06:00
Matt
d9422d8b0e
get rid of limits on file sizes. dynamically allocate
...
file names and fixed-size File array in BigFile class. should
save gigabytes of memory in many-collection systems with
1+ million files or so.
2015-08-14 20:14:50 -06:00
Matt
f7f577cf98
the new disk page cache. temporarily disabled.
2015-08-14 15:52:24 -06:00
Matt
0d2aa33afb
undo #define thing
2015-08-14 13:08:11 -06:00
Matt
a1ed368d82
bring back max mem control into master controls.
...
it's useful to limit per process mem usage to prevent
oom killer because we can't save if we get killed.
overhaul diskpagecache to just use rdbcache. much simpler
and faster, but disabled for now until debugged more.
reduce min files to merge for crawlbot collections so
they stay more tightly merged to conserve fds and mem.
improved logDebugDisk msgs.
overhauled File.cpp fd pool. now it is way faster and
doesn't use any extra mem. much simpler too. although
could be sped up a little by using a linked list, but
probably is not significant enough to warrant doing right now.
increase mem ptr table from 3M to 8M slots. should really make
dynamic though. fix core from null msg20s[0]->m_r.
only call attemptMergeAll once every 60 seconds really.
do not attempt merge if already merging.
2015-08-14 12:58:54 -06:00
Matt
5c67cbe65d
undo
2015-08-12 08:43:44 -07:00
Matt
444ebeeb65
one scp install per host
2015-08-12 08:39:01 -07:00
Matt
5c2a2ce496
fix core
2015-08-12 08:36:23 -07:00
Matt
adc9d3bc89
Merge branch 'testing' into diffbot-testing
2015-08-08 19:22:50 -06:00
Matt
3477d39608
fix cores
2015-08-08 19:22:01 -06:00
Matt Wells
840ca3fea1
fix rdbmap reduce mem thing
2015-08-08 15:43:09 -07:00
Matt Wells
f8047ac5ef
speed up Rdb::attemptMargeAll() because it is a problem
...
according to the profiler when we got tens of thousands
of collections.
2015-08-08 12:27:18 -07:00
Matt Wells
c2bf461d27
call reduceMemFootprint() after writing rdb map
...
to save mem immediately rather than on restart of gb
2015-08-08 11:23:14 -07:00
Matt
890170aa90
fix core from archive.org yml file checking.
...
show site ip in inlinker table for easier spam removal.
2015-08-02 12:50:29 -06:00
Matt
c1ec4dedbb
fix for bad query formation.
...
text:""foo bar""
2015-08-02 11:34:55 -06:00
Kevin Truong
37591be421
Merge branch 'diffbot-testing' of https://github.com/gigablast/open-source-search-engine
2015-07-31 18:12:56 -07:00
Kevin Truong
b6207ec344
Fixes #3012 . Allow facet ranges to work on negative numbers.
2015-07-31 18:11:37 -07:00
Matt
18d1a787bb
fix core dump from meta data in title rec
...
that was just a \0 from injecting content that way
2015-07-31 18:42:21 -06:00
Matt Wells
e18fca88f4
Merge branch 'diffbot' into diffbot-testing
2015-07-31 08:56:47 -07:00
Matt Wells
85c7fbae70
fix infinite loop bug from EBADRBDID
2015-07-31 08:56:26 -07:00
Matt
5af61ff59a
fix core from boolean queries
2015-07-30 10:21:30 -06:00
Matt
72768c093d
Merge branch 'diffbot-sam' of github.com:gigablast/open-source-search-engine into diffbot-sam
2015-07-23 17:24:41 -06:00
sam
86946392d0
reverted stepping. Useless
2015-07-23 10:53:59 -07:00
Matt
da41d53575
Merge branch 'diffbot-testing' into diffbot-sam
2015-07-23 09:27:00 -06:00
Matt Wells
e165b5d668
speed up bool queries
2015-07-22 13:00:45 -07:00
Matt
090e1b35d5
fix score info reporting for new bool query
...
min score based on # of query terms contained.
2015-07-20 14:37:37 -06:00
Matt
69c791e5aa
for now at least do not use siterank for ranking
...
boolean search results.
2015-07-20 11:50:31 -06:00
Matt
1c93a88d82
use the # of matched terms as the score of a doc
...
when doing a boolean query. later: use proximity
scoring for non-field query terms.
2015-07-20 11:09:56 -06:00
Matt
ff7639e323
do not get synonyms for boolean operators.
...
just skip synonyms if ignoreWord is set at all.
2015-07-19 13:07:05 -06:00
Matt
646bc91c59
fix more possible unicode errors
2015-07-19 12:05:09 -06:00
Matt
b9fc583cae
fix core
2015-07-18 18:01:11 -06:00
Matt
16fd428887
fix more cores from the dynamic query size changes.
...
add how many query terms we truncated in the json/xml replies.
document those fields as well.
2015-07-18 14:15:47 -06:00
Matt Wells
dab0726fac
typo fix
2015-07-17 10:43:38 -06:00
Matt
5e7a06229c
print special message if no seeds were able to be crawled.
2015-07-17 08:42:01 -06:00
Matt
7e526863d7
do not include 'diffbot uri' in urls.csv. should
...
not have been there.
2015-07-16 10:11:04 -06:00
Matt
0d3cfc2796
single words in quotes - keep them in quotes so
...
we do not get synonym forms
2015-07-15 09:58:25 -06:00