Commit Graph

3245 Commits

Author SHA1 Message Date
Matt
adbec58f41 fix core from asking for too many docids 2015-08-19 08:53:39 -07:00
Matt Wells
6c14d659b8 move 2nd occurence of same collnum_t collection id
on the same shard to the trash/ subdir.
put call to syncParmsWithHost0 in a sleep loop in case
host #0 has error, although the timeout is really high.
2015-08-18 18:59:01 -07:00
Matt Wells
9642947136 fix so host #0 will delete then re-add collections
that use the same collnum but have a different name.
fixed some unlabelled safebufs.
fix core when deleting collnum from tree/buckets that
is higher than Collectiondb.m_numRecs.
fix File::m_filename safebufs that were not freed on exit.
2015-08-18 14:09:16 -07:00
Matt Wells
dd9b4e0ca2 fix little core 2015-08-17 15:04:16 -07:00
Matt Wells
30693c3cf7 use setBuf() func instead 2015-08-16 22:19:30 -07:00
Matt Wells
28644f127e fix problem of saving rdbmap when coring in a malloc/free. 2015-08-16 22:14:53 -07:00
Matt Wells
be1ebfbcd0 do not execute backtrace function if core
was in Mem.cpp basically otherwise we don't save state.
2015-08-16 20:29:14 -07:00
Matt Wells
3a67480b63 for BigFile::m_fileBuf array of Files make sure
to clear it for files that do not exist so
File::m_calledSet is false on them. so BigFile::getFile(j)
returns a File ptr whose m_calledSet is false if the file
does not exist on disk. and BigFile::removePart(j) sets
((File *)m_fileBuf.m_bufStart)[j].m_calledSet = false.
2015-08-16 19:40:08 -07:00
Matt Wells
63c7752734 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing 2015-08-16 17:14:33 -07:00
Matt Wells
e671be17ca fix log msg 2015-08-16 17:14:21 -07:00
Matt
b709f736f4 show max mem alloc slots in pagestats.cpp 2015-08-16 17:32:47 -06:00
Matt Wells
ffa6c09c74 fix BigFile::addPart(n) when adding parts out of order. 2015-08-16 15:13:59 -07:00
Matt Wells
f8fb266844 fix new merging algo. 2015-08-16 10:11:21 -07:00
Matt Wells
178721d35b speed up getFileSize() by using stat() func again.
despam logs at startup.
do not perm check every coll dir, only first 100, on
startup to make things faster.
2015-08-15 22:21:15 -07:00
Matt
bff643b555 use a linked list of merge candidates to
make attemptMergeAll() much much faster.
2015-08-15 19:26:37 -06:00
Matt
d9422d8b0e get rid of limits on file sizes. dynamically allocate
file names and fixed-size File array in BigFile class. should
save gigabytes of memory in many-collection systems with
1+ million files or so.
2015-08-14 20:14:50 -06:00
Matt
f7f577cf98 the new disk page cache. temporarily disabled. 2015-08-14 15:52:24 -06:00
Matt
0d2aa33afb undo #define thing 2015-08-14 13:08:11 -06:00
Matt
a1ed368d82 bring back max mem control into master controls.
it's useful to limit per process mem usage to prevent
oom killer because we can't save if we get killed.
overhaul diskpagecache to just use rdbcache. much simpler
and faster, but disabled for now until debugged more.
reduce min files to merge for crawlbot collections so
they stay more tightly merged to conserve fds and mem.
improved logDebugDisk msgs.
overhauled File.cpp fd pool. now it is way faster and
doesn't use any extra mem. much simpler too. although
could be sped up a little by using a linked list, but
probably is not significant enough to warrant doing right now.
increase mem ptr table from 3M to 8M slots. should really make
dynamic though. fix core from null msg20s[0]->m_r.
only call attemptMergeAll once every 60 seconds really.
do not attempt merge if already merging.
2015-08-14 12:58:54 -06:00
Matt
5c67cbe65d undo 2015-08-12 08:43:44 -07:00
Matt
444ebeeb65 one scp install per host 2015-08-12 08:39:01 -07:00
Matt
5c2a2ce496 fix core 2015-08-12 08:36:23 -07:00
Matt
adc9d3bc89 Merge branch 'testing' into diffbot-testing 2015-08-08 19:22:50 -06:00
Matt
3477d39608 fix cores 2015-08-08 19:22:01 -06:00
Matt Wells
840ca3fea1 fix rdbmap reduce mem thing 2015-08-08 15:43:09 -07:00
Matt Wells
f8047ac5ef speed up Rdb::attemptMargeAll() because it is a problem
according to the profiler when we got tens of thousands
of collections.
2015-08-08 12:27:18 -07:00
Matt Wells
c2bf461d27 call reduceMemFootprint() after writing rdb map
to save mem immediately rather than on restart of gb
2015-08-08 11:23:14 -07:00
Matt
890170aa90 fix core from archive.org yml file checking.
show site ip in inlinker table for easier spam removal.
2015-08-02 12:50:29 -06:00
Matt
c1ec4dedbb fix for bad query formation.
text:""foo bar""
2015-08-02 11:34:55 -06:00
Kevin Truong
37591be421 Merge branch 'diffbot-testing' of https://github.com/gigablast/open-source-search-engine 2015-07-31 18:12:56 -07:00
Kevin Truong
b6207ec344 Fixes #3012. Allow facet ranges to work on negative numbers. 2015-07-31 18:11:37 -07:00
Matt
18d1a787bb fix core dump from meta data in title rec
that was just a \0 from injecting content that way
2015-07-31 18:42:21 -06:00
Matt Wells
e18fca88f4 Merge branch 'diffbot' into diffbot-testing 2015-07-31 08:56:47 -07:00
Matt Wells
85c7fbae70 fix infinite loop bug from EBADRBDID 2015-07-31 08:56:26 -07:00
Matt
5af61ff59a fix core from boolean queries 2015-07-30 10:21:30 -06:00
Matt
72768c093d Merge branch 'diffbot-sam' of github.com:gigablast/open-source-search-engine into diffbot-sam 2015-07-23 17:24:41 -06:00
sam
86946392d0 reverted stepping. Useless 2015-07-23 10:53:59 -07:00
Matt
da41d53575 Merge branch 'diffbot-testing' into diffbot-sam 2015-07-23 09:27:00 -06:00
Matt Wells
e165b5d668 speed up bool queries 2015-07-22 13:00:45 -07:00
Matt
090e1b35d5 fix score info reporting for new bool query
min score based on # of query terms contained.
2015-07-20 14:37:37 -06:00
Matt
69c791e5aa for now at least do not use siterank for ranking
boolean search results.
2015-07-20 11:50:31 -06:00
Matt
1c93a88d82 use the # of matched terms as the score of a doc
when doing a boolean query. later: use proximity
scoring for non-field query terms.
2015-07-20 11:09:56 -06:00
Matt
ff7639e323 do not get synonyms for boolean operators.
just skip synonyms if ignoreWord is set at all.
2015-07-19 13:07:05 -06:00
Matt
646bc91c59 fix more possible unicode errors 2015-07-19 12:05:09 -06:00
Matt
b9fc583cae fix core 2015-07-18 18:01:11 -06:00
Matt
16fd428887 fix more cores from the dynamic query size changes.
add how many query terms we truncated in the json/xml replies.
document those fields as well.
2015-07-18 14:15:47 -06:00
Matt Wells
dab0726fac typo fix 2015-07-17 10:43:38 -06:00
Matt
5e7a06229c print special message if no seeds were able to be crawled. 2015-07-17 08:42:01 -06:00
Matt
7e526863d7 do not include 'diffbot uri' in urls.csv. should
not have been there.
2015-07-16 10:11:04 -06:00
Matt
0d3cfc2796 single words in quotes - keep them in quotes so
we do not get synonym forms
2015-07-15 09:58:25 -06:00