Commit Graph

3450 Commits

Author SHA1 Message Date
Matt
5d07e24c01 use rel no follow switch support. 2015-10-19 10:05:46 -06:00
Matt
fa691bf06c also fix for numbers like for facet termlists 2015-10-10 14:15:09 -06:00
Matt
102199fcd1 fix the fix 2015-10-10 14:01:11 -06:00
Matt
2fac15049e fix halfstopwikibigram bug 2015-10-10 13:15:09 -06:00
Matt
8763fd6d78 make gb shutdown easier (./gb stop) 2015-10-10 11:12:02 -07:00
Matt
f52147bf5d we were allocating too many nodes in top tree. tone that down.
fix bug with verify writes being turned on then off.
2015-10-09 14:30:57 -06:00
Matt
0048ab6be8 fix right 2015-10-08 13:42:42 -07:00
Matt
02140ce00f Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing 2015-10-08 13:41:34 -07:00
Matt
edb9b91fb1 fix blaster2 2015-10-08 13:41:23 -07:00
Matt
dfe1d6c7b3 move a parm down 2015-10-07 16:23:14 -06:00
Matt
a1e876becc add <docScore> to serps 2015-10-07 11:31:45 -06:00
Matt
4600ce0816 fix threads from freezing up just because pthread_create()
had an error. need to return the thread stack.
2015-10-07 07:50:44 -06:00
Matt
cee5d8922a Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing 2015-10-05 18:48:06 -06:00
Matt
5b605624ca fix core dump
average": 3.677707812426755e+26,
2015-10-05 18:47:11 -06:00
Matt
a77c9be5b8 Merge branch 'diffbot-testing' into diffbot-sam 2015-10-05 17:32:21 -06:00
Matt
df1c7f6e0f update qa.cpp syntax test to do &n=100
for gbssStatusCode:0 query
2015-10-05 17:31:35 -06:00
sam
97b9c99bec correcting facet min=0 2015-10-05 16:09:52 -07:00
Matt
9b785a1522 allow more than 2gb of mem to be allocated to hold resulting docids. 2015-10-05 09:35:44 -07:00
Matt
757a44b149 fix facets when doing > 1 split and first
split termlist is empty.
2015-10-05 10:05:08 -06:00
Matt
21b71226a6 remove bad fix 2015-10-05 09:33:02 -06:00
Matt Wells
42cdd5b382 fix msg20 getsummary core 2015-10-02 12:34:08 -07:00
Matt Wells
e4adc99c0c fix empty winner tree bug.
try to improve rdbcache promotion logic for all
caches. -O2 on spider.cpp.
2015-10-02 12:16:48 -07:00
Matt Wells
9daaa4d5af Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing 2015-10-01 19:37:12 -07:00
Matt Wells
9178d67b2f fix churn bug in winnerlistcache in spider.cpp
so do not add the dolebuf list of spiderrequests
back into the cache, but just modify the "jump"
in the first 4 bytes of the cached record. because
when we re-added it back to the cache it created too
much churn and we'd lose cached records unnecessarily.
2015-10-01 19:35:34 -07:00
Matt
e06ae06c23 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing 2015-09-30 15:27:07 -06:00
Matt
a31c7f5fc8 exiting msg 2015-09-30 15:26:58 -06:00
Matt Wells
06aea41611 show spiderdb scan progress in spider queue for the collection 2015-09-30 13:38:22 -07:00
Matt Wells
b97546f98c do not expand about:blank iframes. 2015-09-30 09:36:04 -07:00
Matt
67fc339953 prevent out of mem core. actually trying to alloc more
than 2GB for search result stuff.
2015-09-26 21:34:07 -07:00
Matt
f0a2f86200 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing 2015-09-25 08:09:25 -07:00
Matt
55993e58d5 fix cores on gi #0 2015-09-25 08:09:05 -07:00
Matt Wells
2721256c0d show ip port of bad host 2015-09-25 07:47:21 -07:00
Matt Wells
93943a0cab some pages legitamately have no outlinks, no need to think
they were banned.
2015-09-24 14:01:23 -07:00
Matt Wells
6454dad6bf Revert "ignore real root, just use seeds, to detect if banned."
This reverts commit cb60f68e72.
2015-09-24 13:29:20 -07:00
Matt
cb60f68e72 ignore real root, just use seeds, to detect if banned. 2015-09-24 14:15:29 -06:00
Matt Wells
268b21d552 reduce log spam 2015-09-24 11:37:16 -06:00
Matt
9be3f9310e fix annoying core dump for some queries in Posdb.cpp 2015-09-24 11:34:02 -06:00
Matt
8a0461b82f Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing 2015-09-24 09:10:37 -06:00
Matt
d92b153090 added 'verify writes' switch to track down data corruption 2015-09-24 09:10:20 -06:00
Matt Wells
3dcaf414db report bytes saved to disk.
if thread crashes try to dump core.
2015-09-23 15:40:30 -07:00
Matt Wells
98744889e2 do not core if no collrec for msg20 summary request 2015-09-23 14:39:13 -07:00
Matt Wells
ba8ebc7794 Revert "data corruption fixes"
This reverts commit 27172945c7.
2015-09-23 14:38:17 -07:00
Matt Wells
27172945c7 data corruption fixes 2015-09-23 14:34:52 -07:00
Matt
5635695666 oom prevention 2015-09-20 21:42:46 -06:00
Matt Wells
d6d5d10a15 prevent core from bad root title rec 2015-09-20 08:26:00 -07:00
Matt Wells
69a3cb0999 fix corrupt tag with corrupt root title buf
from coring
2015-09-17 21:33:58 -07:00
Matt
13e0ba7bff fix bug of having a meta redirect tag
in <script> tags. we have to use Xml class
to make sure it is a legit refresh tag.
2015-09-16 11:03:38 -06:00
Matt
58e9f56015 never let any diffbot error prevent us from
retrying a url in subsequent crawl rounds.
2015-09-16 10:00:11 -06:00
Matt
bcdecc63c6 expose "urlip" injection parm to provide ip of url
being injected to save gigablast from an ip lookup
if you want.
2015-09-16 09:43:15 -06:00
Matt
f9c4f8fc9a test with 2 first 2015-09-14 19:16:07 -06:00