Matt
5d07e24c01
use rel no follow switch support.
2015-10-19 10:05:46 -06:00
Matt
fa691bf06c
also fix for numbers like for facet termlists
2015-10-10 14:15:09 -06:00
Matt
102199fcd1
fix the fix
2015-10-10 14:01:11 -06:00
Matt
2fac15049e
fix halfstopwikibigram bug
2015-10-10 13:15:09 -06:00
Matt
8763fd6d78
make gb shutdown easier (./gb stop)
2015-10-10 11:12:02 -07:00
Matt
f52147bf5d
we were allocating too many nodes in top tree. tone that down.
...
fix bug with verify writes being turned on then off.
2015-10-09 14:30:57 -06:00
Matt
0048ab6be8
fix right
2015-10-08 13:42:42 -07:00
Matt
02140ce00f
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
2015-10-08 13:41:34 -07:00
Matt
edb9b91fb1
fix blaster2
2015-10-08 13:41:23 -07:00
Matt
dfe1d6c7b3
move a parm down
2015-10-07 16:23:14 -06:00
Matt
a1e876becc
add <docScore> to serps
2015-10-07 11:31:45 -06:00
Matt
4600ce0816
fix threads from freezing up just because pthread_create()
...
had an error. need to return the thread stack.
2015-10-07 07:50:44 -06:00
Matt
cee5d8922a
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
2015-10-05 18:48:06 -06:00
Matt
5b605624ca
fix core dump
...
average": 3.677707812426755e+26,
2015-10-05 18:47:11 -06:00
Matt
a77c9be5b8
Merge branch 'diffbot-testing' into diffbot-sam
2015-10-05 17:32:21 -06:00
Matt
df1c7f6e0f
update qa.cpp syntax test to do &n=100
...
for gbssStatusCode:0 query
2015-10-05 17:31:35 -06:00
sam
97b9c99bec
correcting facet min=0
2015-10-05 16:09:52 -07:00
Matt
9b785a1522
allow more than 2gb of mem to be allocated to hold resulting docids.
2015-10-05 09:35:44 -07:00
Matt
757a44b149
fix facets when doing > 1 split and first
...
split termlist is empty.
2015-10-05 10:05:08 -06:00
Matt
21b71226a6
remove bad fix
2015-10-05 09:33:02 -06:00
Matt Wells
42cdd5b382
fix msg20 getsummary core
2015-10-02 12:34:08 -07:00
Matt Wells
e4adc99c0c
fix empty winner tree bug.
...
try to improve rdbcache promotion logic for all
caches. -O2 on spider.cpp.
2015-10-02 12:16:48 -07:00
Matt Wells
9daaa4d5af
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
2015-10-01 19:37:12 -07:00
Matt Wells
9178d67b2f
fix churn bug in winnerlistcache in spider.cpp
...
so do not add the dolebuf list of spiderrequests
back into the cache, but just modify the "jump"
in the first 4 bytes of the cached record. because
when we re-added it back to the cache it created too
much churn and we'd lose cached records unnecessarily.
2015-10-01 19:35:34 -07:00
Matt
e06ae06c23
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
2015-09-30 15:27:07 -06:00
Matt
a31c7f5fc8
exiting msg
2015-09-30 15:26:58 -06:00
Matt Wells
06aea41611
show spiderdb scan progress in spider queue for the collection
2015-09-30 13:38:22 -07:00
Matt Wells
b97546f98c
do not expand about:blank iframes.
2015-09-30 09:36:04 -07:00
Matt
67fc339953
prevent out of mem core. actually trying to alloc more
...
than 2GB for search result stuff.
2015-09-26 21:34:07 -07:00
Matt
f0a2f86200
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
2015-09-25 08:09:25 -07:00
Matt
55993e58d5
fix cores on gi #0
2015-09-25 08:09:05 -07:00
Matt Wells
2721256c0d
show ip port of bad host
2015-09-25 07:47:21 -07:00
Matt Wells
93943a0cab
some pages legitamately have no outlinks, no need to think
...
they were banned.
2015-09-24 14:01:23 -07:00
Matt Wells
6454dad6bf
Revert "ignore real root, just use seeds, to detect if banned."
...
This reverts commit cb60f68e72
.
2015-09-24 13:29:20 -07:00
Matt
cb60f68e72
ignore real root, just use seeds, to detect if banned.
2015-09-24 14:15:29 -06:00
Matt Wells
268b21d552
reduce log spam
2015-09-24 11:37:16 -06:00
Matt
9be3f9310e
fix annoying core dump for some queries in Posdb.cpp
2015-09-24 11:34:02 -06:00
Matt
8a0461b82f
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
2015-09-24 09:10:37 -06:00
Matt
d92b153090
added 'verify writes' switch to track down data corruption
2015-09-24 09:10:20 -06:00
Matt Wells
3dcaf414db
report bytes saved to disk.
...
if thread crashes try to dump core.
2015-09-23 15:40:30 -07:00
Matt Wells
98744889e2
do not core if no collrec for msg20 summary request
2015-09-23 14:39:13 -07:00
Matt Wells
ba8ebc7794
Revert "data corruption fixes"
...
This reverts commit 27172945c7
.
2015-09-23 14:38:17 -07:00
Matt Wells
27172945c7
data corruption fixes
2015-09-23 14:34:52 -07:00
Matt
5635695666
oom prevention
2015-09-20 21:42:46 -06:00
Matt Wells
d6d5d10a15
prevent core from bad root title rec
2015-09-20 08:26:00 -07:00
Matt Wells
69a3cb0999
fix corrupt tag with corrupt root title buf
...
from coring
2015-09-17 21:33:58 -07:00
Matt
13e0ba7bff
fix bug of having a meta redirect tag
...
in <script> tags. we have to use Xml class
to make sure it is a legit refresh tag.
2015-09-16 11:03:38 -06:00
Matt
58e9f56015
never let any diffbot error prevent us from
...
retrying a url in subsequent crawl rounds.
2015-09-16 10:00:11 -06:00
Matt
bcdecc63c6
expose "urlip" injection parm to provide ip of url
...
being injected to save gigablast from an ip lookup
if you want.
2015-09-16 09:43:15 -06:00
Matt
f9c4f8fc9a
test with 2 first
2015-09-14 19:16:07 -06:00