Matt
|
48ac8bf80f
|
fix udp linked list thing again
|
2015-04-13 10:13:59 -06:00 |
|
Matt
|
e6696a6937
|
fix some more
|
2015-04-13 10:08:01 -06:00 |
|
Matt
|
9feb070fe9
|
fix issue of not being able to exit gb when
a disk read retry is taking forever.
|
2015-04-13 10:06:08 -06:00 |
|
Matt
|
f5a7423336
|
fix bug of never calling callback
|
2015-04-13 09:56:21 -06:00 |
|
Matt
|
43ced700d0
|
calls NEWS BLOG
|
2015-04-12 12:33:09 -06:00 |
|
Matt
|
2814e3db37
|
show screenshots
|
2015-04-12 11:52:22 -06:00 |
|
Matt Wells
|
994ba73007
|
log more when doing crawlbottesting-* tests
|
2015-04-12 06:46:33 -07:00 |
|
mwells
|
56a46fd294
|
fix printing of facets when &header=0 so diffbot json output
is still simple and correct.
|
2015-04-10 16:25:38 -06:00 |
|
Matt Wells
|
a891fb7bdc
|
turn on indexing spider status docs for all
diffbot CRAWLS on startup, whether it was off or on
before.
|
2015-04-10 14:35:41 -07:00 |
|
Matt Wells
|
4dce44c976
|
fix log msg
|
2015-04-10 13:31:46 -07:00 |
|
Matt Wells
|
02aa138fbb
|
debug helper msg
|
2015-04-10 13:27:06 -07:00 |
|
Matt
|
13d0361756
|
try to speed up host #4 on seraph
|
2015-04-10 09:20:18 -06:00 |
|
Matt Wells
|
6b139e9eee
|
clarify jam ups
|
2015-04-08 18:30:27 -07:00 |
|
Matt Wells
|
e38cb8c080
|
Merge branch 'diffbot' into diffbot-testing
|
2015-04-08 18:08:10 -07:00 |
|
Matt Wells
|
a7e48515fa
|
fix core when adding gbss for a force deleted doc.
|
2015-04-08 18:07:32 -07:00 |
|
Matt Wells
|
64bae224e0
|
fix core on the GI
|
2015-04-08 16:05:32 -06:00 |
|
Matt Wells
|
97d3b185c1
|
just use INCOMING udp slots/sockets for jam detection.
this will highlight the slow nodes better.
|
2015-04-08 15:52:43 -06:00 |
|
Matt Wells
|
7fd4310106
|
don't include gbss headers if not gbss documents
|
2015-04-07 22:05:11 -07:00 |
|
mwells
|
fea5210906
|
fix infinite loop bug
|
2015-04-07 15:27:26 -06:00 |
|
mwells
|
2997b3bb28
|
fix for skipping dead shards on tag re clookup
|
2015-04-07 14:36:47 -06:00 |
|
mwells
|
53a2d39afd
|
fix for calling callback of timedout udp slots
|
2015-04-07 14:18:42 -06:00 |
|
Matt Wells
|
2114c40cda
|
fix not calling callback when udp reply times out.
like for msg39 replies we need to timeout quickly.
|
2015-04-07 12:38:35 -07:00 |
|
Matt Wells
|
05a66cc367
|
fix bug of not able to get ip address because
peeksize is too big.
|
2015-04-07 12:29:19 -07:00 |
|
Matt Wells
|
b08d12a11e
|
fix cores associated with new spider status docs.
|
2015-04-07 10:33:54 -07:00 |
|
Matt Wells
|
4ed8231222
|
upp max vfds again
|
2015-04-07 09:56:04 -06:00 |
|
Matt
|
036fb4e0dc
|
crap, another oopsy fix
|
2015-04-06 14:43:58 -07:00 |
|
Matt
|
bc3335c434
|
more facet counting fixes
|
2015-04-06 14:38:38 -07:00 |
|
Matt
|
74fe3c5866
|
more facet counting fixes
|
2015-04-06 14:02:59 -07:00 |
|
Matt
|
1a262c8254
|
fixed oopsy
|
2015-04-06 13:51:19 -07:00 |
|
Matt Wells
|
8326460e8f
|
fix counting of # docs that have facet field.
|
2015-04-06 14:41:44 -06:00 |
|
Matt Wells
|
330d9a9dbf
|
report max rounds reached, not max to process or crawl reached.
|
2015-04-06 10:24:33 -06:00 |
|
Matt
|
bffaa09599
|
fix for the GI
|
2015-04-06 08:24:00 -06:00 |
|
Matt
|
de187dbb2b
|
documentation fix
|
2015-04-03 16:00:04 -06:00 |
|
Matt
|
8433c49aa9
|
make sure we index a spider status doc for each diffbot
object. that way we can tell if diffbot objects are deduping,
how they are changing over time, etc.
|
2015-04-03 14:59:09 -06:00 |
|
Matt
|
dad1cb15f4
|
fix excessive looping when calling makeCallbacks()
on niceness 1 or above when none are available.
|
2015-04-03 12:12:58 -06:00 |
|
Matt
|
c991a2dcdd
|
try to ameliorate the udp slot jamming issue.
|
2015-04-03 10:43:11 -06:00 |
|
Matt
|
c2567ad244
|
a hopeful fix for host #0 always crashing from
streaming socket timeouts.
|
2015-04-02 15:17:49 -06:00 |
|
Matt
|
2ce107e4be
|
keep track of how many times the host exited/cored as an exponent
to the 'x' in the hosts table. this way we can detect hosts that
have restarted many times and fix them.
|
2015-04-01 16:28:58 -06:00 |
|
Matt
|
e583850e40
|
fix core when searching bogus collection.
|
2015-04-01 15:30:59 -06:00 |
|
Matt
|
94a8210586
|
added CSV to output dropdown. show all json fields
for spider status doc csv files. support spider status
docs in csv output.
|
2015-04-01 13:53:03 -06:00 |
|
Matt
|
f26c9d609b
|
one more qa test fix for spider status docs
|
2015-04-01 12:47:32 -06:00 |
|
Matt
|
5e46262cb2
|
more fixes for qa'ing of new spider status docs
|
2015-04-01 12:03:17 -06:00 |
|
Matt
|
10a31783bb
|
fixes to pass internal qa tests in light
of gbss (spider status doc) changes and other things.
had to make xmldoc.o -O2 instead of -O3 to fix strange bug.
|
2015-04-01 11:20:36 -06:00 |
|
Matt
|
6b293f17e6
|
now show "totalDocsWithField" for each facet, so we know
how many docs had that field, with any particular value,
so we can do tf/idf type things.
|
2015-04-01 09:16:42 -06:00 |
|
mwells
|
47f6d9f414
|
clean out rebuild trees/buckets too
|
2015-03-21 22:42:49 -06:00 |
|
Matt
|
e99b2f0a65
|
added RdbBuckets::cleanBuckets() corresponding to
RdbTree::cleanTree() to remove keys from deleted
collections at startup.
|
2015-03-21 22:28:34 -06:00 |
|
Matt
|
000c5d67e9
|
do not index xml docs' body for custom crawls
or when indexbody is turned off.
|
2015-03-21 09:21:44 -06:00 |
|
Matt Wells
|
9f42a6d5ff
|
fix indexing of spider status docs
|
2015-03-20 18:08:39 -07:00 |
|
Matt Wells
|
7d82a5ca69
|
try to get diffbot reply info first before
making spider status doc
|
2015-03-20 17:44:21 -07:00 |
|
Matt Wells
|
07d13541ed
|
emergency fixes for corrupt tagdb tag id
|
2015-03-20 17:21:52 -07:00 |
|