Commit Graph

2927 Commits

Author SHA1 Message Date
Matt
48ac8bf80f fix udp linked list thing again 2015-04-13 10:13:59 -06:00
Matt
e6696a6937 fix some more 2015-04-13 10:08:01 -06:00
Matt
9feb070fe9 fix issue of not being able to exit gb when
a disk read retry is taking forever.
2015-04-13 10:06:08 -06:00
Matt
f5a7423336 fix bug of never calling callback 2015-04-13 09:56:21 -06:00
Matt
43ced700d0 calls NEWS BLOG 2015-04-12 12:33:09 -06:00
Matt
2814e3db37 show screenshots 2015-04-12 11:52:22 -06:00
Matt Wells
994ba73007 log more when doing crawlbottesting-* tests 2015-04-12 06:46:33 -07:00
mwells
56a46fd294 fix printing of facets when &header=0 so diffbot json output
is still simple and correct.
2015-04-10 16:25:38 -06:00
Matt Wells
a891fb7bdc turn on indexing spider status docs for all
diffbot CRAWLS on startup, whether it was off or on
before.
2015-04-10 14:35:41 -07:00
Matt Wells
4dce44c976 fix log msg 2015-04-10 13:31:46 -07:00
Matt Wells
02aa138fbb debug helper msg 2015-04-10 13:27:06 -07:00
Matt
13d0361756 try to speed up host #4 on seraph 2015-04-10 09:20:18 -06:00
Matt Wells
6b139e9eee clarify jam ups 2015-04-08 18:30:27 -07:00
Matt Wells
e38cb8c080 Merge branch 'diffbot' into diffbot-testing 2015-04-08 18:08:10 -07:00
Matt Wells
a7e48515fa fix core when adding gbss for a force deleted doc. 2015-04-08 18:07:32 -07:00
Matt Wells
64bae224e0 fix core on the GI 2015-04-08 16:05:32 -06:00
Matt Wells
97d3b185c1 just use INCOMING udp slots/sockets for jam detection.
this will highlight the slow nodes better.
2015-04-08 15:52:43 -06:00
Matt Wells
7fd4310106 don't include gbss headers if not gbss documents 2015-04-07 22:05:11 -07:00
mwells
fea5210906 fix infinite loop bug 2015-04-07 15:27:26 -06:00
mwells
2997b3bb28 fix for skipping dead shards on tag re clookup 2015-04-07 14:36:47 -06:00
mwells
53a2d39afd fix for calling callback of timedout udp slots 2015-04-07 14:18:42 -06:00
Matt Wells
2114c40cda fix not calling callback when udp reply times out.
like for msg39 replies we need to timeout quickly.
2015-04-07 12:38:35 -07:00
Matt Wells
05a66cc367 fix bug of not able to get ip address because
peeksize is too big.
2015-04-07 12:29:19 -07:00
Matt Wells
b08d12a11e fix cores associated with new spider status docs. 2015-04-07 10:33:54 -07:00
Matt Wells
4ed8231222 upp max vfds again 2015-04-07 09:56:04 -06:00
Matt
036fb4e0dc crap, another oopsy fix 2015-04-06 14:43:58 -07:00
Matt
bc3335c434 more facet counting fixes 2015-04-06 14:38:38 -07:00
Matt
74fe3c5866 more facet counting fixes 2015-04-06 14:02:59 -07:00
Matt
1a262c8254 fixed oopsy 2015-04-06 13:51:19 -07:00
Matt Wells
8326460e8f fix counting of # docs that have facet field. 2015-04-06 14:41:44 -06:00
Matt Wells
330d9a9dbf report max rounds reached, not max to process or crawl reached. 2015-04-06 10:24:33 -06:00
Matt
bffaa09599 fix for the GI 2015-04-06 08:24:00 -06:00
Matt
de187dbb2b documentation fix 2015-04-03 16:00:04 -06:00
Matt
8433c49aa9 make sure we index a spider status doc for each diffbot
object. that way we can tell if diffbot objects are deduping,
how they are changing over time, etc.
2015-04-03 14:59:09 -06:00
Matt
dad1cb15f4 fix excessive looping when calling makeCallbacks()
on niceness 1 or above when none are available.
2015-04-03 12:12:58 -06:00
Matt
c991a2dcdd try to ameliorate the udp slot jamming issue. 2015-04-03 10:43:11 -06:00
Matt
c2567ad244 a hopeful fix for host #0 always crashing from
streaming socket timeouts.
2015-04-02 15:17:49 -06:00
Matt
2ce107e4be keep track of how many times the host exited/cored as an exponent
to the 'x' in the hosts table. this way we can detect hosts that
have restarted many times and fix them.
2015-04-01 16:28:58 -06:00
Matt
e583850e40 fix core when searching bogus collection. 2015-04-01 15:30:59 -06:00
Matt
94a8210586 added CSV to output dropdown. show all json fields
for spider status doc csv files. support spider status
docs in csv output.
2015-04-01 13:53:03 -06:00
Matt
f26c9d609b one more qa test fix for spider status docs 2015-04-01 12:47:32 -06:00
Matt
5e46262cb2 more fixes for qa'ing of new spider status docs 2015-04-01 12:03:17 -06:00
Matt
10a31783bb fixes to pass internal qa tests in light
of gbss (spider status doc) changes and other things.
had to make xmldoc.o -O2 instead of -O3 to fix strange bug.
2015-04-01 11:20:36 -06:00
Matt
6b293f17e6 now show "totalDocsWithField" for each facet, so we know
how many docs had that field, with any particular value,
so we can do tf/idf type things.
2015-04-01 09:16:42 -06:00
mwells
47f6d9f414 clean out rebuild trees/buckets too 2015-03-21 22:42:49 -06:00
Matt
e99b2f0a65 added RdbBuckets::cleanBuckets() corresponding to
RdbTree::cleanTree() to remove keys from deleted
collections at startup.
2015-03-21 22:28:34 -06:00
Matt
000c5d67e9 do not index xml docs' body for custom crawls
or when indexbody is turned off.
2015-03-21 09:21:44 -06:00
Matt Wells
9f42a6d5ff fix indexing of spider status docs 2015-03-20 18:08:39 -07:00
Matt Wells
7d82a5ca69 try to get diffbot reply info first before
making spider status doc
2015-03-20 17:44:21 -07:00
Matt Wells
07d13541ed emergency fixes for corrupt tagdb tag id 2015-03-20 17:21:52 -07:00