Commit Graph

1210 Commits

Author SHA1 Message Date
mwells
b6e5424e32 do not download bulkjob urls in crawlbot.
just return a fake http reply.
however, do use crawl-delay throttling
logic. deduping is already turned off for
bulk jobs so it should be ok.
2014-03-21 12:40:38 -07:00
Matt Wells
b33121af7d make all field names lower case without
spaces when we hash them to make the
prefixhash. since json names often have
mixed case field names and spaces.
2014-03-20 16:08:02 -07:00
Matt Wells
98a10d4936 Merge branch 'testing' into diffbot-testing 2014-03-20 15:50:49 -07:00
Matt Wells
bbc8fc0c79 always show admin link 2014-03-20 15:48:51 -07:00
Matt Wells
67202f3731 Merge branch 'diffbot' into diffbot-testing 2014-03-20 15:39:03 -07:00
Matt Wells
99bd9319fd temp hack to reduce network comm
between trinity and neo
2014-03-20 15:42:34 -07:00
Matt Wells
5ed19026d9 temp debug comments 2014-03-20 15:33:37 -07:00
Matt Wells
b8d0e95035 Merge branch 'diffbot' into diffbot-testing 2014-03-20 10:26:55 -07:00
mwells
ca0843aa8b more bool query fixes. 2014-03-20 10:03:25 -07:00
mwells
cfbec626e8 more righteous fixes for bool queries 2014-03-19 13:51:32 -07:00
mwells
ab3368b5a0 more bool fixes. not operator support. 2014-03-19 09:38:45 -07:00
mwells
1bb91149d6 more bool fixes 2014-03-18 14:42:50 -07:00
mwells
652892dc10 more bool fixes 2014-03-18 14:37:59 -07:00
mwells
f392826b1e nested bool query fixes 2014-03-18 14:08:59 -07:00
mwells
b7d80fd02d more bool query fixes 2014-03-18 13:41:36 -07:00
mwells
b31eaee9fd simple bool queries work 2014-03-18 12:07:29 -07:00
Matt Wells
d4302e3301 fix core 2014-03-18 11:12:50 -07:00
Matt Wells
3b97682cc3 more bool query fixes 2014-03-18 10:44:56 -07:00
Matt Wells
6e23d37e47 Merge branch 'diffbot' into diffbot-testing 2014-03-17 17:27:28 -07:00
mwells
54cc8088fb more bool query fixes. hopefully this will do it,
but still can do some optimizations for speed.
2014-03-17 17:00:08 -07:00
Matt Wells
9d3c35ad17 nothing 2014-03-17 13:53:19 -07:00
Matt Wells
4abf56a75d cleanups 2014-03-16 18:06:22 -07:00
Matt Wells
d2511d0bef host table cleanups 2014-03-16 17:14:47 -07:00
Matt Wells
5057fdaf14 aesthetic cleanups 2014-03-16 17:12:04 -07:00
Matt Wells
d320bf9d75 spidering back on in main's coll.conf 2014-03-16 15:06:39 -07:00
Matt Wells
c513ad9418 Merge branch 'diffbot' into testing 2014-03-16 14:51:22 -07:00
Matt Wells
acd05aa740 fix a few minor bugs.
/master/->/admin/ and crawl type mismatch.
2014-03-16 10:34:58 -07:00
Matt Wells
edbd61b0c5 thread fixes. if pthread_create fails then
keep thread queue and just return. will try to
relaunch later. do not count delete keys towards
shard rebalance count.
2014-03-15 20:07:02 -07:00
Matt Wells
5ca411e3e2 tuning the rebalance loop 2014-03-15 14:56:11 -07:00
Matt Wells
86147fe22c tight merge during rebalance to save
disk space, so neg recs annihilate pos recs.
2014-03-14 23:37:30 -07:00
Matt Wells
6c704f6fdf Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot 2014-03-14 22:16:40 -07:00
Matt Wells
e37eebd76f when rebalancing wait for merge to complete before scanning
more
2014-03-14 22:16:25 -07:00
Matt Wells
82ac3fab6c merge fixes 2014-03-14 22:15:08 -07:00
Matt Wells
df46a6fc1d Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot-matt 2014-03-14 19:32:10 -07:00
Matt Wells
1f162ce7b2 update localhosts.conf too 2014-03-14 19:20:23 -07:00
Matt Wells
553aefdb55 keep files tightly merged when doing rebalanced
to avoid running out of disk space
2014-03-14 19:19:41 -07:00
mwells
cb483c42ea more fixes for bool searching before
using a slightly different and simpler approach
2014-03-13 16:00:23 -07:00
mwells
7812f5c746 more bool fixes. still needs a little more work 2014-03-13 13:54:23 -07:00
mwells
3b2d981dff more fixes for new boolean logic. 2014-03-13 13:09:33 -07:00
Matt Wells
fb0123ad53 nothing 2014-03-13 11:27:28 -07:00
Matt Wells
9acb7ef0f4 fix core &token= core 2014-03-13 07:57:06 -07:00
Matt Wells
018258bcaa Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing 2014-03-12 20:55:21 -07:00
Matt Wells
fbd1bcd349 initial attempt at new boolean query logic.
supports unlimited # of boolean query terms.
already docid phased from phasing logic already there
but could be phased more to save more mem and speed up
a little more.
2014-03-12 20:53:44 -07:00
Matt Wells
3e7243c6ce fix add url core 2014-03-12 08:28:42 -07:00
Matt Wells
7ec1513d41 updates 2014-03-12 08:09:45 -07:00
Matt Wells
312438a32b Merge branch 'diffbot-dan' into diffbot-testing 2014-03-11 17:02:59 -07:00
Matt Wells
84784d8d76 minor fixups 2014-03-11 17:02:24 -07:00
Daniel Steinberg
2331b4673d Defect #2099: throw an error a crawl request was made with a name that already existed for bulk request (or the other way around) 2014-03-11 16:21:58 -07:00
Matt Wells
8445e53c61 fix query reindex some more 2014-03-11 14:46:49 -07:00
Matt Wells
c4b38a5c72 fix a few cores from previous code updates 2014-03-11 09:36:33 -07:00