mwells
b6e5424e32
do not download bulkjob urls in crawlbot.
...
just return a fake http reply.
however, do use crawl-delay throttling
logic. deduping is already turned off for
bulk jobs so it should be ok.
2014-03-21 12:40:38 -07:00
Matt Wells
b33121af7d
make all field names lower case without
...
spaces when we hash them to make the
prefixhash. since json names often have
mixed case field names and spaces.
2014-03-20 16:08:02 -07:00
Matt Wells
98a10d4936
Merge branch 'testing' into diffbot-testing
2014-03-20 15:50:49 -07:00
Matt Wells
bbc8fc0c79
always show admin link
2014-03-20 15:48:51 -07:00
Matt Wells
67202f3731
Merge branch 'diffbot' into diffbot-testing
2014-03-20 15:39:03 -07:00
Matt Wells
99bd9319fd
temp hack to reduce network comm
...
between trinity and neo
2014-03-20 15:42:34 -07:00
Matt Wells
5ed19026d9
temp debug comments
2014-03-20 15:33:37 -07:00
Matt Wells
b8d0e95035
Merge branch 'diffbot' into diffbot-testing
2014-03-20 10:26:55 -07:00
mwells
ca0843aa8b
more bool query fixes.
2014-03-20 10:03:25 -07:00
mwells
cfbec626e8
more righteous fixes for bool queries
2014-03-19 13:51:32 -07:00
mwells
ab3368b5a0
more bool fixes. not operator support.
2014-03-19 09:38:45 -07:00
mwells
1bb91149d6
more bool fixes
2014-03-18 14:42:50 -07:00
mwells
652892dc10
more bool fixes
2014-03-18 14:37:59 -07:00
mwells
f392826b1e
nested bool query fixes
2014-03-18 14:08:59 -07:00
mwells
b7d80fd02d
more bool query fixes
2014-03-18 13:41:36 -07:00
mwells
b31eaee9fd
simple bool queries work
2014-03-18 12:07:29 -07:00
Matt Wells
d4302e3301
fix core
2014-03-18 11:12:50 -07:00
Matt Wells
3b97682cc3
more bool query fixes
2014-03-18 10:44:56 -07:00
Matt Wells
6e23d37e47
Merge branch 'diffbot' into diffbot-testing
2014-03-17 17:27:28 -07:00
mwells
54cc8088fb
more bool query fixes. hopefully this will do it,
...
but still can do some optimizations for speed.
2014-03-17 17:00:08 -07:00
Matt Wells
9d3c35ad17
nothing
2014-03-17 13:53:19 -07:00
Matt Wells
4abf56a75d
cleanups
2014-03-16 18:06:22 -07:00
Matt Wells
d2511d0bef
host table cleanups
2014-03-16 17:14:47 -07:00
Matt Wells
5057fdaf14
aesthetic cleanups
2014-03-16 17:12:04 -07:00
Matt Wells
d320bf9d75
spidering back on in main's coll.conf
2014-03-16 15:06:39 -07:00
Matt Wells
c513ad9418
Merge branch 'diffbot' into testing
2014-03-16 14:51:22 -07:00
Matt Wells
acd05aa740
fix a few minor bugs.
...
/master/->/admin/ and crawl type mismatch.
2014-03-16 10:34:58 -07:00
Matt Wells
edbd61b0c5
thread fixes. if pthread_create fails then
...
keep thread queue and just return. will try to
relaunch later. do not count delete keys towards
shard rebalance count.
2014-03-15 20:07:02 -07:00
Matt Wells
5ca411e3e2
tuning the rebalance loop
2014-03-15 14:56:11 -07:00
Matt Wells
86147fe22c
tight merge during rebalance to save
...
disk space, so neg recs annihilate pos recs.
2014-03-14 23:37:30 -07:00
Matt Wells
6c704f6fdf
Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
2014-03-14 22:16:40 -07:00
Matt Wells
e37eebd76f
when rebalancing wait for merge to complete before scanning
...
more
2014-03-14 22:16:25 -07:00
Matt Wells
82ac3fab6c
merge fixes
2014-03-14 22:15:08 -07:00
Matt Wells
df46a6fc1d
Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot-matt
2014-03-14 19:32:10 -07:00
Matt Wells
1f162ce7b2
update localhosts.conf too
2014-03-14 19:20:23 -07:00
Matt Wells
553aefdb55
keep files tightly merged when doing rebalanced
...
to avoid running out of disk space
2014-03-14 19:19:41 -07:00
mwells
cb483c42ea
more fixes for bool searching before
...
using a slightly different and simpler approach
2014-03-13 16:00:23 -07:00
mwells
7812f5c746
more bool fixes. still needs a little more work
2014-03-13 13:54:23 -07:00
mwells
3b2d981dff
more fixes for new boolean logic.
2014-03-13 13:09:33 -07:00
Matt Wells
fb0123ad53
nothing
2014-03-13 11:27:28 -07:00
Matt Wells
9acb7ef0f4
fix core &token= core
2014-03-13 07:57:06 -07:00
Matt Wells
018258bcaa
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
2014-03-12 20:55:21 -07:00
Matt Wells
fbd1bcd349
initial attempt at new boolean query logic.
...
supports unlimited # of boolean query terms.
already docid phased from phasing logic already there
but could be phased more to save more mem and speed up
a little more.
2014-03-12 20:53:44 -07:00
Matt Wells
3e7243c6ce
fix add url core
2014-03-12 08:28:42 -07:00
Matt Wells
7ec1513d41
updates
2014-03-12 08:09:45 -07:00
Matt Wells
312438a32b
Merge branch 'diffbot-dan' into diffbot-testing
2014-03-11 17:02:59 -07:00
Matt Wells
84784d8d76
minor fixups
2014-03-11 17:02:24 -07:00
Daniel Steinberg
2331b4673d
Defect #2099 : throw an error a crawl request was made with a name that already existed for bulk request (or the other way around)
2014-03-11 16:21:58 -07:00
Matt Wells
8445e53c61
fix query reindex some more
2014-03-11 14:46:49 -07:00
Matt Wells
c4b38a5c72
fix a few cores from previous code updates
2014-03-11 09:36:33 -07:00