Daniel Steinberg
|
0efac8c156
|
Defect #2080: seed URLs duplicated
|
2014-03-25 17:25:55 -07:00 |
|
Daniel Steinberg
|
e1b1b15a38
|
bigger buffer
|
2014-03-25 16:34:40 -07:00 |
|
Daniel Steinberg
|
9846061dff
|
when restarting a bulk job, copy bulkurls.txt to /tmp, and then transfer it back to the new collection folder
|
2014-03-25 16:20:24 -07:00 |
|
Daniel Steinberg
|
ab90c06d8d
|
add TODO for regex checking
|
2014-03-25 13:05:43 -07:00 |
|
Daniel Steinberg
|
1ff6c1fae0
|
Merge remote-tracking branch 'origin/diffbot' into diffbot-dan
|
2014-03-25 12:53:37 -07:00 |
|
Daniel Steinberg
|
b8836745f0
|
use SpiderRequest instead of isonsamedomain flag to determine whether to output data in CSV (Defect #2122)
|
2014-03-25 12:51:08 -07:00 |
|
mwells
|
b6e5424e32
|
do not download bulkjob urls in crawlbot.
just return a fake http reply.
however, do use crawl-delay throttling
logic. deduping is already turned off for
bulk jobs so it should be ok.
|
2014-03-21 12:40:38 -07:00 |
|
mwells
|
502752aba4
|
doc updates
|
2014-03-21 08:59:13 -07:00 |
|
Matt Wells
|
b33121af7d
|
make all field names lower case without
spaces when we hash them to make the
prefixhash. since json names often have
mixed case field names and spaces.
|
2014-03-20 16:08:02 -07:00 |
|
Matt Wells
|
98a10d4936
|
Merge branch 'testing' into diffbot-testing
|
2014-03-20 15:50:49 -07:00 |
|
Matt Wells
|
bbc8fc0c79
|
always show admin link
|
2014-03-20 15:48:51 -07:00 |
|
Matt Wells
|
99bd9319fd
|
temp hack to reduce network comm
between trinity and neo
|
2014-03-20 15:42:34 -07:00 |
|
Matt Wells
|
67202f3731
|
Merge branch 'diffbot' into diffbot-testing
|
2014-03-20 15:39:03 -07:00 |
|
Matt Wells
|
5ed19026d9
|
temp debug comments
|
2014-03-20 15:33:37 -07:00 |
|
Matt Wells
|
b8d0e95035
|
Merge branch 'diffbot' into diffbot-testing
|
2014-03-20 10:26:55 -07:00 |
|
mwells
|
ca0843aa8b
|
more bool query fixes.
|
2014-03-20 10:03:25 -07:00 |
|
mwells
|
cfbec626e8
|
more righteous fixes for bool queries
|
2014-03-19 13:51:32 -07:00 |
|
mwells
|
ab3368b5a0
|
more bool fixes. not operator support.
|
2014-03-19 09:38:45 -07:00 |
|
mwells
|
1bb91149d6
|
more bool fixes
|
2014-03-18 14:42:50 -07:00 |
|
mwells
|
652892dc10
|
more bool fixes
|
2014-03-18 14:37:59 -07:00 |
|
mwells
|
f392826b1e
|
nested bool query fixes
|
2014-03-18 14:08:59 -07:00 |
|
mwells
|
b7d80fd02d
|
more bool query fixes
|
2014-03-18 13:41:36 -07:00 |
|
mwells
|
b31eaee9fd
|
simple bool queries work
|
2014-03-18 12:07:29 -07:00 |
|
Matt Wells
|
d4302e3301
|
fix core
|
2014-03-18 11:12:50 -07:00 |
|
Matt Wells
|
3b97682cc3
|
more bool query fixes
|
2014-03-18 10:44:56 -07:00 |
|
Matt Wells
|
6e23d37e47
|
Merge branch 'diffbot' into diffbot-testing
|
2014-03-17 17:27:28 -07:00 |
|
mwells
|
54cc8088fb
|
more bool query fixes. hopefully this will do it,
but still can do some optimizations for speed.
|
2014-03-17 17:00:08 -07:00 |
|
Matt Wells
|
9d3c35ad17
|
nothing
|
2014-03-17 13:53:19 -07:00 |
|
Matt Wells
|
4abf56a75d
|
cleanups
|
2014-03-16 18:06:22 -07:00 |
|
Matt Wells
|
d2511d0bef
|
host table cleanups
|
2014-03-16 17:14:47 -07:00 |
|
Matt Wells
|
5057fdaf14
|
aesthetic cleanups
|
2014-03-16 17:12:04 -07:00 |
|
Matt Wells
|
d320bf9d75
|
spidering back on in main's coll.conf
|
2014-03-16 15:06:39 -07:00 |
|
Matt Wells
|
c513ad9418
|
Merge branch 'diffbot' into testing
|
2014-03-16 14:51:22 -07:00 |
|
Matt Wells
|
acd05aa740
|
fix a few minor bugs.
/master/->/admin/ and crawl type mismatch.
|
2014-03-16 10:34:58 -07:00 |
|
Matt Wells
|
edbd61b0c5
|
thread fixes. if pthread_create fails then
keep thread queue and just return. will try to
relaunch later. do not count delete keys towards
shard rebalance count.
|
2014-03-15 20:07:02 -07:00 |
|
Matt Wells
|
5ca411e3e2
|
tuning the rebalance loop
|
2014-03-15 14:56:11 -07:00 |
|
Matt Wells
|
86147fe22c
|
tight merge during rebalance to save
disk space, so neg recs annihilate pos recs.
|
2014-03-14 23:37:30 -07:00 |
|
Matt Wells
|
6c704f6fdf
|
Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
|
2014-03-14 22:16:40 -07:00 |
|
Matt Wells
|
e37eebd76f
|
when rebalancing wait for merge to complete before scanning
more
|
2014-03-14 22:16:25 -07:00 |
|
Matt Wells
|
82ac3fab6c
|
merge fixes
|
2014-03-14 22:15:08 -07:00 |
|
Matt Wells
|
df46a6fc1d
|
Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot-matt
|
2014-03-14 19:32:10 -07:00 |
|
Matt Wells
|
1f162ce7b2
|
update localhosts.conf too
|
2014-03-14 19:20:23 -07:00 |
|
Matt Wells
|
553aefdb55
|
keep files tightly merged when doing rebalanced
to avoid running out of disk space
|
2014-03-14 19:19:41 -07:00 |
|
mwells
|
cb483c42ea
|
more fixes for bool searching before
using a slightly different and simpler approach
|
2014-03-13 16:00:23 -07:00 |
|
mwells
|
7812f5c746
|
more bool fixes. still needs a little more work
|
2014-03-13 13:54:23 -07:00 |
|
mwells
|
3b2d981dff
|
more fixes for new boolean logic.
|
2014-03-13 13:09:33 -07:00 |
|
Matt Wells
|
fb0123ad53
|
nothing
|
2014-03-13 11:27:28 -07:00 |
|
Matt Wells
|
9acb7ef0f4
|
fix core &token= core
|
2014-03-13 07:57:06 -07:00 |
|
Daniel Steinberg
|
7b5816f194
|
updated error message
|
2014-03-12 20:56:27 -07:00 |
|
Matt Wells
|
018258bcaa
|
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
|
2014-03-12 20:55:21 -07:00 |
|