mwells
0b9b77ea46
call buildProxyTable when ips updated
2014-06-02 10:20:29 -07:00
mwells
9767ec4c84
fix a few cores in spider proxies
2014-05-30 15:13:19 -07:00
mwells
a811462d5f
spider proxy stuff compiles now
2014-05-30 15:05:00 -07:00
mwells
8fb8669da1
more spider proxy updates.
2014-05-29 21:17:51 -06:00
Matt Wells
d928a16211
Merge branch 'diffbot-testing' into diffbot-matt
2014-05-27 15:22:38 -07:00
Matt Wells
f341dba0c8
got the general framework for load-balanced/reliabled
...
floaters in place for the distributed spider network.
need to fill in the blanks now.
2014-05-27 15:21:12 -07:00
Daniel Steinberg
7448e8a1ff
don't use "expand" for mode= requests or non-analyze requests
2014-05-26 20:38:44 -07:00
Matt Wells
2d4fb483b2
disambiguate error msg
2014-05-26 10:46:10 -07:00
Matt Wells
8234aaed23
put lastspidertimeutc back in because we need
...
it for debugging.
2014-05-23 09:43:46 -07:00
Matt Wells
e3b6f6b74e
a second fix for crawls saying they're done and
...
then resuming. it seems to happen when we turn
spiders off then back on again. so hack that.
2014-05-23 07:29:18 -07:00
Matt Wells
1f4dc2df97
fix bug in spider scan
...
of spiderdb for unique firstips
2014-05-22 13:08:01 -07:00
Matt Wells
68fcffb2da
speed up scan of spiderdb
...
to repopulate waiting tree by jumping over
last firstip.
2014-05-22 12:20:03 -07:00
Matt Wells
e9c4c9bb9a
fix possible loss of data when doing reads
...
on especially doledb.
2014-05-22 11:06:56 -07:00
Matt Wells
1660805f66
more useful logging for debugging
2014-05-22 10:36:44 -07:00
Matt Wells
32735677d2
wait 45 seconds before ending round, not 30
...
to try to fix some issues...
2014-05-22 08:32:19 -07:00
Matt Wells
935cc72e19
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
2014-05-21 13:55:29 -07:00
Matt Wells
b8886c399c
show start/end job times on pagecrawlbot.
2014-05-21 13:55:01 -07:00
Matt Wells
61fc015014
fix potential diffbot injection bug
2014-05-21 12:21:29 -07:00
Matt Wells
b0c87b355c
log update
2014-05-21 10:09:50 -07:00
Matt Wells
45df139ccb
update logging
2014-05-21 10:05:49 -07:00
Matt Wells
7ad9058f77
when doing a query reindex on a json
...
child url we need to add the spider request
of the original parent url and make sure
it does not get "EDOCUNCHANGED" error.
then the possibly new json child objects
won't get indexed.
2014-05-21 05:43:53 -07:00
Matt Wells
34afc7c7cf
Merge branch 'diffbot-dan' into diffbot-testing
2014-05-21 05:30:56 -07:00
Daniel Steinberg
e39dffadcf
use "expand" option when calling Diffbot
2014-05-20 22:00:46 -07:00
Matt Wells
4b587f168b
fix bug of not including empty responses when &icc=1
2014-05-20 21:07:21 -07:00
Matt Wells
c729b51ae5
fixed exact # search results hit count
...
when using min/max/sort operators.
2014-05-20 13:45:00 -07:00
Matt Wells
6664faa792
fix printing back-to-back commas when showing
...
results in json with &icc=1.
2014-05-20 13:23:29 -07:00
Matt Wells
cd3e11b6ee
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
2014-05-16 18:48:06 -07:00
Matt Wells
d2cc117d82
fix oops
2014-05-16 18:47:52 -07:00
Matt Wells
526be98ec8
fix core scenario when diffbot reply that was injected
...
using &diffbotreply= contains the http mime.
2014-05-16 18:46:39 -07:00
Matt Wells
baf1ccb7d5
note updates
2014-05-16 09:52:41 -07:00
Matt Wells
eea5dff0f5
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
2014-05-16 09:38:42 -07:00
Matt Wells
a22396c344
quick doc update
2014-05-16 09:38:32 -07:00
Matt Wells
2484147403
fix core
2014-05-16 09:30:46 -07:00
Matt Wells
1af8ca846f
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
2014-05-16 08:08:42 -07:00
Matt Wells
a81f2145bd
fix sendmail ip to 127.0.0.1
2014-05-16 08:08:20 -07:00
Matt Wells
4684298965
minor doc update
2014-05-16 08:01:29 -07:00
Matt Wells
2ce6ed266a
fix another core from a 0 docid
2014-05-16 07:59:04 -07:00
Matt Wells
6d9fdc975b
fix core from not setting m_gotClusterRecs in Msg39.cpp
2014-05-16 06:32:51 -07:00
Matt Wells
5c2cc973a8
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
2014-05-15 18:27:13 -07:00
Matt Wells
a303bda1f8
fix core
2014-05-15 15:10:57 -07:00
Matt Wells
b38f62c7dc
nothing
2014-05-15 14:15:05 -07:00
Matt Wells
72c6d032d8
fix query reindex on subdocuments (diffbot json blurbs)
...
so that they just put in a spiderrequest to reindex
the parent url. Added &diffbotreply= to the injection
interface so dan can provide that along with the
pageUrl he passes in with &u=
2014-05-15 14:11:12 -07:00
Daniel Steinberg
fc5cfa2a62
move list of bulk urls to new directory earlier. May fix Defect #2218 if there is something that is causing the bulk job to restart before this function returns
2014-05-15 13:35:32 -07:00
Daniel Steinberg
6afa3f2561
save spots to disk as space separated
2014-05-14 14:40:46 -07:00
Matt Wells
00b652581f
fix boolean query containing quoted phrase
2014-05-14 11:22:07 -07:00
Matt Wells
8ac7fdfa24
Msg39::controlLoop now works
2014-05-14 11:02:09 -07:00
Matt Wells
d95cbb42d6
Merge branch 'diffbot-testing' into diffbot-matt
2014-05-14 10:52:45 -07:00
Matt Wells
db543ddd9f
nothing
2014-05-14 09:37:59 -07:00
Matt Wells
40bca5d120
try to fix msg22 core some more
2014-05-14 08:16:47 -07:00
Matt Wells
48df53e74f
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
...
Conflicts:
Msg22.cpp
2014-05-14 07:48:23 -07:00