mwells
|
72df0d25d2
|
added safebuf base64decode func
|
2014-06-06 16:20:15 -07:00 |
|
mwells
|
965d992f98
|
Merge branch 'diffbot-testing' into diffbot-matt
Conflicts:
Msg13.cpp
|
2014-06-06 15:14:41 -07:00 |
|
mwells
|
3f2dcda4e1
|
got new floater/proxy logic compiling.
|
2014-06-06 15:11:51 -07:00 |
|
Matt Wells
|
6b5b83ac85
|
fixes for gbmin/gbmax being first query term.
|
2014-06-06 10:20:12 -07:00 |
|
Matt Wells
|
5b6100c77d
|
log format change for errcnt
|
2014-06-06 09:29:57 -07:00 |
|
Matt Wells
|
d850f5f006
|
try to prevent job status flip flop from error retries.
|
2014-06-05 23:38:54 -07:00 |
|
Matt Wells
|
74f0a41290
|
bulk jobs give up after downloading a url
3 times. crawls don't give up on tmperrors,
but retry every 30 days.
|
2014-06-05 23:11:14 -07:00 |
|
Matt Wells
|
172d7071a7
|
fix to rename tagdb0000.002.dat
|
2014-06-05 22:21:41 -07:00 |
|
Matt Wells
|
8ac691f324
|
fix merging getting clogged by so many
collections tring to merge tagdb at once
|
2014-06-05 21:27:33 -07:00 |
|
Matt Wells
|
13243a411c
|
more fixes for fake http reply hack
|
2014-06-05 20:31:49 -07:00 |
|
Matt Wells
|
ce7294e9a9
|
more mem leak fixes for fake
bulk job empty http replies
|
2014-06-05 20:09:12 -07:00 |
|
Matt Wells
|
7f10fca234
|
no longer for add www to url domain if it is just
a domain. was messing of tmblr.co where www.tmblr.co
has no IP.
|
2014-06-05 17:00:12 -07:00 |
|
Matt Wells
|
3c6a8bf87e
|
fix issue of not retrying diffbot internal errors.
|
2014-06-05 16:24:52 -07:00 |
|
Matt Wells
|
cfda735194
|
print error stuff in spiderdb dump.
|
2014-06-05 16:14:32 -07:00 |
|
Matt Wells
|
970eb33a83
|
sanity checks to ensure fakefirstip
was able to convert to a real good firstip
|
2014-06-05 16:13:33 -07:00 |
|
Matt Wells
|
7b4b8b27bd
|
more debug msgs
|
2014-06-05 14:58:20 -07:00 |
|
Matt Wells
|
5f41840211
|
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
|
2014-06-05 14:48:23 -07:00 |
|
Matt Wells
|
8477ef72f8
|
support gbmin gbmax gbminint gbmaxint range query
terms properly, when generating the docidvotebuf.
fixes boolean queries using them as well.
|
2014-06-05 14:47:45 -07:00 |
|
Matt Wells
|
1fe2c94322
|
add some debug notes
|
2014-06-05 12:26:06 -07:00 |
|
Matt Wells
|
780fd43aae
|
timestamp bug fix
|
2014-06-04 15:50:26 -07:00 |
|
mwells
|
2a36e5bde5
|
Merge branch 'diffbot-testing' into diffbot-matt
|
2014-06-04 14:40:34 -07:00 |
|
Matt Wells
|
546d135007
|
fix boolean queries to do the on-demand
mini merges of the termlists. should fix
gbmin:offerprice:100 AND (text:lord OR text:helicopter)
|
2014-06-04 14:33:54 -07:00 |
|
mwells
|
2c750b2c22
|
Merge branch 'diffbot-testing' into diffbot-matt
|
2014-06-04 13:56:44 -07:00 |
|
Matt Wells
|
d98cf4b2b0
|
try to prevent slamming diffbot backend
with bulk jobs consisting of hundreds of
different domains/ips.
|
2014-06-04 12:37:49 -07:00 |
|
Matt Wells
|
4298e4e752
|
sanity checks for debugging duplicate
titledb file bug.
|
2014-06-04 12:15:12 -07:00 |
|
Matt Wells
|
b7d9002a05
|
fix log bug
|
2014-06-04 10:57:25 -07:00 |
|
Matt Wells
|
8b74bd855b
|
Merge branch 'master' into diffbot-testing
|
2014-06-04 09:37:55 -07:00 |
|
Matt Wells
|
fcc8bc85cc
|
update bulk job restart
|
2014-06-04 09:36:26 -07:00 |
|
Matt Wells
|
e2ca303fe2
|
doc updates
|
2014-06-04 07:38:40 -07:00 |
|
mwells
|
a734240474
|
minor date change in documentation.
|
2014-06-04 07:26:46 -07:00 |
|
mwells
|
beba94013e
|
remove clustermaintenance documentation. seemed pretty
obsolete.
|
2014-06-04 07:26:10 -07:00 |
|
mwells
|
3fd973a53e
|
documentation updates for scaling the cluster
|
2014-06-04 07:17:34 -07:00 |
|
Matt Wells
|
ec1b66aff5
|
Merge branch 'master' into diffbot-testing
|
2014-06-03 20:50:59 -07:00 |
|
Matt Wells
|
585e6a357f
|
parm documentation update for url filters
|
2014-06-03 20:50:22 -07:00 |
|
Matt Wells
|
b534ac5812
|
do not print completed time if spidering is going on
|
2014-06-03 20:30:10 -07:00 |
|
Matt Wells
|
694b19e053
|
Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing
|
2014-06-03 18:07:51 -07:00 |
|
Matt Wells
|
07cf2f1129
|
fix core
|
2014-06-03 18:07:35 -07:00 |
|
Matt Wells
|
f8073b5adc
|
Merge branch 'diffbot-testing' into diffbot-matt
|
2014-06-03 14:59:31 -07:00 |
|
Matt Wells
|
50468293e7
|
fix bool expressions with only one operand.
i.e. double parens bug.
|
2014-06-03 14:46:28 -07:00 |
|
Matt Wells
|
bf70823260
|
take out <moreResultsFollow> for &stream=1
for now. maybe add back in later but would be
at end of the reply.
|
2014-06-03 14:09:24 -07:00 |
|
mwells
|
d23032241d
|
fix mem leak when downloading images is turned on.
|
2014-06-03 13:26:56 -07:00 |
|
Matt Wells
|
da677eb8a4
|
fix for searching for query pipe operator in quotes.
|
2014-06-03 13:08:35 -07:00 |
|
mwells
|
91c7115c73
|
nothing
|
2014-06-03 11:49:21 -07:00 |
|
mwells
|
a1f1daad16
|
Merge branch 'master' into diffbot-matt
Conflicts:
Spider.cpp
|
2014-06-03 11:41:46 -07:00 |
|
mwells
|
6dcbc10e92
|
spider proxy updates.
|
2014-06-03 11:38:44 -07:00 |
|
mwells
|
ba2329808b
|
fix siteListIsEmpty bug causing spider to
spider the whole internet when it shouldn't
|
2014-06-03 11:37:31 -07:00 |
|
Matt Wells
|
c3a823c99d
|
fix relative url bug when relative url starts with ?
|
2014-06-03 10:54:50 -07:00 |
|
Matt Wells
|
536b43e19f
|
Merge branch 'master' into diffbot-testing
|
2014-06-03 10:17:00 -07:00 |
|
mwells
|
a772e21db6
|
only show proxy stuff in logs when debugging is on for it
|
2014-06-02 17:37:43 -07:00 |
|
mwells
|
937e275134
|
zero out the proxy ips
|
2014-06-02 17:32:13 -07:00 |
|