mwells
7f622bd416
fixes for cloud support.
2014-08-31 16:23:11 -07:00
mwells
c4174a0ca6
fix bug causing qa json facet test to fail
2014-07-30 15:36:08 -07:00
mwells
58f5a2dd57
save conf files safely to disk so we don't
...
lose them because the disk is full.
2014-07-29 10:02:43 -07:00
mwells
85b628cade
qa updates
2014-07-26 21:39:16 -07:00
mwells
837b6cf465
api updates
2014-07-23 08:47:48 -07:00
Matt Wells
6b797f5023
more core stability fixes. prevent core dumps
2014-07-16 12:07:39 -07:00
mwells
5ae476f34e
print facets for each search result
2014-07-08 19:38:54 -07:00
mwells
6434e5cc04
Merge branch 'testing' into diffbot-matt
...
Conflicts:
Errno.cpp
Errno.h
Parms.h
2014-07-07 09:49:59 -07:00
mwells
dc6c97c59c
basic qa tests running
2014-07-06 18:53:05 -07:00
mwells
29d170631a
more api updates
2014-07-05 12:36:01 -07:00
mwells
ea2650292a
more api updates. will also be useful
...
for running qa tests.
2014-07-04 20:57:42 -07:00
mwells
9249564191
now floaters are working pretty well
2014-06-30 16:26:10 -06:00
mwells
4a2717a88f
Merge branch 'diffbot-testing' into diffbot-matt
2014-06-09 12:42:54 -07:00
mwells
4a4fccfd93
added 'make testing-deb' support to build debian packages.
2014-06-07 10:21:51 -07:00
mwells
72df0d25d2
added safebuf base64decode func
2014-06-06 16:20:15 -07:00
mwells
806cf79b73
spider proxy updates
2014-06-02 13:18:18 -07:00
mwells
a811462d5f
spider proxy stuff compiles now
2014-05-30 15:05:00 -07:00
mwells
8fb8669da1
more spider proxy updates.
2014-05-29 21:17:51 -06:00
Matt Wells
72c6d032d8
fix query reindex on subdocuments (diffbot json blurbs)
...
so that they just put in a spiderrequest to reindex
the parent url. Added &diffbotreply= to the injection
interface so dan can provide that along with the
pageUrl he passes in with &u=
2014-05-15 14:11:12 -07:00
Matt Wells
82726879a2
support base64 generated thumbnails in serps.
2014-04-24 14:04:57 -07:00
mwells
8a003e3492
fix url filters profile logic.
2014-04-09 19:51:36 -07:00
Matt Wells
2d4af1aefe
index numbers as integers too, not just floats
...
so we can sort by spider date without losing
128 seconds of resolution.
2014-02-06 20:57:54 -08:00
Matt Wells
4be68fdaa6
set safebuf::m_buf to null in destructor
2014-02-02 12:16:11 -07:00
Matt Wells
239811b024
take out confusing function no longer used
2014-01-28 11:10:59 -08:00
Matt Wells
034de5039f
ignore tagdb corrupt tags in xmldoc.cpp.
...
fix ip -1 bug when adding to waiting tree
and it would prevent populateWaitingTreeFromSpiderdb()
from continuing and freeze things up.
2014-01-22 14:36:05 -08:00
Matt Wells
e366c12470
Merge branch 'master' into diffbot
...
Conflicts:
Collectiondb.cpp
Msg13.cpp
Parms.cpp
Spider.h
2014-01-07 12:09:11 -08:00
Matt Wells
7df2111ceb
fixed 'gb inject titledb-DIR newhosts.conf' command
...
for populating an index from titledb files in DIR
and transmitting to appropriate host in newhosts.conf.
also prettied up the gb -h output to use a formatting
function.
2014-01-02 01:20:08 -07:00
Matt Wells
c2f8445a70
expand reg ex shortcuts like \d to [0-9]
2013-12-19 18:31:37 -08:00
mwells
82494baa89
move CollectionRec stuff into Collectiondb files
...
for simplicity.
2013-12-10 15:28:04 -08:00
Matt Wells
5e4b5a112c
Merge branch 'master' into diffbot
...
Conflicts:
PageResults.cpp
Threads.cpp
XmlDoc.cpp
XmlDoc.h
2013-12-07 11:34:26 -07:00
Matt Wells
5da41cd113
fix a couple different cores.
2013-11-24 19:46:44 -07:00
Matt Wells
e0a15194e1
fix json double decoding issue. no more
...
partial decodes, json parser stores
fully decoded string into separate buf.
2013-11-22 14:16:14 -08:00
Matt Wells
43e40208b8
Merge branch 'master' into diffbot
...
Conflicts:
SafeBuf.cpp
SafeBuf.h
SearchInput.cpp
XmlDoc.cpp
2013-11-20 15:51:58 -08:00
mwells
46a683a904
label the bigger safebuf chunks of mem
...
so we can see a better breakdown of mem
on the stats page, not just a big "SafeBuf"
allocation.
2013-11-19 23:53:40 -07:00
Matt Wells
fe1a7d1a75
rdbbase not fully resetting? it was
...
trying to dump to coll directories that
had been moved to trash folder.
and printing out "deleted from under us".
at least it was corrupting data in RdbMem
this time because i added m_dumpErrno logic.
2013-11-15 09:01:58 -08:00
Matt Wells
45cc9bb112
fix a few nasty bugs
2013-11-13 18:31:26 -08:00
Matt Wells
fbcd6b8afd
display json objects that are not in arrays
...
in csv. show csv header. how to deal
with heterogenous object lists?
index spiderdate: for gbsortby:spiderdate.
added gbrevsortby: support.
2013-11-12 13:51:52 -08:00
Matt Wells
09f28b2f26
now we index all numbers that have field names
...
(so can't just be a number in the body) but it
can be in a meta tag or json item. then use
like gbsortby:products.offerPrice to sort the
search results (json objects) by that.
2013-11-08 16:16:13 -08:00
Matt Wells
726fdb4873
fix that json RE-encoding bug
2013-10-24 18:09:35 -07:00
mwells
fa9f81bd7c
trying to fix json decoding bug.
...
make highlight class use safebuf.
2013-10-24 17:55:01 -07:00
Matt Wells
209e6db25f
do not match "isindexed" for getting
...
the diffbot api in XmlDoc::getUrlFilterNum().
do not supply SpiderReply to that function
b/c the spider reply is just being
generated.
2013-10-22 16:25:26 -07:00
Matt Wells
8f5bb4a787
a few core dump fixes. get crawl-delay
...
working a little. about half way done.
2013-10-22 15:44:10 -07:00
Matt Wells
a288217e9f
a few bug fixes
2013-10-17 18:59:00 -07:00
Matt Wells
fc17521697
Merge branch 'master' into diffbot
...
Conflicts:
Hostdb.cpp
Makefile
PageResults.cpp
PageRoot.cpp
Pages.cpp
Rdb.cpp
SearchInput.cpp
SearchInput.h
Spider.cpp
Spider.h
XmlDoc.cpp
2013-10-16 14:28:42 -07:00
Matt Wells
f5e5b0f5d3
fix crawlbot bugs
2013-10-16 12:12:22 -07:00
mwells
a0808df2ae
got new diffbot api compiled
2013-10-14 18:19:59 -06:00
mwells
a562c65627
another code checkpoint. new json api
...
for crawlbot. new url filters for crawlbot.
2013-10-14 16:10:48 -06:00
mwells
0de777d80d
parser fixes
2013-10-11 17:35:12 -06:00
mwells
6d5643e185
json parsing
2013-10-11 16:14:26 -06:00
Matt Wells
ed0fbf2b99
fix core from not decoding json properly.
2013-10-08 11:46:18 -07:00