Matt Wells
fbcd6b8afd
display json objects that are not in arrays
...
in csv. show csv header. how to deal
with heterogenous object lists?
index spiderdate: for gbsortby:spiderdate.
added gbrevsortby: support.
2013-11-12 13:51:52 -08:00
Matt Wells
09f28b2f26
now we index all numbers that have field names
...
(so can't just be a number in the body) but it
can be in a meta tag or json item. then use
like gbsortby:products.offerPrice to sort the
search results (json objects) by that.
2013-11-08 16:16:13 -08:00
Matt Wells
726fdb4873
fix that json RE-encoding bug
2013-10-24 18:09:35 -07:00
mwells
fa9f81bd7c
trying to fix json decoding bug.
...
make highlight class use safebuf.
2013-10-24 17:55:01 -07:00
Matt Wells
209e6db25f
do not match "isindexed" for getting
...
the diffbot api in XmlDoc::getUrlFilterNum().
do not supply SpiderReply to that function
b/c the spider reply is just being
generated.
2013-10-22 16:25:26 -07:00
Matt Wells
8f5bb4a787
a few core dump fixes. get crawl-delay
...
working a little. about half way done.
2013-10-22 15:44:10 -07:00
Matt Wells
a288217e9f
a few bug fixes
2013-10-17 18:59:00 -07:00
Matt Wells
fc17521697
Merge branch 'master' into diffbot
...
Conflicts:
Hostdb.cpp
Makefile
PageResults.cpp
PageRoot.cpp
Pages.cpp
Rdb.cpp
SearchInput.cpp
SearchInput.h
Spider.cpp
Spider.h
XmlDoc.cpp
2013-10-16 14:28:42 -07:00
Matt Wells
f5e5b0f5d3
fix crawlbot bugs
2013-10-16 12:12:22 -07:00
mwells
a0808df2ae
got new diffbot api compiled
2013-10-14 18:19:59 -06:00
mwells
a562c65627
another code checkpoint. new json api
...
for crawlbot. new url filters for crawlbot.
2013-10-14 16:10:48 -06:00
mwells
0de777d80d
parser fixes
2013-10-11 17:35:12 -06:00
mwells
6d5643e185
json parsing
2013-10-11 16:14:26 -06:00
Matt Wells
ed0fbf2b99
fix core from not decoding json properly.
2013-10-08 11:46:18 -07:00
mwells
6c2c9f7774
trying to bring back dmoz integration.
2013-10-02 22:34:21 -06:00
mwells
7cdb3d6f9c
fix infinite loop from json parsing and
...
fix some core dumps.
2013-09-27 17:52:36 -06:00
mwells
5fbf323cb5
json api now shows all collections
...
and their relevant parms and stats
for /crawlbot?token=xxx&format=json
2013-09-25 16:59:31 -06:00
Matt Wells
5dc7bd2ab4
integrate diffbot from svn back into git.
2013-09-13 09:23:18 -07:00
Matt Wells
94e6492916
removed MAX_COLL_RECS so we can have unlimited
...
collections, really limited by the sizeof(collnum_t) only now,
which is 16bits, 15bits unsigned, which is the limitation.
can always expand this so we can have more than 32k collections.
2013-08-30 16:20:38 -07:00
Matt Wells
f6e560c1f4
Initial file population.
2013-08-02 13:12:24 -07:00