Commit Graph

20 Commits

Author SHA1 Message Date
Matt Wells
fbcd6b8afd display json objects that are not in arrays
in csv. show csv header. how to deal
with heterogenous object lists?
index spiderdate: for gbsortby:spiderdate.
added gbrevsortby: support.
2013-11-12 13:51:52 -08:00
Matt Wells
09f28b2f26 now we index all numbers that have field names
(so can't just be a number in the body) but it
can be in a meta tag or json item. then use
like gbsortby:products.offerPrice to sort the
search results (json objects) by that.
2013-11-08 16:16:13 -08:00
Matt Wells
726fdb4873 fix that json RE-encoding bug 2013-10-24 18:09:35 -07:00
mwells
fa9f81bd7c trying to fix json decoding bug.
make highlight class use safebuf.
2013-10-24 17:55:01 -07:00
Matt Wells
209e6db25f do not match "isindexed" for getting
the diffbot api in XmlDoc::getUrlFilterNum().
do not supply SpiderReply to that function
b/c the spider reply is just being
generated.
2013-10-22 16:25:26 -07:00
Matt Wells
8f5bb4a787 a few core dump fixes. get crawl-delay
working a little. about half way done.
2013-10-22 15:44:10 -07:00
Matt Wells
a288217e9f a few bug fixes 2013-10-17 18:59:00 -07:00
Matt Wells
fc17521697 Merge branch 'master' into diffbot
Conflicts:
	Hostdb.cpp
	Makefile
	PageResults.cpp
	PageRoot.cpp
	Pages.cpp
	Rdb.cpp
	SearchInput.cpp
	SearchInput.h
	Spider.cpp
	Spider.h
	XmlDoc.cpp
2013-10-16 14:28:42 -07:00
Matt Wells
f5e5b0f5d3 fix crawlbot bugs 2013-10-16 12:12:22 -07:00
mwells
a0808df2ae got new diffbot api compiled 2013-10-14 18:19:59 -06:00
mwells
a562c65627 another code checkpoint. new json api
for crawlbot. new url filters for crawlbot.
2013-10-14 16:10:48 -06:00
mwells
0de777d80d parser fixes 2013-10-11 17:35:12 -06:00
mwells
6d5643e185 json parsing 2013-10-11 16:14:26 -06:00
Matt Wells
ed0fbf2b99 fix core from not decoding json properly. 2013-10-08 11:46:18 -07:00
mwells
6c2c9f7774 trying to bring back dmoz integration. 2013-10-02 22:34:21 -06:00
mwells
7cdb3d6f9c fix infinite loop from json parsing and
fix some core dumps.
2013-09-27 17:52:36 -06:00
mwells
5fbf323cb5 json api now shows all collections
and their relevant parms and stats
for /crawlbot?token=xxx&format=json
2013-09-25 16:59:31 -06:00
Matt Wells
5dc7bd2ab4 integrate diffbot from svn back into git. 2013-09-13 09:23:18 -07:00
Matt Wells
94e6492916 removed MAX_COLL_RECS so we can have unlimited
collections, really limited by the sizeof(collnum_t) only now,
which is 16bits, 15bits unsigned, which is the limitation.
can always expand this so we can have more than 32k collections.
2013-08-30 16:20:38 -07:00
Matt Wells
f6e560c1f4 Initial file population. 2013-08-02 13:12:24 -07:00