Commit Graph

12 Commits

Author SHA1 Message Date
Matt Wells
e346a14a47 added logic to retry diffbot reply on connection reset,
connection timed out or gateway timed out (http status 504)
msgs.  added logic to detect truncated json (missing final })
and not print it. also, at index time, we set a diffbot missing
curly error to g_errno so the whole url can be retried later.
2015-03-09 20:54:34 -07:00
Matt
4c19453ea9 working with -m32 for basic testing.
compiles for 64-bit.
2014-11-12 11:38:37 -08:00
Matt
96b8197ad3 now it compiles with -m32 2014-11-10 14:45:11 -08:00
Matt Wells
c5ae5ca4b5 v3 support for tokenized diffbot replies
using the "objects" array in the json.
2014-05-12 16:13:24 -07:00
Matt Wells
9c26b85c2f fixed contenthash32 logic for json objects.
fixed hashing of numbers/bools for json objects.
added m_dupCache to reduce spiderrequests added to spiderdb.
do not add urls to waitingtree if ufn is obviously filtered/banned.
do not spider spiderrequest from doledb is maxoutperip would
be violated.
2014-02-05 13:22:03 -08:00
Matt Wells
e0a15194e1 fix json double decoding issue. no more
partial decodes, json parser stores
fully decoded string into separate buf.
2013-11-22 14:16:14 -08:00
Matt Wells
fbcd6b8afd display json objects that are not in arrays
in csv. show csv header. how to deal
with heterogenous object lists?
index spiderdate: for gbsortby:spiderdate.
added gbrevsortby: support.
2013-11-12 13:51:52 -08:00
Matt Wells
a288217e9f a few bug fixes 2013-10-17 18:59:00 -07:00
Matt Wells
9d6c3626d8 json indexing/hashing updates. 2013-10-16 15:41:12 -07:00
mwells
a562c65627 another code checkpoint. new json api
for crawlbot. new url filters for crawlbot.
2013-10-14 16:10:48 -06:00
mwells
0de777d80d parser fixes 2013-10-11 17:35:12 -06:00
mwells
6d5643e185 json parsing 2013-10-11 16:14:26 -06:00