Commit Graph

15 Commits

Author SHA1 Message Date
Matt Wells
9642947136 fix so host #0 will delete then re-add collections
that use the same collnum but have a different name.
fixed some unlabelled safebufs.
fix core when deleting collnum from tree/buckets that
is higher than Collectiondb.m_numRecs.
fix File::m_filename safebufs that were not freed on exit.
2015-08-18 14:09:16 -07:00
Zak Betz
7b507a70ef Set value length to 0 for something that does not return a string value
in Json.cpp.
Fix the '-' -> '_' when indexing generic fields.
Add a StackBuf macro which is a Safebuf initialized with a small
stack buffer for use in a local scope.
2015-06-30 14:09:57 -06:00
Matt
3e5218c54c fix gbssDocId:123456789, et al, query. will only work for docs indexed
after applying this fix.
2015-04-13 14:13:16 -06:00
Matt Wells
e346a14a47 added logic to retry diffbot reply on connection reset,
connection timed out or gateway timed out (http status 504)
msgs.  added logic to detect truncated json (missing final })
and not print it. also, at index time, we set a diffbot missing
curly error to g_errno so the whole url can be retried later.
2015-03-09 20:54:34 -07:00
Matt
4c19453ea9 working with -m32 for basic testing.
compiles for 64-bit.
2014-11-12 11:38:37 -08:00
Matt
96b8197ad3 now it compiles with -m32 2014-11-10 14:45:11 -08:00
Matt Wells
c5ae5ca4b5 v3 support for tokenized diffbot replies
using the "objects" array in the json.
2014-05-12 16:13:24 -07:00
Matt Wells
9c26b85c2f fixed contenthash32 logic for json objects.
fixed hashing of numbers/bools for json objects.
added m_dupCache to reduce spiderrequests added to spiderdb.
do not add urls to waitingtree if ufn is obviously filtered/banned.
do not spider spiderrequest from doledb is maxoutperip would
be violated.
2014-02-05 13:22:03 -08:00
Matt Wells
e0a15194e1 fix json double decoding issue. no more
partial decodes, json parser stores
fully decoded string into separate buf.
2013-11-22 14:16:14 -08:00
Matt Wells
fbcd6b8afd display json objects that are not in arrays
in csv. show csv header. how to deal
with heterogenous object lists?
index spiderdate: for gbsortby:spiderdate.
added gbrevsortby: support.
2013-11-12 13:51:52 -08:00
Matt Wells
a288217e9f a few bug fixes 2013-10-17 18:59:00 -07:00
Matt Wells
9d6c3626d8 json indexing/hashing updates. 2013-10-16 15:41:12 -07:00
mwells
a562c65627 another code checkpoint. new json api
for crawlbot. new url filters for crawlbot.
2013-10-14 16:10:48 -06:00
mwells
0de777d80d parser fixes 2013-10-11 17:35:12 -06:00
mwells
6d5643e185 json parsing 2013-10-11 16:14:26 -06:00