Commit Graph

526 Commits

Author SHA1 Message Date
Matt Wells
6b9d0656ff Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot 2013-11-15 09:36:09 -08:00
Matt Wells
3258360679 do not do array break up substition logic
on diffbot replies if not of type product or image.
it was breaking up the images array WITHIN an
article type.
2013-11-15 09:35:32 -08:00
Matt Wells
bf75ac6a0d fix page process pattern parsing 2013-11-15 09:34:47 -08:00
Matt Wells
3563f0643f fix little but of using product not image 2013-11-15 09:13:27 -08:00
Matt Wells
fe1a7d1a75 rdbbase not fully resetting? it was
trying to dump to coll directories that
had been moved to trash folder.
and printing out "deleted from under us".
at least it was corrupting data in RdbMem
this time because i added m_dumpErrno logic.
2013-11-15 09:01:58 -08:00
Matt Wells
9ed40a1112 hacky hacks 2013-11-14 16:59:50 -08:00
Matt Wells
bb964ac214 fix core 2013-11-14 16:28:23 -08:00
Matt Wells
b0e40ae68b fix bad json bug 2013-11-14 15:05:15 -08:00
Matt Wells
1518778405 fix for bad json splicing 2013-11-14 14:42:31 -08:00
Matt Wells
7fc8b6a005 fix oopsy 2013-11-14 14:09:05 -08:00
Matt Wells
7c84b6ee0b show restart crawl button 2013-11-14 14:07:45 -08:00
Matt Wells
62432b3530 support for &restart=1 2013-11-14 14:02:56 -08:00
Matt Wells
3033684a8d fix for json parsing.
added restart=1 support
2013-11-14 13:16:08 -08:00
Matt Wells
9059aa8a01 fix link 2013-11-14 12:53:49 -08:00
Matt Wells
be213ca28f now fix embedded products and images in the diffbot
json reply properly!
2013-11-14 12:51:34 -08:00
Matt Wells
28cd1e6490 you can submit action then expression now. 2013-11-14 09:54:36 -08:00
Matt Wells
8534914902 fix core when xmldoc::getmsg20reply is called 2013-11-14 09:32:18 -08:00
Matt Wells
5c0194c439 fix json validation bug 2013-11-13 19:29:33 -08:00
Matt Wells
eb719849a6 do not core on this dump error 2013-11-13 19:04:22 -08:00
Matt Wells
da013d1b18 fix invalid json bug of not ending
json items in images/products array
2013-11-13 18:44:15 -08:00
Matt Wells
45cc9bb112 fix a few nasty bugs 2013-11-13 18:31:26 -08:00
Matt Wells
a5c3b3b8f8 fix so spider does not say it is
done crawling right after you seed it!
2013-11-13 16:03:15 -08:00
Matt Wells
7020f66daa bulk api nominal updates 2013-11-13 14:30:51 -08:00
Matt Wells
9e77f1b2f6 Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot 2013-11-13 13:27:45 -08:00
Matt Wells
a31b13ad61 fix a few bugs. 2013-11-13 13:27:22 -08:00
Matt Wells
6cc4e6d980 added some more links to my gui 2013-11-12 17:05:13 -08:00
Matt Wells
7f038235e1 hack in a type:product or type:image
since product and image json elements
are taken from an array and lack those.
2013-11-12 16:57:14 -08:00
Matt Wells
df28c4e0c2 search results in csv format.
remove serps per page limit if custom crawl.
2013-11-12 16:33:45 -08:00
Matt Wells
38c8bec024 use gbspiderdate not spiderdate.
so gotta use gbsortby:gbspiderdate etc.
2013-11-12 13:55:47 -08:00
Matt Wells
fbcd6b8afd display json objects that are not in arrays
in csv. show csv header. how to deal
with heterogenous object lists?
index spiderdate: for gbsortby:spiderdate.
added gbrevsortby: support.
2013-11-12 13:51:52 -08:00
Matt Wells
364216ff16 fixed bugs in sort by prices, etc. 2013-11-11 18:58:45 -08:00
Matt Wells
4548098809 a couple more nominal updates 2013-11-11 16:10:47 -08:00
Matt Wells
ad61e9ea5a /v2/bulk api updates. 2013-11-11 15:52:04 -08:00
Matt Wells
7248641bc4 fix mem leaks. turn off electric fence. 2013-11-11 09:58:14 -08:00
Matt Wells
7efb743e65 nothing 2013-11-10 22:25:19 -08:00
Matt Wells
5aa1609350 Merge branch 'master' into diffbot 2013-11-10 22:11:39 -08:00
Matt Wells
af678b7c1b fix a few bugs. 2013-11-10 22:11:13 -08:00
Matt Wells
105a201cde fix mem leak.
check if tree writes are disabled and block
until not when deleting/resetting a collection.
just like we do it tree is being saved.
2013-11-10 16:28:00 -08:00
Matt Wells
810a6918fd Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot 2013-11-10 09:41:19 -08:00
Matt Wells
3afac4812d fix bug of trying to del/reset coll while
disable writing was engaged. we already
had it check to see if tree was saving,
but not if writes were disabled. so it
gets ETRYAGAIN and retries later.
2013-11-10 09:40:32 -08:00
Matt Wells
b1e98aa4b8 fix core. 2013-11-08 21:33:37 -07:00
Matt Wells
e395628d5a use &format=0 1 or 2 for html/xml/json now.
use &icc=1 to get dump of json objects in serps.
2013-11-08 18:00:30 -08:00
Matt Wells
aa9a77674f fixed oopsy when parsing float words 2013-11-08 16:25:23 -08:00
Matt Wells
09f28b2f26 now we index all numbers that have field names
(so can't just be a number in the body) but it
can be in a meta tag or json item. then use
like gbsortby:products.offerPrice to sort the
search results (json objects) by that.
2013-11-08 16:16:13 -08:00
Matt Wells
9895ad093f fix that pesky spider start time bug. 2013-11-07 16:43:02 -08:00
Matt Wells
a76f4c6974 just POST a full request for webhook now
so we can do application/json content type
2013-11-07 14:20:15 -08:00
Matt Wells
ab9a3b1798 out download links to api output for a crawl 2013-11-07 14:07:38 -08:00
Matt Wells
3e4db4f1bc show all crawl details in url webhook
notification in the post body.
2013-11-07 13:59:43 -08:00
Matt Wells
2ae04cff71 return crawl delete reply in json.
take out EDOCEVILREDIRECT errors.
2013-11-07 09:55:47 -08:00
Matt Wells
3b929917d1 do not site cluster or do dup removal
in crawlbot search results
2013-11-07 09:40:31 -08:00