Matt Wells
6b9d0656ff
Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
2013-11-15 09:36:09 -08:00
Matt Wells
3258360679
do not do array break up substition logic
...
on diffbot replies if not of type product or image.
it was breaking up the images array WITHIN an
article type.
2013-11-15 09:35:32 -08:00
Matt Wells
bf75ac6a0d
fix page process pattern parsing
2013-11-15 09:34:47 -08:00
Matt Wells
3563f0643f
fix little but of using product not image
2013-11-15 09:13:27 -08:00
Matt Wells
fe1a7d1a75
rdbbase not fully resetting? it was
...
trying to dump to coll directories that
had been moved to trash folder.
and printing out "deleted from under us".
at least it was corrupting data in RdbMem
this time because i added m_dumpErrno logic.
2013-11-15 09:01:58 -08:00
Matt Wells
9ed40a1112
hacky hacks
2013-11-14 16:59:50 -08:00
Matt Wells
bb964ac214
fix core
2013-11-14 16:28:23 -08:00
Matt Wells
b0e40ae68b
fix bad json bug
2013-11-14 15:05:15 -08:00
Matt Wells
1518778405
fix for bad json splicing
2013-11-14 14:42:31 -08:00
Matt Wells
7fc8b6a005
fix oopsy
2013-11-14 14:09:05 -08:00
Matt Wells
7c84b6ee0b
show restart crawl button
2013-11-14 14:07:45 -08:00
Matt Wells
62432b3530
support for &restart=1
2013-11-14 14:02:56 -08:00
Matt Wells
3033684a8d
fix for json parsing.
...
added restart=1 support
2013-11-14 13:16:08 -08:00
Matt Wells
9059aa8a01
fix link
2013-11-14 12:53:49 -08:00
Matt Wells
be213ca28f
now fix embedded products and images in the diffbot
...
json reply properly!
2013-11-14 12:51:34 -08:00
Matt Wells
28cd1e6490
you can submit action then expression now.
2013-11-14 09:54:36 -08:00
Matt Wells
8534914902
fix core when xmldoc::getmsg20reply is called
2013-11-14 09:32:18 -08:00
Matt Wells
5c0194c439
fix json validation bug
2013-11-13 19:29:33 -08:00
Matt Wells
eb719849a6
do not core on this dump error
2013-11-13 19:04:22 -08:00
Matt Wells
da013d1b18
fix invalid json bug of not ending
...
json items in images/products array
2013-11-13 18:44:15 -08:00
Matt Wells
45cc9bb112
fix a few nasty bugs
2013-11-13 18:31:26 -08:00
Matt Wells
a5c3b3b8f8
fix so spider does not say it is
...
done crawling right after you seed it!
2013-11-13 16:03:15 -08:00
Matt Wells
7020f66daa
bulk api nominal updates
2013-11-13 14:30:51 -08:00
Matt Wells
9e77f1b2f6
Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
2013-11-13 13:27:45 -08:00
Matt Wells
a31b13ad61
fix a few bugs.
2013-11-13 13:27:22 -08:00
Matt Wells
6cc4e6d980
added some more links to my gui
2013-11-12 17:05:13 -08:00
Matt Wells
7f038235e1
hack in a type:product or type:image
...
since product and image json elements
are taken from an array and lack those.
2013-11-12 16:57:14 -08:00
Matt Wells
df28c4e0c2
search results in csv format.
...
remove serps per page limit if custom crawl.
2013-11-12 16:33:45 -08:00
Matt Wells
38c8bec024
use gbspiderdate not spiderdate.
...
so gotta use gbsortby:gbspiderdate etc.
2013-11-12 13:55:47 -08:00
Matt Wells
fbcd6b8afd
display json objects that are not in arrays
...
in csv. show csv header. how to deal
with heterogenous object lists?
index spiderdate: for gbsortby:spiderdate.
added gbrevsortby: support.
2013-11-12 13:51:52 -08:00
Matt Wells
364216ff16
fixed bugs in sort by prices, etc.
2013-11-11 18:58:45 -08:00
Matt Wells
4548098809
a couple more nominal updates
2013-11-11 16:10:47 -08:00
Matt Wells
ad61e9ea5a
/v2/bulk api updates.
2013-11-11 15:52:04 -08:00
Matt Wells
7248641bc4
fix mem leaks. turn off electric fence.
2013-11-11 09:58:14 -08:00
Matt Wells
7efb743e65
nothing
2013-11-10 22:25:19 -08:00
Matt Wells
5aa1609350
Merge branch 'master' into diffbot
2013-11-10 22:11:39 -08:00
Matt Wells
af678b7c1b
fix a few bugs.
2013-11-10 22:11:13 -08:00
Matt Wells
105a201cde
fix mem leak.
...
check if tree writes are disabled and block
until not when deleting/resetting a collection.
just like we do it tree is being saved.
2013-11-10 16:28:00 -08:00
Matt Wells
810a6918fd
Merge branch 'diffbot' of github.com:gigablast/open-source-search-engine into diffbot
2013-11-10 09:41:19 -08:00
Matt Wells
3afac4812d
fix bug of trying to del/reset coll while
...
disable writing was engaged. we already
had it check to see if tree was saving,
but not if writes were disabled. so it
gets ETRYAGAIN and retries later.
2013-11-10 09:40:32 -08:00
Matt Wells
b1e98aa4b8
fix core.
2013-11-08 21:33:37 -07:00
Matt Wells
e395628d5a
use &format=0 1 or 2 for html/xml/json now.
...
use &icc=1 to get dump of json objects in serps.
2013-11-08 18:00:30 -08:00
Matt Wells
aa9a77674f
fixed oopsy when parsing float words
2013-11-08 16:25:23 -08:00
Matt Wells
09f28b2f26
now we index all numbers that have field names
...
(so can't just be a number in the body) but it
can be in a meta tag or json item. then use
like gbsortby:products.offerPrice to sort the
search results (json objects) by that.
2013-11-08 16:16:13 -08:00
Matt Wells
9895ad093f
fix that pesky spider start time bug.
2013-11-07 16:43:02 -08:00
Matt Wells
a76f4c6974
just POST a full request for webhook now
...
so we can do application/json content type
2013-11-07 14:20:15 -08:00
Matt Wells
ab9a3b1798
out download links to api output for a crawl
2013-11-07 14:07:38 -08:00
Matt Wells
3e4db4f1bc
show all crawl details in url webhook
...
notification in the post body.
2013-11-07 13:59:43 -08:00
Matt Wells
2ae04cff71
return crawl delete reply in json.
...
take out EDOCEVILREDIRECT errors.
2013-11-07 09:55:47 -08:00
Matt Wells
3b929917d1
do not site cluster or do dup removal
...
in crawlbot search results
2013-11-07 09:40:31 -08:00