Commit Graph

23 Commits

Author SHA1 Message Date
Matt Wells
d03028ea93 bulk api post truncation fix 2013-12-17 10:03:46 -08:00
Matt Wells
5e4b5a112c Merge branch 'master' into diffbot
Conflicts:

	PageResults.cpp
	Threads.cpp
	XmlDoc.cpp
	XmlDoc.h
2013-12-07 11:34:26 -07:00
Matt Wells
5da41cd113 fix a couple different cores. 2013-11-24 19:46:44 -07:00
Matt Wells
7020f66daa bulk api nominal updates 2013-11-13 14:30:51 -08:00
Matt Wells
e395628d5a use &format=0 1 or 2 for html/xml/json now.
use &icc=1 to get dump of json objects in serps.
2013-11-08 18:00:30 -08:00
Matt Wells
3e4db4f1bc show all crawl details in url webhook
notification in the post body.
2013-11-07 13:59:43 -08:00
Matt Wells
2c7035ac2b do not truncate diffbot reply 2013-11-05 11:17:54 -08:00
Matt Wells
22f9e9355d /v2/bulk api fixes 2013-10-22 18:51:09 -07:00
Matt Wells
d16e5d37f1 tested robots crawl-delay directive
by forcing a 10.1 second delay for
diffbot.com in XmlDoc.cpp.
seemed to work after a few fixes.
however, it is ultimately only
an IP-based crawl delay, although
the delay applies to all subdomains
on the same domain, it's just that each IP
has its own timer for that delay.
2013-10-22 17:41:52 -07:00
Matt Wells
8c3a61f070 /v2/crawl api 2013-10-22 12:25:37 -07:00
mwells
ea859ef685 added 'gb emailmandrill' for testing.
got it working. it posts json, not url encoded.
2013-10-09 17:35:51 -06:00
Matt Wells
3702a05d64 add sendEmailThroughMandrill() to send
through mail chimp http api.
2013-10-08 18:01:38 -07:00
mwells
76c9f47498 file download api updates.
to include collection name in filename
being downloaded.
2013-09-30 11:10:43 -06:00
Matt Wells
c0f1330d70 Merge branch 'master' into diffbot
Conflicts:

	HttpServer.cpp
	Makefile
	PageGet.cpp
	Pages.h
	SafeBuf.h
2013-09-28 13:13:12 -07:00
mwells
5884951190 only do certain things if running
on a machine in matt wells datacenter.
like fan switching based on temps,
or printing seo links. made seo functions
weak overridable placeholder stubs so if
seo.o is linked in it will override.
include seo.o object if seo.cpp file exists
for automatic seo module building and linking.
2013-09-28 13:43:56 -06:00
mwells
fd081478de fix crawlbot to work on a distributed network
as far as adding/deleting/resetting  colls
and updating parms. ideally we'd have a Colldb
Rdb where each key was a parm. that would make
syncing easier if a host went down, then it would
get the negative/positive colldb parm keys later.
so it could sync up on all your operations as long
as all your operations in terms of adding and deleting
database key/value pairs.
2013-09-26 22:41:05 -06:00
Matt Wells
02bf6ab3cc new crawlbot api. not backwards compatible any more. 2013-09-17 10:25:54 -07:00
Matt Wells
93ce424d99 start working on the main gui for
crawlbot which is /crawlbot
2013-09-13 16:22:07 -07:00
Matt Wells
5dc7bd2ab4 integrate diffbot from svn back into git. 2013-09-13 09:23:18 -07:00
mwells
82ee2dfed7 fix cores when spider is unzipping
gzipped web pages.
2013-08-28 22:49:22 -06:00
mwells
e9297df240 listen on DNS port 5998 not 6000. 6000 seemed
to cause issues on a particular install for
some reason.
2013-08-19 15:02:27 -06:00
mwells
0b94b31fbc Fix potential core issue in proxy. 2013-08-08 15:14:36 -06:00
Matt Wells
f6e560c1f4 Initial file population. 2013-08-02 13:12:24 -07:00