Commit Graph

20 Commits

Author SHA1 Message Date
Matt Wells
a76f4c6974 just POST a full request for webhook now
so we can do application/json content type
2013-11-07 14:20:15 -08:00
Matt Wells
3e4db4f1bc show all crawl details in url webhook
notification in the post body.
2013-11-07 13:59:43 -08:00
Matt Wells
adf4d258ae better crawl status reporting.
allow for _ in coll names.
2013-10-30 10:00:46 -07:00
Matt Wells
20052e34fe made webhook return the crawl name
and status as X- fields in the mime.
2013-10-28 22:03:10 -07:00
Matt Wells
a5a7ab2434 added spider status msg to json output
to indicate if spider has hit a limit.
no longer disable spiders in xmldoc.cpp
when a crawl/process limit is hit. just
check for limit when spidering urls in
spider.cpp and if it is hit set
CollectionRec::m_spiderStatus[Msg] and
send email from there.
Added maxCrawlRounds parm.
2013-10-23 11:40:30 -07:00
Matt Wells
0e4d96b3f8 added "seeds" to json reply. store seed urls
(and deup them) in collrec. fixed some respidering
issues. any time we re-enter url filters
then rebuild the waiting tree.
2013-10-21 17:35:14 -07:00
Matt Wells
b589b17e63 fix collection resetting. 2013-10-18 15:21:00 -07:00
Matt Wells
a288217e9f a few bug fixes 2013-10-17 18:59:00 -07:00
mwells
ea859ef685 added 'gb emailmandrill' for testing.
got it working. it posts json, not url encoded.
2013-10-09 17:35:51 -06:00
mwells
c1c5c4e3d0 send notifications if no urls available
for immediate spidering.
2013-10-09 15:24:35 -06:00
Matt Wells
283ec2f6b4 email and webhook alerts when spider runs out of urls
to spider.
2013-10-09 11:42:56 -07:00
Matt Wells
3702a05d64 add sendEmailThroughMandrill() to send
through mail chimp http api.
2013-10-08 18:01:38 -07:00
Matt Wells
fe97e08281 move from groups to shards. got rid of annoying
groupid bit mask thing.
2013-10-04 16:18:56 -07:00
mwells
259ec08e09 email hook now works but you have to
supply the IP address of your sendmail
server and it has to allow email
forwarding from host #0's IP. specify
the sendmail server's IP in the Master
Controls.
2013-10-02 09:36:44 -06:00
mwells
45941e4b2f fix notification system. 2013-10-01 17:30:06 -06:00
mwells
3fecb3eb1f got email and url notification code compiling.
when crawl hits a limit we do notifications.
2013-10-01 15:14:39 -06:00
Matt Wells
a412c798bf Merge branch 'master' into diffbot
Conflicts:
	PageResults.cpp
2013-09-13 09:24:28 -07:00
mwells
34b6d3e74a fixed some cores. brought in fixes from
old repo.
2013-09-08 16:16:13 -06:00
Matt Wells
94e6492916 removed MAX_COLL_RECS so we can have unlimited
collections, really limited by the sizeof(collnum_t) only now,
which is 16bits, 15bits unsigned, which is the limitation.
can always expand this so we can have more than 32k collections.
2013-08-30 16:20:38 -07:00
Matt Wells
f6e560c1f4 Initial file population. 2013-08-02 13:12:24 -07:00