Commit Graph

26 Commits

Author SHA1 Message Date
Kevin Ansfield
e177e7e1b5 Fixed analytics require error
refs a06064b115

- fixed incorrect path in require for base email analytics service
2021-03-02 08:26:42 +00:00
Kevin Ansfield
a06064b115 Fixed email analytics require error
refs https://github.com/TryGhost/Ghost/pull/12541
refs https://github.com/TryGhost/Ghost/pull/12689

- the analytics job had been switched to create it's own instance of EmailAnalyticsService to avoid requiring logging but the analytics extraction branch was created before this change and wasn't picked up when merging
- pulled `queries` option object into a separate file for re-use
- updated `fetchLatest` job to conform to extracted library interface
2021-03-02 08:22:11 +00:00
Kevin Ansfield
11802ebee0
Extracted email analytics library code to external packages (#12541)
closes https://github.com/TryGhost/Team/issues/493

- all functionality except that directly related to Ghost's database and business logic now lives in external packages
  - @tryghost/email-analytics-service
  - @tryghost/email-analytics-provider-mailgun
2021-03-02 07:26:33 +00:00
Kevin Ansfield
08e1268aed
Added migration to remove surrounding <> in email_batches.provider_id (#12673)
refs https://github.com/TryGhost/Team/issues/221#issuecomment-759105424

- Mailgun responds to an email send with a provider id in the format `<x@y.com>` but everywhere else it's used in their API it uses the format `x@y.com`
- updates email batch save to strip the brackets, and migration removes brackets from existing records so we no longer have to add special handling for the stored id any time we use it
2021-02-23 08:48:21 +00:00
Kevin Ansfield
42e452b127
Removed models require from analytics job (#12689)
refs https://github.com/TryGhost/Ghost/issues/12496

By requiring the models layer the shared logging util was being required as a side-effect causing the open file descriptors problem to continue. Removing logging from the models layer isn't feasible due to deep require chains spreading across the codebase, it's much quicker to remove the need for models in the analytics job.

- models layer was only needed because it's used by the session service
- updated analytics job to create it's own instance of `EmailAnalyticsService` rather than the default instance in order to pass in custom dependencies
- pass in custom `logging` object that uses `parentPort.postMessage` as a way of writing log output
- pass in custom `settings` object that returns settings that have been manually fetched and cached during job instantiation
2021-02-22 12:10:19 +00:00
Naz
8a718ca99a Migrated jobs to use parentPort.postMessage
refs https://github.com/TryGhost/Ghost/issues/12496

- Using ghost-ignition logging caused file handle leaks. As there is no straight-forward way to handle write streams with bunyan (ghost-ignition's underlying logging library) this method of logging was chosen as an alternative to keep the amount of open file handles to minimum
- The follow up changes will include custom formatter for jobs service which should make logging match the same format  as has been used inside the jobs
2021-02-22 20:02:00 +13:00
Kevin Ansfield
63f7f9a827 🐛 Disabled auto-unsubscribe of members on permanent email failure events
refs https://github.com/TryGhost/Team/issues/446

Mailgun permanent failure events do not always correspond to unsubscribe-level events as originally thought, meaning some members could be unsubscribed unexpectedly due to delivery hiccups.

- disabled auto-unsubscribe on permanent failure events in the analytics event processor
- list maintenance will be added back in the future via alternative means
2021-01-12 18:40:31 +00:00
Naz
b2e7d2bf06 Bumped job-manager version to 0.7.0
closes https://github.com/TryGhost/Ghost-Utils/issues/122
2021-01-06 17:48:05 +13:00
Naz
c1e3788570 Added central error handler to job manager
refs https://github.com/TryGhost/Ghost-Utils/issues/118

- Duplicating error handling across jobs is not best developer experience. Also, having custom error handling logic did not allow for recommended worker script behavior: allowing for unhandled exceptions to bubble up and be managed by parent process
2020-12-14 18:01:41 +13:00
Kevin Ansfield
9586d1ce56 Fixed sqlite storing member email open rate as floats
no issue

- sqlite will store a float in an integer column due to it's type affinity resulting in long decimal numbers in the UI when we're expecting an integer
- use the `ROUND()` function to ensure we're storing integers in place of floats when performing open rate average calculations
2020-12-10 10:00:25 +00:00
Kevin Ansfield
baf635e27b Updated email stats aggregator for new member count columns
refs https://github.com/TryGhost/Ghost/issues/12461

- added two default aggregations for overall email count and opened email count
- when number of tracked emails is sufficient add the open rate aggregation to the update query
2020-12-09 13:14:39 +00:00
Kevin Ansfield
567eb6325f
Added members.email_open_rate aggregation to email analytics (#12458)
refs https://github.com/TryGhost/Ghost/issues/12421
requires https://github.com/TryGhost/Ghost/pull/12457

- updates stats aggregator to calculate and store an open rate for each member
  - uses two queries because I couldn't find a reasonable approach to perform the update in a single query as per the email aggregation
  - benchmarked locally at <1sec/1000members
  - will not store an open rate unless the number of tracked emails sent to a member is above a certain threshold (defaults to 5) to avoid new members being heavily weighted
- fixes typo in EmailAnalytics that was stopping member stats from being aggregated
2020-12-08 12:43:10 +00:00
Kevin Ansfield
84ae8585c6 Added guard for page.items existing in Mailgun response
no issue

- it's possible to get Mailgun responses where the `items` array doesn't exist so we need to guard against that so we don't error
2020-12-07 11:07:25 +00:00
Kevin Ansfield
a945ae7c02 Added sentry error capturing to email analytics jobs
no issue

- jobs operate in their own process and so do not share any of the error capturing that is configured in the parent process
2020-12-02 14:57:15 +00:00
Kevin Ansfield
c40a8e978d Fixed incorrect knex syntax
refs 166d072cd4

- accidentally committed an earlier attempt 🤦🏻‍♂️ updated with the valid syntax
2020-12-02 14:53:10 +00:00
Kevin Ansfield
166d072cd4 Fixed email analytics job not being registered when creating an email
no issue

- job registration was checking for submitted emails in it's email count but the job registration method is called as soon as an email is created meaning the email has a status of 'pending' which prevented the analytics job from being started until a second email was sent
2020-12-02 14:47:18 +00:00
Kevin Ansfield
f802128cfc
Added emailAnalytics config feature flag (#12443)
no issue

- email analytics may be desirable to fully switch off in certain circumstances, when that happens we want to prevent related background jobs from running and expose the feature flag via the config endpoint in the Admin API so that clients can adjust accordingly
2020-12-02 13:22:12 +00:00
Kevin Ansfield
2951eb9eaf Added missing moment require
refs 92ef83c61a
2020-12-02 12:15:27 +00:00
Kevin Ansfield
92ef83c61a Adjusted email jobs registration to check for emails newer than 30 days
no issue

- if emails are older than 30 days we wouldn't be able to fetch any analytics for them and if a site used emails in the past but is no longer using them it doesn't make sense to keep potentially expensive background worker threads spinning up
2020-12-02 12:14:22 +00:00
Kevin Ansfield
d675278b0b
Prevented scheduling of recurring analytics jobs when not using emails (#12441)
no issue

- recurring jobs spin up worker threads which can be quite CPU intensive even when not performing much processing, this can be problematic in environments where there are many Ghost instances running
- updated the email job scheduling to be skipped on bootup when there are no emails in the database and to be started when the first email is created as long as we're not in testing env
- increase analytics job schedule from every 2 minutes to every 5 minutes to help spread the load further across instances
2020-12-02 08:17:44 +00:00
Kevin Ansfield
249fd4f06a Fixed email count check in email-analytics service
no issue

- raw knex `.count()` does not return a straight number, we need to handle an array of rowDataPacket objects
2020-12-01 15:02:04 +00:00
Kevin Ansfield
db7fff0b2a Fixed misleading emailAnalytics.fetchLatest debug statement 2020-12-01 10:16:41 +00:00
Kevin Ansfield
d604f96a9c Added no-emails guard check when fetching latest email events
no issue

- replicates the same guard used in `fetchLatest()` to prevent contacting Mailgun when there are no emails in the database
2020-12-01 10:15:31 +00:00
Kevin Ansfield
a29ac2691a
Fixed sqlite3 errors when email analytics jobs run (#12431)
no issue

- the 4.2.0 version of `sqlite3` that we're using is not compatible with `worker_threads`
- 5.0.0 should add support this but there are other errors
- 5.0.1 is released but not published (https://github.com/mapbox/node-sqlite3/issues/1386)
2020-11-26 15:12:12 +00:00
Kevin Ansfield
dcafebe379 Fixed linting 2020-11-26 13:11:18 +00:00
Kevin Ansfield
717543835c
Added email analytics service (#12393)
no issue

- added `EmailAnalyticsService`
  - `.fetchAll()` grabs and processes all available events
  - `.fetchLatest()` grabs and processes all events since the last seen event timestamp
  - `EventProcessor` passed event objects and updates `email_recipients` or `members` records depending on the event being analytics or list hygiene
    - always returns a `EventProcessingResult` instance so that progress can be tracked and merged across individual events, batches (pages of events), and total runs
    - adds email_id and member_id to the returned result where appropriate so that the stats aggregator can limit processing to data that has changed
    - sets `email_recipients.{delivered_at, opened_at, failed_at}` for analytics events
    - sets `members.subscribed = false` for permanent failure/unsubscribed/complained list hygiene events
  - `StatsAggregator` takes an `EventProcessingResult`-like object containing arrays of email ids and member ids on which to aggregate statistics.
  - jobs for `fetch-latest` and `fetch-all` ready for use with the JobsService
- added `initialiseRecurringJobs()` function to Ghost bootup procedure that schedules the email analytics "fetch latest" job to run every minute
2020-11-26 13:09:38 +00:00