ref
https://linear.app/tryghost/issue/ONC-217/implement-the-deliverytime-option-in-mailgun-api-calls
Ghost experiences its highest peak load immediately after sending out a
newsletter, as it recieves an influx of traffic from users clicking on
the links in the email, a burst of email analytics events to process
from mailgun, and an increase in organic traffic to the site's frontend
as well as the admin analytics pages. The `BatchSendingService`
currently sends all the batches to Mailgun as quickly as possible, which
may contribute to higher peak loads.
This commit adds a `deliverytime` parameter to our API calls to Mailgun,
which allows us to specify a time in the future when we want the email
to be delivered. This will allow us to moderate the rate at which emails
are delivered, and in turn that should moderate the peak traffic volume
that Ghost receives in the first 2-3 minutes after sending an email.
The `deliverytime` is calculated based on a configurable parameter:
`bulkEmail.targetDeliveryWindow`, which specifies the maximum allowable
time (in milliseconds) after the email is first sent for Ghost to
instruct Mailgun to deliver the emails. Ghost will attempt to space out
all the batches as evenly as possible throughout the specified window.
For example, if the targetDeliveryWindow is set to `300000` (5 minutes)
and there are 100 batches, Ghost will set the `deliveryTime` for each
batch ~3 seconds apart.
ref https://github.com/TryGhost/Ghost/pull/20835
- reimplemented email analytics changes that prioritized opened events
over other events in order to speed up open analytics
- added db persistence to fetch missing job to ensure we re-fetch every
window of events, especially important if we restart following a large
email batch
We learned a few things with the previous trial run of this. Namely,
that event throughput is not as high as we initially saw in the data for
particularly large databases. This set of changes is more conservative,
while a touch more complicated, in ensuring we capture edge cases for
really large newsletter sends (100k+ members).
In general, we want to make sure we're fetching new open events at least
every 5 mins, and often much faster than that, unless it's a quiet
period (suggesting we haven't had a newsletter send or much outstanding
event data).
- this change contains the removal of the `promise.allsettled` package,
as this is not needed on Node 12+, which removes 75 further dependencies
in production mode
ref https://linear.app/tryghost/issue/CFR-4/
- added request queueing middleware (express-queue) to handle high
request volume
- added new config option `optimization.requestQueue`
- added new config option `optimization.requestConcurrency`
- added logging of request queue depth - `req.queueDepth`
We've done a fair amount of investigation around improving Ghost's
resiliency to high request volume. While we believe this to be partly
due to database connection contention, it also seems Ghost gets
overwhelmed by the requests themselves. Implementing a simple queueing
system allows us a simple lever to change the volume of requests Ghost
is actually ingesting at any given time and gives us options besides
simply increasing database connection pool size.
---------
Co-authored-by: Michael Barrett <mike@ghost.org>
refs https://github.com/TryGhost/Product/issues/4053
This adds the feature flag. If enabled, the list-unsubscribe header
should be set. The value currently is only for testing purposes and
probably won't work yet.
refs https://github.com/TryGhost/Ghost/issues/15725
This pull request adds a new configuration option for the Mailgun email
provider that allows the user to set the maximum number of recipients
per email batch via a new config option `bulkEmail.batchSize`
refs: https://github.com/TryGhost/Toolbox/issues/595
We're rolling out new rules around the node assert library, the first of which is enforcing the use of assert/strict. This means we don't need to use the strict version of methods, as the standard version will work that way by default.
This caught some gotchas in our existing usage of assert where the lack of strict mode had unexpected results:
- Url matching needs to be done on `url.href` see aa58b354a4
- Null and undefined are not the same thing, there were a few cases of this being confused
- Particularly questionable changes in [PostExporter tests](c1a468744b) tracked [here](https://github.com/TryGhost/Team/issues/3505).
- A typo see eaac9c293a
Moving forward, using assert strict should help us to catch unexpected behaviour, particularly around nulls and undefineds during implementation.
As discussed with the product team we want to enforce kebab-case file names for
all files, with the exception of files which export a single class, in which
case they should be PascalCase and reflect the class which they export.
This will help find classes faster, and should push better naming for them too.
Some files and packages have been excluded from this linting, specifically when
a library or framework depends on the naming of a file for the functionality
e.g. Ember, knex-migrator, adapter-manager
- we have calls to the metrics library so we can measure the time it
takes the Mailgun API to return a response
- however, there's a bug in the code whereby if the `batchHandler`
takes a long time and then throws an error, this time will be reported
to metrics
- this is misleading because it looks like Mailgun is taking a long time
if the databases are slow
- this pulls the specific SDK call out into a function so it's easier to
wrap with timing code
fixes https://github.com/TryGhost/Team/issues/2562
New event fetching loops:
- Reworked the analytics fetching algorithm. Instead of starting again
where we stopped during the last fetching minus 30 minutes, we now just
continue where we stopped. But with ms precision (because no longer
database dependent after first fetch), and we stop at NOW - 1 minute to
reduce chance of missing events.
- Apart from that, a missing fetching loop is introduced. This fetches
events that are older than 30 minutes, and just processes all events a
second time to make sure we didn't skip any because of storage delays in
the Mailgun API.
- A new scheduled fetching loop, that allows us to schedule between a
given start/end date (currently only persisted in memory, so stops after
a reboot)
UI and endpoint changes:
- New UI to show the state of the analytics 'loops'
- New endpoint to request the analytics loop status
- New endpoint to schedule analytics
- New endpoint to cancel scheduled analytics
- Some number formatting improvements, and introduction of 'opened'
count in debug screen
- Live reload of data in the debug screen
Other changes:
- This also improves the support for maxEvents. We can now stop a
fetching loop after x events without worrying about lost events. This is
used to reduce the fetched events in the missing and scheduled event
loop (e.g. when the main one is fetching lots of events, we skip the
other loops).
- Prevents fetching the same events over and over again if no new events
come in (because we always started at the same begin timestamp). The
code increases the begin timestamp with 1 second if it is safe to do so,
to prevent the API from returning the same events over and over again.
- Some optimisations in handing the processing results (less merges to
reduce CPU usage in cases we have lots of events).
Testing:
- You can test with lots of events using the new mailgun mocking server
(Toolbox repo `scripts/mailgun-mock-server`). This can also simulate
events that are only returned after x minutes because of storage delays.
refs https://github.com/TryGhost/Team/issues/2486
Stop the event fetching loop as soon as we receive events that were
created later then when we started the loop. This ensures that we don't
miss events if we receive a giant batch of events that take a long time
to process.
refs https://github.com/TryGhost/Toolbox/issues/501
- this reverts commit 48dda23554
- also includes a resolution for `@elastic/elasticsearch` so we don't
run a version that is potentially problematic - see referenced issue
for context
- in the event the Mailgun config doesn't exist, we return `null` from
this function
- this updates the jsdoc to correct the return type of `getInstance`
- this was all getting terribly behind so I've done several things:
- majority of `@tryghost/*` except Lexical packages
- gscan + knex-migrator to remove old `@tryghost/errors` usage
- bumped lockfile
refs: https://github.com/TryGhost/Ghost/issues/15725
- our users are having difficulties getting onboarded with mailgun
- we're adding an explicit and unique tag to all requests, to help mailgun detect when mail is being sent from Ghost