no ref
Added Prometheus metrics for the job queue throughput and email analytics throughput. We'll likely keep these around as good metrics to keep an eye on, though for the moment their primary function is to establish a baseline for users w/o the job queue enabled so we can observe the full impact once switching it on.
ref https://linear.app/tryghost/issue/ENG-1556/
- added background job queue behind config flags
- when enabled, is only used for the member email analytics updates in
order to speed up the parent job, and take load off of the main process
that is serving requests
The intent here is to decouple certain code paths from the main process where it is unnecessary, or worse, where it's part of the request. Primary use cases are email analytics (particularly the member stats [open rate]) which are not particularly helpful in the period immediately following an email send, while the click traffic and delivered/opened events are.
Related, the email link clicks themselves send off a cascade of events that are quite a burden on the main process currently and are somewhat tied to the request response when they needn't be. We'll be looking to tackle that after some initial testing with the email analytics job.
ref https://linear.app/tryghost/issue/CFR-4/
- added request queueing middleware (express-queue) to handle high
request volume
- added new config option `optimization.requestQueue`
- added new config option `optimization.requestConcurrency`
- added logging of request queue depth - `req.queueDepth`
We've done a fair amount of investigation around improving Ghost's
resiliency to high request volume. While we believe this to be partly
due to database connection contention, it also seems Ghost gets
overwhelmed by the requests themselves. Implementing a simple queueing
system allows us a simple lever to change the volume of requests Ghost
is actually ingesting at any given time and gives us options besides
simply increasing database connection pool size.
---------
Co-authored-by: Michael Barrett <mike@ghost.org>
- this version is written in TS, but was published a few months ago and
needs to be bumped here
- also updates a previous deep include into the library, which was
unnecessary anyway
refs: https://github.com/TryGhost/Toolbox/issues/595
We're rolling out new rules around the node assert library, the first of which is enforcing the use of assert/strict. This means we don't need to use the strict version of methods, as the standard version will work that way by default.
This caught some gotchas in our existing usage of assert where the lack of strict mode had unexpected results:
- Url matching needs to be done on `url.href` see aa58b354a4
- Null and undefined are not the same thing, there were a few cases of this being confused
- Particularly questionable changes in [PostExporter tests](c1a468744b) tracked [here](https://github.com/TryGhost/Team/issues/3505).
- A typo see eaac9c293a
Moving forward, using assert strict should help us to catch unexpected behaviour, particularly around nulls and undefineds during implementation.
refs: https://github.com/TryGhost/Toolbox/issues/188
- some of our older packages used a pattern for linting which missed using test config for linting tests
- we need this to be consistent so that we can add more eslint rules for testing
- two packages also didn't use the lib pattern, which made the lint pattern error - so this was fixed as well
As discussed with the product team we want to enforce kebab-case file names for
all files, with the exception of files which export a single class, in which
case they should be PascalCase and reflect the class which they export.
This will help find classes faster, and should push better naming for them too.
Some files and packages have been excluded from this linting, specifically when
a library or framework depends on the naming of a file for the functionality
e.g. Ember, knex-migrator, adapter-manager
- we previously used `@stdlib/utils` instead of the child package
`@stdlib/copy`, which is a lot smaller and contains our only use of
the parent
- this saves 140+MB of dependencies
- we keep ending up with multiple versions of the depedency in our tree,
and it's causing problems when comparing instances
- the workaround I'm implementing for now is to bump the package
everywhere and set a resolution so we only have 1 shared instance
- hopefully we can come up with a better method down the line
- there's a weird situation when we have mixed versions of the
dependency because different libraries try to compare instances
- this brings the usage up to 1.2.21 so we can fix the build for now
refs https://github.com/TryGhost/Toolbox/issues/501
- this reverts commit 48dda23554
- also includes a resolution for `@elastic/elasticsearch` so we don't
run a version that is potentially problematic - see referenced issue
for context
fixes https://github.com/TryGhost/Team/issues/2366
refs https://ghost.slack.com/archives/C02G9E68C/p1670232405014209
Probem described in issue.
In the old MEGA flow:
- The `email_verification_required` check is now repeated inside the job
In the new email service flow:
- The `email_verification_required` is now checked (didn't happen
before)
- When generating the email batch recipients, we only include members
that were created before the email was created. That way it is
impossible to avoid limit checks by inserting new members between
creating an email and sending an email.
- We don't need to repeat the check inside the job because of the above
changes
Improved handling of large imports:
- When checking `email_verification_required`, we now also check if the
import threshold is reached (a new method is introduced in
vertificationTrigger specifically for this usage). If it is, we start
the verification progress. This is required for long running imports
that only check the verification threshold at the very end.
- This change increases the concurrency of fastq to 3 (refs
https://ghost.slack.com/archives/C02G9E68C/p1670232405014209). So when
running a long import, it is now possible to send emails without having
to wait for the import. Above change makes sure it is not possible to
get around the verification limits.
Refactoring:
- Removed the need to use `updateVerificationTrigger` by making
thresholds getters instead of fixed variables.
- Improved awaiting of members import job in regression test
- this was all getting terribly behind so I've done several things:
- majority of `@tryghost/*` except Lexical packages
- gscan + knex-migrator to remove old `@tryghost/errors` usage
- bumped lockfile
refs https://github.com/TryGhost/Team/issues/2339
- Includes a new pattern in the job manager that allows us to properly
await jobs.
- Added new convenience mocking methods to stub settings
- Tests the main flows for bulk sending:
- Sending in multiple batches
- Sending to multiple segments
- Handling a failed batch and retrying that batch
- Fixes bug in batch generation (ordering not working)
In a different PR I'll add more detailed tests.
fixes https://github.com/TryGhost/Team/issues/2310
This moves the processing of the events from the event-processor to a
new email-event-processor in the email-service package.
- The `EmailEventProcessor` only translates events from
providerId/emailId to their known emailId, memberId and recipientId, and
dispatches the corresponding events.
- Since `EmailEventProcessor` runs in a separate worker thread, we can't
listen for the dispatched events on the main thread. To accomplish this
communication, the events dispatched from the `EmailEventProcessor`
class are 'posted' via the postMessage method and redispatched on the
main thread.
- A new `EmailEventStorage` class reacts to the email events and stores
it in the database. This code mostly corresponds to the (now deleted)
subclass of the old `EmailEventProcessor`
- Updating a members last_seen_at timestamp has moved to the
lastSeenAtUpdater.
- Email events no longer store `ObjectID` because these are not
encodable across threads via postMessage
- Includes new E2E tests that test the storage of all supported Mailgun
events. Note that in these tests we run the processing on the main
thread instead of on a separate thread (couldn't do this because
stubbing is not possible across threads)
There are some missing pieces that will get added in later PRs (this PR
focuses on porting the existing functionality):
- Handling temporary failures/bounces
- Capturing the error messages of bounce events