ref https://linear.app/tryghost/issue/CFR-29
- Removed the mobiledoc and lexical columns from the posts input
serializer, meaning they will no longer be queried for.
Get helpers are essentially a gateway to the Content API. We already
strip out the mobiledoc and lexical fields in the output
serializer/returned response, but this means we're passing the mobiledoc
and lexical fields back from the db. This is pointless and these fields
are substantial in size - by far the largest fields in the whole ghost
db - leading to slowed performance.
I've updated the posts input serializer to strip out the lexical and mobiledoc
columns so we stop doing a `select *` with every query.
ref ENG-824
- the bug is causing resize prefixes being added to images served from
outside of Ghost.
- this now would only append the prefex to images served by Ghost and
other images urls' would get served as is.
- we can determine that by checking whether imageName doesn't exist,
meaning the source is a third party.
- this mostly affect edge case users, eg where a feature image url was
passed in via the API and doesn't get served by Ghost.
refs CFR-21
Reorganised middleware execution so that member data is not redundantly loaded for static assets or the sitemap.
---------
Co-authored-by: Michael Barrett <mike@ghost.org>
refs #20197
- adds a jackspeak resolution to Ghost core so we can try and ensure the compatible version of jackspeak/string-width is used when the lockfile is regenerated
ref https://linear.app/tryghost/issue/MOM-73
We need to add body parsing middleware here, so that NestJS has access to it.
We also attach the rawBody which is used to validate the HTTP Signatures
closes https://linear.app/tryghost/issue/MOM-80
- updated internal linking search results items
- removed visibility text from meta data
- added additional icon for paid/specific tier visibility
- added titles to icons
- bumped `@tryghost/koenig-lexical` to include support for meta icon titles
- bumped other Koenig packages due to sub-dependency updates
ref https://linear.app/tryghost/issue/CFR-27
- updated packages to include performance improvement for NQL filter
strings including multiple neq filters for the same resource
- bumped `bookshelf-plugins`
- bumped NQL versions
We identified a performance fix that allows us to combine not equal
(neq) filters for the same resource in a logically-equivalent way that
also has far more performant resulting SQL.
We're effectively automatically combining strings like
'tag:-tag1+tag:-tag2` into 'tag:-[tag1,tag2]'.
fix https://linear.app/tryghost/issue/SLO-104/cannot-read-properties-of-undefined-reading-0-an-unexpected-error
- if the request body didn't contain the correct keys, it'd just HTTP
500 out of there
- this adds some optional chaining so we end up with undefined if
anything isn't as expected, and the following if-statement does the
rest of the check for us
- this also adds a breaking test (the first E2E test for authentication, yay!)
refs https://docs.sentry.io/platforms/javascript/configuration/filtering/#using--1%20
- this simplifies our logic to determine whether we should send events
by moving the code to `beforeSend`
- `errorHandler` is going away in Sentry v8 so this results in a shorter
diff in the future
- the logic should be the same, always send non-Ghost errors, and only
send HTTP 500 Ghost errors
fix https://linear.app/tryghost/issue/SLO-101/http-500-with-invalid-multipart-data
- previously, busboy would error out if we supplied a body that was
invalid (such as an empty FormData)
- we would then return a HTTP 500 to the user, which causes all manner
of problems
- now we catch errors from busboy and return a nice BadRequestError
fix https://linear.app/tryghost/issue/SLO-85/fix-http-500-on-contentposts
- in the event we give the incorrect format in a filter, MySQL will
throw an error and we'll throw a HTTP 500 error
- we can capture this error and return a more useful error to the user
- ideally we'd do this in a validation step before attempting the query,
but parsing this out of NQL and detecting which columns are DATETIME
could be quite tricky
- this updates a bunch of places where we're just using Object to cheat
the system
- doing this means editor autocomplete and basic type checking is better
because we now have proper types in place
- functionality should not change, these are just comments
fix https://linear.app/tryghost/issue/SLO-87/cannot-read-properties-of-undefined-reading-createimpl-an-unexpected
refs https://github.com/jsdom/jsdom/issues/3709
- in the event we are given some HTML to parse, and that fails, we
currently return a HTTP 500 because it's unhandled
- the instance we saw was due to `<constructor>` crashing jsdom, we've
opened an issue for that
- in terms of handling the error gracefully, we can surround the code
in a try-catch and return a more suitable error. I've gone for a
ValidationError for now - you could debate whether a different one is
more appropriate
- also added Sentry error capturing so we're not blind to these,
ultimately we should make sure the parser can handle all
user-submitted data
- this adds a simple set of types to the @tryghost/api-framework
package that should describe all of the keys available on a
controller, and then rolls it out to all API controllers
- unfortunately, due to https://github.com/microsoft/TypeScript/issues/47107, we have
to split apart `module.exports` into a variable assignment in order for type-checking
to be done
- the main benefit of this is that `frame` is now typed, and editors understand what keys
are available, so intellisense works properly
- `statusCode` should be a number, but we were passing a string
- this doesn't really affect anything, but tsserver was flagging it up
as the wrong type
- we should pass it as `err` and not `error`
- this probably slipped in because the catch parameter is called
`error`, so I've updated that and fixed the references
- this helps tsserver figure out what the type of things is around our
codebase
- nothing crazy, mostly Express types for the middleware, application and router levels
fix https://linear.app/tryghost/issue/SLO-95/unexpected-end-of-multipart-data-for-broken-image-upload-request
- in the event the client sends an invalid body to the image or media
upload endpoints, Dicer will throw an error if the boundary data is
malformed
- previously, we've just been bubbling that up as an InternalServerError
and that results in an HTTP 500
- we can capture errors produced by dicer and return a handled
BadRequestError, as it's the client's fault
- also includes breaking tests
fix https://linear.app/tryghost/issue/SLO-94/unexpected-field-when-given-broken-image-upload-request
- in the event the body of an image or media upload request is malformed
(broken metadata / blob or something), we get a MulterError and this
bubbles up as an InternalServerError and spits out a HTTP 500
- we can capture this and return a BadRequestError, as it's the client's
fault for not providing the correct body
- this implements that and adds breaking tests
ref
https://linear.app/tryghost/issue/ENG-902/add-an-optional-timeout-in-the-redis-cache-adapter-in-case-redis
- Added an optional timeout parameter to AdapterCacheRedis, so that the
`get(key)` method will return `null` after the timeout if it hasn't
received a response from Redis
- When load testing the `LinkRedirectRepository` with the Redis cache
enabled on staging, we noticed that for some reason Redis stopped
responding to commands for ~30 seconds.
- The `LinkRedirectRepository` was waiting for the Redis cache to
respond and resulted in a drastic increase in response times for link
redirects
- This change will allow us to set a timeout on the `get(key)` method,
so that if Redis doesn't respond within the timeout, the method will
return `null` as if it were a cache miss.
- Then the `LinkRedirectRepository` will fall back to the database and
return the link redirect from the database instead of waiting
indefinitely for Redis to respond
fix https://linear.app/tryghost/issue/SLO-93/undefined-path-error-with-bad-image-upload
- in the event we receive a request to upload an image, that doesn't
contain an image, we still try and unlink the files
- this is a dangling promise, so it doesn't cause an explicit HTTP
error, but it does show up as a console error
- fixed it by checking for the path, and early returning if it doesn't
exist
- also added a test that would fail without this
ref https://linear.app/tryghost/issue/SLO-78
- the `POST /members/api/member` endpoint is solely used by the alpha
feature `membersSpamPrevention` and should not be available otherwise
fix https://linear.app/tryghost/issue/SLO-79/incorrectusageerror-the-url-httpsblogkongregatecompercentc0-couldnt-be
- we added this Sentry captureException whilst fixing a bug where
decodeUrl could fail, and throw a 500 exception
- we added handling for that case and returned an empty string, but we
also added Sentry error capturing
- at this point, I don't think we need to be capturing errors in Sentry,
because the issue is already handled, and it only usually happens with
malicious/incorrect URLs
- this is our #2 cause of Sentry alerts, so it's good to clean it up
- we don't need to deep require into the library as it exports what we
need on the surface
- this should unblock https://github.com/TryGhost/Ghost/pull/19002, as
it's randomly failing with this require
refs
[ENG-827](https://linear.app/tryghost/issue/ENG-827/🐛-crash-on-resizing-animated-gif)
Added a timeout to the image resizing middleware to prevent crashes when
an image is taking too long to resize. When the timeout is reached and
the image has not been resized, the middleware will return the original
image
ref
https://linear.app/tryghost/issue/ENG-851/implement-a-minimal-but-complete-version-of-redirect-caching-to
ref https://app.incident.io/ghost/incidents/55
Often immediately after sending an email, sites receive a large volume
of requests to LinkRedirect endpoints from members clicking on the links in
the email.
We currently don't cache any of these requests in our CDN, because we
also record click events, update the member's `last_seen_at` timestamp,
and send webhooks in response to these clicks, so Ghost needs to handle
each of these requests itself. This means that each of these LinkRedirect requests
hits Ghost, and currently all these requests hit the database to lookup
where to redirect the member to.
Each one of these requests can make up to 11 database queries, which can
quickly exhaust Ghost's database connection pool. Even though the
LinkRedirect lookup query is fairly cheap and quick, these queries aren't
prioritized over the "record" queries Ghost needs to handle, so they can
get stuck behind other queries in the queue and eventually timeout.
The result is that members are unable to actually reach the destination
of the link they clicked on, instead receiving a 500 error in Ghost, or
it can take a long time (60s+) for the redirect to happen.
This PR uses our existing `adapterManager` to cache the redirect lookups
either in-memory or in Redis (if configured — by default there is no caching). This only removes 1 out of
11 queries per redirect request, so it won't reduce the load on the DB
drastically, but it at least decouples the serving of the LinkRedirect from
the DB so the member can be redirected even if the DB is under heavy
load.
Local load testing results have shown a decrease in response times from
60 seconds to ~50ms for the redirect requests when handling 500 requests
per second, and reduced the 500 error rate to 0.
ref https://linear.app/tryghost/issue/ENG-826
- Changed staff deletion logic to do a bulk insert when adding a tag to
the users' associated posts
Staff deletion logic has really poor performance at scale because we do
individual updates for every post. If a user has dozens+ posts
(especially in a large db with thousands of posts), this can take >60s
and look like a timeout. Ultimately this should probably be a jobbed off
process, but for the time being we can improve this by doing a bulk
insert.
Note that this update uses the pattern for the bulk tagging of posts
from the right click (bulk) actions in the posts lists in Admin. With
bulk actions, **we do not trigger web hooks or the post.edited events**.
We will document this and follow up on this separately.
ref MOM61
- Adds admin-x react app we’ll use as ActivityPub playground to the
sidebar nav behind the feature flag.
- Wired up routing to Ember
- Setup the project as `admin-x-activitypub`
---------
Co-authored-by: Ronald Langeveld <hi@ronaldlangeveld.com>
ref https://linear.app/tryghost/issue/MOM-29
This is very rough, and all still behind a flag. The idea is that any public
post which is published gets added to the Outbox of the site Actor. We also
dispatch an event, which will be used to deliver the Activity to any relevant
inboxes, but that is outside the scope of this commit.
refs https://ghost-foundation.sentry.io/issues/5135326925/
- the service tends to 503 all the time, and we don't really care enough
for it to ping us in Sentry, as it's not something we control
- we can still keep logging the errors in case we need to go and look at
what went wrong
refs #20050
- Renovate seems to be unable to bump the package past the security
release, but unfortunately this release contains a breaking bug
- this commit manually bumps the package so we can get things flowing
again
- the security release doesn't really affect us, but we should still try
and keep on the latest
ref
https://linear.app/tryghost/issue/ENG-845/error-attempted-to-set-lexical-on-the-deleted-record
ref
[https://linear.app/tryghost/issue/ENG-854/🐛-deleting-imported-posts-makes-ghost-unresponsive](https://linear.app/tryghost/issue/ENG-854/%F0%9F%90%9B-deleting-imported-posts-makes-ghost-unresponsive)
- When deleting a post in the editor's Post Settings Menu, if the post
has unsaved changes (indicated by the hasDirtyAttributes property in the
editor), Admin will crash because it tries to save a post revision
before leaving the editor, but the post has already been deleted so
saving fails.
- This can occur when editing a post and quickly deleting it from the
Post Settings Menu before saving is completed.
- It can also occur when attempting to delete an imported post, as the
editor will parse the lexical from the server and may make some minor,
invisible-to-the-user changes to the lexical string locally (e.g. JSON
formatting, or updating the JSON to use extended version of base lexical
nodes), which triggers the same error.
- This fix bypasses the attempt to save a post revision when leaving the
editor if the post is already deleted, which allows the transition back
to the Posts route to succeed.
refs f39d1d3aa3
- similar to the commit above, the JSON parser changed between Node 18
and Node 20, so the error message changed too
- we actually just want to check the error is forwarded to the user, so
we can do that by getting the error message from JSON.parse and check
against that
ref https://linear.app/tryghost/issue/MOM-48
This required some structural changes to our NestJS setup so that we can mount
it on multiple parts of the Ghost express app.
We've used the RouterModule to allow adding submodules that are mounted on
different paths, and we've had to be explicit about the base path for each
module. We've also had to switch back to using the Module decorator, because
RouterModule doesn't work with DynamicModule definitions.
Now that the NestJS app has knowledge of the full path, we need to "reset" the
url & baseUrl when passing the request into NestJS so that it can correctly
match the path. This is probably needed for the frontend too, for subdirs, but
that causes further issues - as this in prototype stage, we'll look later
Another issue is that NestJS replaces the express app instance with its own,
which isn't an issue for the Admin API (though we've fixed it anyway for
consistency), but did cause problems for the frontend, because the express app
is where view engine and directory information is stored.
The fix for this is to save a reference to the original ghost express
application, and reattach it to the request if it is not handled by Nest
Now that we have the Nest app mounted on the frontend, we're able to have it
handle the /.well-known/webfinger route with a proper controller, which is nice!
ref MOM-31
ref https://linear.app/tryghost/issue/MOM-31
We'll be building a lot of the new code for ActivityPub in Nest, so we'll need
to have it enabled in Ghost to work.
- The RSS cache has lived for a really long time, but I'm not sure it's useful
- Want to be able to determine if it gets used much, and if not, then we can remove it
ref 78311591d0
- updated tests to not click a button on the setup/done screen that is no longer shown
- fixed setup flow showing an alert bar due to not handling the `TransitionAborted` error that is thrown by the setup/done->dashboard redirect
ref https://linear.app/tryghost/issue/KTLO-1/members-spam-signups
- Some customers are seeing many spammy signups ("hundreds a day") — our
hypothesis is that bots and/or email link checkers are able to signup by
simply following the link in the email without even loading the page in
a browser.
- Currently new members signup by clicking a magic link in an email,
which is a simple GET request. When the user (or a bot) clicks that link, Ghost
creates the member and signs them in for the first time.
- This change, behind an alpha flag, requires a new member to click the
link in the email, which takes them to a new frontend route `/confirm_signup/`, then submit a form on the page which sends a POST request to the
server. If JavaScript is enabled, the form will be submitted
automatically so the only change to the user is an extra flash/redirect
before being signed in and redirected to the homepage.
- This change is behind the alpha flag `membersSpamPrevention` so we can
test it out on a few customer's sites and see if it helps reduce the
spam signups. With the flag off, the signup flow remains the same as
before.
ref https://linear.app/tryghost/issue/ENG-790/remove-use-of-sub-queries-in-email-analytics
- the `delivered_at` column is typically entirely/nearly entirely filled with values meaning the `IS NOT NULL` query matches a huge number of rows that MySQL has to fetch from the index to count
- using `IS NULL` switches that behaviour around as it will now match very few rows which has been shown in testing to be considerably quicker
- after switching to `IS NULL` the query returns an "undelivered" count rather than a "delivered" count, in order to keep the rest of the system behaviour the same we can calculate the delivered count by subtracting the query result from the total number of emails sent which we can fetch using a very fast primary key lookup query on the `emails` table
closes https://linear.app/tryghost/issue/ENG-790/remove-use-of-sub-queries-in-email-analytics
Avoiding sub queries means we don't have a process tied up for longer than necessary and we can more easily see if one of the queries is non-performant.
- extracted the count queries into separate queries and used the retrieved values in the final update query
- removed a query by moving the email open rate calculation into JS as we've already fetched the necessary data before that point
ref https://linear.app/tryghost/issue/CFR-13
- enabled saving traces on browser test failure; this makes troubleshooting a lot easier
- updated handling in offers tests to ensure the tier has fully loaded in the UI (not just `networkidle`)
- updated publishing test to examine the publish button reaction to the save action response instead of a 300ms pause
In general, our tests use a lot of watching for 'networkidle' - and sometimes just raw timeouts - which do not scale well into running tests on CI. In particular, 'networkidle' does not work if we're expecting to see React components' state updates propagate and re-render. We should always instead look to the content which encapsulates the response and the UI updates. This is something we should tackle on a larger scale.
ref ENG-774
ref https://linear.app/tryghost/issue/ENG-774
Staff Tokens will have both a `user` and an `apiKey` present on the
`loadedPermissions`.
The check here for `apiKey` was written when we could assume that an
`apiKey` was an Admin Integration - so it completely overwrote the
previous `allowed` list. When we added the concept of Staff Tokens -
this resulted in a privilege escalation.
This is a good lesson in not using proxies or indicators for data, as
changes elsewhere can invalidate them - if we had been specific and
checked the role of the current actor we wouldn't've had this bug!
ref https://linear.app/tryghost/issue/CFR-4/
- added request queueing middleware (express-queue) to handle high
request volume
- added new config option `optimization.requestQueue`
- added new config option `optimization.requestConcurrency`
- added logging of request queue depth - `req.queueDepth`
We've done a fair amount of investigation around improving Ghost's
resiliency to high request volume. While we believe this to be partly
due to database connection contention, it also seems Ghost gets
overwhelmed by the requests themselves. Implementing a simple queueing
system allows us a simple lever to change the volume of requests Ghost
is actually ingesting at any given time and gives us options besides
simply increasing database connection pool size.
---------
Co-authored-by: Michael Barrett <mike@ghost.org>
ref ENG-728
ref https://linear.app/tryghost/issue/ENG-728
This is not used anywhere, and makes the code more complicated, it's a good
step toward simplifying permissions and pulling them out of the database.
ref ENG-728
ref https://linear.app/tryghost/issue/ENG-728
This is NOT a functionality change. The Post#permissible method unit
tests have been updated to pass `true` as `hasUserPermission` and we can
see that the permission functionality remains the same.
The permissible method of the post model is responsible for removing
permission based on the data that is being modified, but the permissions
module is setup to allow the permissible method to grant permission -
this means that we call permissible, even if the current actor doesn't
have permission, this results in code that is hard to understand and
manage.
We are going to be instead returning early if an actor does not have
permission, this will allow permissible method signatures to be greatly
simplified (removing the need for hasUserPermission, hasApiKeyPermission
& hasMemberPermission arguments).
fixes https://linear.app/tryghost/issue/ENG-746/http-500-responses-when-handle-image-sizes-middleware-hits-missing
- in the event a request comes in for a resized image, but the source
image does not exist, we return a rendered 404 page
- we do this because we pass the NotFoundError to `next`, which skips
over the static asset code where we return a plaintext 404
- also included a breaking test that ensure we go to the next middleware
without an error
ref ENG-761
ref https://linear.app/tryghost/issue/ENG-761
Creating these pipelines is expensive, and we don't want to do it
repeatedly for the same controller. Adding caching should reduce the
amount of time spent setting up pipelines for each usage of the `get`
helper.
ref https://linear.app/tryghost/issue/IPC-66/onboarding-checklist-v1
- Adds a basic version of a new onboarding checklist behind the feature
flag, without incomplete/complete state logic
- Links to Design settings, Members screen and new post
- Opens amodal that we’ll use as Share modal
---------
Co-authored-by: Daniël van der Winden <danielvanderwinden@ghost.org>