ref https://linear.app/ghost/issue/ANAL-115/data-retention
- The bad news here is I didn't notice that the tinybird web analytics
starter kit included a TTL on the analytics_events datasource of 60 days
- This means any data older than 60days was automatically dropped from
the table
- I updated this in the UI when I noticed it a few days ago, this makes
sure it can't come back
- The good news is that we don't have to implement anything to make this
work when we do get to the point where we want a TTL!
ref https://linear.app/tryghost/issue/ANAL-96/data-discrepancy-between-charts-when-filtering
- This adds a set of tests to describe what the data should look like when we filter on various values
- We have tests for source and browser which are pulled from different MVs
- The result files are generated using ./scripts/gen_test_results.sh, and then manually verified
- We know they are not yet fully correct
- Added yarn command to update TB CLI, as that needs doing frequently and I can never remember the command
- Improved safety & usability of tinybird test script by ensuring branches are correctly created before running & adding optional delete
- Updated tinybird test to warn only for sanity check as that's not always a valid check (Will prob remove soon)
- Improved output of tinybird test script on failure, so that the diff is readable and closer to what git shows you
- Added tool to convert tinybird ndjson to csv to make it easier to bring the data into google sheets for verifying numbers
- TODO: make these run in CI
- Right now you run them by running `yarn tb` and then `./script/branch_and_test.sh`
- These are snapshot tests that check we get the desired result
Co-authored-by: alejandromav <hi@alejandromav.com>
ref
https://linear.app/tryghost/issue/ANAL-60/click-through-filtering-for-sources
- In our stats page we use the referrer without a protocol or www, that
is the pure domain as our source that we output
- Meanwhile all the data pipelines had the full url as the referrer
passed through
- When we come to add clickthroughs/filtering, we'll need to use this
value to filter the data. If we have a different value locally in the UI
to what is in the DB, we won't be able to make the filters match
- Also, we pay for everything we store, and this removes all the
https:// and www. data
- Whilst we are in development, we can safely make changes to all
aspects of our pipeline without worrying
- This is because currently, it's safe to delete all data and start over
- This script removes everything excepts the analytics_events
datasource, and then recreates everything fresh, repopulating from the
datasource where possible
- This shouldn't be used after tinybird is in production, we need a
better change process
closes https://linear.app/tryghost/issue/ANAL-77/na-data-should-be-zero
ref https://www.tinybird.co/blog-posts/tips-9-filling-gaps-in-time-series-on-clickhouse
- Sometimes we have no matching data for a particular date/date range, which makes our charts look super janky
- Clickhouse has a feature to fill these in called WITH FILL, which makes it really easy to fix this!
- WITH FILL works except for on bounce rate. That seems to be due to the column being marked as nullable and so WITH FILL fills missing data with NULL instead of 0
- To fix that, I've updated the code that generates the bounce rate so that it doesn't generate nulls, and that seems to result in a not-nullable column, which then works with WITH FILL
- When I went off, I quickly recreated all our endpoints with some new functionality
- However, I forgot that I was manually managing tokens, this meant the UI for stats broke with a token error
- Adding the tokens to the endpoint definitions should prevent this happening again, by automating the management of the token scopes
closes https://linear.app/tryghost/issue/ANAL-23/filtering-by-logged-out-logged-in-traffic
- Updated all of our tinybird datasources and pipes to handle member status
- Added member_status as an array query param to the API endpoints
- Added a really dodgy power select multiple to the stats page to demonstrate it works (needs styling)
- Added all of the wiring so each chart updates
- This was done pretty fast, and may not be 100% right yet
ref https://linear.app/tryghost/issue/ANAL-27/setup-tinybird-project-and-cicd
- Tinybird has a system for managing it's configuration as code, with full ci/cd support
- The tinybird CLI tool uses python, so we'll run that using docker, via `yarn tb`
- Some of the files tinybird adds should not be in source control, so we've added those to git ignore
- Everything in /ghost/tinybird is tinybird's init config