Commit Graph

18 Commits

Author SHA1 Message Date
Hannah Wolfe
4b1ce62ca9
Fix origin attribution in Tinybird analytics hits (#21187)
closes 
https://linear.app/tryghost/issue/ANAL-96/data-discrepancy-between-charts-when-filtering

- Atttribute the referral of the first hits to the whole session in `analytics_sources_mv`, while keeping the raw hits in `analytics_hits`
- Updated tests accordingly
- This is a rebased / reordered version of https://github.com/TryGhost/Ghost/pull/21166

Co-authored-by: alejandromav <hi@alejandromav.com>
Co-authored-by: Alejandro Martin <alejandromav@tinybird.co>
2024-10-02 17:27:43 +01:00
Hannah Wolfe
f082ba68e0 Added tinybird tests for filtering
ref https://linear.app/tryghost/issue/ANAL-96/data-discrepancy-between-charts-when-filtering

- This adds a set of tests to describe what the data should look like when we filter on various values
- We have tests for source and browser which are pulled from different MVs
- The result files are generated using ./scripts/gen_test_results.sh, and then manually verified
- We know they are not yet fully correct
2024-10-02 17:04:24 +01:00
Hannah Wolfe
60443726c9 Fixed tinybird test fixture data
ref https://linear.app/tryghost/issue/ANAL-96/data-discrepancy-between-charts-when-filtering

- This fixes the test data so that the session first hit and subsequent hits are in chronological order
- It also makes sure there isn't more than 30 minutes between hits, as our tracking script is only designed to keep sessions alive for 30 minutes so the data wasn't realistic
- NOTE: This data was generated by a script https://gist.github.com/ErisDS/25bb36f38d4c5a3f01d86f34ea5be707 - which didn't take these things into account

Co-authored-by: alejandromav <hi@alejandromav.com>
2024-10-02 17:04:24 +01:00
Hannah Wolfe
bce5d9d588
Updated tinybird tooling with usability improvements (#21185)
- Added yarn command to update TB CLI, as that needs doing frequently and I can never remember the command
- Improved safety & usability of tinybird test script by ensuring branches are correctly created before running & adding optional delete
- Updated tinybird test to warn only for sanity check as that's not always a valid check (Will prob remove soon)
- Improved output of tinybird test script on failure, so that the diff is readable and closer to what git shows you
- Added tool to convert tinybird ndjson to csv to make it easier to bring the data into google sheets for verifying numbers
2024-10-02 15:28:39 +01:00
Hannah Wolfe
7ebb208549 Fixed browser/device missing data in tinybird
ref https://linear.app/tryghost/issue/ANAL-96/data-discrepancy-between-charts-when-filtering

- The top browser and device endpoints were pulling from the sources MV, that is filtered to not have same-source traffic
2024-09-27 14:30:42 +01:00
Hannah Wolfe
606fcbabe7 Added a set of tests for our tinybird setup
- TODO: make these run in CI
- Right now you run them by running `yarn tb` and then `./script/branch_and_test.sh`
- These are snapshot tests that check we get the desired result

Co-authored-by: alejandromav <hi@alejandromav.com>
2024-09-27 14:30:42 +01:00
Hannah Wolfe
e3268c8c59 Renamed hits to pageviews in tinybird
closes https://linear.app/tryghost/issue/ANAL-111/rename-hits-to-pageviews-inside-of-tinybird

- We currently have two concepts: visits (unique visits) and pageviews (also called hits)
- We want to standardise on this terminology, so inside tinybird, we're going to call hit "pageviews" to make it super clear what's happening
2024-09-27 14:30:42 +01:00
Hannah Wolfe
908ff731d6
Updated tinybird unsafe deploy script (#21135)
ref https://github.com/TryGhost/Ghost/pull/21092

- This updates the unsafe deploy script to be safer and more useful
2024-09-26 15:54:08 +01:00
Hannah Wolfe
45211b2f4c
Fixed bounce rate on stats page (#21097)
closes
https://linear.app/tryghost/issue/ANAL-81/investigate-bounce-rate-looks-incorrect

- Think I've figured out what was wrong
- TODO: Figure out TinyBird's test pipeline, so we can verify this
2024-09-24 15:52:57 +01:00
Hannah Wolfe
7e27b1cb36
Clickthrough filtering for stats page (#21095)
closes
https://linear.app/tryghost/issue/ANAL-58/click-through-filtering-for-content
closes
https://linear.app/tryghost/issue/ANAL-60/click-through-filtering-for-sources
closes
https://linear.app/tryghost/issue/ANAL-61/click-through-filtering-for-locations

- This implements filtering and click-throughs for device, browser,
source, location and pathname.
- It requires significant updates to our tinybird setup, to pass through
all the right data and have them as parameters on the API endpoints
- We update the UI to add query parameters when clicking around and then
pass those through to every chart/request.
- We've added a interface to display the filters and remove them

---------

Co-authored-by: Peter Zimon <peter.zimon@gmail.com>
2024-09-24 15:26:08 +01:00
Hannah Wolfe
5ebdbe4e25
Renamed referrer without www to source in tinybird (#21094)
ref
https://linear.app/tryghost/issue/ANAL-60/click-through-filtering-for-sources

- In our stats page we use the referrer without a protocol or www, that
is the pure domain as our source that we output
- Meanwhile all the data pipelines had the full url as the referrer
passed through
- When we come to add clickthroughs/filtering, we'll need to use this
value to filter the data. If we have a different value locally in the UI
to what is in the DB, we won't be able to make the filters match
- Also, we pay for everything we store, and this removes all the
https:// and www. data
2024-09-24 14:16:50 +01:00
Hannah Wolfe
26e09dd6cc
Added script for unsafe redeploys to tinybird (#21092)
- Whilst we are in development, we can safely make changes to all
aspects of our pipeline without worrying
- This is because currently, it's safe to delete all data and start over
- This script removes everything excepts the analytics_events
datasource, and then recreates everything fresh, repopulating from the
datasource where possible
- This shouldn't be used after tinybird is in production, we need a
better change process
2024-09-24 13:53:19 +01:00
Hannah Wolfe
1c8513d94b Updated KPI pipe to fill missing data
closes https://linear.app/tryghost/issue/ANAL-77/na-data-should-be-zero
ref https://www.tinybird.co/blog-posts/tips-9-filling-gaps-in-time-series-on-clickhouse

- Sometimes we have no matching data for a particular date/date range, which makes our charts look super janky
- Clickhouse has a feature to fill these in called WITH FILL, which makes it really easy to fix this!
- WITH FILL works except for on bounce rate. That seems to be due to the column being marked as nullable and so WITH FILL fills missing data with NULL instead of 0
- To fix that, I've updated the code that generates the bounce rate so that it doesn't generate nulls, and that seems to result in a not-nullable column, which then works with WITH FILL
2024-09-19 11:32:49 +01:00
Hannah Wolfe
7c465a5fb5 Updated token handling to be in code
- When I went off, I quickly recreated all our endpoints with some new functionality
- However, I forgot that I was manually managing tokens, this meant the UI for stats broke with a token error
- Adding the tokens to the endpoint definitions should prevent this happening again, by automating the management of the token scopes
2024-09-19 11:32:49 +01:00
Hannah Wolfe
4c5704bfa6 Added audience handling to stats
closes https://linear.app/tryghost/issue/ANAL-23/filtering-by-logged-out-logged-in-traffic

- Updated all of our tinybird datasources and pipes to handle member status
- Added member_status as an array query param to the API endpoints
- Added a really dodgy power select multiple to the stats page to demonstrate it works (needs styling)
- Added all of the wiring so each chart updates
- This was done pretty fast, and may not be 100% right yet
2024-09-02 12:56:37 +01:00
Hannah Wolfe
08bf49eaec
Added full suite of tinybird datasources and pipes (#20882)
ref
https://linear.app/tryghost/issue/ANAL-27/setup-tinybird-project-and-cicd
ref
https://github.com/tinybirdco/web-analytics-starter-kit/blob/main/tinybird/pipes/analytics_sessions.pipe

- These datasources and pipes work together to define the main endpoints
we need for our stats dashboard
- They are based on the web analytics starter kit from tinybird
- We've updated them to handle site_uuid
- There's more to do to pipe the member-related and post-related data
through the system yet
2024-08-29 22:03:31 +01:00
Hannah Wolfe
79f4b523ac
Added analytics_events tinybird datasource
ref https://linear.app/tryghost/issue/ANAL-27/setup-tinybird-project-and-cicd
ref https://www.tinybird.co/docs/concepts/data-sources

- This is our main datasource, where we'll store events that come in as people browse around Ghost
- It's defined using tinybird's format, and then deployed out to tinybird using `tb deploy`
2024-08-29 17:20:16 +01:00
Hannah Wolfe
e41984d0a5
Initial tinybird setup
ref https://linear.app/tryghost/issue/ANAL-27/setup-tinybird-project-and-cicd

- Tinybird has a system for managing it's configuration as code, with full ci/cd support
- The tinybird CLI tool uses python, so we'll run that using docker, via `yarn tb`
- Some of the files tinybird adds should not be in source control, so we've added those to git ignore
- Everything in /ghost/tinybird is tinybird's init config
2024-08-29 16:56:51 +01:00