Ghost/ghost/tinybird/pipes/analytics_sources.pipe
Hannah Wolfe 4b1ce62ca9
Fix origin attribution in Tinybird analytics hits (#21187)
closes 
https://linear.app/tryghost/issue/ANAL-96/data-discrepancy-between-charts-when-filtering

- Atttribute the referral of the first hits to the whole session in `analytics_sources_mv`, while keeping the raw hits in `analytics_hits`
- Updated tests accordingly
- This is a rebased / reordered version of https://github.com/TryGhost/Ghost/pull/21166

Co-authored-by: alejandromav <hi@alejandromav.com>
Co-authored-by: Alejandro Martin <alejandromav@tinybird.co>
2024-10-02 17:27:43 +01:00

32 lines
1.0 KiB
Plaintext

NODE analytics_sources_1
DESCRIPTION >
Aggregate by referral and calculate session and views
SQL >
WITH (SELECT domainWithoutWWW(href) FROM analytics_hits LIMIT 1) AS current_domain,
sessions AS (
SELECT
session_id, argMin(source, timestamp) AS source,
maxIf(member_status, member_status IN ('paid', 'free', 'undefined')) AS member_status
FROM analytics_hits
GROUP BY session_id
)
SELECT
a.site_uuid,
toDate(a.timestamp) AS date,
a.device,
a.browser,
a.location,
b.source AS source,
a.pathname,
b.member_status AS member_status,
uniqState(a.session_id) AS visits,
countState() AS pageviews
FROM analytics_hits as a
INNER JOIN sessions AS b ON a.session_id = b.session_id
GROUP BY a.site_uuid, toDate(a.timestamp), a.device, a.browser, a.location, b.member_status, b.source, a.pathname
HAVING b.source != current_domain
TYPE MATERIALIZED
DATASOURCE analytics_sources_mv