Ghost/ghost/tinybird/pipes/analytics_hits.pipe
Hannah Wolfe 5ebdbe4e25
Renamed referrer without www to source in tinybird (#21094)
ref
https://linear.app/tryghost/issue/ANAL-60/click-through-filtering-for-sources

- In our stats page we use the referrer without a protocol or www, that
is the pure domain as our source that we output
- Meanwhile all the data pipelines had the full url as the referrer
passed through
- When we come to add clickthroughs/filtering, we'll need to use this
value to filter the data. If we have a different value locally in the UI
to what is in the DB, we won't be able to make the filters match
- Also, we pay for everything we store, and this removes all the
https:// and www. data
2024-09-24 14:16:50 +01:00

68 lines
2.0 KiB
Plaintext

DESCRIPTION >
Parsed `page_hit` events, implementing `browser` and `device` detection logic.
TOKEN "dashboard" READ
TOKEN "stats page" READ
NODE parsed_hits
DESCRIPTION >
Parse raw page_hit events
SQL >
SELECT
timestamp,
action,
version,
coalesce(session_id, '0') as session_id,
JSONExtractString(payload, 'locale') as locale,
JSONExtractString(payload, 'location') as location,
JSONExtractString(payload, 'referrer') as referrer,
JSONExtractString(payload, 'pathname') as pathname,
JSONExtractString(payload, 'href') as href,
JSONExtractString(payload, 'site_uuid') as site_uuid,
JSONExtractString(payload, 'member_uuid') as member_uuid,
JSONExtractString(payload, 'member_status') as member_status,
JSONExtractString(payload, 'post_uuid') as post_uuid,
lower(JSONExtractString(payload, 'user-agent')) as user_agent
FROM analytics_events
where action = 'page_hit'
NODE endpoint
SQL >
SELECT
site_uuid,
timestamp,
action,
version,
session_id,
member_uuid,
member_status,
post_uuid,
location,
domainWithoutWWW(referrer) as source,
pathname,
href,
case
when match(user_agent, 'wget|ahrefsbot|curl|urllib|bitdiscovery|\+https://|googlebot')
then 'bot'
when match(user_agent, 'android')
then 'mobile-android'
when match(user_agent, 'ipad|iphone|ipod')
then 'mobile-ios'
else 'desktop'
END as device,
case
when match(user_agent, 'firefox')
then 'firefox'
when match(user_agent, 'chrome|crios')
then 'chrome'
when match(user_agent, 'opera')
then 'opera'
when match(user_agent, 'msie|trident')
then 'ie'
when match(user_agent, 'iphone|ipad|safari')
then 'safari'
else 'Unknown'
END as browser
FROM parsed_hits