mirror of
https://github.com/plausible/analytics.git
synced 2024-11-22 10:43:38 +03:00
d620432227
* Fix interpolation in data_migration.ex * Speed up calculating acquisition_channel in clickhouse The previous `has` queries proved to be problematic and causing a lot of CPU overhead. Benchmarked via this query: ```sql SELECT channel, count(), countIf(acquisition_channel(referrer_source, utm_medium, utm_campaign, utm_source, click_id_param) = channel) AS matches FROM events_v2 WHERE timestamp > now() - toIntervalHour(48) GROUP BY channel ORDER BY count() desc ``` Before this fix: ``` query_duration_ms: 57960 DiskReadElapsedMs: 374.712 RealTimeMs: 2891200.667 UserTimeMs: 2704024.783 SystemTimeMs: 1693.265 OSCPUWaitMs: 90.253 OSCPUVirtualTimeMs: 2705709.58 ``` After this fix: ``` query_duration_ms: 4367 DiskReadElapsedMs: 454.356 RealTimeMs: 213892.207 UserTimeMs: 199363.485 SystemTimeMs: 1479.364 OSCPUWaitMs: 13.739 OSCPUVirtualTimeMs: 200837.37 ``` Note that the new tables are not tracked in our schema as usual as they're pretty much temporary tables to create the dictionary without needing to upload files to clickhouse servers. * CREATE OR REPLACE table with SELECT |
||
---|---|---|
.. | ||
data_migrations | ||
ingest_repo | ||
json-schemas | ||
ref_inspector | ||
repo | ||
static | ||
tracker/js | ||
ua_inspector | ||
verification | ||
custom_sources.json | ||
ga4-source-categories.csv | ||
legacy_plans.json | ||
paddle_sandbox.pem | ||
paddle.pem | ||
placeholder_favicon.ico | ||
plans_v1.json | ||
plans_v2.json | ||
plans_v3.json | ||
plans_v4.json | ||
referer_favicon_domains.json | ||
sandbox_plans.json |