analytics/lib/mix/tasks
Karl-Aksel Puulmann d620432227
Channels: Speed up clickhouse calculations (#4789)
* Fix interpolation in data_migration.ex

* Speed up calculating acquisition_channel in clickhouse

The previous `has` queries proved to be problematic and causing a lot of
CPU overhead.

Benchmarked via this query:

```sql
SELECT
  channel,
  count(),
  countIf(acquisition_channel(referrer_source, utm_medium, utm_campaign, utm_source, click_id_param) = channel) AS matches
FROM events_v2
WHERE timestamp > now() - toIntervalHour(48)
GROUP BY channel
ORDER BY count() desc
```

Before this fix:
```
query_duration_ms:                                                57960
DiskReadElapsedMs:                                                374.712
RealTimeMs:                                                       2891200.667
UserTimeMs:                                                       2704024.783
SystemTimeMs:                                                     1693.265
OSCPUWaitMs:                                                      90.253
OSCPUVirtualTimeMs:                                               2705709.58
```

After this fix:
```
query_duration_ms:                                                4367
DiskReadElapsedMs:                                                454.356
RealTimeMs:                                                       213892.207
UserTimeMs:                                                       199363.485
SystemTimeMs:                                                     1479.364
OSCPUWaitMs:                                                      13.739
OSCPUVirtualTimeMs:                                               200837.37
```

Note that the new tables are not tracked in our schema as usual as
they're pretty much temporary tables to create the dictionary without
needing to upload files to clickhouse servers.

* CREATE OR REPLACE table with SELECT
2024-11-11 10:39:51 +00:00
..
cancel_subscription.ex Refactor enterprise plan upgrade and change-plan actions (#3397) 2023-10-10 20:35:17 +03:00
clean_clickhouse.ex Channels: Speed up clickhouse calculations (#4789) 2024-11-11 10:39:51 +00:00
create_free_subscription.ex Implement basics of Teams (#4658) 2024-10-21 07:35:23 +00:00
download_country_database.ex Remove Timex.today (#4357) 2024-07-23 09:02:14 +02:00
generate_referrer_favicons.ex Upgrade Erlang/Elixir stack (#3454) 2023-10-24 10:33:48 +02:00
pull_sandbox_subscription.ex Implement basics of Teams (#4658) 2024-10-21 07:35:23 +00:00
send_pageview.ex Map lowercase tagged sources to capitalized form during ingestion (#4417) 2024-08-27 14:03:15 +03:00