Previous migration took forever on prod, likely because Map lookups are linear time in complexity.
`transform/3` helps achieve the same functionality with the help of a hash table and updated
WHERE clause allows skipping most rows which dont need updating
Co-authored-by: Uku Taht <Uku.taht@gmail.com>
* Set log_comment with request information
* CRMAuthPlug -> SuperAdminOnlyPlug
* Super basic debug view
* Handle clustered setups
* Changelog entry
* Cleanup
* fragment trick to use ecto querying, filtering
* Move clustered_table? function to IngestRepo module
* Format
* More resilient user_id getting in helper
* Add data migration for creating and syncing location_data table and dictionary
* Migration to populate location data
* Daily cron to refresh location dataset if changed
* Add support for visit:country_name, visit:region_name and visit:city_name dimensions
Under the hood this relies on a `location_data` table in clickhouse being regularly synced with
plausible/location repo and dictionary lookups used in ALIAS columns
* Update queue name
* Update documentation
* Explicit structs
* Improve docs further
* Migration comment
* Add queues
* Add error when already loaded
* Test for filtering by new dimensions
* Update deps
* dimension -> select_dimension
* Update a test
* CH Migration: exit/entry hostnames in sessions_v2
* Leave only exit_page_hostname, we already record hostnames
* Use ClickHouse DDL in favour of ecto so that cluster is included
* Compress with ZSTD(3)
This migration will noop in staging/production as it already has been run. It also leaves
behind a backup table that initially takes no extra space but will need to be cleaned up
manually
* Get rid of PASS_V2_SCHEMA_MIGRATION
* Use in-memory domain lookup + regular table settings
* Remove faulty date arithmetic + prev part calculation
* Set V2_MIGRATION_DONE in Mix.env == :dev
* Mute credo
This commit moves a migration that was created in `clickhouse_repo/` to
its correct folder, `ingest_repo/`. The migration was created in
1cb07efe6d prior the read/write
separation.
* Configure ingest repo access/pool size
If I'm not mistaken 3 is a sane default, the only
inserts we're doing are:
- session buffer dump
- events buffer dump
- GA import dump
And all are serializable within their scopes?
* Add IngestRepo
* Start IngestRepo
* Use IngestRepo for inserts
* Annotate ClickhouseRepo as read_only
So no insert* functions are expanded
* Update moduledoc
* rename alias
* Fix default env var value so it can be casted
* Use IngestRepo for migrations
* Set default ingest pool size from 3 to 5
in case conns are restarting or else...
* Ensure all Repo prometheus metrics are collected