ON CLUSTER fails since it tries to create conflicting
dll entries on each node.
Error:
```Cannot execute replicated DDL query, maximum retries exceeded.
(UNFINISHED)```
* Channels: Migration to add column, backfill code
This change adds `acqusition_channel` columns to events_v2 and
sessions_v2 tables. These columns are materialized - we don't ingest
into them directly. Instead they're calculated based on other columns.
The data migration changes now allow to also backfill the column.
Tested the ability to change definitions by changing the function
definitions and re-running the migration with backfill. Confirmed that
the underlying data changed as expected.
* quiet option
* Exclude data migrations from validation
* Migration consistency
* Channels: Fix cluster behavior
CREATE TABLE AS SELECT syntax did not work on cluster.
Instead, let's do a normal insert. For safety and to avoid timing
issues, ensure that INSERT waits for data to be inserted on all active
replicas.
* Proper replicated tables
* Fix interpolation in data_migration.ex
* Speed up calculating acquisition_channel in clickhouse
The previous `has` queries proved to be problematic and causing a lot of
CPU overhead.
Benchmarked via this query:
```sql
SELECT
channel,
count(),
countIf(acquisition_channel(referrer_source, utm_medium, utm_campaign, utm_source, click_id_param) = channel) AS matches
FROM events_v2
WHERE timestamp > now() - toIntervalHour(48)
GROUP BY channel
ORDER BY count() desc
```
Before this fix:
```
query_duration_ms: 57960
DiskReadElapsedMs: 374.712
RealTimeMs: 2891200.667
UserTimeMs: 2704024.783
SystemTimeMs: 1693.265
OSCPUWaitMs: 90.253
OSCPUVirtualTimeMs: 2705709.58
```
After this fix:
```
query_duration_ms: 4367
DiskReadElapsedMs: 454.356
RealTimeMs: 213892.207
UserTimeMs: 199363.485
SystemTimeMs: 1479.364
OSCPUWaitMs: 13.739
OSCPUVirtualTimeMs: 200837.37
```
Note that the new tables are not tracked in our schema as usual as
they're pretty much temporary tables to create the dictionary without
needing to upload files to clickhouse servers.
* CREATE OR REPLACE table with SELECT
* Expose a few data migration functions, add quiet option to do_run
* Create functions and test acquisition channel logic in clickhouse
Tests were lifted from test/plausible_web/controllers/api/external_controller_test.exs
* Clean up test code a bit
* Property test for acquisition channels
* Handle empty strings properly in reference implementation
* Fix spelling, minor issues
* Revert "Property test for acquisition channels"
This reverts commit 3fa0e0e4eb.
* Only test clickhouse functions
* Solve minor code issue
* update channels logic
* Revert "Only test clickhouse functions"
This reverts commit e12784031a.
* Add more tests
* Add small result assertion
* Make query options explicit in data migrations
* Move multi-query running logic to within datamigration lib
* Unbreak numeric ids migration
* Named params directly to Clickhouse
* Update reference test implementation
---------
Co-authored-by: Uku Taht <uku.taht@gmail.com>
* Add migration adding team related tables and fields
* Add `team_site_transfers` table to the teams migration
* Remove team_id FK from api_keys table
* Change new FK constraints on existing tables to `nilify_all` on delete
* Ensure unique indexes on invitation_id and transfer_id fields
* WIP: Start refactoring revenue metrics
* Hacks to make things work
* Remove old revenue code, remove revenue metrics if needed
* Update query_optimizer docs
* Minor fixes
* Add tests around average/total revenue when non-revenue goal filtering going on
* Optimize, calculate filters as expected (OR-ing clauses)
* Revenue: Handle cases where revenue metrics should not be returned or nil
* Expose revenue metrics in internal schema, add tests
* Docstring
* Remove TODO
* Typegen
* Solve warnings
* Remove nesting
* ce_test fix
* Tag tests as ee_only
* Fix: When filtering by revenue goal and no conversions, return 0.0 instead of nil
* More straight-forward preloading logic
* Refactor comparisons to a new options format
Prerequisite for APIv2 comparison work
* Experiment with default include deduplication
* WIP
Oops, breaks `include.total_rows`
* WIP
* Refactor breakdown.ex
* Pagination fix: dont paginate split subqueries
* Timeseries tests pass
* Aggregate tests use QueryExecutor
* Simplify QueryExecutor
* Handle legacy time-on-page metric in query_executor.ex
No behavioral changes
* Remove keep_requested_metrics
* Clean up imports
* Refactor aggregate.ex to be more straight-forward in output format building
* top stats: compute comparison via apiv2
* Minor cleanups
* WIP: Pipelines
* WIP: refactor for code cleanliness
* QueryExecutor to QueryRunner
* Make compilable
* Comparisons for timeseries works
Except for comparisons where comparison window is bigger than source query window
* Add special case for timeseries
* JSON schema tests for comparisons
* Test comparisons with the new API
* comparison date range parsing improvement
* Make comparisons api internal-only
* typegen
* credo
* Different schemata
* get_comparison_query
* Add comment on timeseries result format
* comparisons typegen
* Percent change for revenue metrics fix
* Use defstruct for query_runner over map
* Remove preloading atoms
Offset-based pagination is used to make sure Looker integration
is able to work as efficiently as possible. To know how many
requests users need to do `include.total_rows` option was added.
* query.date_range is now in UTC instead of user timezone
This simplifies things down the line and fixes several bugs where
query.date_range is cast to naivedatetime for ecto purposes
Many places still remain broken:
- comparison queries
- `to_date_range` calls
* Make default_for_date_range not care about time zones
* Make timezone parameter mandatory for to_date_range
* Simplify utc_date_range, update legacy query builder
* Fix more cases where query date range is needed
* query.date_range -> query.utc_time_range
* Query.date_range/1 function
* ensure_include_imported update
* Clean up send_email_report
* Rename matches/does_not_match filters internally
These have never been exposed to the frontend/user directly, only via
APIv1 filtering syntax. As such we are free to rename these without
breaking things
* Rename function arguments for consistency, simplify
* Add support for `match`/`not_match` operators for query apiv2
These match the string against a regular expression, as defined in
https://github.com/google/re2/wiki/Syntax
* not_match -> match_not
* does_not_contain -> contains_not
Note that for backwards compatibility:
- Browser handles does_not_contain in URL
- Backend will handle does_not_contain in queries for a day where we will remove it for better autocompletion
* not_matches_wildcard -> matches_wildcard_not
* prettier
* match -> matches
* Fix and test fix for matches_wildcard against prop when prop is missing
* Custom properties support for matches/matches_not
* Restore contains_not
* Test contains and contains_not behavior for custom properties
Previous migration took forever on prod, likely because Map lookups are linear time in complexity.
`transform/3` helps achieve the same functionality with the help of a hash table and updated
WHERE clause allows skipping most rows which dont need updating
Co-authored-by: Uku Taht <Uku.taht@gmail.com>
* add realtime date_ranges into the private API schema
This commit starts parsing date ranges into a new NaiveDateTimeRange
struct, rather than a simple Date.Range.
* transform realtime labels into negative integers + test
* move schema type argument to last position in helper functions
* allow passing a date param + tests
* Update test/plausible/stats/query_parser_test.exs
Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>
* Update test/plausible/stats/query_parser_test.exs
Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>
* Update test/plausible/stats/query_parser_test.exs
Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>
* Update test/plausible/stats/query_parser_test.exs
Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>
* keep test file structure consistent
* Turn NaiveDateTimeRange into DateTimeRange
* change 'now' field from NaiveDateTime to DateTime in v2 query
* fix minute interval labels + add missing tests
* return query_result.date_range as iso8601 timestamps with timezone
* allow timestamps with tz as date_range arguments in API v2
* delete Plausible.Timezones.to_utc_datetime
* simplify returning comparison periods
* add comment about realtime not supported in comparisons
* pass only now instead of test_opts
* drop redundant else branch
* separate tests
* stick to a single check_date_range function in tests
* fix credo error
---------
Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>
* Restore `date` internal parameter, validate via json schema
* Improved error formatting from json schema, get most tests passing
* Handle internal overrides to JSON schema
* Parsing tests all pass
* Remove some repeated code, enforce length/uniqueness in schema
* Explicit separation between internal and public API validation
* Mark file as external_resource
* map_join
* Update query tests
* Update query tests
* Serve schema under an /api/docs/query/schema.json endpoint
* dotify errors
* Order QueryResult in API response
This improves experience in docs when querying interactively
* More utm in seeds
* More improved seeds
* Proper QueryResult.query structure
* Allow docs to query /api/v2/query and sites
The new endpoints use cookie authentication. The docs site uses
these endpoints to provide an interactive docs editor.
* query_result ordering test
* Refresh router
* Test module name
* Set log_comment with request information
* CRMAuthPlug -> SuperAdminOnlyPlug
* Super basic debug view
* Handle clustered setups
* Changelog entry
* Cleanup
* fragment trick to use ecto querying, filtering
* Move clustered_table? function to IngestRepo module
* Format
* More resilient user_id getting in helper
* Add data migration for creating and syncing location_data table and dictionary
* Migration to populate location data
* Daily cron to refresh location dataset if changed
* Add support for visit:country_name, visit:region_name and visit:city_name dimensions
Under the hood this relies on a `location_data` table in clickhouse being regularly synced with
plausible/location repo and dictionary lookups used in ALIAS columns
* Update queue name
* Update documentation
* Explicit structs
* Improve docs further
* Migration comment
* Add queues
* Add error when already loaded
* Test for filtering by new dimensions
* Update deps
* dimension -> select_dimension
* Update a test
* Update Goal schema
* Equip ComboBox with the ability of JS selection callbacks
* Update factory so display_name is always present
* Extend Goals context interface
* Update seeds
Also farming unsuspecting BEAM programmers for better
sample page paths :)
* Update ComboBox test
* Unify error message color class with helpers seen elsewhere
* Use goal.display_name where applicable
* Implement LiveView extensions for editing goals
* Sprinkle display name in external stats controller tests
* Format
* Fix goal list mobile view
* Update lib/plausible_web/live/goal_settings/list.ex
Co-authored-by: Artur Pata <artur.pata@gmail.com>
* Update lib/plausible_web/live/goal_settings/form.ex
Co-authored-by: Artur Pata <artur.pata@gmail.com>
* Update the APIs: plugins and external
* Update test so the intent is clearer
* Format
* Update CHANGELOG
* Simplify form tabs tests
* Revert "Format"
This reverts commit c1647b5307.
* Fixup format commit that went too far
* ComboBox: select the input contents on first focus
* Update lib/plausible/goal/schema.ex
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
* Update lib/plausible/goals/goals.ex
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
* Update lib/plausible_web/live/goal_settings/form.ex
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
* Pass form goal instead of just ID
* Make tab component dumber
* Extract separate render functions for edit and create forms
* Update test to account for extracted forms
* Inline goal get query
* Extract revenue goal settings to a component and avoid computing assigns in flight
* Make LV modal preload optional
* Disable preload for goal settings form modal
* Get rid of phash component ID hack
* For another render after render_submit when testing goal updates
* Fix LV preload option
* Enable preload back for goals modal for now
* Make formatter happy
* Implement support for preopening of LV modal
* Preopen goals modal to avoid feedback gap on loading edited goal
* Remove `console.log` call from modal JS
* Clean up display name input IDs
* Make revenue settings functional on first edit again
* Display names: 2nd stage migration
* Update migration with data backfill
---------
Co-authored-by: Artur Pata <artur.pata@gmail.com>
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>