Commit Graph

331 Commits

Author SHA1 Message Date
RobertJoonas
6822b29016
Average Scroll Depth Metric: put scroll depth on the dashboard under a feature flag (#4832)
* migration: add scroll_depth to events_v2

* (cherry-pick) ingest scroll depth

* replace convoluted test with more concise ones

* QueryParser: parse internal scroll_depth metric + validation

* turn QueryComparisonsTest into QueryInternalTest

* rename file

* (cherry pick) query scroll depth 15b14d3

...and move the tests into `internal_query_test.exs`

* review feedback

* Get rid of unnecessary separation between aggregate and group scroll depth
* Drop irrelevant other metrics in tests

* add test ensuring scroll depth unavailable in Stats API v1

* Put scroll depth on the dashboard

* Top Stats
* Main Graph
* Top Pages > Details

* feature flag for dashboard scroll depth access

* ignore credo warning

* enable scroll_depth flag in tests

* remove duplication

* write timestamps explicitly in a test

* revert moving tests around

* Add query_comparisons_test back
* Move scroll_depth tests into query_test
* Delete query_internal_test

* rename setup util (got updated on master)

* use pageleave_factory where applicable

* Use the correct generated query-api.d.ts

* npm format
2024-11-20 13:13:04 +00:00
RobertJoonas
e93c97de1e
migration: add scroll_depth to events_v2 (#4827) 2024-11-19 09:59:23 +00:00
Karl-Aksel Puulmann
9af498833e
Channels: backfill utm_medium based on click_param_id (#4833)
* Backfill utm_medium

Follow-up to https://github.com/plausible/analytics/pull/4817

* Update backfill
2024-11-19 08:12:39 +00:00
Uku Taht
0bbdbc9f42
Imported channel migration (#4815) 2024-11-14 17:12:18 +00:00
Uku Taht
daa42cbc9d
Update acquisition channel UDF to prioritize display over paid search (#4818)
* Update acquisition channel UDF to prioritize display over paid search

* Remove migration

Will run this manually together with a backfill, self-hosted will get this for free.

* Add test

---------

Co-authored-by: Karl-Aksel Puulmann <oxymaccy@gmail.com>
2024-11-14 16:01:34 +00:00
Uku Taht
cf4ba664ed
Tiny source data update (#4821)
* Merge teams.microsoft.com -> Microsoft Teams

* Display favicon for Linkedin
2024-11-14 13:28:45 +00:00
hq1
86b3bf4f24
Set guest_invitations.invitation_id not null (#4812)
Once https://github.com/plausible/analytics/pull/4811/files
is migrated.
2024-11-13 12:48:55 +00:00
hq1
7cf61c9590
Add invitation_id column to guest_invitations schema (#4811)
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
2024-11-12 14:27:13 +00:00
Adrian Gruntkowski
9004a02f30
Set NOT NULL on teams.allow_next_upgrade_override (#4807) 2024-11-12 10:04:30 +00:00
Adrian Gruntkowski
e31aeff721
Set default for teams.allow_next_upgrade_override schema column (#4799) 2024-11-12 09:05:16 +00:00
Karl-Aksel Puulmann
fc83040ec1
Channels: Run TRUNCATE with alter_sync=2 (#4804)
ON CLUSTER fails since it tries to create conflicting
dll entries on each node.

Error:
```Cannot execute replicated DDL query, maximum retries exceeded.
(UNFINISHED)```
2024-11-12 07:24:23 +00:00
Karl-Aksel Puulmann
4aa7dec301
Channels: Migration to add materialized column, backfill code (#4798)
* Channels: Migration to add column, backfill code

This change adds `acqusition_channel` columns to events_v2 and
sessions_v2 tables. These columns are materialized - we don't ingest
into them directly. Instead they're calculated based on other columns.

The data migration changes now allow to also backfill the column.

Tested the ability to change definitions by changing the function
definitions and re-running the migration with backfill. Confirmed that
the underlying data changed as expected.

* quiet option

* Exclude data migrations from validation

* Migration consistency
2024-11-12 06:41:34 +00:00
Karl-Aksel Puulmann
3759db9b8c
Channels: Fix ON CLUSTER behavior (#4801)
* Channels: Fix cluster behavior

CREATE TABLE AS SELECT syntax did not work on cluster.

Instead, let's do a normal insert. For safety and to avoid timing
issues, ensure that INSERT waits for data to be inserted on all active
replicas.

* Proper replicated tables
2024-11-11 19:59:16 +00:00
Artur Pata
b22b35793c
Saved segments/create table (#4797)
* Add migration for Saved Segments

* Remove premature optimisation

* Format

* Refactor to explicit segment type
2024-11-11 16:31:43 +00:00
Karl-Aksel Puulmann
d620432227
Channels: Speed up clickhouse calculations (#4789)
* Fix interpolation in data_migration.ex

* Speed up calculating acquisition_channel in clickhouse

The previous `has` queries proved to be problematic and causing a lot of
CPU overhead.

Benchmarked via this query:

```sql
SELECT
  channel,
  count(),
  countIf(acquisition_channel(referrer_source, utm_medium, utm_campaign, utm_source, click_id_param) = channel) AS matches
FROM events_v2
WHERE timestamp > now() - toIntervalHour(48)
GROUP BY channel
ORDER BY count() desc
```

Before this fix:
```
query_duration_ms:                                                57960
DiskReadElapsedMs:                                                374.712
RealTimeMs:                                                       2891200.667
UserTimeMs:                                                       2704024.783
SystemTimeMs:                                                     1693.265
OSCPUWaitMs:                                                      90.253
OSCPUVirtualTimeMs:                                               2705709.58
```

After this fix:
```
query_duration_ms:                                                4367
DiskReadElapsedMs:                                                454.356
RealTimeMs:                                                       213892.207
UserTimeMs:                                                       199363.485
SystemTimeMs:                                                     1479.364
OSCPUWaitMs:                                                      13.739
OSCPUVirtualTimeMs:                                               200837.37
```

Note that the new tables are not tracked in our schema as usual as
they're pretty much temporary tables to create the dictionary without
needing to upload files to clickhouse servers.

* CREATE OR REPLACE table with SELECT
2024-11-11 10:39:51 +00:00
Karl-Aksel Puulmann
dbf7a099a3
Acquisition channels: Functions to calculate channels in clickhouse (#4701)
* Expose a few data migration functions, add quiet option to do_run

* Create functions and test acquisition channel logic in clickhouse

Tests were lifted from test/plausible_web/controllers/api/external_controller_test.exs

* Clean up test code a bit

* Property test for acquisition channels

* Handle empty strings properly in reference implementation

* Fix spelling, minor issues

* Revert "Property test for acquisition channels"

This reverts commit 3fa0e0e4eb.

* Only test clickhouse functions

* Solve minor code issue

* update channels logic

* Revert "Only test clickhouse functions"

This reverts commit e12784031a.

* Add more tests

* Add small result assertion

* Make query options explicit in data migrations

* Move multi-query running logic to within datamigration lib

* Unbreak numeric ids migration

* Named params directly to Clickhouse

* Update reference test implementation

---------

Co-authored-by: Uku Taht <uku.taht@gmail.com>
2024-11-06 11:27:02 +00:00
Karl-Aksel Puulmann
4e10efe723
Channels: click_id_param column (#4703)
* Add migration for click_id_source

* click_id_param
2024-11-05 07:03:00 +00:00
Uku Taht
9d06d45e45
New remap sources migration that's case insensitive (#4771) 2024-11-04 09:47:01 +00:00
Uku Taht
a1b1b84963
Remap sources migration (#4751) 2024-10-31 08:07:50 +00:00
Uku Taht
c3a06caa97
Channel and source data updates (#4599)
* Channel and source data updates

* Update source mappings for migration

* Fix codespell

Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>

* Update lib/plausible/ingestion/acquisition.ex

Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>

* Standardize access to utm params

* Add wikipedia as "known" source

* Move custom sources to json file

* Add some advertising utm_sources

* Move source mapping logic to refinspector file

* Rename PlausibleWeb.RefInspector -> Plausible.Ingestion.Source

* Move mapping overrides to custom_sources.json

* More robust detection of paid sources

* Add missing utm_sources to migration

* Codespell

* Add moduledoc for Plausible.Ingestion.Source

* Fix dialyzer

* Remove migration

* Add more custom favicons

* Re-generate referrer favicons file

* Add doctest for sources

---------

Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>
2024-10-30 13:41:51 +00:00
Adrian Gruntkowski
1e38bd8771
Add fields and tables for teams (#4696)
* Add migration adding team related tables and fields

* Add `team_site_transfers` table to the teams migration

* Remove team_id FK from api_keys table

* Change new FK constraints on existing tables to `nilify_all` on delete

* Ensure unique indexes on invitation_id and transfer_id fields
2024-10-17 11:28:56 +00:00
hq1
67e35fa1d2
Migration: cascade delete enterprise plans on user removal (#4684) 2024-10-16 08:43:16 +00:00
Karl-Aksel Puulmann
141eea88ff
APIv2: Revenue metrics (#4659)
* WIP: Start refactoring revenue metrics

* Hacks to make things work

* Remove old revenue code, remove revenue metrics if needed

* Update query_optimizer docs

* Minor fixes

* Add tests around average/total revenue when non-revenue goal filtering going on

* Optimize, calculate filters as expected (OR-ing clauses)

* Revenue: Handle cases where revenue metrics should not be returned or nil

* Expose revenue metrics in internal schema, add tests

* Docstring

* Remove TODO

* Typegen

* Solve warnings

* Remove nesting

* ce_test fix

* Tag tests as ee_only

* Fix: When filtering by revenue goal and no conversions, return 0.0 instead of nil

* More straight-forward preloading logic
2024-10-09 10:18:48 +00:00
ruslandoga
5fec52ab36
Release v2.1.4 (#4660) 2024-10-09 07:45:17 +00:00
Karl-Aksel Puulmann
5ad743c8d3
APIv2: Comparisons for breakdowns, timeseries, time_on_page (#4647)
* Refactor comparisons to a new options format

Prerequisite for APIv2 comparison work

* Experiment with default include deduplication

* WIP

Oops, breaks `include.total_rows`

* WIP

* Refactor breakdown.ex

* Pagination fix: dont paginate split subqueries

* Timeseries tests pass

* Aggregate tests use QueryExecutor

* Simplify QueryExecutor

* Handle legacy time-on-page metric in query_executor.ex

No behavioral changes

* Remove keep_requested_metrics

* Clean up imports

* Refactor aggregate.ex to be more straight-forward in output format building

* top stats: compute comparison via apiv2

* Minor cleanups

* WIP: Pipelines

* WIP: refactor for code cleanliness

* QueryExecutor to QueryRunner

* Make compilable

* Comparisons for timeseries works

Except for comparisons where comparison window is bigger than source query window

* Add special case for timeseries

* JSON schema tests for comparisons

* Test comparisons with the new API

* comparison date range parsing improvement

* Make comparisons api internal-only

* typegen

* credo

* Different schemata

* get_comparison_query

* Add comment on timeseries result format

* comparisons typegen

* Percent change for revenue metrics fix

* Use defstruct for query_runner over map

* Remove preloading atoms
2024-10-08 10:13:04 +00:00
Adrian Gruntkowski
e11fd159df
Add notes column to users table (#4612) 2024-09-25 14:21:26 +00:00
ruslandoga
dca2eb5b81
Update Ecto dumps (#4481)
* update Ecto dumps

* rm tmp tables from dump
2024-09-23 12:50:08 +00:00
Artur Pata
82a15884ad
Automatically generate Typescript types for v2 API query schema (#4574)
* Generate types from query schema

* Flip the query schema so private is static

* Ensure private schema stays private

* Refactor comment, json schema utils
2024-09-18 11:01:20 +00:00
Uku Taht
7a77ebf9bf
Add feature-flagged channels UI (#4585)
* Add feature-flagged channels UI

* Implement channels modal

* Channel -> Channels tab
2024-09-18 08:34:12 +00:00
Karl-Aksel Puulmann
ef57502854
APIv2: Implement pagination and include.total_rows (#4575)
Offset-based pagination is used to make sure Looker integration
is able to work as efficiently as possible. To know how many
requests users need to do `include.total_rows` option was added.
2024-09-12 15:51:18 +03:00
Karl-Aksel Puulmann
bd11b4cf67
APIv2: Standard iso8601 timestamps, operate on UTC (#4563)
* query.date_range is now in UTC instead of user timezone

This simplifies things down the line and fixes several bugs where
query.date_range is cast to naivedatetime for ecto purposes

Many places still remain broken:
- comparison queries
- `to_date_range` calls

* Make default_for_date_range not care about time zones

* Make timezone parameter mandatory for to_date_range

* Simplify utc_date_range, update legacy query builder

* Fix more cases where query date range is needed

* query.date_range -> query.utc_time_range

* Query.date_range/1 function

* ensure_include_imported update

* Clean up send_email_report
2024-09-11 09:21:59 +03:00
Artur Pata
52b94842c0
Assert filters are tuples, simplify schema (#4541) 2024-09-10 18:01:42 +03:00
Karl-Aksel Puulmann
e8d544c841
Remove does_not_contain support (#4564)
It only needed to be live until users have reload. This has been live
for >24h.
2024-09-10 15:38:04 +03:00
Karl-Aksel Puulmann
604dde99fd
APIv2: Regex operations, consistent operators (#4488)
* Rename matches/does_not_match filters internally

These have never been exposed to the frontend/user directly, only via
APIv1 filtering syntax. As such we are free to rename these without
breaking things

* Rename function arguments for consistency, simplify

* Add support for `match`/`not_match` operators for query apiv2

These match the string against a regular expression, as defined in
https://github.com/google/re2/wiki/Syntax

* not_match -> match_not

* does_not_contain -> contains_not

Note that for backwards compatibility:
- Browser handles does_not_contain in URL
- Backend will handle does_not_contain in queries for a day where we will remove it for better autocompletion

* not_matches_wildcard -> matches_wildcard_not

* prettier

* match -> matches

* Fix and test fix for matches_wildcard against prop when prop is missing

* Custom properties support for matches/matches_not

* Restore contains_not

* Test contains and contains_not behavior for custom properties
2024-09-09 10:05:24 +03:00
Uku Taht
d56d6998df
Acquisition channel (#4489)
* WIP

* Add acquisition channel

* Add detection for gclid and msclkid

* Add GA4 source categories file as external resource
2024-09-05 12:02:15 +03:00
Uku Taht
90b81b615f
Add migration for acquisition channel (#4531) 2024-09-05 11:51:16 +03:00
Karl-Aksel Puulmann
8fa3a83129
APIv2: and/or/not support (#4480)
* First approximation of AND/OR/NOT support

Broken by this:
- Goal filtering
- Table deciding
- Imports

* TableDecider handle nesting

* Query.remove_top_level_filters

* Plausible.Stats.Imported.SQL.Expression

* Handle AND/OR/NOT with imported data, create Plausible.Stats.Imported.SQL.WhereBuilder

* Add parser validations for event:goal, event:hostname and event:props:x filters top level constraints

* Move module around

* Query.get_filter -> Filters.filtering_on_dimension? in some callsites

* Filters.get_toplevel_filter

* TableDecider.sessions_join_events?, remove old method

* Transforming filters in query_optimizer

* Query API tests for and/or/not

* Reorder parser steps

* Post-merge test fixups

* Solve merge issue

* Simplify filtering_on_dimension?

* Update transformer code

* dimensions_used_in_filters min_depth option, simplify parser validations

* rename_dimensions_used_in_filter

* fix rename_dimensions_used_in_filter

* Rename a test
2024-09-04 15:44:03 +03:00
Karl-Aksel Puulmann
3310006337
Update 20240801091615_capitalize_known_sources.exs migration (#4525)
Previous migration took forever on prod, likely because Map lookups are linear time in complexity.
`transform/3` helps achieve the same functionality with the help of a hash table and updated
WHERE clause allows skipping most rows which dont need updating

Co-authored-by: Uku Taht <Uku.taht@gmail.com>
2024-09-04 13:57:28 +03:00
Adrian Gruntkowski
533bf90329
Create user_sessions table (#4511) 2024-09-03 10:02:43 +02:00
Uku Taht
77248c8800
Add data migration for capitalizing sources (#4418) 2024-09-02 13:59:58 +03:00
RobertJoonas
f04c47f881
Support realtime periods in API v2 (#4469)
* add realtime date_ranges into the private API schema

This commit starts parsing date ranges into a new NaiveDateTimeRange
struct, rather than a simple Date.Range.

* transform realtime labels into negative integers + test

* move schema type argument to last position in helper functions

* allow passing a date param + tests

* Update test/plausible/stats/query_parser_test.exs

Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>

* Update test/plausible/stats/query_parser_test.exs

Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>

* Update test/plausible/stats/query_parser_test.exs

Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>

* Update test/plausible/stats/query_parser_test.exs

Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>

* keep test file structure consistent

* Turn NaiveDateTimeRange into DateTimeRange

* change 'now' field from NaiveDateTime to DateTime in v2 query

* fix minute interval labels + add missing tests

* return query_result.date_range as iso8601 timestamps with timezone

* allow timestamps with tz as date_range arguments in API v2

* delete Plausible.Timezones.to_utc_datetime

* simplify returning comparison periods

* add comment about realtime not supported in comparisons

* pass only now instead of test_opts

* drop redundant else branch

* separate tests

* stick to a single check_date_range function in tests

* fix credo error

---------

Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>
2024-09-02 12:56:58 +03:00
hq1
05136398cf
Migration: add installation meta (#4486) 2024-08-29 11:05:36 +02:00
Karl-Aksel Puulmann
9c71161eab
APIv2: JSON schema validation, separate internal and public API validation (#4464)
* Restore `date` internal parameter, validate via json schema

* Improved error formatting from json schema, get most tests passing

* Handle internal overrides to JSON schema

* Parsing tests all pass

* Remove some repeated code, enforce length/uniqueness in schema

* Explicit separation between internal and public API validation

* Mark file as external_resource

* map_join

* Update query tests

* Update query tests

* Serve schema under an /api/docs/query/schema.json endpoint

* dotify errors
2024-08-26 14:01:27 +03:00
ruslandoga
b64038af48
Fix alias migration warning (#4465) 2024-08-26 11:16:13 +02:00
Karl-Aksel Puulmann
11acadfde9
APIv2: docs-related changes (#4453)
* Order QueryResult in API response

This improves experience in docs when querying interactively

* More utm in seeds

* More improved seeds

* Proper QueryResult.query structure

* Allow docs to query /api/v2/query and sites

The new endpoints use cookie authentication. The docs site uses
these endpoints to provide an interactive docs editor.

* query_result ordering test

* Refresh router

* Test module name
2024-08-22 10:44:41 +03:00
Karl-Aksel Puulmann
4967960278
Populate log_comment with debug information, /debug/clickhouse route (#4435)
* Set log_comment with request information

* CRMAuthPlug -> SuperAdminOnlyPlug

* Super basic debug view

* Handle clustered setups

* Changelog entry

* Cleanup

* fragment trick to use ecto querying, filtering

* Move clustered_table? function to IngestRepo module

* Format

* More resilient user_id getting in helper
2024-08-14 12:33:36 +03:00
Karl-Aksel Puulmann
b88074bf1b
Fix migration typo (#4437) 2024-08-13 12:07:18 +03:00
Karl-Aksel Puulmann
ee3d1e770e
APIv2: visit:country_name, visit:region_name, visit:city_name dimensions (#4328)
* Add data migration for creating and syncing location_data table and dictionary

* Migration to populate location data

* Daily cron to refresh location dataset if changed

* Add support for visit:country_name, visit:region_name and visit:city_name dimensions

Under the hood this relies on a `location_data` table in clickhouse being regularly synced with
plausible/location repo and dictionary lookups used in ALIAS columns

* Update queue name

* Update documentation

* Explicit structs

* Improve docs further

* Migration comment

* Add queues

* Add error when already loaded

* Test for filtering by new dimensions

* Update deps

* dimension -> select_dimension

* Update a test
2024-08-13 09:44:58 +03:00
hq1
7fb2bfbd29
Migration: turn google auth tokens into text column type (#4428) 2024-08-09 12:17:26 +02:00
hq1
cc769dfb3d
Edit goals with display names (#4415)
* Update Goal schema

* Equip ComboBox with the ability of JS selection callbacks

* Update factory so display_name is always present

* Extend Goals context interface

* Update seeds

Also farming unsuspecting BEAM programmers for better
sample page paths :)

* Update ComboBox test

* Unify error message color class with helpers seen elsewhere

* Use goal.display_name where applicable

* Implement LiveView extensions for editing goals

* Sprinkle display name in external stats controller tests

* Format

* Fix goal list mobile view

* Update lib/plausible_web/live/goal_settings/list.ex

Co-authored-by: Artur Pata <artur.pata@gmail.com>

* Update lib/plausible_web/live/goal_settings/form.ex

Co-authored-by: Artur Pata <artur.pata@gmail.com>

* Update the APIs: plugins and external

* Update test so the intent is clearer

* Format

* Update CHANGELOG

* Simplify form tabs tests

* Revert "Format"

This reverts commit c1647b5307.

* Fixup format commit that went too far

* ComboBox: select the input contents on first focus

* Update lib/plausible/goal/schema.ex

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>

* Update lib/plausible/goals/goals.ex

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>

* Update lib/plausible_web/live/goal_settings/form.ex

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>

* Pass form goal instead of just ID

* Make tab component dumber

* Extract separate render functions for edit and create forms

* Update test to account for extracted forms

* Inline goal get query

* Extract revenue goal settings to a component and avoid computing assigns in flight

* Make LV modal preload optional

* Disable preload for goal settings form modal

* Get rid of phash component ID hack

* For another render after render_submit when testing goal updates

* Fix LV preload option

* Enable preload back for goals modal for now

* Make formatter happy

* Implement support for preopening of LV modal

* Preopen goals modal to avoid feedback gap on loading edited goal

* Remove `console.log` call from modal JS

* Clean up display name input IDs

* Make revenue settings functional on first edit again

* Display names: 2nd stage migration

* Update migration with data backfill

---------

Co-authored-by: Artur Pata <artur.pata@gmail.com>
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
2024-08-09 11:12:00 +02:00