Commit Graph

2814 Commits

Author SHA1 Message Date
Marko Saric
219f1f5538
Add link to settings on the "Waiting for first pageview" screen (#4020)
* Update waiting_first_pageview.html.heex

* Improve text formatting

---------

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
2024-04-18 12:13:19 +02:00
Adrian Gruntkowski
9bae3ccce3
Improve UI/UX of imports view and GA import flow (#4017)
* Add runtime config option for enabled/disabling csv imports and exports

* Use the new option to toggle rendering exports UI

* Disable import buttons when at maximum imports or when option disabled for CSV

* Improve forms for GA import flow

* Add test for maximum imports reached

* Remove "Changed your mind?" prefixing back button

* Hide UA imports in Integrations when `imports_exports` flag is enabled

* Implement `csv_imports_exports` feature flag

* Revert "Add runtime config option for enabled/disabling csv imports and exports"

This reverts commit e30f202dd3.

* Send import notification email only to the user who ran the import

* Improve rendering of disabled button state

* Put import status heroicon in front of import label
2024-04-18 12:12:48 +02:00
RobertJoonas
3a371fdf4d
Test new imported metrics (GA4) (#4014)
* test fixture imported data with stats requests

* take visits metric from the events table in event:page breakdown

* Remove assert_referrers after all

pageReferrer is an event scoped property in GA4, which when queried
along with session-level dimensions will return unexpected data.

Adding the pageReferrer dimension to the GA4 Data API request, it will
cause the selected metric totals to increase significantly, even though
they shouldn't.

* Adjust sources and utm_mediums assertions

* adjust assert_pages

* Make formatter happy

---------

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
2024-04-18 12:12:24 +02:00
Adrian Gruntkowski
9849743407
Always sort occupied date ranges in Imported.clamp_dates/3 (#4018) 2024-04-18 11:15:51 +02:00
Adam Rutkowski
b373c36dcc Fix docs link for hostname shield 2024-04-17 07:01:16 +02:00
Adam Rutkowski
2b1fbf0a0e Fix typo (and kick docker build) 2024-04-16 20:57:29 +02:00
hq1
6fb56dc1cc
Stats api hostname filter (#4008)
* Update Stats API tests

* Revert "Remove hostname filter from the external API (#3991)"

This reverts commit 884daa7943.
2024-04-16 20:36:57 +02:00
hq1
f635f0a6d3
Hostnames shield (#3990)
* Add shield hostname rules migration

* Add hostname rule schema

* Initialize hostname rules cache

* Extend Shields context with hostname related functions

* Instrument ingestion pipeline with hostname rule lookups

* Limit hostname suggestions by shield patterns

* Add LiveView for hostname rules management

* Test hostname cache

* Rename feature flag - should be separate from hostname filter

* Remove :shield_pages feature flag

* Update CHANGELOG

* Format

* Update lib/plausible/shield/hostname_rule.ex

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>

* Move tests from `lib/` 🤦

* Use plain `assign` where no short-circuit is necessary

* Fine tune the copy a little bit

* Prevent misplaced tests

* Treat a test with common sense

* Fixup another test that hasn't been really run before

* Make the form hint dynamic depending on rules count

---------

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
2024-04-16 20:30:20 +02:00
RobertJoonas
fce909d041
Improve merge imported (#4003)
* Pass the actual date args that are returned in fixtures (just for clarity)

* Change select key from operating_system to os

* Select (not set) from imported data as done from native stats

* Support breakdown by imported os_version

* Remove dead code

The `query.include_imported` is set to `false` from the start when
filters are included.

* Refactor Plausible.Stats.Imported

Extract a function that does group_by and dimension select, instead of
doing both separately inside the merge_imported function body

* Further refactor of Plausible.Stats.Imported

Get rid of code repetition

* Support breakdown by imported browser_version

* Support breakdown by imported referrer

* Support breakdown by imported referrer

* remove redundant if in select

* use greatest instead of coalesce

Co-authored-by: ruslandoga <doga.ruslan@gmail.com>

* add back the :member filter handling in Stats.Imported

---------

Co-authored-by: Uku Taht <uku.taht@gmail.com>
Co-authored-by: ruslandoga <doga.ruslan@gmail.com>
2024-04-16 16:58:22 +01:00
Marko Saric
09612e21e7
Update search-terms.js (#4016) 2024-04-16 16:08:09 +02:00
Adrian Gruntkowski
c07f00636d
Stop importing page referrer from GA4 (#4012)
* Stop importing page referrer from GA4

* Update GA4 import fixture

* Update fixture-based test
2024-04-16 15:35:36 +02:00
ruslandoga
350d42fb95
remove no-op coalesce calls (#4015) 2024-04-16 10:11:13 +02:00
ruslandoga
d2fc89e734
use custom email template for csv imports (#4011) 2024-04-16 10:10:59 +02:00
hq1
cf61e47a0a
Add shield hostname rules migration (#3992) 2024-04-11 12:00:01 +02:00
Adrian Gruntkowski
c1c03b729c
Reapply "Local CSV exports/imports and S3/UI updates (#3989)" (#3995) (#3996)
* Reapply "Local CSV exports/imports and S3/UI updates (#3989)" (#3995)

This reverts commit aee69e44c8.

* remove unused functions

* eh, that one was actually used

* ugh, they were both used

---------

Co-authored-by: ruslandoga <67764432+ruslandoga@users.noreply.github.com>
2024-04-11 09:15:01 +02:00
RobertJoonas
5163880968
fix test (#4001)
* fix test

* Update test/plausible_web/controllers/api/stats_controller/suggestions_test.exs

Co-authored-by: hq1 <hq@mtod.org>

* format

---------

Co-authored-by: hq1 <hq@mtod.org>
2024-04-10 12:40:42 +01:00
hq1
378d3bc6f5
Discard sessions switching hostnames for UTM/referrer breakdowns (#4000)
* Discard sessions switching hostnames for UTM/referrer breakdowns

Co-authored-by: Uku Taht <uku.taht@gmail.com>

* Format

---------

Co-authored-by: Uku Taht <uku.taht@gmail.com>
2024-04-10 11:38:15 +02:00
hq1
39fdbb3a67
Move hostname filter appending to Breakdown module (#3998)
We currently update the query filtered by hostname to its
respective visit props in some cases.
This patch moves it down, from controllers, to the Breakdown module,
so any changes in logic will be also reflected in the Stats API.
2024-04-10 09:49:09 +02:00
Adrian Gruntkowski
aee69e44c8
Revert "Local CSV exports/imports and S3/UI updates (#3989)" (#3995)
This reverts commit 1a0cb52f95.
2024-04-09 21:26:23 +02:00
ruslandoga
1a0cb52f95
Local CSV exports/imports and S3/UI updates (#3989)
* local CSV exports/imports and S3 updates

* credo

* dialyzer

* refactor input columns

* fix ci minio/clickhouse tests

* Update lib/plausible_web/live/csv_export.ex

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>

* fix date range filter in export_pages_q and process only pageviews

* remove toTimeZone(zero_timestamp) note

* use SiteImport.pending(), SiteImport.importing()

* escape [SiteImport.pending(), SiteImport.importing()]

* use random s3 keys for imports to avoid collisions (sometimes makes the upload get stuck)

* clamp import date ranges

* site is already in assigns

* recompute cutoff date each time

* use toDate(timestamp[, timezone]) shortcut

* show alreats on export cancel/delete and extract hint into a component

* switch to Imported.clamp_dates/4

* reprocess tables when imports are added

* recompute cutoff_date on each call

* actually use clamped_date_range on submit

* add warning message

* add expiry rules to buckets in make minio

* add site_id to imports notifications and use it in csv_importer

* try/catch safer

* return :ok

* date range is not available when no uploads

* improve ui and warning messages

* use Generic.notice

* fix flaky exports test

* begin tests

* Improve `Importer` notification payload shape

---------

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
2024-04-09 20:59:48 +02:00
Adrian Gruntkowski
bb108450cb
Fix flaky google import API tests due to hardcoded import ID (#3994) 2024-04-09 19:55:26 +02:00
hq1
884daa7943
Remove hostname filter from the external API (#3991) 2024-04-09 18:03:06 +02:00
Karl-Aksel Puulmann
ceaf2e1f79
Fix experimental_reduced_joins (#3993)
The original diff had an important exclusionary branch commented out,
causing events table to be queried for breakdowns with the flag off
2024-04-09 18:45:03 +03:00
Karl-Aksel Puulmann
441412a164
Return 400 when using invalid filters for stats api (#3986)
Currently a 500 is returned instead and logged to sentry.
2024-04-09 15:36:17 +03:00
Adrian Gruntkowski
14b00c6ac3
Fix dry run mode in DataMigration.SiteImports (#3988) 2024-04-09 13:04:17 +02:00
Adrian Gruntkowski
1c1ea95e16
Ensure only complete imports are considered in site imports data migration (#3987)
* Ensure only complete imports are considered in site imports data migration

* Refactor `SiteImports` data migration for clarity (h/t @RobertJoonas)

* Fix tests
2024-04-09 11:49:28 +02:00
Adrian Gruntkowski
d796788715
Keep sites.imported_data in sync with backfilled SiteImport when migrating (#3979)
* Keep `sites.imported_data` in sync with backfilled `SiteImport` when migrating

* Consider only completed site imports in data migration
2024-04-09 09:04:51 +02:00
ruslandoga
94deb89b9d
remove Plausible Team footer from self-hosted emails (#3980)
* remove Plausible Team footer from self-hosted

* don't test unsubscribe placeholder in small build
2024-04-09 09:04:23 +02:00
Adrian Gruntkowski
b951065724
Refactor Imported.check_dates (->clamp_dates) for better felxibility (#3983) 2024-04-09 09:04:11 +02:00
Adrian Gruntkowski
d381c79d4b
Reapply "Include query string when logging the request (#3971)" (#3984) (#3985)
This reverts commit acbd2f8e30.
2024-04-09 08:49:45 +03:00
Adrian Gruntkowski
acbd2f8e30
Revert "Include query string when logging the request (#3971)" (#3984)
This reverts commit acbbaa9116.
2024-04-08 18:44:14 +02:00
Karl-Aksel Puulmann
a6d4786959
Worker to clean site data from ClickHouse (#3959)
* Create a worker to clean clickhouse deleted sites data

The plan is to run this weekly, but going to trigger it manually the first few times on cloud

* Make asserting count more reliable

* credo

* PR feedback

* Fixes
2024-04-08 12:26:38 +03:00
Karl-Aksel Puulmann
acbbaa9116
Include query string when logging the request (#3971)
* Update request logging

Ultimate goal is to be able to compare results with and without a flag against each other.
To do this we need logging which displays the full request url with parameters.

Example logs:
```
14:46:09.042 request_id=F8MRLSsaKB7BeIkAAAHk [info] (200) GET /api/sites took 17ms
14:46:09.175 request_id=F8MRLTKYV3G-GqEAAAZB [info] (200) GET /api/stats/dummy.site/current-visitors took 24ms
14:46:09.396 request_id=F8MRLUDfav28LIkAAAIE [info] (202) POST /api/event took 5ms
14:46:09.501 request_id=F8MRLUDS_YhftUkAAAAD [info] (200) GET /api/stats/dummy.site/sources?period=30d&date=2024-04-04&filters=%7B%7D&with_imported=true&limit=9 took 111ms
14:46:09.508 request_id=F8MRLUDhHbK8WKUAAAah [info] (200) GET /api/stats/dummy.site/main-graph?period=30d&date=2024-04-04&filters=%7B%7D&with_imported=true&metric=visitors took 117ms
14:46:09.511 request_id=F8MRLUDS1CYntK4AAAaB [info] (200) GET /api/stats/dummy.site/entry-pages?period=30d&date=2024-04-04&filters=%7B%7D&with_imported=true&limit=9 took 121ms
14:46:09.541 request_id=F8MRLTk5sIPYSn4AABoC [info] (200) GET /api/stats/dummy.site/top-stats?period=30d&date=2024-04-04&filters=%7B%7D&with_imported=true&comparison=previous_period&compare_from=undefined&compare_to=undefined&match_day_of_week=true took 278ms
```

* re-add plug

* router_dispatch -> endpoint
2024-04-08 09:29:11 +03:00
Adrian Gruntkowski
a7603c9e49
Improve import procedure to ensure no time range overlaps (#3970)
* Always scope import ID by site as well

* Do not schedule new import job if there are any site imports in progress

* Disable import buttons when any import is in progress

* Simplify `schedule_job/4` (h/t @RobertJoonas)
2024-04-04 18:56:36 +02:00
Adrian Gruntkowski
cffff0340c
Handle Google API timeouts gracefully during imports (#3975) 2024-04-04 18:55:39 +02:00
Adrian Gruntkowski
33eed9d7db
Delete imports which have no stats (#3972) 2024-04-04 18:55:14 +02:00
Marko Saric
e5b7f1afd0
Changing the copy of the locked screen (#3967)
changing the copy here as I think that in some situations the "we're still counting stats" message is now shown even to those dashboards where we've stopped counting stats so best to avoid that
2024-04-04 18:54:37 +02:00
Uku Taht
f966419a4a
Update ua_inspector (#3957)
Co-authored-by: hq1 <hq@mtod.org>
2024-04-04 17:20:57 +02:00
hq1
f9f0407d68
Remove experimtnal_hostname_filter and keep it on by default (#3973)
* Remove `experimental_hostname_filter` and keep it on by default

* Catch up with changes done via e5b56dbe6
2024-04-04 17:20:16 +02:00
RobertJoonas
e5b56dbe62
Refactor VisitorGraph (#3936)
* Give a more semantic name to a function

* Make the LineGraph component thinner

* Move LineGraph into a separate file

* Move interval logic into interval-picker.js

This commit also fixes a bug where the interval name displayed inside
the picker component flickers the default interval when the graph is
loading.

The problem was that we were counting on graphData for returning us the
current interval: `let currentInterval = graphData?.interval`

We should always know the default interval before making the main-graph
request. Sending graphData to IntervalPicker component does not make
sense anyway.

* extract data fetching functions out of VisitorGraph component

* Return graph_metric key from Top Stats API

This commit introduces no behavioral changes - only starts returning an
additional field, allowing us to avoid the following logic in React:

1. Finding the metric names, given a stat display name. E.g.
   `Unique visitors (last 30 min) -> visitors`

2. Checking if a metric is graphable or not

* Move metric state into localStorage

This commit gets rid of the internal `metric` state in the VisitorGraph
component and starts using localStorage for that instead.

This commit also chains the main-graph request into the top-stats request
callback - meaning that we'll always fetch new graph data after top stats
are updated. And we do it all in a single function.

Doing so simplifies the loading state significantly, and also helps to
make it clear, that at all times, existing top stats are required before
we can fetch the graph. That's because the metric is determined by which
Top stats are returned (for example, we can't be sure whether revenue
metrics will be returned or not).

* Make sure graph tooltip says "Converted Visitors"

* Extract a StatsExport function component

Again, instead of relying on `graphData?.interval` we can read it from
localStorage, or default to the largest interval available. The export
should not be dependant on the graph.

* Extract SamplingNotice function component

* Extract WithImportedSwitch function component

* Stop "lazy-loading" the graph and top stats

Since the container is always on top on the page, it will be visible on
the first render in any case - no matter the screen size.

* Turn VisitorGraph into a function component

* Display empty container until everything has loaded

* Do not display loading spinner on realtime ticks

* Turn Top Stats into a fn component

* fetch top stats and graph async

* Make sure revenue metrics can remain on the graph

* Add an extra check to canMetricBeGraphed

* fix typo

* remove redundant double negation
2024-04-04 13:39:55 +01:00
Karl-Aksel Puulmann
3115c6e7a8
Reducing JOINs in queries (#3966)
* Move experimental_session_count? logic to within query object

* WIP new querying system for deciding what tables to query

* both -> either

* Include sample_percent in both tables

* Remove a hanging TODO

* Allow filtering by visit props on event queries if flag is on

* Make default sessions join more conditional

* Simplify events_join_sessions?

* Add some TODOs

* Fix assignment

* Handle entry/exit page visit props separately from props stored in events table

* Update test which created sessions/events differently from everyone else

* Make query_events private

* Dont filter by session properties on events table if querying sessions and joining in events

* Handle visits, pageviews, events and visitors metrics from other table

* both -> either

* events, pageviews are strictly event metrics

* Add support for (plain) breakdowns deciding which table to use

* Run tests with experimental_reduced_joins as a separate job

Also refactor which tests are run with postgres:15 to reduce number of jobs

* moduledocs for TableDecider

* Fix matrix

* Custom build name

* Move TEST_EXPERIMENTAL_REDUCED_JOINS check

* Handle percentage separately from other metrics

* Remove debug code

* TableDecider tests

* both => sample_percent

* Improve naming

* Simplify code

* Breakdowns retain old behavior if getting metric visitors

* Unify behavior of entry/exit page hostnames with rest

* Fix test naming
2024-04-04 13:54:23 +03:00
hq1
6af80dd246
Filter by hostnames (#3963)
* CH Migration: exit/entry hostnames in sessions_v2

* Leave only exit_page_hostname, we already record hostnames

* Use ClickHouse DDL in favour of ecto so that cluster is included

* Compress with ZSTD(3)

* Expose Hostname filter in the dashboard dropdown

* Add `exit_page_hostname` to ClickHouse `sessions_v2` schema

* Start tracking hostname changes in sessions

* Implement hostname filter suggestions

* Enable filtering by `event:hostname`

* Add tests for filtering by hostnames

* Ensure filter suggestions work for exit pages too

* Allow overriding hostnames with `send_pageview` mix task

* Remove `:window_time_on_page` flag

It seems that we can remove it after all?

* Initialize `experimental_hostname_filter` query parameter

* Rewrite cache store behaviour with regards to session hostnames

* Work around inconsistent session merging

So that `populate_stats` can get closer to actual ingestion

* Improve top stats test

* Make it possible to filter sessions by entry/exit hostnames

* Update pages tests

* Expose `experimental_hostname_filtering` temporarily in the UI

* Untested yet: also apply experimental filtering to sources

* Introduce `hostname_filter` feature flag

* Format

* Test top sources with hostname filter + experimental flag
2024-04-04 10:48:30 +02:00
Adrian Gruntkowski
e6d83e946f
Populate new columns in imports and exports (#3969)
* Extend `Imported*` schemas with newly added columns

* Populate newly added `Imported*` fields in GA4 imports

* Extend exports with newly added fields

* Extend CSV importer to ingest new fields

* Fix alias shadowing error

* Add more extensive GA4 import fixtures

* Apply rounding and casting to sampled visits
2024-04-04 10:33:19 +02:00
RobertJoonas
bd73fc8266
CH Migration - add more imported metrics and properties (#3949)
* add migration

* add utm_source to imported_sources

* quickfix

* satisfy credo

* Revert "satisfy credo"

This reverts commit bb0b228164.

* Revert "quickfix"

This reverts commit ab6f70c79e.

---------

Co-authored-by: ruslandoga <67764432+ruslandoga@users.noreply.github.com>
2024-04-04 09:12:53 +01:00
ruslandoga
2ba4988c95
specify insert columns in import from s3 (#3968)
Co-authored-by: RobertJoonas <56999674+RobertJoonas@users.noreply.github.com>
2024-04-03 10:10:49 +01:00
hq1
1f778e0c11
CH Migration: exit page hostname on sessions_v2 (#3953)
* CH Migration: exit/entry hostnames in sessions_v2

* Leave only exit_page_hostname, we already record hostnames

* Use ClickHouse DDL in favour of ecto so that cluster is included

* Compress with ZSTD(3)
2024-04-03 09:42:47 +02:00
Adrian Gruntkowski
9f27fa303c
Fix dry run mode in DataMigration.SiteImports (#3965) 2024-04-02 14:05:34 +02:00
Adrian Gruntkowski
23a3699dd7
Improve import stats toggle and with_imported flag computation (#3960)
* Check import presence across all imports and not just the first one

Also, simplify imported data toggle rendering to not explicitly
refer to the earliest import source.

* Change imported stats toggle icon in dashboard

* Test `Imported.get_imports_date_range/1`

* Simplify failed UA/GA import email copy
2024-04-02 12:53:19 +02:00
Adrian Gruntkowski
71fe541359
Implement script for backfilling legacy site import entries and adjusting end dates of site imports (#3954)
* Always select and clear import ID 0 when referring to legacy imports

* Implement script for adding site import entries and adjusting end dates

* Log cases where end date computation is using fallback

* Don't log queries when running the migration to reduce noise
2024-04-02 12:53:02 +02:00
Adrian Gruntkowski
5bf59d1d8a
Implement adjusting imported date range to actual and existing stats (#3943)
* Implement adjusting imported date range to actual and existing stats

* Drop redundant prefix from import list entries

* Make pageview numbers in imports list formatted for readability

* Test and improve date range cropping

* DRY UA and GA4 stats start and end date API calls

* Extend UA/GA import controller tests and improve error handling

* refactor finding longest open range without existing data

* Fix typo in test description

Co-authored-by: RobertJoonas <56999674+RobertJoonas@users.noreply.github.com>

* Rename `open_ranges` to `free_ranges`

---------

Co-authored-by: Robert Joonas <robertjoonas16@gmail.com>
Co-authored-by: RobertJoonas <56999674+RobertJoonas@users.noreply.github.com>
2024-03-28 09:32:41 +01:00