Commit Graph

962 Commits

Author SHA1 Message Date
RobertJoonas
1d3b068233
Implement filtering for imported data (#4118)
* move imported.ex to imported subfolder

* move constructing base imported query into a separate module

* Implement imported table deciding and filtering

+ tests for pages, entry_pages, exit_pages and common filter types

* add top stats test with country filter

* add timeseries test

* Drop bounce_rate and time_on_page from imported & page-filtered Top Stats

* rename field returned by top stats

* turn pages into a fn comp

* Move dashboard API results under a results key

...and also return the skip_imported_reason to the frontend to be used
for displaying warnings.

* extend ListReport component with an optional afterFetchData prop

* turn Devices into a fn comp

* add not_requested as a skip_imported_reason

* display warning icons in the dashboard

* Implement filtering suggestions and translate filter fields for imported

* WIP

* Improve and cover filtering suggestions with tests

* Rename imported suggestions query helpers

* fix screen size breakdown with screen size filter

* support filtering by the same suggestion property

* support location filters when fetching location suggestions

* support filtering by multiple props from the same table

* Implement filtering by goals

* Make views per visit metric work for import entry and exit pages

* Get rid of circular dependencies between Stats.Imported and Stats.Imported.Base

* Clean up Query struct manipulation in Breakdown

* Rename helper function for clarity

* Automatically refresh query struct state after modifications

* Shutup credo

* display imported warning bubble in prop breakdown section

* Render warning bubble for funnels whenever imported data is in the view

* Transform any operator on respective goal filters

* Fix percentage and conversion_rate calculation in presence of custom props

* add tests for for combining page and pageview goal filters

* add skip_refresh option to query tweaking functions

* add imported CR support for timeseries

* still show url breakdown when special goal + url in filter

* rename Query.refresh

* use flat_map instead of map and concat

* fix darkmode color

* Handle invalid imported region codes in suggestions gracefully

* Add an entry to CHANGELOG.md

---------

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
2024-06-03 12:29:08 +02:00
ruslandoga
8588776df2
Add newline to csv importer insert sql (#4172)
* add newline to csv importer insert sql

* add test

* changelog

* fix test typo

* fix test typo x2
2024-06-03 09:48:16 +02:00
hq1
bb7b0d9f94
Tidy up verification code a little (#4156) 2024-05-29 08:51:10 +02:00
hq1
9d77e970ef
Redo site verification UI (#4153)
* Redo site verification UI

* format
2024-05-28 13:43:14 +02:00
hq1
d9cf037866
Handle no snippet WordPress case (#4152) 2024-05-28 09:37:01 +02:00
hq1
78c5762761
Verification: prioritize domain mismatch over multiple snippets (#4146) 2024-05-27 10:52:22 +02:00
hq1
0381bc6121
Use codespell (#4145)
* Fix typos

* Add .codespellignore

* Add codespell GH action
2024-05-27 10:52:01 +02:00
hq1
206a6543f6
First pass: extract verification errors (#4138)
* First pass: extract verification errors

We like to make them easier to maintain.
An update fixing documentation URLs will follow.

* Update moduledoc
2024-05-24 16:23:25 +02:00
hq1
a7b69719c1
More fixes to the verification process (#4136)
* More fixes to the verification process:
 - extend the list of known attributes
 - follow redirect chain host to verify data-domain difference
 - display the actual data-domain in error message, not URL

* format

* fix test

* remove redundant pattern matching

* Update lib/plausible/verification/checks/snippet.ex

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>

---------

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
2024-05-24 12:45:31 +02:00
hq1
ee61094023
Verification improvements (#4132)
* Simplify browserless script and catch errors

* Improve fetching body:
    - follow up on max. 4 redirects
    - rely on Req default timeouts (wait much longer to account for slow
      sites)

* Improve installation check:
  - rely on Req default timeouts, wait longer
  - log service errors as warnings
  - use stealth mode to bypass captchas

* Stop cutting off body too large

* Improve diagnostics for known failures

* Another round of diagnostic improvements

* Format

* Add a test for callback status 500
2024-05-24 10:40:59 +02:00
hq1
c81cb16933
Snippet integration verification (#4106)
* Allow running browserless.io locally

* Compile tailwind classes based on extra/ too

* Add browserless runtime configuration

* Ignore verification events on ingestion

* Improve extracting HTML text in tests

* Update dependencies

- Floki will be used on production to parse site contents
- Req will be used to handle redundant stuff like retrying etc.

* Add shuttle SVG to generic components

Later on we'll use it to indicate verification errors

* Connect live socket & allow skipping awaiting the first pageview

* Connect live socket in general settings

* Implement verification checks & diagnostics

* Stub remote services with Req for testing

* Change snippet screen copy

* Update tracker script, so that:

1. headless browsers aren't ignored if `window.__plausible` is defined
2. callback optionally supplies the event response HTTP status

This will be later used to check whether the server acknowledged
the verification event.

* Implement LiveView verification UI

* Embed the verification UIs into settings and onboarding

* Implement browserless puppeteer verification script

It:
 - tries to visit the site
 - defines window.__plausible, so the tracker doesn't ignore test events
 - sends a verification event and instruments the callback
 - awaits the callback to fire and returns the result

* Improve diagnostics for CSP

Only report CSP error if the snippet is already found

* Put verification behind a feature flag/env setting

* Contact Us hint only for Enterprise Edition

* For headless code, use JS context instead of EEx interpolation

* Update diagnostics test with WordPress scenarios

* Shorten exception/throw interception

* Rename test

* Tidy up

* Bust URL always on headless check

* Update moduledoc

* Detect official Plausible WordPress Plugin

and act accordingly on diagnostics interoperation

* Stop using 'rating' in favour of 'interpretation'

* Only report CSP error if no proxy is likely

* Update CHANGELOG

* Allow event-* attributes on snippet elements

* Improve naive GTM detection, not to confuse it with GA4

* Update lib/plausible/verification.ex

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>

* Update test/plausible/site/verification/checks_test.exs

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>

* s/perform_wrapped/perform_safe

* Update lib/plausible/verification/checks/installation.ex

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>

* Remove garbage

---------

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
2024-05-23 15:00:50 +02:00
ruslandoga
9687fa786d
remove Plausible Analytics metions from CE (#4121)
* remove Plausible Analytics metions from CE

* update tests

* still mention Plausible Analytics on landing page
2024-05-23 09:43:01 +02:00
hq1
cd4d1d06bb
Rename Plugins API Token(s) to Plugin Token(s) (#4075)
Deployment needs to be coordinated with WordPress plugin
update. Reasoning: there's too much confusion coming
from the use of the "API" terminology, leading to wrong
keys being generated by users.
2024-05-22 07:45:09 +02:00
Adrian Gruntkowski
2d23db2982
Return warning when imported stats are skipped in Stats API due to unsupported query (#4116) 2024-05-21 11:45:18 +02:00
Adrian Gruntkowski
7876802d05
Improve CSV importer tests and visit duration query for imported data (#4114)
* Tidy up breakdown helper functions in CSV importer tests

* Fix a typo

* Extend city breakdown tests in CSV importer test suite

* Make visit duration computation consistent across native and imported data
2024-05-21 09:24:10 +02:00
Adrian Gruntkowski
756d9c28ce
Improve test coverage and fix stats API and dashboard CSV export (#4110)
* Fix broken hostname property and handle missing imported metrics gracefully

* Add test for CSV export of imported data

* Add extra coverage for property and metric combox which were failing

* Compute visit duration and bounce rate for exit pages in imported data

* Drop support for breaking down by `event:hostname` property for now
2024-05-20 16:11:35 +01:00
RobertJoonas
df178ea2d5
Introduce :skip_imported_reason field into Query struct (#4115)
* introduce new query field

* improve tests
2024-05-20 15:46:25 +01:00
Karl-Aksel Puulmann
cfdd769984
Fix StatsAPI access for new accounts on business tier (#4105)
Currently, business tier users created after business tier launch can't
access Stats API due to faulty grandfathering logic. This change should
fix that.
2024-05-15 11:50:41 +03:00
Adrian Gruntkowski
9374a95cf2
Export and import custom events via CSV (#4096)
* Export and import custom events via CSV

* Add prop support of url for cloaked links and path for 404s in imported queries

* Handle custom events with empty URL and path properties gracefully

* Make events with properties logic DRY and fix missed cloaked link

* Add test for path property breakdown

* Update raw CH data fixture and extend CSV importer tests

* Fix broken query condition after rebase

* Update CHANGELOG.md
2024-05-14 13:50:30 +02:00
Karl-Aksel Puulmann
baa99652f6
Refactor internal Query schema and introduce WhereBuilder (#4082)
* New struct format for query after parsing

* WIP refactoring

* WIP: Validations working

* WIP: tuple to list

* continued refactoring

* WIP: parsing defaults

* Breakdown tests pass

* Window functions fix

* Fix default

* Remove dead argument

* Update filters tests

* Update query_test.exs

* Fix table_decider

* sources tests pass

* Filter suggestions fix

* revenue/goal filter applied refactor

* Update top_stats matching

* Get stats_controller tests passing

* Update neighbor_aggregate_time_on_page

* Refactor Query.remove_event_filters into Query.remove_filters, add new callsites

* Move goal where clause building to new WhereBuilder module

* Move event:name filters

* Move more filters to WhereBuilder

* Update fragment to allow non-static meta columns

* Build where clause for events table using WhereBuilder

* Build sessions table where clause using WhereBuilder

* Move time range filtering and site checking to WhereBuilder

* WhereBuilder.build_condition method

* Remove TODO

* _rest pattern for TableDecider, Query pattern matching

Future-proofing in a tiny way

* Hacky fix to get tests passing for Google API tests

* Typespec fix

* Merge conflict

* refactor special goal filter logic in imported.ex

* Docs feedback

* put_filter

---------

Co-authored-by: Robert Joonas <robertjoonas16@gmail.com>
2024-05-14 11:58:10 +03:00
Uku Taht
06e8118dab
Filtering Search Console keywords (#4077)
* Apply filters in search console request

* Remove dead code from search console modal

* Remove unimportant information from keyword modal

* Show invalid filters from search console

* Fix tests

* Add/Fix tests

* Fix typo

* Remove unused variable

* Fix typo

* Changelog entry

* Fix Credo

* Display impressions, CTR and position in keyword modal

* Undo change that should not have been committed

* Fix test

* Fix test

* filters -> search_console_filters
2024-05-14 09:56:55 +03:00
ruslandoga
1f4346f4df
Add DATA_DIR (#4100)
* add DATA_DIR

* add test

* changelog

* fix test in CI where PERSISTENT_CACHE_DIR is always set

* consistent fallback
2024-05-13 10:17:56 +02:00
ruslandoga
7af8273702
Handle s3 timeout on settings page (#4036)
* handle s3 timeout

* no-async approach

* fewer changes

* add tests

* make s3 failure test EE-only
2024-05-13 10:17:27 +02:00
RobertJoonas
a6dcd19ccc
Autoconfigure event goals (#4093)
* add new goal suggestions API

* silence credo

* Order suggestions from subqueries explicitly

* allow autoconfiguring goals

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>

* Fix form modal tab switching behavior

* add test

* Remove redundant and invalid action link title

---------

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
2024-05-10 11:48:27 +02:00
Adrian Gruntkowski
6bbc8d69a4
Avoid crashing on empty GA4 HTTP report response (#4094) 2024-05-10 09:12:22 +02:00
Adrian Gruntkowski
4e7e932a75
Add support for imported custom events (#4033)
* Add Ecto schema for imported custom events

* Start importing custom events from GA4

* query imported goals

* make it possible to query events metric from imported

* make it possible to query pageviews in goal breakdown

* make it possible to query conversion rate

* fix rate limiting test

* add CR tests for dashboard API

* implement imported link_url breakdown

* override special custom event names coming from GA4

* allow specific goal filters in imported_q

* update GA4 import tests to use Stats API

* Improve tests slightly

* Update CHANGELOG.md

---------

Co-authored-by: Robert Joonas <robertjoonas16@gmail.com>
2024-05-09 13:13:19 +01:00
Adrian Gruntkowski
d8435f2e01
Remove imports_exports and csv_imports_exports feature flags (#4089) 2024-05-09 10:09:24 +02:00
ruslandoga
84ed7988a8
Drop local_start_date/1 (#4088) 2024-05-09 08:59:23 +02:00
Karl-Aksel Puulmann
0a883f10e7
Refactor: Use common current_visitors code (#4071)
* Use common module for counting current visitors in external stats controller

* Refactor spike notifier, remove now-dead code
2024-05-07 15:03:37 +03:00
Karl-Aksel Puulmann
64850cd00f
Remove maybe_drop_prop_filter (#4066)
* Fix event props paygate

Previous code wasn't properly omitting event property filters from
queries.

Discovered while refactoring the code. Extracting fix from refactor for easier reviewability.

* a

* Drop function
2024-05-07 15:03:16 +03:00
ruslandoga
02d4709be7
add csv fixture for e2e export/import test (#4037)
* add inline csv fixture

* use new csvs

* cleanup csv reading and site_id replacing

* perform comparisons between native and imported queries

* help help help

* help help

* help

* eh

* fin

* exclude export/import e2e test when experimental_reduced_joins flag is enabled

* adapt to new pageviews

* adapt to experimental_reduced_joins

* credo is formatter

* cleanup

* assert bounce rates equal in city breakdown

* fix rebase against master

* clean-up dataset

* update comment

* fix typo

* apply csv changes to the files

* use sessions timestamp for exports' dates

---------

Co-authored-by: RobertJoonas <56999674+RobertJoonas@users.noreply.github.com>
2024-05-07 08:48:22 +01:00
Adrian Gruntkowski
d7ca8d9600
Revert "Debug queries for super-admins (#4010)" (#4073)
This reverts commit dd493fdad2.
2024-05-06 15:05:17 +02:00
Karl-Aksel Puulmann
17f812443d
Return session in each time bucket its active in for hourly/minute timeseries (#4052)
* Fix typo in test name

* Update test_helper, enable experimental_session_count together with experimental_reduced_joins

* Return session in each time bucket its active in for hourly/minute timeseries

The behavior is behind experimental_session_count flag

This results in more accurate visitor showing compared to previous approach of showing each user only
active the _last_ time they did a pageview.

Were not doing this for monthly/weekly graphs due to query performance cost and it having a small effect there.
See also https://3.basecamp.com/5308029/buckets/35611491/messages/7085680123

* Add tests for new behavior

Note the new behavior mimics the old one precisely, these tests fail if only
experimental_reduced_joins is on, but not experimental_session_count

* Type erasure

* Dead comment remove

* Expected_plot change
2024-05-05 11:44:43 +03:00
ruslandoga
972dd5d150
redirect to s3 url when downloading exports (#4002)
* redirect to s3 url

* use new on_ee macro, reduce wait time for email to five seconds
2024-05-02 19:53:12 +01:00
ruslandoga
8712e91bcb
drop time on page from exports (#4051) 2024-05-02 08:29:24 +01:00
RobertJoonas
bfdadc2eee
Include breakdown property in the Query struct (#4053)
* keep breakdown prop in the query struct

* Explicitly ignore property param in aggregate and timeseries

Since parameter validation depends on the breakdown property, we need to
make sure it doesn't have any unexpected effect in endpoints where it's
not expected.
2024-04-30 18:43:46 +01:00
Adrian Gruntkowski
41fef85d29
Implement resumable GA4 imports to work around rate limiting (#4049)
* Add support for resuming import to GA4 importer

* Handle rate limiting gracefully for all remainig GA4 HTTP requests

* Show notice tooltip for long running imports

* Bump resume job schedule delay to 65 minutes

* Fix tooltip styling
2024-04-30 18:06:18 +02:00
hq1
dd493fdad2
Debug queries for super-admins (#4010)
* Debug queries for super-admins

* Fixup

* Update lib/plausible/clickhouse_repo.ex

Co-authored-by: ruslandoga <doga.ruslan@gmail.com>

* Try again with https://github.com/plausible/analytics/pull/3699

It's still clunky 😅

Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>
Co-authored-by: ruslandoga <doga.ruslan@gmail.com>

* Move headers injection to a separate plug module

* Add tests

* Update repo test

* Format

* Add moduledoc

* Don't assume order in query_log

* Be patient about query_log maybe?

* huh?

* huh2

* Wait longer

* Guard against \x00 in response header - testing on stage

* Fixup

* fixup

* fixup

* s/debug_label/label

* Include `site_id` and `metadata` in `log_comment`

* Tolerate non-serializable log_comment contents

---------

Co-authored-by: ruslandoga <doga.ruslan@gmail.com>
Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>
2024-04-30 09:57:28 +02:00
hq1
ad9141a9d0
Display tooltips on plan change when limits exceeded (#4048)
* Reapply "Display upgrade tooltips for exceeded limits (#4032)"

This reverts commit 76e910d45c.

* Switch to alpinejs controlled tooltips

Co-authored-by: Robert Joonas <robertjoonas16@gmail.com>

* Remove unused selector

* Refactor plan limits warning and extract tooltip component

* Remove redundant check

---------

Co-authored-by: Robert Joonas <robertjoonas16@gmail.com>
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
2024-04-29 11:16:15 +02:00
hq1
b2009aa158
Rely on con_cache telemetry (#4019)
* Rely on con_cache telemetry

Now that https://github.com/sasa1977/con_cache/pull/76
is released, we don't have to use low-level operations
to emit hit/miss events.

This PR also wraps cache processes with
a function returning appropriate child specs lists.

Ideally each cache will have its own supervisor/child specs
going forward. This is an intermediate step in that direction.

* Update lib/plausible/application.ex

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>

* Declare caches without warmers with plain child specs

---------

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
2024-04-29 11:00:53 +02:00
RobertJoonas
726fe2d982
fix test (#4038) 2024-04-29 09:30:30 +01:00
hq1
d6824de1ad
Rename internal build symbols (#3942)
* Rename internal build symbols

* Rename remaining + add `on_ce` macro

cc @ruslandoga
2024-04-29 08:05:33 +02:00
Adrian Gruntkowski
ca25b6c764
Bump GA4 import page limit to max 250k (#4045)
* Bump GA4 page limit to 250k

* Handle empty properties list response gracefully
2024-04-26 20:41:42 +02:00
Adam Rutkowski
76e910d45c Revert "Display upgrade tooltips for exceeded limits (#4032)"
This reverts commit 4a372a1528.
2024-04-23 12:53:26 +02:00
hq1
4a372a1528
Display upgrade tooltips for exceeded limits (#4032) 2024-04-23 10:42:48 +02:00
Adrian Gruntkowski
a8ea4ce54b
Add new actions to CRM (#4030)
* Implement `Auth.TOTP.force_disable/1`

* Add "Reset 2FA" action to users CRM

* Add `Purge.reset!/2` variant allowing to set arbitrary cutoff time

* Add ability to set native stats start time from CRM

* Revert "Add `Purge.reset!/2` variant allowing to set arbitrary cutoff time"

This reverts commit 6f294d5d58.

* Add test for CRM site update action
2024-04-23 10:29:49 +02:00
hq1
148413afbb
Fix dogfooding invitation event (#4023)
* Fix dogfooding invitation event

* Fix small build
2024-04-22 16:43:04 +02:00
Adrian Gruntkowski
3023cb12fd
Introduce active_visitors to imported_pages and start populating it with activeUsers from GA4 imports (#4027)
* Import `activeUsers` into `imported_pages.active_visitors` for GA4

* Add test for active visitors

* Simplify assertion in active visitors test

* Improve assertion for active visitors further
2024-04-22 10:18:16 +02:00
Adrian Gruntkowski
fede2f0a8a
Make final touches to Imports & Exports (#4025)
* Make final touches to Imports & Exports

* Change import content copy depending on CSV imports and exports flag state

* Remove unused aliases
2024-04-19 11:40:13 +02:00
Adrian Gruntkowski
c10580777e
Remove references to site.imported_data (#4006)
* Remove references to `site.imported_data`

* Count pre-existing ID 0 imports when showing pageview count summary for legacy imports

* Fix tests after rebase

* Dry `delete_imported_stats!`

* Clean up remaining imported data references and add notes
2024-04-19 11:15:51 +02:00
Adrian Gruntkowski
9bae3ccce3
Improve UI/UX of imports view and GA import flow (#4017)
* Add runtime config option for enabled/disabling csv imports and exports

* Use the new option to toggle rendering exports UI

* Disable import buttons when at maximum imports or when option disabled for CSV

* Improve forms for GA import flow

* Add test for maximum imports reached

* Remove "Changed your mind?" prefixing back button

* Hide UA imports in Integrations when `imports_exports` flag is enabled

* Implement `csv_imports_exports` feature flag

* Revert "Add runtime config option for enabled/disabling csv imports and exports"

This reverts commit e30f202dd3.

* Send import notification email only to the user who ran the import

* Improve rendering of disabled button state

* Put import status heroicon in front of import label
2024-04-18 12:12:48 +02:00
RobertJoonas
3a371fdf4d
Test new imported metrics (GA4) (#4014)
* test fixture imported data with stats requests

* take visits metric from the events table in event:page breakdown

* Remove assert_referrers after all

pageReferrer is an event scoped property in GA4, which when queried
along with session-level dimensions will return unexpected data.

Adding the pageReferrer dimension to the GA4 Data API request, it will
cause the selected metric totals to increase significantly, even though
they shouldn't.

* Adjust sources and utm_mediums assertions

* adjust assert_pages

* Make formatter happy

---------

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
2024-04-18 12:12:24 +02:00
Adrian Gruntkowski
9849743407
Always sort occupied date ranges in Imported.clamp_dates/3 (#4018) 2024-04-18 11:15:51 +02:00
Adam Rutkowski
2b1fbf0a0e Fix typo (and kick docker build) 2024-04-16 20:57:29 +02:00
hq1
6fb56dc1cc
Stats api hostname filter (#4008)
* Update Stats API tests

* Revert "Remove hostname filter from the external API (#3991)"

This reverts commit 884daa7943.
2024-04-16 20:36:57 +02:00
hq1
f635f0a6d3
Hostnames shield (#3990)
* Add shield hostname rules migration

* Add hostname rule schema

* Initialize hostname rules cache

* Extend Shields context with hostname related functions

* Instrument ingestion pipeline with hostname rule lookups

* Limit hostname suggestions by shield patterns

* Add LiveView for hostname rules management

* Test hostname cache

* Rename feature flag - should be separate from hostname filter

* Remove :shield_pages feature flag

* Update CHANGELOG

* Format

* Update lib/plausible/shield/hostname_rule.ex

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>

* Move tests from `lib/` 🤦

* Use plain `assign` where no short-circuit is necessary

* Fine tune the copy a little bit

* Prevent misplaced tests

* Treat a test with common sense

* Fixup another test that hasn't been really run before

* Make the form hint dynamic depending on rules count

---------

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
2024-04-16 20:30:20 +02:00
RobertJoonas
fce909d041
Improve merge imported (#4003)
* Pass the actual date args that are returned in fixtures (just for clarity)

* Change select key from operating_system to os

* Select (not set) from imported data as done from native stats

* Support breakdown by imported os_version

* Remove dead code

The `query.include_imported` is set to `false` from the start when
filters are included.

* Refactor Plausible.Stats.Imported

Extract a function that does group_by and dimension select, instead of
doing both separately inside the merge_imported function body

* Further refactor of Plausible.Stats.Imported

Get rid of code repetition

* Support breakdown by imported browser_version

* Support breakdown by imported referrer

* Support breakdown by imported referrer

* remove redundant if in select

* use greatest instead of coalesce

Co-authored-by: ruslandoga <doga.ruslan@gmail.com>

* add back the :member filter handling in Stats.Imported

---------

Co-authored-by: Uku Taht <uku.taht@gmail.com>
Co-authored-by: ruslandoga <doga.ruslan@gmail.com>
2024-04-16 16:58:22 +01:00
Adrian Gruntkowski
c07f00636d
Stop importing page referrer from GA4 (#4012)
* Stop importing page referrer from GA4

* Update GA4 import fixture

* Update fixture-based test
2024-04-16 15:35:36 +02:00
Adrian Gruntkowski
c1c03b729c
Reapply "Local CSV exports/imports and S3/UI updates (#3989)" (#3995) (#3996)
* Reapply "Local CSV exports/imports and S3/UI updates (#3989)" (#3995)

This reverts commit aee69e44c8.

* remove unused functions

* eh, that one was actually used

* ugh, they were both used

---------

Co-authored-by: ruslandoga <67764432+ruslandoga@users.noreply.github.com>
2024-04-11 09:15:01 +02:00
RobertJoonas
5163880968
fix test (#4001)
* fix test

* Update test/plausible_web/controllers/api/stats_controller/suggestions_test.exs

Co-authored-by: hq1 <hq@mtod.org>

* format

---------

Co-authored-by: hq1 <hq@mtod.org>
2024-04-10 12:40:42 +01:00
hq1
378d3bc6f5
Discard sessions switching hostnames for UTM/referrer breakdowns (#4000)
* Discard sessions switching hostnames for UTM/referrer breakdowns

Co-authored-by: Uku Taht <uku.taht@gmail.com>

* Format

---------

Co-authored-by: Uku Taht <uku.taht@gmail.com>
2024-04-10 11:38:15 +02:00
Adrian Gruntkowski
aee69e44c8
Revert "Local CSV exports/imports and S3/UI updates (#3989)" (#3995)
This reverts commit 1a0cb52f95.
2024-04-09 21:26:23 +02:00
ruslandoga
1a0cb52f95
Local CSV exports/imports and S3/UI updates (#3989)
* local CSV exports/imports and S3 updates

* credo

* dialyzer

* refactor input columns

* fix ci minio/clickhouse tests

* Update lib/plausible_web/live/csv_export.ex

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>

* fix date range filter in export_pages_q and process only pageviews

* remove toTimeZone(zero_timestamp) note

* use SiteImport.pending(), SiteImport.importing()

* escape [SiteImport.pending(), SiteImport.importing()]

* use random s3 keys for imports to avoid collisions (sometimes makes the upload get stuck)

* clamp import date ranges

* site is already in assigns

* recompute cutoff date each time

* use toDate(timestamp[, timezone]) shortcut

* show alreats on export cancel/delete and extract hint into a component

* switch to Imported.clamp_dates/4

* reprocess tables when imports are added

* recompute cutoff_date on each call

* actually use clamped_date_range on submit

* add warning message

* add expiry rules to buckets in make minio

* add site_id to imports notifications and use it in csv_importer

* try/catch safer

* return :ok

* date range is not available when no uploads

* improve ui and warning messages

* use Generic.notice

* fix flaky exports test

* begin tests

* Improve `Importer` notification payload shape

---------

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
2024-04-09 20:59:48 +02:00
Adrian Gruntkowski
bb108450cb
Fix flaky google import API tests due to hardcoded import ID (#3994) 2024-04-09 19:55:26 +02:00
hq1
884daa7943
Remove hostname filter from the external API (#3991) 2024-04-09 18:03:06 +02:00
Karl-Aksel Puulmann
441412a164
Return 400 when using invalid filters for stats api (#3986)
Currently a 500 is returned instead and logged to sentry.
2024-04-09 15:36:17 +03:00
Adrian Gruntkowski
14b00c6ac3
Fix dry run mode in DataMigration.SiteImports (#3988) 2024-04-09 13:04:17 +02:00
Adrian Gruntkowski
1c1ea95e16
Ensure only complete imports are considered in site imports data migration (#3987)
* Ensure only complete imports are considered in site imports data migration

* Refactor `SiteImports` data migration for clarity (h/t @RobertJoonas)

* Fix tests
2024-04-09 11:49:28 +02:00
Adrian Gruntkowski
d796788715
Keep sites.imported_data in sync with backfilled SiteImport when migrating (#3979)
* Keep `sites.imported_data` in sync with backfilled `SiteImport` when migrating

* Consider only completed site imports in data migration
2024-04-09 09:04:51 +02:00
ruslandoga
94deb89b9d
remove Plausible Team footer from self-hosted emails (#3980)
* remove Plausible Team footer from self-hosted

* don't test unsubscribe placeholder in small build
2024-04-09 09:04:23 +02:00
Adrian Gruntkowski
b951065724
Refactor Imported.check_dates (->clamp_dates) for better felxibility (#3983) 2024-04-09 09:04:11 +02:00
Karl-Aksel Puulmann
a6d4786959
Worker to clean site data from ClickHouse (#3959)
* Create a worker to clean clickhouse deleted sites data

The plan is to run this weekly, but going to trigger it manually the first few times on cloud

* Make asserting count more reliable

* credo

* PR feedback

* Fixes
2024-04-08 12:26:38 +03:00
Adrian Gruntkowski
a7603c9e49
Improve import procedure to ensure no time range overlaps (#3970)
* Always scope import ID by site as well

* Do not schedule new import job if there are any site imports in progress

* Disable import buttons when any import is in progress

* Simplify `schedule_job/4` (h/t @RobertJoonas)
2024-04-04 18:56:36 +02:00
Adrian Gruntkowski
cffff0340c
Handle Google API timeouts gracefully during imports (#3975) 2024-04-04 18:55:39 +02:00
Adrian Gruntkowski
33eed9d7db
Delete imports which have no stats (#3972) 2024-04-04 18:55:14 +02:00
hq1
f9f0407d68
Remove experimtnal_hostname_filter and keep it on by default (#3973)
* Remove `experimental_hostname_filter` and keep it on by default

* Catch up with changes done via e5b56dbe6
2024-04-04 17:20:16 +02:00
RobertJoonas
e5b56dbe62
Refactor VisitorGraph (#3936)
* Give a more semantic name to a function

* Make the LineGraph component thinner

* Move LineGraph into a separate file

* Move interval logic into interval-picker.js

This commit also fixes a bug where the interval name displayed inside
the picker component flickers the default interval when the graph is
loading.

The problem was that we were counting on graphData for returning us the
current interval: `let currentInterval = graphData?.interval`

We should always know the default interval before making the main-graph
request. Sending graphData to IntervalPicker component does not make
sense anyway.

* extract data fetching functions out of VisitorGraph component

* Return graph_metric key from Top Stats API

This commit introduces no behavioral changes - only starts returning an
additional field, allowing us to avoid the following logic in React:

1. Finding the metric names, given a stat display name. E.g.
   `Unique visitors (last 30 min) -> visitors`

2. Checking if a metric is graphable or not

* Move metric state into localStorage

This commit gets rid of the internal `metric` state in the VisitorGraph
component and starts using localStorage for that instead.

This commit also chains the main-graph request into the top-stats request
callback - meaning that we'll always fetch new graph data after top stats
are updated. And we do it all in a single function.

Doing so simplifies the loading state significantly, and also helps to
make it clear, that at all times, existing top stats are required before
we can fetch the graph. That's because the metric is determined by which
Top stats are returned (for example, we can't be sure whether revenue
metrics will be returned or not).

* Make sure graph tooltip says "Converted Visitors"

* Extract a StatsExport function component

Again, instead of relying on `graphData?.interval` we can read it from
localStorage, or default to the largest interval available. The export
should not be dependant on the graph.

* Extract SamplingNotice function component

* Extract WithImportedSwitch function component

* Stop "lazy-loading" the graph and top stats

Since the container is always on top on the page, it will be visible on
the first render in any case - no matter the screen size.

* Turn VisitorGraph into a function component

* Display empty container until everything has loaded

* Do not display loading spinner on realtime ticks

* Turn Top Stats into a fn component

* fetch top stats and graph async

* Make sure revenue metrics can remain on the graph

* Add an extra check to canMetricBeGraphed

* fix typo

* remove redundant double negation
2024-04-04 13:39:55 +01:00
Karl-Aksel Puulmann
3115c6e7a8
Reducing JOINs in queries (#3966)
* Move experimental_session_count? logic to within query object

* WIP new querying system for deciding what tables to query

* both -> either

* Include sample_percent in both tables

* Remove a hanging TODO

* Allow filtering by visit props on event queries if flag is on

* Make default sessions join more conditional

* Simplify events_join_sessions?

* Add some TODOs

* Fix assignment

* Handle entry/exit page visit props separately from props stored in events table

* Update test which created sessions/events differently from everyone else

* Make query_events private

* Dont filter by session properties on events table if querying sessions and joining in events

* Handle visits, pageviews, events and visitors metrics from other table

* both -> either

* events, pageviews are strictly event metrics

* Add support for (plain) breakdowns deciding which table to use

* Run tests with experimental_reduced_joins as a separate job

Also refactor which tests are run with postgres:15 to reduce number of jobs

* moduledocs for TableDecider

* Fix matrix

* Custom build name

* Move TEST_EXPERIMENTAL_REDUCED_JOINS check

* Handle percentage separately from other metrics

* Remove debug code

* TableDecider tests

* both => sample_percent

* Improve naming

* Simplify code

* Breakdowns retain old behavior if getting metric visitors

* Unify behavior of entry/exit page hostnames with rest

* Fix test naming
2024-04-04 13:54:23 +03:00
hq1
6af80dd246
Filter by hostnames (#3963)
* CH Migration: exit/entry hostnames in sessions_v2

* Leave only exit_page_hostname, we already record hostnames

* Use ClickHouse DDL in favour of ecto so that cluster is included

* Compress with ZSTD(3)

* Expose Hostname filter in the dashboard dropdown

* Add `exit_page_hostname` to ClickHouse `sessions_v2` schema

* Start tracking hostname changes in sessions

* Implement hostname filter suggestions

* Enable filtering by `event:hostname`

* Add tests for filtering by hostnames

* Ensure filter suggestions work for exit pages too

* Allow overriding hostnames with `send_pageview` mix task

* Remove `:window_time_on_page` flag

It seems that we can remove it after all?

* Initialize `experimental_hostname_filter` query parameter

* Rewrite cache store behaviour with regards to session hostnames

* Work around inconsistent session merging

So that `populate_stats` can get closer to actual ingestion

* Improve top stats test

* Make it possible to filter sessions by entry/exit hostnames

* Update pages tests

* Expose `experimental_hostname_filtering` temporarily in the UI

* Untested yet: also apply experimental filtering to sources

* Introduce `hostname_filter` feature flag

* Format

* Test top sources with hostname filter + experimental flag
2024-04-04 10:48:30 +02:00
Adrian Gruntkowski
e6d83e946f
Populate new columns in imports and exports (#3969)
* Extend `Imported*` schemas with newly added columns

* Populate newly added `Imported*` fields in GA4 imports

* Extend exports with newly added fields

* Extend CSV importer to ingest new fields

* Fix alias shadowing error

* Add more extensive GA4 import fixtures

* Apply rounding and casting to sampled visits
2024-04-04 10:33:19 +02:00
Adrian Gruntkowski
9f27fa303c
Fix dry run mode in DataMigration.SiteImports (#3965) 2024-04-02 14:05:34 +02:00
Adrian Gruntkowski
23a3699dd7
Improve import stats toggle and with_imported flag computation (#3960)
* Check import presence across all imports and not just the first one

Also, simplify imported data toggle rendering to not explicitly
refer to the earliest import source.

* Change imported stats toggle icon in dashboard

* Test `Imported.get_imports_date_range/1`

* Simplify failed UA/GA import email copy
2024-04-02 12:53:19 +02:00
Adrian Gruntkowski
71fe541359
Implement script for backfilling legacy site import entries and adjusting end dates of site imports (#3954)
* Always select and clear import ID 0 when referring to legacy imports

* Implement script for adding site import entries and adjusting end dates

* Log cases where end date computation is using fallback

* Don't log queries when running the migration to reduce noise
2024-04-02 12:53:02 +02:00
Adrian Gruntkowski
5bf59d1d8a
Implement adjusting imported date range to actual and existing stats (#3943)
* Implement adjusting imported date range to actual and existing stats

* Drop redundant prefix from import list entries

* Make pageview numbers in imports list formatted for readability

* Test and improve date range cropping

* DRY UA and GA4 stats start and end date API calls

* Extend UA/GA import controller tests and improve error handling

* refactor finding longest open range without existing data

* Fix typo in test description

Co-authored-by: RobertJoonas <56999674+RobertJoonas@users.noreply.github.com>

* Rename `open_ranges` to `free_ranges`

---------

Co-authored-by: Robert Joonas <robertjoonas16@gmail.com>
Co-authored-by: RobertJoonas <56999674+RobertJoonas@users.noreply.github.com>
2024-03-28 09:32:41 +01:00
hq1
b31433a7bf
Ensure all the react container attributes are strings (#3948) 2024-03-26 11:01:59 +01:00
hq1
edf70d14b6
Use sessionStorage for "dashboard first launch" banner tracking (#3892)
* Use sessionStorage for offer e-mail report banner tracking

Keeping it within the cookie is problematic, as the banners don't
expire and overflow the cookie with data when enough new sites
are added.

Ref https://github.com/plausible/analytics/issues/3762

* Update changelog

* Extract a component

* Make is_dbip evaluate to quoted boolean
2024-03-26 09:49:15 +01:00
Karl-Aksel Puulmann
4af7019011
Ignore sessions without entry/exit pages when breaking down entry/exit pages (#3933)
* Ignore sessions without entry/exit pages when breaking down entry/exit pages

* Update stats controller tests to have more realistic test data (pageview followed by event)
2024-03-26 09:01:07 +02:00
hq1
2fae0146a4
Reapply 3918 (#3940)
* Reapply "Pages shield (#3918)"

This reverts commit 33b5c10654.

* Make the FF check work against the site actor
2024-03-25 10:36:22 +01:00
hq1
9989ce6927
Migration for 3918 (#3939)
* Revert "Pages shield (#3918)"

This reverts commit 53f94a9f82.

* Migration: Shield page rules
2024-03-25 10:19:50 +01:00
hq1
53f94a9f82
Pages shield (#3918)
* Migration: Shield page rules

* Add Ecto schema for Page Rules

* Add Page Rule cache

* Fix typo

* BTW: Use already imported function

* Extend Shields context interface + split existing tests

* Ingestion: filter matching patches + refactor shield actions

* Add LV section for adding Page Rules

* Validate max page path length

* Put Pages Shield behind a feature flag

* Update CHANGELOG

* Update docs link anchor

As per https://github.com/plausible/docs/pull/477

* Update lib/plausible/shields.ex

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>

* Update lib/plausible_web/live/shields/page_rules.ex

Co-authored-by: ruslandoga <doga.ruslan@gmail.com>

* Update lib/plausible_web/live/shields/page_rules.ex

Co-authored-by: ruslandoga <doga.ruslan@gmail.com>

---------

Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
Co-authored-by: ruslandoga <doga.ruslan@gmail.com>
2024-03-25 09:48:56 +01:00
Adrian Gruntkowski
ba5b80a8c0
Add label to site imports and populate it (#3914) 2024-03-22 11:17:02 +01:00
RobertJoonas
d6e1e8bebd
Put total conversions on the graph + goal-filtered CSV export improvements (#3929)
* Add validation for the events metric in main_graph

* Test the already existing events metric support in main-graph

* Put total conversions on the graph

* extract main_graph_csv function (refactor only)

* add total_conversions and conversion_rate to goal-filtered visitors.csv

* update changelog
2024-03-22 09:35:23 +00:00
Uku Taht
fd879eeb16
Store referrers from android apps (#3715)
* Store referrers from android apps

* Add test for unknown referrer protocol

* Store android referrer protocol
2024-03-21 17:45:34 +02:00
RobertJoonas
c32779a3e5
Timeseries for conversion rate (#3919)
* add conversion rate to Stats API timeseries

* make sure CR can be queried as the only metric

* add a test asserting zeros are returned

* add tests for filtering by other properties at the same time

* Remove unnecessary validation of params

1. It doesn't make to validate `interval` (and its granularity) in all
   endpoints. It's only relevant for the main graph.

2. The plug (renamed to `date_validation_plug`) already makes sure that
   the dates are validated. No need to call the same function again in
   Top Stats and Funnel endpoints.

* add metric validation to main graph

* Add tests for main graph API

* put conversion rate on the graph

* update changelog

* Add revenue metrics into metrics.ex

* make fn private

* avoid setting graph metric to visitors in goal-filtered view
2024-03-21 13:58:00 +00:00
Adrian Gruntkowski
d6e81670e4
Unify UA and GA4 import flow into one (#3888)
* Unify GA4 and UA import flow into one

* Clean up property and view data retrieval via Google HTTP APIs

* Turn `Map.get` into `Map.fetch!` in API response processing code

* Bump list account summaries page size limit to max of 200

* Show only views in legacy flow and fix legacy redirect after import start

* Move google analytics import actions tests to a separate module

* Extend Google Analytics controller tests

* DRY up `property?` predicate (h/t @RobertJoonas)
2024-03-21 11:37:10 +01:00
ruslandoga
5f9465614b
Include domain and dates in zip archive filename (#3921)
* include domain and dates in zip archive filename

* adapt to comments
2024-03-21 11:35:42 +01:00
Karl-Aksel Puulmann
32ab138301
Fix issue with name clash (#3925)
Unexpectedly, table.name caused a name clash after CR refactor, so using a unique name
for the output column

Sentry issue: https://sentry.plausible.io/organizations/sentry/issues/5612
2024-03-21 10:11:29 +00:00
Karl-Aksel Puulmann
c219652dae
Re-apply Move conversion_rate logic from elixir to clickhouse (#3924)
* Revert "Revert "Move conversion_rate logic from elixir to clickhouse (#3887)"…"

This reverts commit 253fb5d67d.

* Fix issue with missing columns

The issue came from refactoring event:goal UNION ALL logic and trying to move
name select from first to last. If any other tables were joined, the incorrect
item would be used as an array index, causing this issue.

Added a relevant test.
2024-03-21 10:48:41 +02:00
Karl-Aksel Puulmann
253fb5d67d
Revert "Move conversion_rate logic from elixir to clickhouse (#3887)" (#3923)
This reverts commit 1909743b90.
2024-03-21 09:53:31 +02:00
Karl-Aksel Puulmann
1909743b90
Move conversion_rate logic from elixir to clickhouse (#3887)
* Separate out query building from pagination/execution logic.

* Refactor pageview_goals breakdown query, removing index column from results

* Remove zip_columns logic

* Use common pagination util

* Do everything in a single query for breakdowns for goals

* Order in DB

* Make sure column order is identical

* Calculate CR within the goal breakdown query

* Calculate CR for property breakdowns

* WIP: Calculate group CR

* CR with order_by

* Compatibility fix

* Import Ecto.Query and cleanup

* handle total_visitors the same way as add_percentage

* Handle conversion_rate in aggregate.ex

* Solve rebase fail

* Simplify maybe_add_group_conversion_rate

* Add conversion_rate defaults to 0 test

* Add test for conversion_rate should not be calculated with imported data (failing here and on master)

* Dont include imported data when breakdown by prop or goal

* Remove revenue_nils
2024-03-21 09:38:44 +02:00