* More fixes to the verification process:
- extend the list of known attributes
- follow redirect chain host to verify data-domain difference
- display the actual data-domain in error message, not URL
* format
* fix test
* remove redundant pattern matching
* Update lib/plausible/verification/checks/snippet.ex
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
---------
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
* Simplify browserless script and catch errors
* Improve fetching body:
- follow up on max. 4 redirects
- rely on Req default timeouts (wait much longer to account for slow
sites)
* Improve installation check:
- rely on Req default timeouts, wait longer
- log service errors as warnings
- use stealth mode to bypass captchas
* Stop cutting off body too large
* Improve diagnostics for known failures
* Another round of diagnostic improvements
* Format
* Add a test for callback status 500
* Allow running browserless.io locally
* Compile tailwind classes based on extra/ too
* Add browserless runtime configuration
* Ignore verification events on ingestion
* Improve extracting HTML text in tests
* Update dependencies
- Floki will be used on production to parse site contents
- Req will be used to handle redundant stuff like retrying etc.
* Add shuttle SVG to generic components
Later on we'll use it to indicate verification errors
* Connect live socket & allow skipping awaiting the first pageview
* Connect live socket in general settings
* Implement verification checks & diagnostics
* Stub remote services with Req for testing
* Change snippet screen copy
* Update tracker script, so that:
1. headless browsers aren't ignored if `window.__plausible` is defined
2. callback optionally supplies the event response HTTP status
This will be later used to check whether the server acknowledged
the verification event.
* Implement LiveView verification UI
* Embed the verification UIs into settings and onboarding
* Implement browserless puppeteer verification script
It:
- tries to visit the site
- defines window.__plausible, so the tracker doesn't ignore test events
- sends a verification event and instruments the callback
- awaits the callback to fire and returns the result
* Improve diagnostics for CSP
Only report CSP error if the snippet is already found
* Put verification behind a feature flag/env setting
* Contact Us hint only for Enterprise Edition
* For headless code, use JS context instead of EEx interpolation
* Update diagnostics test with WordPress scenarios
* Shorten exception/throw interception
* Rename test
* Tidy up
* Bust URL always on headless check
* Update moduledoc
* Detect official Plausible WordPress Plugin
and act accordingly on diagnostics interoperation
* Stop using 'rating' in favour of 'interpretation'
* Only report CSP error if no proxy is likely
* Update CHANGELOG
* Allow event-* attributes on snippet elements
* Improve naive GTM detection, not to confuse it with GA4
* Update lib/plausible/verification.ex
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
* Update test/plausible/site/verification/checks_test.exs
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
* s/perform_wrapped/perform_safe
* Update lib/plausible/verification/checks/installation.ex
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
* Remove garbage
---------
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
Deployment needs to be coordinated with WordPress plugin
update. Reasoning: there's too much confusion coming
from the use of the "API" terminology, leading to wrong
keys being generated by users.
* Tidy up breakdown helper functions in CSV importer tests
* Fix a typo
* Extend city breakdown tests in CSV importer test suite
* Make visit duration computation consistent across native and imported data
* Fix broken hostname property and handle missing imported metrics gracefully
* Add test for CSV export of imported data
* Add extra coverage for property and metric combox which were failing
* Compute visit duration and bounce rate for exit pages in imported data
* Drop support for breaking down by `event:hostname` property for now
Currently, business tier users created after business tier launch can't
access Stats API due to faulty grandfathering logic. This change should
fix that.
* Export and import custom events via CSV
* Add prop support of url for cloaked links and path for 404s in imported queries
* Handle custom events with empty URL and path properties gracefully
* Make events with properties logic DRY and fix missed cloaked link
* Add test for path property breakdown
* Update raw CH data fixture and extend CSV importer tests
* Fix broken query condition after rebase
* Update CHANGELOG.md
* New struct format for query after parsing
* WIP refactoring
* WIP: Validations working
* WIP: tuple to list
* continued refactoring
* WIP: parsing defaults
* Breakdown tests pass
* Window functions fix
* Fix default
* Remove dead argument
* Update filters tests
* Update query_test.exs
* Fix table_decider
* sources tests pass
* Filter suggestions fix
* revenue/goal filter applied refactor
* Update top_stats matching
* Get stats_controller tests passing
* Update neighbor_aggregate_time_on_page
* Refactor Query.remove_event_filters into Query.remove_filters, add new callsites
* Move goal where clause building to new WhereBuilder module
* Move event:name filters
* Move more filters to WhereBuilder
* Update fragment to allow non-static meta columns
* Build where clause for events table using WhereBuilder
* Build sessions table where clause using WhereBuilder
* Move time range filtering and site checking to WhereBuilder
* WhereBuilder.build_condition method
* Remove TODO
* _rest pattern for TableDecider, Query pattern matching
Future-proofing in a tiny way
* Hacky fix to get tests passing for Google API tests
* Typespec fix
* Merge conflict
* refactor special goal filter logic in imported.ex
* Docs feedback
* put_filter
---------
Co-authored-by: Robert Joonas <robertjoonas16@gmail.com>
* Apply filters in search console request
* Remove dead code from search console modal
* Remove unimportant information from keyword modal
* Show invalid filters from search console
* Fix tests
* Add/Fix tests
* Fix typo
* Remove unused variable
* Fix typo
* Changelog entry
* Fix Credo
* Display impressions, CTR and position in keyword modal
* Undo change that should not have been committed
* Fix test
* Fix test
* filters -> search_console_filters
* Delete dead code
* Speed up calculating usage for users with many many sites
Currently, the settings page time outs for a user with 14k sites.
This PR speeds things up by:
1. Doing the work in parallel (max 10 queries at once)
2. Increasing chunking size (300 -> 1000)
Note that the query is relatively lightweight on clickhouse - running
these queries manually takes ~70ms. If this becomes slow we can also
introduce a PROJECTION to speed up the calculation, but this wasn't a
bottleneck currently.
On chunking size:
ClickHouse can handle even 10k site_ids in a single query fast if run
via clickhouse-client , but running the same query via ecto_ch it becomes
really slow (60ms vs 1s).
Not sure if this is a driver, serialization or networking issue.
* add new goal suggestions API
* silence credo
* Order suggestions from subqueries explicitly
* allow autoconfiguring goals
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
* Fix form modal tab switching behavior
* add test
* Remove redundant and invalid action link title
---------
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
* refactor filter suggestions with a more DRY approach
* Avoid DRYing props string->atom translation
---------
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
* Add Ecto schema for imported custom events
* Start importing custom events from GA4
* query imported goals
* make it possible to query events metric from imported
* make it possible to query pageviews in goal breakdown
* make it possible to query conversion rate
* fix rate limiting test
* add CR tests for dashboard API
* implement imported link_url breakdown
* override special custom event names coming from GA4
* allow specific goal filters in imported_q
* update GA4 import tests to use Stats API
* Improve tests slightly
* Update CHANGELOG.md
---------
Co-authored-by: Robert Joonas <robertjoonas16@gmail.com>
* Fix event props paygate
Previous code wasn't properly omitting event property filters from
queries.
Discovered while refactoring the code. Extracting fix from refactor for easier reviewability.
* a
* Drop function
* Fix typo in test name
* Update test_helper, enable experimental_session_count together with experimental_reduced_joins
* Return session in each time bucket its active in for hourly/minute timeseries
The behavior is behind experimental_session_count flag
This results in more accurate visitor showing compared to previous approach of showing each user only
active the _last_ time they did a pageview.
Were not doing this for monthly/weekly graphs due to query performance cost and it having a small effect there.
See also https://3.basecamp.com/5308029/buckets/35611491/messages/7085680123
* Add tests for new behavior
Note the new behavior mimics the old one precisely, these tests fail if only
experimental_reduced_joins is on, but not experimental_session_count
* Type erasure
* Dead comment remove
* Expected_plot change
* keep breakdown prop in the query struct
* Explicitly ignore property param in aggregate and timeseries
Since parameter validation depends on the breakdown property, we need to
make sure it doesn't have any unexpected effect in endpoints where it's
not expected.
* Add support for resuming import to GA4 importer
* Handle rate limiting gracefully for all remainig GA4 HTTP requests
* Show notice tooltip for long running imports
* Bump resume job schedule delay to 65 minutes
* Fix tooltip styling
* Rely on con_cache telemetry
Now that https://github.com/sasa1977/con_cache/pull/76
is released, we don't have to use low-level operations
to emit hit/miss events.
This PR also wraps cache processes with
a function returning appropriate child specs lists.
Ideally each cache will have its own supervisor/child specs
going forward. This is an intermediate step in that direction.
* Update lib/plausible/application.ex
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
* Declare caches without warmers with plain child specs
---------
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
* Implement `Auth.TOTP.force_disable/1`
* Add "Reset 2FA" action to users CRM
* Add `Purge.reset!/2` variant allowing to set arbitrary cutoff time
* Add ability to set native stats start time from CRM
* Revert "Add `Purge.reset!/2` variant allowing to set arbitrary cutoff time"
This reverts commit 6f294d5d58.
* Add test for CRM site update action
* Improve UI for failed import entry case
* Make failed request debugging for UA imports on par with the one for GA4
* Refine failed import state further
* Import `activeUsers` into `imported_pages.active_visitors` for GA4
* Add test for active visitors
* Simplify assertion in active visitors test
* Improve assertion for active visitors further
* Remove references to `site.imported_data`
* Count pre-existing ID 0 imports when showing pageview count summary for legacy imports
* Fix tests after rebase
* Dry `delete_imported_stats!`
* Clean up remaining imported data references and add notes
* Add runtime config option for enabled/disabling csv imports and exports
* Use the new option to toggle rendering exports UI
* Disable import buttons when at maximum imports or when option disabled for CSV
* Improve forms for GA import flow
* Add test for maximum imports reached
* Remove "Changed your mind?" prefixing back button
* Hide UA imports in Integrations when `imports_exports` flag is enabled
* Implement `csv_imports_exports` feature flag
* Revert "Add runtime config option for enabled/disabling csv imports and exports"
This reverts commit e30f202dd3.
* Send import notification email only to the user who ran the import
* Improve rendering of disabled button state
* Put import status heroicon in front of import label
* test fixture imported data with stats requests
* take visits metric from the events table in event:page breakdown
* Remove assert_referrers after all
pageReferrer is an event scoped property in GA4, which when queried
along with session-level dimensions will return unexpected data.
Adding the pageReferrer dimension to the GA4 Data API request, it will
cause the selected metric totals to increase significantly, even though
they shouldn't.
* Adjust sources and utm_mediums assertions
* adjust assert_pages
* Make formatter happy
---------
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
* Add shield hostname rules migration
* Add hostname rule schema
* Initialize hostname rules cache
* Extend Shields context with hostname related functions
* Instrument ingestion pipeline with hostname rule lookups
* Limit hostname suggestions by shield patterns
* Add LiveView for hostname rules management
* Test hostname cache
* Rename feature flag - should be separate from hostname filter
* Remove :shield_pages feature flag
* Update CHANGELOG
* Format
* Update lib/plausible/shield/hostname_rule.ex
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
* Move tests from `lib/` 🤦
* Use plain `assign` where no short-circuit is necessary
* Fine tune the copy a little bit
* Prevent misplaced tests
* Treat a test with common sense
* Fixup another test that hasn't been really run before
* Make the form hint dynamic depending on rules count
---------
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
* Pass the actual date args that are returned in fixtures (just for clarity)
* Change select key from operating_system to os
* Select (not set) from imported data as done from native stats
* Support breakdown by imported os_version
* Remove dead code
The `query.include_imported` is set to `false` from the start when
filters are included.
* Refactor Plausible.Stats.Imported
Extract a function that does group_by and dimension select, instead of
doing both separately inside the merge_imported function body
* Further refactor of Plausible.Stats.Imported
Get rid of code repetition
* Support breakdown by imported browser_version
* Support breakdown by imported referrer
* Support breakdown by imported referrer
* remove redundant if in select
* use greatest instead of coalesce
Co-authored-by: ruslandoga <doga.ruslan@gmail.com>
* add back the :member filter handling in Stats.Imported
---------
Co-authored-by: Uku Taht <uku.taht@gmail.com>
Co-authored-by: ruslandoga <doga.ruslan@gmail.com>
* Reapply "Local CSV exports/imports and S3/UI updates (#3989)" (#3995)
This reverts commit aee69e44c8.
* remove unused functions
* eh, that one was actually used
* ugh, they were both used
---------
Co-authored-by: ruslandoga <67764432+ruslandoga@users.noreply.github.com>
We currently update the query filtered by hostname to its
respective visit props in some cases.
This patch moves it down, from controllers, to the Breakdown module,
so any changes in logic will be also reflected in the Stats API.
* local CSV exports/imports and S3 updates
* credo
* dialyzer
* refactor input columns
* fix ci minio/clickhouse tests
* Update lib/plausible_web/live/csv_export.ex
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
* fix date range filter in export_pages_q and process only pageviews
* remove toTimeZone(zero_timestamp) note
* use SiteImport.pending(), SiteImport.importing()
* escape [SiteImport.pending(), SiteImport.importing()]
* use random s3 keys for imports to avoid collisions (sometimes makes the upload get stuck)
* clamp import date ranges
* site is already in assigns
* recompute cutoff date each time
* use toDate(timestamp[, timezone]) shortcut
* show alreats on export cancel/delete and extract hint into a component
* switch to Imported.clamp_dates/4
* reprocess tables when imports are added
* recompute cutoff_date on each call
* actually use clamped_date_range on submit
* add warning message
* add expiry rules to buckets in make minio
* add site_id to imports notifications and use it in csv_importer
* try/catch safer
* return :ok
* date range is not available when no uploads
* improve ui and warning messages
* use Generic.notice
* fix flaky exports test
* begin tests
* Improve `Importer` notification payload shape
---------
Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>
* Ensure only complete imports are considered in site imports data migration
* Refactor `SiteImports` data migration for clarity (h/t @RobertJoonas)
* Fix tests
* Create a worker to clean clickhouse deleted sites data
The plan is to run this weekly, but going to trigger it manually the first few times on cloud
* Make asserting count more reliable
* credo
* PR feedback
* Fixes
* Update request logging
Ultimate goal is to be able to compare results with and without a flag against each other.
To do this we need logging which displays the full request url with parameters.
Example logs:
```
14:46:09.042 request_id=F8MRLSsaKB7BeIkAAAHk [info] (200) GET /api/sites took 17ms
14:46:09.175 request_id=F8MRLTKYV3G-GqEAAAZB [info] (200) GET /api/stats/dummy.site/current-visitors took 24ms
14:46:09.396 request_id=F8MRLUDfav28LIkAAAIE [info] (202) POST /api/event took 5ms
14:46:09.501 request_id=F8MRLUDS_YhftUkAAAAD [info] (200) GET /api/stats/dummy.site/sources?period=30d&date=2024-04-04&filters=%7B%7D&with_imported=true&limit=9 took 111ms
14:46:09.508 request_id=F8MRLUDhHbK8WKUAAAah [info] (200) GET /api/stats/dummy.site/main-graph?period=30d&date=2024-04-04&filters=%7B%7D&with_imported=true&metric=visitors took 117ms
14:46:09.511 request_id=F8MRLUDS1CYntK4AAAaB [info] (200) GET /api/stats/dummy.site/entry-pages?period=30d&date=2024-04-04&filters=%7B%7D&with_imported=true&limit=9 took 121ms
14:46:09.541 request_id=F8MRLTk5sIPYSn4AABoC [info] (200) GET /api/stats/dummy.site/top-stats?period=30d&date=2024-04-04&filters=%7B%7D&with_imported=true&comparison=previous_period&compare_from=undefined&compare_to=undefined&match_day_of_week=true took 278ms
```
* re-add plug
* router_dispatch -> endpoint
* Always scope import ID by site as well
* Do not schedule new import job if there are any site imports in progress
* Disable import buttons when any import is in progress
* Simplify `schedule_job/4` (h/t @RobertJoonas)
changing the copy here as I think that in some situations the "we're still counting stats" message is now shown even to those dashboards where we've stopped counting stats so best to avoid that
* Give a more semantic name to a function
* Make the LineGraph component thinner
* Move LineGraph into a separate file
* Move interval logic into interval-picker.js
This commit also fixes a bug where the interval name displayed inside
the picker component flickers the default interval when the graph is
loading.
The problem was that we were counting on graphData for returning us the
current interval: `let currentInterval = graphData?.interval`
We should always know the default interval before making the main-graph
request. Sending graphData to IntervalPicker component does not make
sense anyway.
* extract data fetching functions out of VisitorGraph component
* Return graph_metric key from Top Stats API
This commit introduces no behavioral changes - only starts returning an
additional field, allowing us to avoid the following logic in React:
1. Finding the metric names, given a stat display name. E.g.
`Unique visitors (last 30 min) -> visitors`
2. Checking if a metric is graphable or not
* Move metric state into localStorage
This commit gets rid of the internal `metric` state in the VisitorGraph
component and starts using localStorage for that instead.
This commit also chains the main-graph request into the top-stats request
callback - meaning that we'll always fetch new graph data after top stats
are updated. And we do it all in a single function.
Doing so simplifies the loading state significantly, and also helps to
make it clear, that at all times, existing top stats are required before
we can fetch the graph. That's because the metric is determined by which
Top stats are returned (for example, we can't be sure whether revenue
metrics will be returned or not).
* Make sure graph tooltip says "Converted Visitors"
* Extract a StatsExport function component
Again, instead of relying on `graphData?.interval` we can read it from
localStorage, or default to the largest interval available. The export
should not be dependant on the graph.
* Extract SamplingNotice function component
* Extract WithImportedSwitch function component
* Stop "lazy-loading" the graph and top stats
Since the container is always on top on the page, it will be visible on
the first render in any case - no matter the screen size.
* Turn VisitorGraph into a function component
* Display empty container until everything has loaded
* Do not display loading spinner on realtime ticks
* Turn Top Stats into a fn component
* fetch top stats and graph async
* Make sure revenue metrics can remain on the graph
* Add an extra check to canMetricBeGraphed
* fix typo
* remove redundant double negation
* Move experimental_session_count? logic to within query object
* WIP new querying system for deciding what tables to query
* both -> either
* Include sample_percent in both tables
* Remove a hanging TODO
* Allow filtering by visit props on event queries if flag is on
* Make default sessions join more conditional
* Simplify events_join_sessions?
* Add some TODOs
* Fix assignment
* Handle entry/exit page visit props separately from props stored in events table
* Update test which created sessions/events differently from everyone else
* Make query_events private
* Dont filter by session properties on events table if querying sessions and joining in events
* Handle visits, pageviews, events and visitors metrics from other table
* both -> either
* events, pageviews are strictly event metrics
* Add support for (plain) breakdowns deciding which table to use
* Run tests with experimental_reduced_joins as a separate job
Also refactor which tests are run with postgres:15 to reduce number of jobs
* moduledocs for TableDecider
* Fix matrix
* Custom build name
* Move TEST_EXPERIMENTAL_REDUCED_JOINS check
* Handle percentage separately from other metrics
* Remove debug code
* TableDecider tests
* both => sample_percent
* Improve naming
* Simplify code
* Breakdowns retain old behavior if getting metric visitors
* Unify behavior of entry/exit page hostnames with rest
* Fix test naming
* CH Migration: exit/entry hostnames in sessions_v2
* Leave only exit_page_hostname, we already record hostnames
* Use ClickHouse DDL in favour of ecto so that cluster is included
* Compress with ZSTD(3)
* Expose Hostname filter in the dashboard dropdown
* Add `exit_page_hostname` to ClickHouse `sessions_v2` schema
* Start tracking hostname changes in sessions
* Implement hostname filter suggestions
* Enable filtering by `event:hostname`
* Add tests for filtering by hostnames
* Ensure filter suggestions work for exit pages too
* Allow overriding hostnames with `send_pageview` mix task
* Remove `:window_time_on_page` flag
It seems that we can remove it after all?
* Initialize `experimental_hostname_filter` query parameter
* Rewrite cache store behaviour with regards to session hostnames
* Work around inconsistent session merging
So that `populate_stats` can get closer to actual ingestion
* Improve top stats test
* Make it possible to filter sessions by entry/exit hostnames
* Update pages tests
* Expose `experimental_hostname_filtering` temporarily in the UI
* Untested yet: also apply experimental filtering to sources
* Introduce `hostname_filter` feature flag
* Format
* Test top sources with hostname filter + experimental flag
* Check import presence across all imports and not just the first one
Also, simplify imported data toggle rendering to not explicitly
refer to the earliest import source.
* Change imported stats toggle icon in dashboard
* Test `Imported.get_imports_date_range/1`
* Simplify failed UA/GA import email copy
* Always select and clear import ID 0 when referring to legacy imports
* Implement script for adding site import entries and adjusting end dates
* Log cases where end date computation is using fallback
* Don't log queries when running the migration to reduce noise