analytics

mirror of https://github.com/plausible/analytics.git synced 2024-11-26 23:27:54 +03:00

Author	SHA1	Message	Date
Adrian Gruntkowski	aee69e44c8	Revert "Local CSV exports/imports and S3/UI updates (#3989 )" (#3995 ) This reverts commit `1a0cb52f95`.	2024-04-09 21:26:23 +02:00
ruslandoga	1a0cb52f95	Local CSV exports/imports and S3/UI updates (#3989 ) * local CSV exports/imports and S3 updates * credo * dialyzer * refactor input columns * fix ci minio/clickhouse tests * Update lib/plausible_web/live/csv_export.ex Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com> * fix date range filter in export_pages_q and process only pageviews * remove toTimeZone(zero_timestamp) note * use SiteImport.pending(), SiteImport.importing() * escape [SiteImport.pending(), SiteImport.importing()] * use random s3 keys for imports to avoid collisions (sometimes makes the upload get stuck) * clamp import date ranges * site is already in assigns * recompute cutoff date each time * use toDate(timestamp[, timezone]) shortcut * show alreats on export cancel/delete and extract hint into a component * switch to Imported.clamp_dates/4 * reprocess tables when imports are added * recompute cutoff_date on each call * actually use clamped_date_range on submit * add warning message * add expiry rules to buckets in make minio * add site_id to imports notifications and use it in csv_importer * try/catch safer * return :ok * date range is not available when no uploads * improve ui and warning messages * use Generic.notice * fix flaky exports test * begin tests * Improve `Importer` notification payload shape --------- Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com>	2024-04-09 20:59:48 +02:00
Adrian Gruntkowski	bb108450cb	Fix flaky google import API tests due to hardcoded import ID (#3994 )	2024-04-09 19:55:26 +02:00
hq1	884daa7943	Remove hostname filter from the external API (#3991 )	2024-04-09 18:03:06 +02:00
Karl-Aksel Puulmann	ceaf2e1f79	Fix experimental_reduced_joins (#3993 ) The original diff had an important exclusionary branch commented out, causing events table to be queried for breakdowns with the flag off	2024-04-09 18:45:03 +03:00
Karl-Aksel Puulmann	441412a164	Return 400 when using invalid filters for stats api (#3986 ) Currently a 500 is returned instead and logged to sentry.	2024-04-09 15:36:17 +03:00
Adrian Gruntkowski	14b00c6ac3	Fix dry run mode in `DataMigration.SiteImports` (#3988 )	2024-04-09 13:04:17 +02:00
Adrian Gruntkowski	1c1ea95e16	Ensure only complete imports are considered in site imports data migration (#3987 ) * Ensure only complete imports are considered in site imports data migration * Refactor `SiteImports` data migration for clarity (h/t @RobertJoonas) * Fix tests	2024-04-09 11:49:28 +02:00
Adrian Gruntkowski	d796788715	Keep `sites.imported_data` in sync with backfilled `SiteImport` when migrating (#3979 ) * Keep `sites.imported_data` in sync with backfilled `SiteImport` when migrating * Consider only completed site imports in data migration	2024-04-09 09:04:51 +02:00
ruslandoga	94deb89b9d	remove Plausible Team footer from self-hosted emails (#3980 ) * remove Plausible Team footer from self-hosted * don't test unsubscribe placeholder in small build	2024-04-09 09:04:23 +02:00
Adrian Gruntkowski	b951065724	Refactor `Imported.check_dates` (->`clamp_dates`) for better felxibility (#3983 )	2024-04-09 09:04:11 +02:00
Adrian Gruntkowski	d381c79d4b	Reapply "Include query string when logging the request (#3971 )" (#3984 ) (#3985 ) This reverts commit `acbd2f8e30`.	2024-04-09 08:49:45 +03:00
Adrian Gruntkowski	acbd2f8e30	Revert "Include query string when logging the request (#3971 )" (#3984 ) This reverts commit `acbbaa9116`.	2024-04-08 18:44:14 +02:00
Karl-Aksel Puulmann	a6d4786959	Worker to clean site data from ClickHouse (#3959 ) * Create a worker to clean clickhouse deleted sites data The plan is to run this weekly, but going to trigger it manually the first few times on cloud * Make asserting count more reliable * credo * PR feedback * Fixes	2024-04-08 12:26:38 +03:00
Karl-Aksel Puulmann	acbbaa9116	Include query string when logging the request (#3971 ) * Update request logging Ultimate goal is to be able to compare results with and without a flag against each other. To do this we need logging which displays the full request url with parameters. Example logs: ``` 14:46:09.042 request_id=F8MRLSsaKB7BeIkAAAHk [info] (200) GET /api/sites took 17ms 14:46:09.175 request_id=F8MRLTKYV3G-GqEAAAZB [info] (200) GET /api/stats/dummy.site/current-visitors took 24ms 14:46:09.396 request_id=F8MRLUDfav28LIkAAAIE [info] (202) POST /api/event took 5ms 14:46:09.501 request_id=F8MRLUDS_YhftUkAAAAD [info] (200) GET /api/stats/dummy.site/sources?period=30d&date=2024-04-04&filters=%7B%7D&with_imported=true&limit=9 took 111ms 14:46:09.508 request_id=F8MRLUDhHbK8WKUAAAah [info] (200) GET /api/stats/dummy.site/main-graph?period=30d&date=2024-04-04&filters=%7B%7D&with_imported=true&metric=visitors took 117ms 14:46:09.511 request_id=F8MRLUDS1CYntK4AAAaB [info] (200) GET /api/stats/dummy.site/entry-pages?period=30d&date=2024-04-04&filters=%7B%7D&with_imported=true&limit=9 took 121ms 14:46:09.541 request_id=F8MRLTk5sIPYSn4AABoC [info] (200) GET /api/stats/dummy.site/top-stats?period=30d&date=2024-04-04&filters=%7B%7D&with_imported=true&comparison=previous_period&compare_from=undefined&compare_to=undefined&match_day_of_week=true took 278ms ``` * re-add plug * router_dispatch -> endpoint	2024-04-08 09:29:11 +03:00
Adrian Gruntkowski	a7603c9e49	Improve import procedure to ensure no time range overlaps (#3970 ) * Always scope import ID by site as well * Do not schedule new import job if there are any site imports in progress * Disable import buttons when any import is in progress * Simplify `schedule_job/4` (h/t @RobertJoonas)	2024-04-04 18:56:36 +02:00
Adrian Gruntkowski	cffff0340c	Handle Google API timeouts gracefully during imports (#3975 )	2024-04-04 18:55:39 +02:00
Adrian Gruntkowski	33eed9d7db	Delete imports which have no stats (#3972 )	2024-04-04 18:55:14 +02:00
Marko Saric	e5b7f1afd0	Changing the copy of the locked screen (#3967 ) changing the copy here as I think that in some situations the "we're still counting stats" message is now shown even to those dashboards where we've stopped counting stats so best to avoid that	2024-04-04 18:54:37 +02:00
Uku Taht	f966419a4a	Update ua_inspector (#3957 ) Co-authored-by: hq1 <hq@mtod.org>	2024-04-04 17:20:57 +02:00
hq1	f9f0407d68	Remove `experimtnal_hostname_filter` and keep it on by default (#3973 ) * Remove `experimental_hostname_filter` and keep it on by default * Catch up with changes done via `e5b56dbe6`	2024-04-04 17:20:16 +02:00
RobertJoonas	e5b56dbe62	Refactor VisitorGraph (#3936 ) * Give a more semantic name to a function * Make the LineGraph component thinner * Move LineGraph into a separate file * Move interval logic into interval-picker.js This commit also fixes a bug where the interval name displayed inside the picker component flickers the default interval when the graph is loading. The problem was that we were counting on graphData for returning us the current interval: `let currentInterval = graphData?.interval` We should always know the default interval before making the main-graph request. Sending graphData to IntervalPicker component does not make sense anyway. * extract data fetching functions out of VisitorGraph component * Return graph_metric key from Top Stats API This commit introduces no behavioral changes - only starts returning an additional field, allowing us to avoid the following logic in React: 1. Finding the metric names, given a stat display name. E.g. `Unique visitors (last 30 min) -> visitors` 2. Checking if a metric is graphable or not * Move metric state into localStorage This commit gets rid of the internal `metric` state in the VisitorGraph component and starts using localStorage for that instead. This commit also chains the main-graph request into the top-stats request callback - meaning that we'll always fetch new graph data after top stats are updated. And we do it all in a single function. Doing so simplifies the loading state significantly, and also helps to make it clear, that at all times, existing top stats are required before we can fetch the graph. That's because the metric is determined by which Top stats are returned (for example, we can't be sure whether revenue metrics will be returned or not). * Make sure graph tooltip says "Converted Visitors" * Extract a StatsExport function component Again, instead of relying on `graphData?.interval` we can read it from localStorage, or default to the largest interval available. The export should not be dependant on the graph. * Extract SamplingNotice function component * Extract WithImportedSwitch function component * Stop "lazy-loading" the graph and top stats Since the container is always on top on the page, it will be visible on the first render in any case - no matter the screen size. * Turn VisitorGraph into a function component * Display empty container until everything has loaded * Do not display loading spinner on realtime ticks * Turn Top Stats into a fn component * fetch top stats and graph async * Make sure revenue metrics can remain on the graph * Add an extra check to canMetricBeGraphed * fix typo * remove redundant double negation	2024-04-04 13:39:55 +01:00
Karl-Aksel Puulmann	3115c6e7a8	Reducing JOINs in queries (#3966 ) * Move experimental_session_count? logic to within query object * WIP new querying system for deciding what tables to query * both -> either * Include sample_percent in both tables * Remove a hanging TODO * Allow filtering by visit props on event queries if flag is on * Make default sessions join more conditional * Simplify events_join_sessions? * Add some TODOs * Fix assignment * Handle entry/exit page visit props separately from props stored in events table * Update test which created sessions/events differently from everyone else * Make query_events private * Dont filter by session properties on events table if querying sessions and joining in events * Handle visits, pageviews, events and visitors metrics from other table * both -> either * events, pageviews are strictly event metrics * Add support for (plain) breakdowns deciding which table to use * Run tests with experimental_reduced_joins as a separate job Also refactor which tests are run with postgres:15 to reduce number of jobs * moduledocs for TableDecider * Fix matrix * Custom build name * Move TEST_EXPERIMENTAL_REDUCED_JOINS check * Handle percentage separately from other metrics * Remove debug code * TableDecider tests * both => sample_percent * Improve naming * Simplify code * Breakdowns retain old behavior if getting metric visitors * Unify behavior of entry/exit page hostnames with rest * Fix test naming	2024-04-04 13:54:23 +03:00
hq1	6af80dd246	Filter by hostnames (#3963 ) * CH Migration: exit/entry hostnames in sessions_v2 * Leave only exit_page_hostname, we already record hostnames * Use ClickHouse DDL in favour of ecto so that cluster is included * Compress with ZSTD(3) * Expose Hostname filter in the dashboard dropdown * Add `exit_page_hostname` to ClickHouse `sessions_v2` schema * Start tracking hostname changes in sessions * Implement hostname filter suggestions * Enable filtering by `event:hostname` * Add tests for filtering by hostnames * Ensure filter suggestions work for exit pages too * Allow overriding hostnames with `send_pageview` mix task * Remove `:window_time_on_page` flag It seems that we can remove it after all? * Initialize `experimental_hostname_filter` query parameter * Rewrite cache store behaviour with regards to session hostnames * Work around inconsistent session merging So that `populate_stats` can get closer to actual ingestion * Improve top stats test * Make it possible to filter sessions by entry/exit hostnames * Update pages tests * Expose `experimental_hostname_filtering` temporarily in the UI * Untested yet: also apply experimental filtering to sources * Introduce `hostname_filter` feature flag * Format * Test top sources with hostname filter + experimental flag	2024-04-04 10:48:30 +02:00
Adrian Gruntkowski	e6d83e946f	Populate new columns in imports and exports (#3969 ) * Extend `Imported` schemas with newly added columns Populate newly added `Imported` fields in GA4 imports Extend exports with newly added fields * Extend CSV importer to ingest new fields * Fix alias shadowing error * Add more extensive GA4 import fixtures * Apply rounding and casting to sampled visits	2024-04-04 10:33:19 +02:00
RobertJoonas	bd73fc8266	CH Migration - add more imported metrics and properties (#3949 ) * add migration * add utm_source to imported_sources * quickfix * satisfy credo * Revert "satisfy credo" This reverts commit `bb0b228164`. * Revert "quickfix" This reverts commit `ab6f70c79e`. --------- Co-authored-by: ruslandoga <67764432+ruslandoga@users.noreply.github.com>	2024-04-04 09:12:53 +01:00
ruslandoga	2ba4988c95	specify insert columns in import from s3 (#3968 ) Co-authored-by: RobertJoonas <56999674+RobertJoonas@users.noreply.github.com>	2024-04-03 10:10:49 +01:00
hq1	1f778e0c11	CH Migration: exit page hostname on sessions_v2 (#3953 ) * CH Migration: exit/entry hostnames in sessions_v2 * Leave only exit_page_hostname, we already record hostnames * Use ClickHouse DDL in favour of ecto so that cluster is included * Compress with ZSTD(3)	2024-04-03 09:42:47 +02:00
Adrian Gruntkowski	9f27fa303c	Fix dry run mode in `DataMigration.SiteImports` (#3965 )	2024-04-02 14:05:34 +02:00
Adrian Gruntkowski	23a3699dd7	Improve import stats toggle and `with_imported` flag computation (#3960 ) * Check import presence across all imports and not just the first one Also, simplify imported data toggle rendering to not explicitly refer to the earliest import source. * Change imported stats toggle icon in dashboard * Test `Imported.get_imports_date_range/1` * Simplify failed UA/GA import email copy	2024-04-02 12:53:19 +02:00
Adrian Gruntkowski	71fe541359	Implement script for backfilling legacy site import entries and adjusting end dates of site imports (#3954 ) * Always select and clear import ID 0 when referring to legacy imports * Implement script for adding site import entries and adjusting end dates * Log cases where end date computation is using fallback * Don't log queries when running the migration to reduce noise	2024-04-02 12:53:02 +02:00
Adrian Gruntkowski	5bf59d1d8a	Implement adjusting imported date range to actual and existing stats (#3943 ) * Implement adjusting imported date range to actual and existing stats * Drop redundant prefix from import list entries * Make pageview numbers in imports list formatted for readability * Test and improve date range cropping * DRY UA and GA4 stats start and end date API calls * Extend UA/GA import controller tests and improve error handling * refactor finding longest open range without existing data * Fix typo in test description Co-authored-by: RobertJoonas <56999674+RobertJoonas@users.noreply.github.com> * Rename `open_ranges` to `free_ranges` --------- Co-authored-by: Robert Joonas <robertjoonas16@gmail.com> Co-authored-by: RobertJoonas <56999674+RobertJoonas@users.noreply.github.com>	2024-03-28 09:32:41 +01:00
ruslandoga	c263df5805	CSV imports (UI) (#3845 ) * add basic ui * remove TODO * credo * allow folder upload * redirect external * mention folder, use folder icon for file picker * back to multiple file upload * mention zip * escape dots in archive filename	2024-03-26 12:55:14 +01:00
hq1	b31433a7bf	Ensure all the react container attributes are strings (#3948 )	2024-03-26 11:01:59 +01:00
hq1	edf70d14b6	Use sessionStorage for "dashboard first launch" banner tracking (#3892 ) * Use sessionStorage for offer e-mail report banner tracking Keeping it within the cookie is problematic, as the banners don't expire and overflow the cookie with data when enough new sites are added. Ref https://github.com/plausible/analytics/issues/3762 * Update changelog * Extract a component * Make is_dbip evaluate to quoted boolean	2024-03-26 09:49:15 +01:00
hq1	7523abe93e	Add metrics to ingestion pipeline (#3927 ) * Add metrics to ingestion pipeline * Format * Format * Update buckets * Credo	2024-03-26 09:42:48 +01:00
Karl-Aksel Puulmann	604bf88451	Add github action to validate whether migrations and app change at the same time (#3945 )	2024-03-26 10:29:55 +02:00
Karl-Aksel Puulmann	4af7019011	Ignore sessions without entry/exit pages when breaking down entry/exit pages (#3933 ) * Ignore sessions without entry/exit pages when breaking down entry/exit pages * Update stats controller tests to have more realistic test data (pageview followed by event)	2024-03-26 09:01:07 +02:00
hq1	2fae0146a4	Reapply 3918 (#3940 ) * Reapply "Pages shield (#3918)" This reverts commit `33b5c10654`. * Make the FF check work against the site actor	2024-03-25 10:36:22 +01:00
hq1	9989ce6927	Migration for 3918 (#3939 ) * Revert "Pages shield (#3918)" This reverts commit `53f94a9f82`. * Migration: Shield page rules	2024-03-25 10:19:50 +01:00
hq1	53f94a9f82	Pages shield (#3918 ) * Migration: Shield page rules * Add Ecto schema for Page Rules * Add Page Rule cache * Fix typo * BTW: Use already imported function * Extend Shields context interface + split existing tests * Ingestion: filter matching patches + refactor shield actions * Add LV section for adding Page Rules * Validate max page path length * Put Pages Shield behind a feature flag * Update CHANGELOG * Update docs link anchor As per https://github.com/plausible/docs/pull/477 * Update lib/plausible/shields.ex Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com> * Update lib/plausible_web/live/shields/page_rules.ex Co-authored-by: ruslandoga <doga.ruslan@gmail.com> * Update lib/plausible_web/live/shields/page_rules.ex Co-authored-by: ruslandoga <doga.ruslan@gmail.com> --------- Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com> Co-authored-by: ruslandoga <doga.ruslan@gmail.com>	2024-03-25 09:48:56 +01:00
Adrian Gruntkowski	ba5b80a8c0	Add label to site imports and populate it (#3914 )	2024-03-22 11:17:02 +01:00
Uku Taht	1d017e86a1	Fix escaping of source filters (#3930 ) * Fix escaping of source filters * CHANGELOG * Fix typo Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com> --------- Co-authored-by: Adrian Gruntkowski <adrian.gruntkowski@gmail.com> Co-authored-by: Karl-Aksel Puulmann <macobo@users.noreply.github.com>	2024-03-22 11:16:44 +01:00
Adrian Gruntkowski	52c226c428	Add `label` column to `site_imports` schema (#3926 )	2024-03-22 10:51:39 +01:00
RobertJoonas	fb61f0b425	Capitalize Total Conversions in graph tooltip (#3934 )	2024-03-22 09:51:17 +00:00
RobertJoonas	d6e1e8bebd	Put total conversions on the graph + goal-filtered CSV export improvements (#3929 ) * Add validation for the events metric in main_graph * Test the already existing events metric support in main-graph * Put total conversions on the graph * extract main_graph_csv function (refactor only) * add total_conversions and conversion_rate to goal-filtered visitors.csv * update changelog	2024-03-22 09:35:23 +00:00
Uku Taht	561dcd821e	Mask pathanme in filter menu event (#3932 )	2024-03-22 10:25:10 +02:00
Uku Taht	fd879eeb16	Store referrers from android apps (#3715 ) * Store referrers from android apps * Add test for unknown referrer protocol * Store android referrer protocol	2024-03-21 17:45:34 +02:00
Uku Taht	8992c8ee07	Add tracking to filter button (#3928 )	2024-03-21 17:44:51 +02:00
RobertJoonas	c32779a3e5	Timeseries for conversion rate (#3919 ) * add conversion rate to Stats API timeseries * make sure CR can be queried as the only metric * add a test asserting zeros are returned * add tests for filtering by other properties at the same time * Remove unnecessary validation of params 1. It doesn't make to validate `interval` (and its granularity) in all endpoints. It's only relevant for the main graph. 2. The plug (renamed to `date_validation_plug`) already makes sure that the dates are validated. No need to call the same function again in Top Stats and Funnel endpoints. * add metric validation to main graph * Add tests for main graph API * put conversion rate on the graph * update changelog * Add revenue metrics into metrics.ex * make fn private * avoid setting graph metric to visitors in goal-filtered view	2024-03-21 13:58:00 +00:00

1 2 3 4 5 ...

2646 Commits