analytics

mirror of https://github.com/plausible/analytics.git synced 2024-11-23 20:13:31 +03:00

Author	SHA1	Message	Date
Cenk Kücük	f6ee17a400	Use hostname for server_name (#2642 )	2023-02-03 08:51:32 -03:00
Adam Rutkowski	8f9f032968	Delay stats deletions (#2632 ) * Implement Site removal transaction * Implement Stats removal Oban worker * Configure site removal queue * Call Site.Removal.run() instead of Purge.delete_site! * Test site/stats removal * Remove FIXME - filed a ticket * Over-communicate lenghty deletion process to the users	2023-01-31 16:11:04 -03:00
Adam Rutkowski	ad12e1ef31	Show user feedback form on server errors (#2617 ) * Move Endpoint errors setup to common config * Implement naive Sentry link resolver * Implement error report e-mail * Delete static sentry script * Implement user feedback form on server errors * Re-arrange pipe * Use Sentry.Config.dsn() where applicable * Fix typo * Use Map.replace/3	2023-01-25 15:15:41 +01:00
ruslandoga	166748dcf2	Replace Geolix with Locus (#2362 ) This PR replaces geolix with locus to simplify self-hosted setup. locus can auto-update maxmind dbs which are recommended for self-hosters if they want city-level geolocation. locus is also a bit faster. This PR also uses a test mmdb file from https://github.com/maxmind/MaxMind-DB for e2e geolocation tests without stubs.	2023-01-17 12:05:09 -03:00
Uku Taht	1785653b1e	Ignore unknown countries (#2556 ) * Ignore XX and T1 countries * Add fallback if country_code=nil * Lookup city overrides directly in CityOverrides module * Changelog * Add empty moduledoc * Remove redundant comment	2023-01-03 10:35:23 -03:00
Adam Rutkowski	5de43b758d	Run tests in async mode where applicable (#2542 ) * Set pg pool size for MIX_ENV=test * Include slow tests in CI run * Exclude slow tests by default * Mark tests slow/async where applicable * Restructure captcha mocks * Revert async where env is relied upon * Add --max-failures=1 to CI run * Set warnings as errors * Disable async where various mocks are used * Revert "Disable async where various mocks are used" This reverts commit `2446b72a29`. * Disable async for test using vcr	2022-12-26 10:20:29 -03:00
ruslandoga	138e7c06d6	add BUILD_METADATA fallback when parsing (#2503 ) ### Changes This PR adds a fallback to empty build metadata when BUILD_METADATA contains invalid JSON. Example `warning` log for `BUILD_METADATA={...}`: ``` 20:57:57.872 [warning] failed to parse $BUILD_METADATA, reason: ** (Jason.DecodeError) unexpected byte at position 1: 0x2E (".") ``` Fixes https://github.com/plausible/analytics/issues/2491 ### Tests - [x] This PR does not require tests ### Changelog - [ ] Entry has been added to changelog ### Documentation - [x] This change does not need a documentation update ### Dark mode - [x] This PR does not change the UI	2022-12-05 17:59:16 +02:00
Adam Rutkowski	356575ef78	Gatekeep ingestion pipeline (#2472 ) * Update Sites.Cache So it's now capable of refreshing most recent sites. Refreshing a single site is no longer wanted. * Introduce Warmer.RecentlyUpdated This is Sites Cache warmer that runs only for most recently updated sites every 30s. * Validate Request creation early * Rename RateLimiter to GateKeeper and introduce detailed policies * Update events API tests - a provisioned site is now required * Update events ingestion tests * Make limits visible in CRM Sites index * Hard-deprecate DOMAIN_BLACKLIST * Remove unnecessary clause * Fix typo * Explicitly delegate Warmer.All * GateKeeper.allwoance => GateKeeper.check * Instrument Sites.Cache measurments * Update send_pageview task to output response headers * Instrument ingestion pipeline * Credo * Make event telemetry test a sync case * Simplify Request.uri/hostname handling * Use embedded schema, apply action and rely on get_field	2022-11-28 15:50:55 +01:00
Adam Rutkowski	457a558471	Kick off sites by domain cache implementation (#2434 ) * Implement sites by domain caching interface + warmer * Add test * Implement hit rate interface * Add moduledocs * Fix up typespec * s/warmer/warmer_fn * Extract measure_duration/2 * Fix up typespec * Log errors and return nil on cache internal errors * Fix up non-existing cache test * Retrieve specific db columns when pre-filling the cache * Reduce the subset of fields retrieved from the DB See `63f3c6233d (r89871536)`	2022-11-16 10:06:23 +01:00
ruslandoga	0b7870dc4d	improve first launch experience for self-hosters (#2357 ) * first launch * dynamic children, wait for repo * remove wait_for_repo and app env manipulations * don't mention free trial in self-hosted pages * add changelog * assigns[:is_selfhost] -> @is_selfhost * better changelog wording * rm admin_user, admin_email, admin_pwd from app env * rm DISABLE_AUTH * redirect / to /login when not authenticated * remove TODO * Update lib/plausible_web/controllers/page_controller.ex Co-authored-by: Uku Taht <Uku.taht@gmail.com> * format Co-authored-by: Uku Taht <Uku.taht@gmail.com>	2022-11-10 12:42:22 +01:00
Adam Rutkowski	101e5a68b5	Allow Site DB lookups during ingestion phase (#2408 ) * Implement FF-driven DB lookup for sites during ingestion We like to see the impact of doing a simple postgres lookup on each ingestion event. The percentage-based feature flag `:ingestion_pg_lookup` must be set in order for lookups to be executed. * Fix resolving Cachex stats metrics * Enable PromEx on dev env	2022-11-01 17:11:50 +02:00
Vinicius Brasil	b898642373	Double maximum header length (#2353 ) This commit makes the permitted header length more permissive, 8,192 bytes, doubling the Phoenix default. Related to https://github.com/4lejandrito/next-plausible/issues/67	2022-10-19 09:41:05 -03:00
Vinicius Brasil	9220d0034d	OpenTelemetry (OTEL) Implementation (#2317 ) This pull request improves the current OpenTelemetry implementation. Currently only 1% of the spans are sent, due to the high volume of ingestion requests to /api/event. I enabled the 1% sampling to /api/event only, recording 100% of the other traces.	2022-10-18 12:11:30 -03:00
Adam Rutkowski	e3ca3b32db	Include tests for Captcha success/failure scenarios (#2344 ) * Include tests for Captcha success/failure scenarios * DRY	2022-10-17 08:16:59 -03:00
RobertJoonas	c0da024b23	Remove static tracker files (#2116 ) * remove tracker files from git index * generate tracker files on npm test * generate tracker files for elixir tests/dev/CI * update tracker/package-lock.json * exclude npm run deploy from mix test + some docs	2022-10-11 12:19:28 +02:00
Uku Taht	e849e03058	Fix favicons (#2257 )	2022-09-23 07:22:43 -03:00
Adam Rutkowski	3f7c1ce549	Aggregate DBConnection.ConnectionError in Sentry (#2260 )	2022-09-22 12:24:54 -03:00
Uku Taht	e373799b01	Move fun_with_flags config from runtime.exs to config.exs Getting this error when running the release: ERROR! the application :fun_with_flags has a different value set for key :persistence during runtime compared to compile time. Since this application environment entry was marked as compile time, this difference can lead to different behaviour than expected: * Compile time value was not set * Runtime value was set to: [adapter: FunWithFlags.Store.Persistent.Ecto, repo: Plausible.Repo]	2022-09-21 13:35:05 +03:00
Uku Taht	3d54b88f0a	Make Finch pools lighter for self-hosting (#2250 ) * Make Finch pools lighter * Use standard http1 Finch pools	2022-09-21 12:51:07 +03:00
Vinicius Brasil	d31db86b49	List all Google Analytics views during import (#2184 ) * List all Google Analytics views during import This commit fixes a bug where different Google Analytics views with the same name and URI were not shown. This was caused because GA views were stored as a map, that naturally doesn't support duplicate keys. This change updates the GA views list to display view IDs, making it clearer to know what is being imported. The dropdown is now grouped by website URL. * Put Google Analytics API URLs in app env * Add controller test to GA view list	2022-09-08 21:02:17 +03:00
Vinicius Brasil	4d20c7ce70	Catch Google Search Console grant error (#2101 ) * Remove invalid Jason.decode argument Co-authored-by: Robert Joonas <robertjoonas16@gmail.com> * Add custom message to Google invalid grant error Co-authored-by: Robert Joonas <robertjoonas16@gmail.com> * Test invalid_grant while refreshing Google token Co-authored-by: Robert Joonas <robertjoonas16@gmail.com> Co-authored-by: Robert Joonas <robertjoonas16@gmail.com>	2022-08-16 10:55:46 +03:00
Uku Taht	5c83ea77de	Remove cache reporting to logs	2022-08-12 11:05:47 +03:00
Vinicius Brasil	4b9032d822	Google Analytics Import Refactor (#2046 ) * Create separate module for GA HTTP requests * Fetch GA data entirely instead of monthly * Add buffering to GA imports * Change positional args to maps when serializing from GA * Create Google Analytics VCR tests	2022-08-03 12:25:50 +03:00
Vinicius Brasil	b415ebe776	Fix geolocation subdivision pattern matching (#2063 ) * Fix geolocation subdivision pattern matching This commit fixes a bug where regions were not being saved. This was caused because Geolix response was returning an additional `:geolocation` map key. It also adds a test case for this. Closes #2033 * Add geolocation database to .gitignore	2022-07-28 15:59:39 +03:00
Weslei Juan Novaes Pereira	0324d03da9	fix: Oban pruner max_age config (#2032 )	2022-07-22 12:00:00 +03:00
Uku Taht	6fbb0a24a8	Do not log Sentry.CrashError to Sentry Stops recursive error logging to sentry	2022-07-14 03:03:59 +03:00
Adam Rutkowski	3b82ba0e25	Upgrade to Geolix 2.0 (#1997 ) * Upgrade geolix * Remove geolix pool config * Save unnecessary Task.async_stream roundtrip Normally the Geolix API accepts `:where` keyword option that designates the database to look up. In case no parameter is supplied, it'll spawn a parallel map over all databases available. In this case we have only one DB anyway, so there is no need for the extra instrumentation. * Follow up on direct :geolocation lookups	2022-07-12 11:39:04 +03:00
Manu S Ajith	81f18ff0a5	Setup promex (#1999 ) * Setup promex Signed-off-by: Manu S Ajith <neo@codingarena.in> * Cleanup promex config file Signed-off-by: Manu S Ajith <neo@codingarena.in>	2022-07-11 15:00:04 +03:00
Uku Taht	2b8e3ea62a	Use finch in sentry client (#1996 ) * Introduce Finch for Sentry integration * Make sure the DummyAgent can be started * No need to sanitize the dsn, finch takes care of that * Simplify the dummy child spec * Annotate redirects clause * Make use of new `get_int_from_path_or_env` * Actually use finch in Sentry config * Configure `excluded_domains` correctly for Sentry The way sentry is configured currently, when we get an HTTP error it will be logged twice - once from Sentry.PlugCapture and once from Sentry.LoggerBackend. The logger backend module does the right thing by default but for some reason we've been overriding the config parameter that by default stops double-counting errors. This commit returns to the default configuration which is better. * Default to 15s timeout * Attempt to send twice at most * Warn in sentry client * Use warn level in sentry client Co-authored-by: Adam Rutkowski <hq@mtod.org>	2022-07-08 11:14:52 +03:00
Uku Taht	ac89d60808	Add sample rate to sentry config	2022-07-07 11:50:47 +03:00
Uku Taht	0553fa041b	Parse geolix pool config as integers	2022-07-07 11:38:18 +03:00
Manu S Ajith	606c162138	Add option to configure sentry pool size, and geolix worker size (#1992 ) Signed-off-by: Manu S Ajith <neo@codingarena.in>	2022-07-07 10:15:13 +03:00
Adam Rutkowski	45cc1d27a1	Fix dev environment startup errors (#1990 ) * Include gelocation DB download in the development workflow * Make sure `tls_certificate_check` is started ASAP This prevents `:application_either_not_started_or_not_ready` errors on application startup. * Mark Makefile targets as PHONY By default Make assumes the targets are files, in this case none of them are.	2022-07-06 17:47:31 +03:00
Uku Taht	910efd849c	Revert config changes	2022-05-27 15:52:31 +03:00
Uku Taht	b667d65d52	Move ARG to running container instead of build container	2022-05-27 15:24:11 +03:00
Uku Taht	d23f7d5358	Disable sentry if not configured	2022-05-27 11:00:39 +03:00
Uku Taht	da93f2aa6e	Remove dead code	2022-05-27 10:52:58 +03:00
Uku Taht	18e2711556	Package new db-ip library in the git repo	2022-05-04 11:07:52 +03:00
Vignesh Joglekar	b7b69c6f62	Adds "invite_only" disable_registration config option (#1841 ) * Adds tri-state disable_registration config * Formatting * Changes variable back to atom * Changelog * Uses atoms correctly :/ * Swaps to a more fitting value * Formatting	2022-05-03 10:44:17 +03:00
Uku Taht	e23cbfcb46	Only nodes that run cron should be elected as leader	2022-04-28 16:57:56 +03:00
Uku Taht	f18a211dcc	Ingest throughput improvement test setup (#1867 ) * Add OTEL and test Cachex for sessions * Move load test * Start apps in the appropriate order	2022-04-28 12:24:29 +03:00
Uku Taht	7c1d64458e	Add fun with flags library	2022-04-21 10:54:08 +03:00
Uku Taht	680bd98bd1	Fix logic	2022-04-13 10:40:51 +03:00
Uku Taht	a282478838	Update cron config	2022-04-11 20:20:05 +03:00
Uku Taht	83c407c016	Upgrade Oban & configure Stager plugin (#1822 )	2022-04-08 11:05:21 +03:00
Uku Taht	06b165eb6d	Run GA import in monthly batches	2022-04-08 08:43:07 +03:00
Uku Taht	ae78444830	Add notice about feature preview	2022-03-25 11:22:02 +02:00
Uku Taht	4cc4e0d61b	Add config flag for import testers	2022-03-25 10:46:43 +02:00
Uku Taht	a9879de1f4	Remove more OTEL stuff	2022-03-21 13:05:34 +02:00
Uku Taht	e27734ed79	[Continued] Google Analytics import (#1753 ) * Add has_imported_stats boolean to Site * Add Google Analytics import panel to general settings * Get GA profiles to display in import settings panel * Add import_from_google method as entrypoint to import data * Add imported_visitors table * Remove conflicting code from migration * Import visitors data into clickhouse database * Pass another dataset to main graph for rendering in red This adds another entry to the JSON data returned via the main graph API called `imported_plot`, which is similar to `plot` in form but will be completed with previously imported data. Currently it simply returns the values from `plot` / 2. The data is rendered in the main graph in red without fill, and without an indicator for the present. Rationale: imported data will not continue to grow so there is no projection forward, only backwards. * Hook imported GA data to dashboard timeseries plot * Add settings option to forget imported data * Import sources from google analytics * Merge imported sources when queried * Merge imported source data native data when querying sources * Start converting metrics to atoms so they can be subqueried This changes "visitors" and in some places "sources" to atoms. This does not change the behaviour of the functions - the tests all pass unchanged following this commit. This is necessary as joining subqueries requires that the keys in `select` statements be atoms and not strings. * Convery GA (direct) source to empty string * Import utm campaign and utm medium from GA * format * Import all data types from GA into new tables * Handle large amounts of more data more safely * Fix some mistakes in tables * Make GA requests in chunks of 5 queries * Only display imported timeseries when there is no filter * Correctly show last 30 minutes timeseries when 'realtime' * Add with_imported key to Query struct * Account for injected :is_not filter on sources from dashboard * Also add tentative imported_utm_sources table This needs a bit more work on the google import side, as GA do not report sources and utm sources as distinct things. * Return imported data to dashboard for rest of Sources panel This extends the merge_imported function definition for sources to utm_sources, utm_mediums and utm_campaigns too. This appears to be working on the DB side but something is incomplete on the client side. * Clear imported stats from all tables when requested * Merge entry pages and exit pages from imported data into unfiltered dashboard view This requires converting the `"visits"` and `"visit_duration"` metrics to atoms so that they can be used in ecto subqueries. * Display imported devices, browsers and OSs on dashboard * Display imported country data on dashboard * Add more metrics to entries/exits for modals * make sure data is returned via API with correct keys * Import regions and cities from GA * Capitalize device upon import to match native data * Leave query limits/offsets until after possibly joining with imported data * Also import timeOnPage and pageviews for pages from GA * imported_countries -> imported_locations * Get timeOnPage and pageviews for pages from GA These are needed for the pages modal, and for calculating exit rates for exit pages. * Add indicator to dashboard when imported data is being used * Don't show imported data as separately line on main graph * "bounce_rate" -> :bounce_rate, so it works in subqueries * Drop imported browser and OS versions These are not needed. * Toggle displaying imported data by clicking indicator * Parse referrers with RefInspector - Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual referrer host + path, whereas 'ga:source' includes utm_mediums and other values when relevant. - 'ga:fullReferror' does however include search engine names directly, so they are manually checked for as RefInspector won't pick up on these. * Keep imported data indicator on dashboard and strikethrough when hidden * Add unlink google button to import panel * Rename some GA browsers and OSes to plausible versions * Get main top pages and exit pages panels working correctly with imported data * mix format * Fetch time_on_pages for imported data when needed * entry pages need to fetch bounces from GA * "sample_percent" -> :sample_percent as only atoms can be used in subqueries * Calculate bounce_rate for joined native and imported data for top pages modal * Flip some query bindings around to be less misleading * Fixup entry page modal visit durations * mix format * Fetch bounces and visit_duration for sources from GA * add more source metrics used for data in modals * Make sources modals display correct values * imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration * Merge imported data into aggregate stats * Reformat top graph side icons * Ensure sample_percent is yielded from aggregate data * filter event_props should be strings * Hide imported data from frontend when using filter * Fix existing tests * fix tests * Fix imported indicator appearing when filtering * comma needed, lost when rebasing * Import utm_terms and utm_content from GA * Merge imported utm_term and utm_content * Rename imported Countries data as Locations * Set imported city schema field to int * Remove utm_terms and utm_content when clearing imported * Clean locations import from Google Analytics - Country and region should be set to "" when GA provides "(not set)" - City should be set to 0 for "unknown", as we cannot reliably import city data from GA. * Display imported region and city in dashboard * os -> operating_system in some parts of code The inconsistency of using os in some places and operating_system in others causes trouble with subqueries and joins for the native and imported data, which would require additional logic to account for. The simplest solution is the just use a consistent word for all uses. This doesn't make any user-facing or database changes. * to_atom -> to_existing_atom * format * "events" metric -> :events * ignore imported data when "events" in metrics * update "bounce_rate" * atomise some more metrics from new city and region api * atomise some more metrics for email handlers * "conversion_rate" -> :conversion_rate during csv export * Move imported data stats code to own module * Move imported timeseries function to Stats.Imported * Use Timex.parse to import dates from GA * has_imported_stats -> imported_source * "time_on_page" -> :time_on_page * Convert imported GA data to UTC * Clean up GA request code a bit There was some weird logic here with two separate lists that really ought to be together, so this merges those. * Fail sooner if GA timezone can't be identified * Link imported tables to site by id * imported_utm_content -> imported_utm_contents * Imported GA from all of time * Reorganise GA data fetch logic - Fetch data from the start of time (2005) - Check whether no data was fetched, and if so, inform user and don't consider data to be imported. * Clarify removal of "visits" data when it isn't in metrics * Apply location filters from API This makes it consistent with the sources etc which filter out 'Direct / None' on the API side. These filters are used by both the native and imported data handling code, which would otherwise both duplicate the filters in their `where` clauses. * Do not use changeset for setting site.imported_source * Add all metrics to all dimensions * Run GA import in the background * Send email when GA import completes * Add handler to insert imported data into tests and imported_browsers_factory * Add remaining import data test factories * Add imported location data to test * Test main graph with imported data * Add imported data to operating systems tests * Add imported data to pages tests * Add imported data to entry pages tests * Add imported data to exit pages tests * Add imported data to devices tests * Add imported data to sources tests * Add imported data to UTM tests * Add new test module for the data import step * Test import of sources GA data * Test import of utm_mediums GA data * Test import of utm_campaigns GA data * Add tests for UTM terms * Add tests for UTM contents * Add test for importing pages and entry pages data from GA * Add test for importing exit page data * Fix module file name typo * Add test for importing location data from GA * Add test for importing devices data from GA * Add test for importing browsers data from GA * Add test for importing OS data from GA * Paginate GA requests to download all data * Bump clickhouse_ecto version * Move RefInspector wrapper function into module * Drop timezone transform on import * Order imported by side_id then date * More strings -> atoms Also changes a conditional to be a bit nicer * Remove parallelisation of data import * Split sources and UTM sources from fetched GA data GA has only a "source" dimension and no "UTM source" dimension. Instead it returns these combined. The logic herein to tease these apart is: 1. "(direct)" -> it's a direct source 2. if the source is a domain -> it's a source 3. "google" -> it's from adwords; let's make this a UTM source "adwords" 4. else -> just a UTM source * Keep prop names in queries as strings * fix typo * Fix import * Insert data to clickhouse in batches * Fix link when removing imported data * Merge source tables * Import hostname as well as pathname * Record start and end time of imported data * Track import progress * Fix month interval with imported data * Do not JOIN when imported date range has no overlap * Fix time on page using exits Co-authored-by: mcol <mcol@posteo.net>	2022-03-10 15:04:59 -06:00

1 2 3 4

179 Commits