Commit Graph

193 Commits

Author SHA1 Message Date
hq1
71ef0bd043
Clean up after V2 migration (#2868)
* Clean up after V2 migration

This PR removes all the leftovers and alternative code
branching after v2 migration.

The self-hosted release is being drafted at:

https://github.com/plausible/hosting/issues/68

Refs:
  - https://github.com/plausible/analytics/pull/2865
  - https://github.com/plausible/analytics/pull/2825
  - https://github.com/plausible/analytics/pull/2780

* !fixup
2023-04-24 12:17:57 +02:00
ruslandoga
adcce15632
Make self-hosted data migration easier (#2865)
* default to v2

* allow N defaults in data migration prompt and custom messages

* join domains lookup

* remove duplicate test runs from ci (both are v2)
2023-04-21 09:33:57 +02:00
hq1
825a754976
Make ingest threshold configurable (#2845)
* Make ingest threshold configurable

* Credo
2023-04-13 13:52:54 +02:00
hq1
b9c2110472
V2 migration tweaks for self hosted release (#2825)
* Get rid of PASS_V2_SCHEMA_MIGRATION

* Use in-memory domain lookup + regular table settings

* Remove faulty date arithmetic + prev part calculation

* Set V2_MIGRATION_DONE in Mix.env == :dev

* Mute credo
2023-04-13 12:09:39 +02:00
hq1
154ce3a44c
Split clickhouse repos - making the main one read only (#2826)
* Split clickhouse pools into readonly/import deletions

* Remove CRM site transfers

* Initialize ImportDeletionRepo

* Put ImportDeletionRepo to use
2023-04-06 12:45:36 +02:00
ruslandoga
b646652071
add transport opts to clickhouse repos (#2783) 2023-04-05 11:58:55 +02:00
hq1
1d01328287
Allow domain change (#2803)
* Migration (PR: https://github.com/plausible/analytics/pull/2802)

* Implement Site.Domain interface allowing change and expiry

* Fixup seeds so they work with V2_MIGRATION_DONE=1

* Update Sites.Cache so it's capable of multi-keyed lookups

* Implement worker handling domain change expiration

* Implement domain change UI

* Implement transition period for public APIs

* Exclude v2 tests in primary test run

* Update lib/plausible_web/controllers/site_controller.ex

Co-authored-by: Vini Brasil <vini@hey.com>

* Update lib/plausible_web/controllers/site_controller.ex

Co-authored-by: Vini Brasil <vini@hey.com>

* Update moduledoc

* Update changelog

* Remove remnant from previous implementation attempt

* !fixup

* !fixup

* Implement domain change via Sites API

cc @ukutaht

* Update CHANGELOG

* Credo

* !fixup commit missing tests

* Allow continuous domain change within the same site

---------

Co-authored-by: Vini Brasil <vini@hey.com>
2023-04-04 10:55:12 +02:00
hq1
d2f2c69387
Conditionally support switching between v1 and v2 clickhouse schemas (#2780)
* Remove ClickhouseSetup module

This has been an implicit point of contact to many
tests. From now on the goal is for each test to maintain
its own, isolated setup so that no accidental clashes
and implicit assumptions are relied upon.

* Implement v2 schema check

An environment variable V2_MIGRATION_DONE acts like
a feature flag, switching plausible from using old events/sessions
schemas to v2 schemas introduced by NumericIDs migration.

* Run both test suites sequentially

While the code for v1 and v2 schemas must be kept still,
we will from now on run tests against both code paths.
Secondary test run will set V2_MIGRATION_DONE=1 variable,
thus making all `Plausible.v2?()` checks return `true'.

* Remove unused function

This is a remnant from the short period when
we would check for existing events before allowing
creating a new site.

* Update test setups/factories with v2 migration check

* Make GateKeeper return site id along with :allow

* Make Billing module check for v2 schema

* Make ingestion aware of v2 schema

* Disable site transfers for when v2 is live

In a separate changeset we will implement simplified
site transfer for when v2 migration is complete.
The new transfer will only rename the site domain in postgres
and keep track of the original site prior to the transfer
so we keep an ingestion grace period until the customers
redeploy their scripting.

* Make Stats base queries aware of v2 schema switch

* Update breakdown with v2 conditionals

* Update pageview local start with v2 check

* Update current visitoris with v2 check

* Update stats controller with v2 checks

* Update external controller with v2 checks

* Update remaining tests with proper fixtures

* Rewrite redundant assignment

* Remove unused alias

* Mute credo, this is not the right time

* Add test_helper prompt

* Fetch priv dir so it works with a release

* Fetch distinct partitions only

* Don't limit inspect output for partitions

* Ensure SQL is printed to IO

* Remove redundant domain fixture
2023-03-27 13:52:42 +02:00
Adam
6d79ca5093
Switch to new clickhouse adapter (ch/chto) (#2733)
* another clickhouse adapter

* don't restore stats_removal.ex

* fix events main-graph error (#2746)

* update ch, chto

* update chto again (#2759)

* Stop treating page filter as an entry_page filter (#2752)

* remove dead code

* stop treating page filter as entry page filter in breakdown queries

* stop treating page filter as entry page filter in aggregate queries

* stop treating page filter as entry page filter in timeseries queries

* mix format

* update changelog

* break code down to smaller functions to keep credo happy

* remove unused functions

* make CSV export return only conversions with goal filter (#2760)

* make CSV export return only conversions with goal filter

* update changelog

* update elixir version in mix.exs (#2742)

* revert admin.ex changes (#2776)

---------

Co-authored-by: ruslandoga <67764432+ruslandoga@users.noreply.github.com>
Co-authored-by: ruslandoga <rusl@n-do.ga>
Co-authored-by: RobertJoonas <56999674+RobertJoonas@users.noreply.github.com>
2023-03-21 09:55:59 +01:00
Adam
4b21b4e6d0
Remove Firewall plug; redundant at infra level (#2730)
* Remove Firewall plug; redundant at infra level

* Update changelog
2023-03-08 09:07:15 +01:00
Adam
8f86036e57
Keep track of native stats start timestamp when retrieving data (#2715)
* Stats boundary/PoC?

* Delete stats removal

* Drop events check on site creation

* Update seeds script

* Use native_stats_start_at

* Don't rely on native stats pointer in imported stats queries

* Reset site

* Export reset/1

* Remove unnecessary inserted_at settings

* Update seeds

* Remove unnecessary inserted_at setting
2023-03-01 13:11:31 +01:00
Adam Rutkowski
867dad6da7
Implement ingest counters (#2693)
* Clickhouse migration: add ingest_counters table

* Configure ingest counters per MIX_ENV

* Emit telemetry for ingest events with rich metadata

* Allow building Request.t() with fake now() - for testing purposes

* Use clickhousex branch where session_id is assigned to each connection

* Add helper function for getting site id via cache

* Add Ecto schema for `ingest_counters` table

* Implement metrics buffer

* Implement buffering handler for `Plausible.Ingestion.Event` telemetry

* Implement periodic metrics aggregation

* Update counters docs

* Add toStartOfMinute() to ordering key

* Reset the sync connection state in `after` clause

* Flush counters on app termination

* Use separate Repo with async settings enabled at config level

* Switch to clickhouse_settings repo root config key

* Add AsyncInsertRepo module
2023-02-23 14:34:24 +01:00
Adam Rutkowski
8f85b110aa
Split Clickhouse pools into Read-Only and Read/Write (dedicated to writes) (#2661)
* Configure ingest repo access/pool size

If I'm not mistaken 3 is a sane default, the only
inserts we're doing are:

  - session buffer dump
  - events buffer dump
  - GA import dump

And all are serializable within their scopes?

* Add IngestRepo

* Start IngestRepo

* Use IngestRepo for inserts

* Annotate ClickhouseRepo as read_only

So no insert* functions are expanded

* Update moduledoc

* rename alias

* Fix default env var value so it can be casted

* Use IngestRepo for migrations

* Set default ingest pool size from 3 to 5

in case conns are restarting or else...

* Ensure all Repo prometheus metrics are collected
2023-02-12 17:50:57 +01:00
ruslandoga
7b2f4c99ee
Support alternative mailing services (Mailgun, Mandrill, Sendgrid) (#2649)
* more bamboo adapters

* add changelog

* add tests
2023-02-07 12:56:47 +01:00
Cenk Kücük
f6ee17a400
Use hostname for server_name (#2642) 2023-02-03 08:51:32 -03:00
Adam Rutkowski
8f9f032968
Delay stats deletions (#2632)
* Implement Site removal transaction

* Implement Stats removal Oban worker

* Configure site removal queue

* Call Site.Removal.run() instead of Purge.delete_site!

* Test site/stats removal

* Remove FIXME - filed a ticket

* Over-communicate lenghty deletion process to the users
2023-01-31 16:11:04 -03:00
Adam Rutkowski
ad12e1ef31
Show user feedback form on server errors (#2617)
* Move Endpoint errors setup to common config

* Implement naive Sentry link resolver

* Implement error report e-mail

* Delete static sentry script

* Implement user feedback form on server errors

* Re-arrange pipe

* Use Sentry.Config.dsn() where applicable

* Fix typo

* Use Map.replace/3
2023-01-25 15:15:41 +01:00
ruslandoga
166748dcf2
Replace Geolix with Locus (#2362)
This PR replaces geolix with locus to simplify self-hosted setup. locus can auto-update maxmind dbs which are recommended for self-hosters if they want city-level geolocation. locus is also a bit faster.

This PR also uses a test mmdb file from https://github.com/maxmind/MaxMind-DB for e2e geolocation tests without stubs.
2023-01-17 12:05:09 -03:00
Uku Taht
1785653b1e
Ignore unknown countries (#2556)
* Ignore XX and T1 countries

* Add fallback if country_code=nil

* Lookup city overrides directly in CityOverrides module

* Changelog

* Add empty moduledoc

* Remove redundant comment
2023-01-03 10:35:23 -03:00
Adam Rutkowski
5de43b758d
Run tests in async mode where applicable (#2542)
* Set pg pool size for MIX_ENV=test

* Include slow tests in CI run

* Exclude slow tests by default

* Mark tests slow/async where applicable

* Restructure captcha mocks

* Revert async where env is relied upon

* Add --max-failures=1 to CI run

* Set warnings as errors

* Disable async where various mocks are used

* Revert "Disable async where various mocks are used"

This reverts commit 2446b72a29.

* Disable async for test using vcr
2022-12-26 10:20:29 -03:00
ruslandoga
138e7c06d6
add BUILD_METADATA fallback when parsing (#2503)
### Changes

This PR adds a fallback to empty build metadata when BUILD_METADATA
contains invalid JSON.

Example `warning` log for `BUILD_METADATA={...}`:

```
20:57:57.872 [warning] failed to parse $BUILD_METADATA, reason: ** (Jason.DecodeError) unexpected byte at position 1: 0x2E (".")
```

Fixes https://github.com/plausible/analytics/issues/2491

### Tests
- [x] This PR does not require tests

### Changelog
- [ ] Entry has been added to changelog

### Documentation
- [x] This change does not need a documentation update

### Dark mode
- [x] This PR does not change the UI
2022-12-05 17:59:16 +02:00
Adam Rutkowski
356575ef78
Gatekeep ingestion pipeline (#2472)
* Update Sites.Cache

So it's now capable of refreshing most recent sites.
Refreshing a single site is no longer wanted.

* Introduce Warmer.RecentlyUpdated

This is Sites Cache warmer that runs only for
most recently updated sites every 30s.

* Validate Request creation early

* Rename RateLimiter to GateKeeper and introduce detailed policies

* Update events API tests - a provisioned site is now required

* Update events ingestion tests

* Make limits visible in CRM Sites index

* Hard-deprecate DOMAIN_BLACKLIST

* Remove unnecessary clause

* Fix typo

* Explicitly delegate Warmer.All

* GateKeeper.allwoance => GateKeeper.check

* Instrument Sites.Cache measurments

* Update send_pageview task to output response headers

* Instrument ingestion pipeline

* Credo

* Make event telemetry test a sync case

* Simplify Request.uri/hostname handling

* Use embedded schema, apply action and rely on get_field
2022-11-28 15:50:55 +01:00
Adam Rutkowski
457a558471
Kick off sites by domain cache implementation (#2434)
* Implement sites by domain caching interface + warmer

* Add test

* Implement hit rate interface

* Add moduledocs

* Fix up typespec

* s/warmer/warmer_fn

* Extract measure_duration/2

* Fix up typespec

* Log errors and return nil on cache internal errors

* Fix up non-existing cache test

* Retrieve specific db columns when pre-filling the cache

* Reduce the subset of fields retrieved from the DB

See 63f3c6233d (r89871536)
2022-11-16 10:06:23 +01:00
ruslandoga
0b7870dc4d
improve first launch experience for self-hosters (#2357)
* first launch

* dynamic children, wait for repo

* remove wait_for_repo and app env manipulations

* don't mention free trial in self-hosted pages

* add changelog

* assigns[:is_selfhost] -> @is_selfhost

* better changelog wording

* rm admin_user, admin_email, admin_pwd from app env

* rm DISABLE_AUTH

* redirect / to /login when not authenticated

* remove TODO

* Update lib/plausible_web/controllers/page_controller.ex

Co-authored-by: Uku Taht <Uku.taht@gmail.com>

* format

Co-authored-by: Uku Taht <Uku.taht@gmail.com>
2022-11-10 12:42:22 +01:00
Adam Rutkowski
101e5a68b5
Allow Site DB lookups during ingestion phase (#2408)
* Implement FF-driven DB lookup for sites during ingestion

We like to see the impact of doing a simple postgres lookup on each
ingestion event. The percentage-based feature flag `:ingestion_pg_lookup`
must be set in order for lookups to be executed.

* Fix resolving Cachex stats metrics

* Enable PromEx on dev env
2022-11-01 17:11:50 +02:00
Vinicius Brasil
b898642373
Double maximum header length (#2353)
This commit makes the permitted header length more permissive, 8,192
bytes, doubling the Phoenix default.

Related to https://github.com/4lejandrito/next-plausible/issues/67
2022-10-19 09:41:05 -03:00
Vinicius Brasil
9220d0034d
OpenTelemetry (OTEL) Implementation (#2317)
This pull request improves the current OpenTelemetry implementation. Currently only 1% of the spans are sent, due to the high volume of ingestion requests to /api/event. I enabled the 1% sampling to /api/event only, recording 100% of the other traces.
2022-10-18 12:11:30 -03:00
Adam Rutkowski
e3ca3b32db
Include tests for Captcha success/failure scenarios (#2344)
* Include tests for Captcha success/failure scenarios

* DRY
2022-10-17 08:16:59 -03:00
RobertJoonas
c0da024b23
Remove static tracker files (#2116)
* remove tracker files from git index

* generate tracker files on npm test

* generate tracker files for elixir tests/dev/CI

* update tracker/package-lock.json

* exclude npm run deploy from mix test + some docs
2022-10-11 12:19:28 +02:00
Uku Taht
e849e03058
Fix favicons (#2257) 2022-09-23 07:22:43 -03:00
Adam Rutkowski
3f7c1ce549
Aggregate DBConnection.ConnectionError in Sentry (#2260) 2022-09-22 12:24:54 -03:00
Uku Taht
e373799b01 Move fun_with_flags config from runtime.exs to config.exs
Getting this error when running the release:

ERROR! the application :fun_with_flags has a different value set for key :persistence during runtime compared to compile time. Since this application environment entry was marked as compile time, this difference can lead to different behaviour than expected:

  * Compile time value was not set
  * Runtime value was set to: [adapter: FunWithFlags.Store.Persistent.Ecto, repo: Plausible.Repo]
2022-09-21 13:35:05 +03:00
Uku Taht
3d54b88f0a
Make Finch pools lighter for self-hosting (#2250)
* Make Finch pools lighter

* Use standard http1 Finch pools
2022-09-21 12:51:07 +03:00
Vinicius Brasil
d31db86b49
List all Google Analytics views during import (#2184)
* List all Google Analytics views during import

This commit fixes a bug where different Google Analytics views with the
same name and URI were not shown. This was caused because GA views were
stored as a map, that naturally doesn't support duplicate keys.

This change updates the GA views list to display view IDs, making it
clearer to know what is being imported. The dropdown is now grouped by
website URL.

* Put Google Analytics API URLs in app env

* Add controller test to GA view list
2022-09-08 21:02:17 +03:00
Vinicius Brasil
4d20c7ce70
Catch Google Search Console grant error (#2101)
* Remove invalid Jason.decode argument

Co-authored-by: Robert Joonas <robertjoonas16@gmail.com>

* Add custom message to Google invalid grant error

Co-authored-by: Robert Joonas <robertjoonas16@gmail.com>

* Test invalid_grant while refreshing Google token

Co-authored-by: Robert Joonas <robertjoonas16@gmail.com>

Co-authored-by: Robert Joonas <robertjoonas16@gmail.com>
2022-08-16 10:55:46 +03:00
Uku Taht
5c83ea77de Remove cache reporting to logs 2022-08-12 11:05:47 +03:00
Vinicius Brasil
4b9032d822
Google Analytics Import Refactor (#2046)
* Create separate module for GA HTTP requests

* Fetch GA data entirely instead of monthly

* Add buffering to GA imports

* Change positional args to maps when serializing from GA

* Create Google Analytics VCR tests
2022-08-03 12:25:50 +03:00
Vinicius Brasil
b415ebe776
Fix geolocation subdivision pattern matching (#2063)
* Fix geolocation subdivision pattern matching

This commit fixes a bug where regions were not being saved. This was
caused because Geolix response was returning an additional
`:geolocation` map key. It also adds a test case for this.

Closes #2033

* Add geolocation database to .gitignore
2022-07-28 15:59:39 +03:00
Weslei Juan Novaes Pereira
0324d03da9
fix: Oban pruner max_age config (#2032) 2022-07-22 12:00:00 +03:00
Uku Taht
6fbb0a24a8 Do not log Sentry.CrashError to Sentry
Stops recursive error logging to sentry
2022-07-14 03:03:59 +03:00
Adam Rutkowski
3b82ba0e25
Upgrade to Geolix 2.0 (#1997)
* Upgrade geolix

* Remove geolix pool config

* Save unnecessary Task.async_stream roundtrip

Normally the Geolix API accepts `:where` keyword option that designates
the database to look up. In case no parameter is supplied, it'll spawn
a parallel map over all databases available. In this case we have only
one DB anyway, so there is no need for the extra instrumentation.

* Follow up on direct :geolocation lookups
2022-07-12 11:39:04 +03:00
Manu S Ajith
81f18ff0a5
Setup promex (#1999)
* Setup promex

Signed-off-by: Manu S Ajith <neo@codingarena.in>

* Cleanup promex config file

Signed-off-by: Manu S Ajith <neo@codingarena.in>
2022-07-11 15:00:04 +03:00
Uku Taht
2b8e3ea62a
Use finch in sentry client (#1996)
* Introduce Finch for Sentry integration

* Make sure the DummyAgent can be started

* No need to sanitize the dsn, finch takes care of that

* Simplify the dummy child spec

* Annotate redirects clause

* Make use of new `get_int_from_path_or_env`

* Actually use finch in Sentry config

* Configure `excluded_domains` correctly for Sentry

The way sentry is configured currently, when we get an HTTP error it
will be logged twice - once from Sentry.PlugCapture and once from
Sentry.LoggerBackend. The logger backend module does the right thing
by default but for some reason we've been overriding the config
parameter that by default stops double-counting errors. This commit
returns to the default configuration which is better.

* Default to 15s timeout

* Attempt to send twice at most

* Warn in sentry client

* Use warn level in sentry client

Co-authored-by: Adam Rutkowski <hq@mtod.org>
2022-07-08 11:14:52 +03:00
Uku Taht
ac89d60808 Add sample rate to sentry config 2022-07-07 11:50:47 +03:00
Uku Taht
0553fa041b Parse geolix pool config as integers 2022-07-07 11:38:18 +03:00
Manu S Ajith
606c162138
Add option to configure sentry pool size, and geolix worker size (#1992)
Signed-off-by: Manu S Ajith <neo@codingarena.in>
2022-07-07 10:15:13 +03:00
Adam Rutkowski
45cc1d27a1
Fix dev environment startup errors (#1990)
* Include gelocation DB download in the development workflow

* Make sure `tls_certificate_check` is started ASAP

This prevents `:application_either_not_started_or_not_ready` errors
on application startup.

* Mark Makefile targets as PHONY

By default Make assumes the targets are files,
in this case none of them are.
2022-07-06 17:47:31 +03:00
Uku Taht
910efd849c Revert config changes 2022-05-27 15:52:31 +03:00
Uku Taht
b667d65d52 Move ARG to running container instead of build container 2022-05-27 15:24:11 +03:00
Uku Taht
d23f7d5358 Disable sentry if not configured 2022-05-27 11:00:39 +03:00