* Add zxcvbn dependency
* Change password length range requirement from 6-64 to 12-128
* Reimplement register form in LV
* Implement server-side check for password strength
* Add rudimentary strength meter
* Make password input with strength a separate component and improve it
* Fix existing tests to provide strong enough password
* Apply formatting
* Replace existing registration form with new one
* Hide built-in label in `.input` component when none provided
* Crop password to first 32 chars for analysis by zxcvbn
* Add tests for new form components
* Integrate hCaptcha into LV
* Fix existing AuthController tests
* Add tests for Live.RegisterForm
* Hide strength meter when password input is empty
* Randomize client IP in headers during tests to avoid hitting rate limit
* Apply auxilliary formatting fixes to AuthController
* Integrate registration from invitation into LV registration logic
* Fix existing password set and reset forms
* Make `password_length_hint` component more customizable
* Optimize `Auth.User.set_password/2`
* Remove unnecessary attribute from registration form
* Move password set and reset forms to LV
* Add tests for SetPasswordForm LV component
* Add tests for password checks in `Auth.User`
* Document code a bit
* Implement simpler approach to hCaptcha integration
* Update CHANGELOG.md
* Improve consistency of color scheme
* Introduce debounce across all text inputs in registration and password forms
* Fix email input background in register form
* Ensure only single error is rendered for empty password confirmation case
* Remove `/password` form entirely in favor of preferred password reset
* Remove unnecessary `router` option from `live_render` calls
* Make expensive assigns in LV with `assign_new` (h/t @aerosol)
* Accept passwords longer than 32 bytes uniformly as very strong
* Avoid displaying blank error side by side with weak password error
* Make register actions handle errors gracefully
* Render only a single piece of feedback to reduce noise
* Make register and password reset forms pw manager friendly (h/t @cnkk)
* Move registration forms to live routes
* Delete no longer used deadviews
* Adjust registration form in accordance to changes in #3290
* Reintroduce dogfood page path for invitation form from #3290
* Use alternative approach to submitting plausible metrics from LV form
* Rename metrics events and extend tests to account for them
* Add Heroicons dependency
* Add name_of/1 html helper
Currently with Floki there's no way to query for
`[name=foo[some]]` selector
* Update changelog
* Make goal deletion possible with only goal id
* Remove stale goal controllers
* Improve ComboBox component
- make sure the list options are always of the parent input width
- allow passing a suggestion function instead of a module
* Stale fixup
* Update routes
* Use the new goals route in funnel settings
* Use a function in the funnel combo
* Use function in the props combo
* Remove old goals form
* Implement new goal settings
* Update moduledoc
* Fix revenue switch in dark mode
* Connect live socket on goal settings page
* Fixup
* Use Heroicons.trash icon
* Tweak goals search input
* Remove unused alias
* Fix search/button alignment
* Fix backspace icon alignment
* Delegate :superadmin check to get_for_user/3
I'll do props settings separately, it's work in progress
in a branch on top of this one already. cc @ukutaht
* Rename socket assigns
* Fixup to 5c9f58e
* Fixup
* Render ComboBox suggestions asynchronously
This commit:
- prevents redundant work by checking the socket connection
- allows passing no options to the ComboBox component,
so that when combined with the `async` option, the options
are asynchronously initialized post-render
- allows updating the suggestions asynchronously with the
`async` option set to `true` - helpful in case of DB
queries used for suggestions
* Update tests
* Throttle comboboxes
* Update tests
* Dim the search input
* Use debounce=200 in ComboBox component
* Move creatable option to the top
* Ensure there's always a leading slash for goals
* Test pageview goals with leading / missing
* Make the modal scrollable on small viewports
* Add revenue fields to ClickHouse events
This commit adds 4 fields to the ClickHouse events_v2 table:
* `revenue_source_amount` and `revenue_source_currency` store revenue in
the original currency sent during ingestion
* `revenue_reporting_amount` and `revenue_reporting_currency` store
revenue in a common currency to perform calculations, and this
currency is defined by the user when setting up the goal
The type of amount fields is `Nullable(Decimal64(3))`. That covers all
fiat currencies and allows us to store huge amounts. Even though
ClickHouse does not suggest using `Nullable`, this is a good use case,
because otherwise additional work would have to be done to
differentiate missing values from real zeroes.
I ran a benchmark with the data pattern we expect in production, where
we have more missing values than real decimals. I created 100 million
records where 90% of decimals are missing. The difference between the
tables in storage is just 0.4Mb.
* Add revenue parameter to Events API
This commit adds support for sending revenue data in ingestion using the
`revenue` parameter - aliased to `$`.
* Add revenue parameter to mix send_pageview
* Add average and total revenue to breakdown queries
* Add revenue goal option to goal creation
This commit adds a currency field to the goals form. Goals that have a
currency set are now revenue goals, and are cached with sites to later
be used during ingestion.
Co-authored-by: Robert Joonas <robertjoonas16@gmail.com>
* Enable feature flag in tests
---------
Co-authored-by: Robert Joonas <robertjoonas16@gmail.com>
* upgrade phoenix
Co-authored-by: Vini Brasil <vini@hey.com>
* fix a test (flash message)
The flash message in focus.html.eex was not covered by any test. This
commit fixes also fixes that.
* change function name
* remove unnecessary formatter and format
* update CI cache
* fix dialyzer error
---------
Co-authored-by: Vini Brasil <vini@hey.com>
* Remove ClickhouseSetup module
This has been an implicit point of contact to many
tests. From now on the goal is for each test to maintain
its own, isolated setup so that no accidental clashes
and implicit assumptions are relied upon.
* Implement v2 schema check
An environment variable V2_MIGRATION_DONE acts like
a feature flag, switching plausible from using old events/sessions
schemas to v2 schemas introduced by NumericIDs migration.
* Run both test suites sequentially
While the code for v1 and v2 schemas must be kept still,
we will from now on run tests against both code paths.
Secondary test run will set V2_MIGRATION_DONE=1 variable,
thus making all `Plausible.v2?()` checks return `true'.
* Remove unused function
This is a remnant from the short period when
we would check for existing events before allowing
creating a new site.
* Update test setups/factories with v2 migration check
* Make GateKeeper return site id along with :allow
* Make Billing module check for v2 schema
* Make ingestion aware of v2 schema
* Disable site transfers for when v2 is live
In a separate changeset we will implement simplified
site transfer for when v2 migration is complete.
The new transfer will only rename the site domain in postgres
and keep track of the original site prior to the transfer
so we keep an ingestion grace period until the customers
redeploy their scripting.
* Make Stats base queries aware of v2 schema switch
* Update breakdown with v2 conditionals
* Update pageview local start with v2 check
* Update current visitoris with v2 check
* Update stats controller with v2 checks
* Update external controller with v2 checks
* Update remaining tests with proper fixtures
* Rewrite redundant assignment
* Remove unused alias
* Mute credo, this is not the right time
* Add test_helper prompt
* Fetch priv dir so it works with a release
* Fetch distinct partitions only
* Don't limit inspect output for partitions
* Ensure SQL is printed to IO
* Remove redundant domain fixture
* Clickhouse migration: add ingest_counters table
* Configure ingest counters per MIX_ENV
* Emit telemetry for ingest events with rich metadata
* Allow building Request.t() with fake now() - for testing purposes
* Use clickhousex branch where session_id is assigned to each connection
* Add helper function for getting site id via cache
* Add Ecto schema for `ingest_counters` table
* Implement metrics buffer
* Implement buffering handler for `Plausible.Ingestion.Event` telemetry
* Implement periodic metrics aggregation
* Update counters docs
* Add toStartOfMinute() to ordering key
* Reset the sync connection state in `after` clause
* Flush counters on app termination
* Use separate Repo with async settings enabled at config level
* Switch to clickhouse_settings repo root config key
* Add AsyncInsertRepo module
This commit adds city data to imported records from Google Analytics. The
current implementation sets city to 0 because GA does not use the GeoNames
database.
Google Analytics Reporting API uses [Geographical IDs](https://developers.google.com/analytics/devguides/collection/protocol/v1/geoid)
to identify cities and countries. Plausible uses
[GeoNames](https://geonames.org/) and I couldn't find databases corelating the
two.
Fortunately, GA also returns the city name and this commit uses the city name
and the country ISO code to find the Geoname ID. To avoid making expensive ETS
searches, I created another ETS table in the Location library that uses
{country, city} as a key.
Related PR: https://github.com/plausible/location/pull/3
This PR replaces geolix with locus to simplify self-hosted setup. locus can auto-update maxmind dbs which are recommended for self-hosters if they want city-level geolocation. locus is also a bit faster.
This PR also uses a test mmdb file from https://github.com/maxmind/MaxMind-DB for e2e geolocation tests without stubs.
This commit updates mix.exs to resolve bamboo_postmark to our fork. The
fork encodes names with quotes when building e-mails, adding support for
special names with commas and quotes. Related to
plausible/bamboo_postmark#1.
Closes#1885
* Implement sites by domain caching interface + warmer
* Add test
* Implement hit rate interface
* Add moduledocs
* Fix up typespec
* s/warmer/warmer_fn
* Extract measure_duration/2
* Fix up typespec
* Log errors and return nil on cache internal errors
* Fix up non-existing cache test
* Retrieve specific db columns when pre-filling the cache
* Reduce the subset of fields retrieved from the DB
See 63f3c6233d (r89871536)
This pull request improves the current OpenTelemetry implementation. Currently only 1% of the spans are sent, due to the high volume of ingestion requests to /api/event. I enabled the 1% sampling to /api/event only, recording 100% of the other traces.
* Update Timex version from 3.7.7 to 3.7.8
* Generate timezone list from Tzdata
This commit fixes a bug where timezone changes weren't updating the
timezone list displayed when editing or creating a site.
Timezones were being pulled from a static list. This commit changes it
to generate the list from Tzdata, that uses a timezone database with
updated information on time changes. Additionally it adds more timezones
with aliases and links to the list.
Closes#1340
* Use timezone name from browser to recommend timezone
This commit matches the timezone name instead of offset to recommend a
timezone when creating a new site. The JavaScript Intl.DateTimeFormat
API is widely supported according to the link. In any case, if the
timezone fails to match by name, it fallbacks to the offset strategy.
https://caniuse.com/mdn-javascript_builtins_intl_datetimeformat_resolvedoptions_computed_timezoneCloses#904
* Create separate module for GA HTTP requests
* Fetch GA data entirely instead of monthly
* Add buffering to GA imports
* Change positional args to maps when serializing from GA
* Create Google Analytics VCR tests
* Upgrade geolix
* Remove geolix pool config
* Save unnecessary Task.async_stream roundtrip
Normally the Geolix API accepts `:where` keyword option that designates
the database to look up. In case no parameter is supplied, it'll spawn
a parallel map over all databases available. In this case we have only
one DB anyway, so there is no need for the extra instrumentation.
* Follow up on direct :geolocation lookups
* Introduce Finch for Sentry integration
* Make sure the DummyAgent can be started
* No need to sanitize the dsn, finch takes care of that
* Simplify the dummy child spec
* Annotate redirects clause
* Make use of new `get_int_from_path_or_env`
* Actually use finch in Sentry config
* Configure `excluded_domains` correctly for Sentry
The way sentry is configured currently, when we get an HTTP error it
will be logged twice - once from Sentry.PlugCapture and once from
Sentry.LoggerBackend. The logger backend module does the right thing
by default but for some reason we've been overriding the config
parameter that by default stops double-counting errors. This commit
returns to the default configuration which is better.
* Default to 15s timeout
* Attempt to send twice at most
* Warn in sentry client
* Use warn level in sentry client
Co-authored-by: Adam Rutkowski <hq@mtod.org>
* Add has_imported_stats boolean to Site
* Add Google Analytics import panel to general settings
* Get GA profiles to display in import settings panel
* Add import_from_google method as entrypoint to import data
* Add imported_visitors table
* Remove conflicting code from migration
* Import visitors data into clickhouse database
* Pass another dataset to main graph for rendering in red
This adds another entry to the JSON data returned via the main graph API
called `imported_plot`, which is similar to `plot` in form but will be
completed with previously imported data. Currently it simply returns
the values from `plot` / 2. The data is rendered in the main graph in
red without fill, and without an indicator for the present. Rationale:
imported data will not continue to grow so there is no projection
forward, only backwards.
* Hook imported GA data to dashboard timeseries plot
* Add settings option to forget imported data
* Import sources from google analytics
* Merge imported sources when queried
* Merge imported source data native data when querying sources
* Start converting metrics to atoms so they can be subqueried
This changes "visitors" and in some places "sources" to atoms. This does
not change the behaviour of the functions - the tests all pass unchanged
following this commit. This is necessary as joining subqueries requires
that the keys in `select` statements be atoms and not strings.
* Convery GA (direct) source to empty string
* Import utm campaign and utm medium from GA
* format
* Import all data types from GA into new tables
* Handle large amounts of more data more safely
* Fix some mistakes in tables
* Make GA requests in chunks of 5 queries
* Only display imported timeseries when there is no filter
* Correctly show last 30 minutes timeseries when 'realtime'
* Add with_imported key to Query struct
* Account for injected :is_not filter on sources from dashboard
* Also add tentative imported_utm_sources table
This needs a bit more work on the google import side, as GA do not
report sources and utm sources as distinct things.
* Return imported data to dashboard for rest of Sources panel
This extends the merge_imported function definition for sources to
utm_sources, utm_mediums and utm_campaigns too. This appears to be
working on the DB side but something is incomplete on the client side.
* Clear imported stats from all tables when requested
* Merge entry pages and exit pages from imported data into unfiltered dashboard view
This requires converting the `"visits"` and `"visit_duration"` metrics
to atoms so that they can be used in ecto subqueries.
* Display imported devices, browsers and OSs on dashboard
* Display imported country data on dashboard
* Add more metrics to entries/exits for modals
* make sure data is returned via API with correct keys
* Import regions and cities from GA
* Capitalize device upon import to match native data
* Leave query limits/offsets until after possibly joining with imported data
* Also import timeOnPage and pageviews for pages from GA
* imported_countries -> imported_locations
* Get timeOnPage and pageviews for pages from GA
These are needed for the pages modal, and for calculating exit rates for
exit pages.
* Add indicator to dashboard when imported data is being used
* Don't show imported data as separately line on main graph
* "bounce_rate" -> :bounce_rate, so it works in subqueries
* Drop imported browser and OS versions
These are not needed.
* Toggle displaying imported data by clicking indicator
* Parse referrers with RefInspector
- Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual
referrer host + path, whereas 'ga:source' includes utm_mediums and
other values when relevant.
- 'ga:fullReferror' does however include search engine names directly,
so they are manually checked for as RefInspector won't pick up on
these.
* Keep imported data indicator on dashboard and strikethrough when hidden
* Add unlink google button to import panel
* Rename some GA browsers and OSes to plausible versions
* Get main top pages and exit pages panels working correctly with imported data
* mix format
* Fetch time_on_pages for imported data when needed
* entry pages need to fetch bounces from GA
* "sample_percent" -> :sample_percent as only atoms can be used in subqueries
* Calculate bounce_rate for joined native and imported data for top pages modal
* Flip some query bindings around to be less misleading
* Fixup entry page modal visit durations
* mix format
* Fetch bounces and visit_duration for sources from GA
* add more source metrics used for data in modals
* Make sources modals display correct values
* imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration
* Merge imported data into aggregate stats
* Reformat top graph side icons
* Ensure sample_percent is yielded from aggregate data
* filter event_props should be strings
* Hide imported data from frontend when using filter
* Fix existing tests
* fix tests
* Fix imported indicator appearing when filtering
* comma needed, lost when rebasing
* Import utm_terms and utm_content from GA
* Merge imported utm_term and utm_content
* Rename imported Countries data as Locations
* Set imported city schema field to int
* Remove utm_terms and utm_content when clearing imported
* Clean locations import from Google Analytics
- Country and region should be set to "" when GA provides "(not set)"
- City should be set to 0 for "unknown", as we cannot reliably import
city data from GA.
* Display imported region and city in dashboard
* os -> operating_system in some parts of code
The inconsistency of using os in some places and operating_system in
others causes trouble with subqueries and joins for the native and
imported data, which would require additional logic to account for. The
simplest solution is the just use a consistent word for all uses. This
doesn't make any user-facing or database changes.
* to_atom -> to_existing_atom
* format
* "events" metric -> :events
* ignore imported data when "events" in metrics
* update "bounce_rate"
* atomise some more metrics from new city and region api
* atomise some more metrics for email handlers
* "conversion_rate" -> :conversion_rate during csv export
* Move imported data stats code to own module
* Move imported timeseries function to Stats.Imported
* Use Timex.parse to import dates from GA
* has_imported_stats -> imported_source
* "time_on_page" -> :time_on_page
* Convert imported GA data to UTC
* Clean up GA request code a bit
There was some weird logic here with two separate lists that really
ought to be together, so this merges those.
* Fail sooner if GA timezone can't be identified
* Link imported tables to site by id
* imported_utm_content -> imported_utm_contents
* Imported GA from all of time
* Reorganise GA data fetch logic
- Fetch data from the start of time (2005)
- Check whether no data was fetched, and if so, inform user and don't
consider data to be imported.
* Clarify removal of "visits" data when it isn't in metrics
* Apply location filters from API
This makes it consistent with the sources etc which filter out 'Direct /
None' on the API side. These filters are used by both the native and
imported data handling code, which would otherwise both duplicate the
filters in their `where` clauses.
* Do not use changeset for setting site.imported_source
* Add all metrics to all dimensions
* Run GA import in the background
* Send email when GA import completes
* Add handler to insert imported data into tests and imported_browsers_factory
* Add remaining import data test factories
* Add imported location data to test
* Test main graph with imported data
* Add imported data to operating systems tests
* Add imported data to pages tests
* Add imported data to entry pages tests
* Add imported data to exit pages tests
* Add imported data to devices tests
* Add imported data to sources tests
* Add imported data to UTM tests
* Add new test module for the data import step
* Test import of sources GA data
* Test import of utm_mediums GA data
* Test import of utm_campaigns GA data
* Add tests for UTM terms
* Add tests for UTM contents
* Add test for importing pages and entry pages data from GA
* Add test for importing exit page data
* Fix module file name typo
* Add test for importing location data from GA
* Add test for importing devices data from GA
* Add test for importing browsers data from GA
* Add test for importing OS data from GA
* Paginate GA requests to download all data
* Bump clickhouse_ecto version
* Move RefInspector wrapper function into module
* Drop timezone transform on import
* Order imported by side_id then date
* More strings -> atoms
Also changes a conditional to be a bit nicer
* Remove parallelisation of data import
* Split sources and UTM sources from fetched GA data
GA has only a "source" dimension and no "UTM source" dimension. Instead
it returns these combined. The logic herein to tease these apart is:
1. "(direct)" -> it's a direct source
2. if the source is a domain -> it's a source
3. "google" -> it's from adwords; let's make this a UTM source "adwords"
4. else -> just a UTM source
* Keep prop names in queries as strings
* fix typo
* Fix import
* Insert data to clickhouse in batches
* Fix link when removing imported data
* Merge source tables
* Import hostname as well as pathname
* Record start and end time of imported data
* Track import progress
* Fix month interval with imported data
* Do not JOIN when imported date range has no overlap
* Fix time on page using exits
Co-authored-by: mcol <mcol@posteo.net>