Commit Graph

242 Commits

Author SHA1 Message Date
RobertJoonas
0194d57b38
add upper bound to limit parameter (#2226)
* add upper bound to limit parameter

* add more strict validation for limit parameter + 2 tests

* update changelog
2022-09-16 10:21:44 +03:00
Vinicius Brasil
a10d44a0d7
Refactor event struct creation function (#2098)
* Replace Ingestion.Request headers with user_agent

* Replace generic Ingestion.Request params with specific fields

* Refactor event building function into small functions

* Move Plausible.Ingestion to Plausible.Ingestion.Event

* Add option to override event fields while building

* Rename Ingestion.Request meta to props

* Replace UTM-specific fields with generic query_params

* Remove Map.from_struct/1 call from ingestion pipeline

* Remove stash options from ingestion
2022-08-16 14:43:10 +03:00
Vinicius Brasil
d41fd68e99
Create struct for event requests (#2084)
* Create struct for saving ingestion request

* Create separate function to buffer events
2022-08-10 10:36:40 +03:00
RobertJoonas
a058cf6240
added a test and changed hashmode condition (#2082) 2022-08-09 14:31:37 +03:00
Vinicius Brasil
b415ebe776
Fix geolocation subdivision pattern matching (#2063)
* Fix geolocation subdivision pattern matching

This commit fixes a bug where regions were not being saved. This was
caused because Geolix response was returning an additional
`:geolocation` map key. It also adds a test case for this.

Closes #2033

* Add geolocation database to .gitignore
2022-07-28 15:59:39 +03:00
Vinicius Brasil
b5ea6ae3dc
Keep user filter when listing cities, countries, and regions stats (#2030)
This commit fixes a bug where location filters were filtering stats but
not the locations list. This was caused by a `Map.put/3` call that
overrides the user filter. This commit rollbacks 5b57143273
changes and removes the `Map.put/3` call.

Closes #1982
2022-07-25 12:19:38 +03:00
Sasha Fonseca
db3a852f26
Fix missing 'pageviews' column from 'pages.csv' (#1988)
* Fix missing 'pageviews' column from 'pages.csv'

* Add CHANGELOG entry for Issue #1878 fix
2022-07-12 23:46:47 +03:00
Adam Rutkowski
3b82ba0e25
Upgrade to Geolix 2.0 (#1997)
* Upgrade geolix

* Remove geolix pool config

* Save unnecessary Task.async_stream roundtrip

Normally the Geolix API accepts `:where` keyword option that designates
the database to look up. In case no parameter is supplied, it'll spawn
a parallel map over all databases available. In this case we have only
one DB anyway, so there is no need for the extra instrumentation.

* Follow up on direct :geolocation lookups
2022-07-12 11:39:04 +03:00
Uku Taht
8c94e52f0f Adds tracing attribute when event ingest is blocked 2022-07-06 12:54:51 +03:00
Uku Taht
292a419473
Add feature flag to block event ingest (#1991)
Event ingest can be blocked using the flag `block_event_ingest`
2022-07-06 12:25:40 +03:00
Uku Taht
3e5695408a
Use new Session.CacheStore in favour of Session.Store (#1934)
* Remove Session.Store in favour of Session.CacheStore

* Add CHANGELOG entry

* Use appropriate enum function
2022-06-06 10:44:33 +03:00
Kurt McAlpine
b92889aaf0
Add API to retrieve site by domain (#1942) 2022-06-03 10:37:14 +03:00
Uku Taht
102ff1885e Trim trailing whitespace from pathname 2022-06-02 11:34:10 +03:00
Uku Taht
892133d036 Parse build information 2022-05-27 15:51:24 +03:00
Uku Taht
b667d65d52 Move ARG to running container instead of build container 2022-05-27 15:24:11 +03:00
Uku Taht
5b5c70b82d Add build metadata to build arg 2022-05-27 14:30:21 +03:00
Uku Taht
47fc09bacf Add version file directly to priv 2022-05-27 11:19:37 +03:00
Uku Taht
32116fe2c9 Show database library in info endpoint 2022-05-27 10:49:04 +03:00
Marko Saric
2a31f7911a
Fix Innenstadt I (#1877) 2022-05-02 11:36:54 +03:00
Uku Taht
02abbda06d Ignore nil custom prop as well 2022-04-29 11:50:31 +03:00
Uku Taht
c92c548ca8 Do not error on empty string custom prop value 2022-04-29 11:13:36 +03:00
Uku Taht
f18a211dcc
Ingest throughput improvement test setup (#1867)
* Add OTEL and test Cachex for sessions

* Move load test

* Start apps in the appropriate order
2022-04-28 12:24:29 +03:00
RobertJoonas
40275b64d4
Pageview custom dimensions (#1816)
* added custom dimension filtering tests for pages

* first filter UI in place

* pages, entry pages and exit pages can be filtered by pageview props

* added tests for expected filtering behaviour

* fix dimension filter for sources + tests

* added is_not filtering functionality

* fixed formatting

* fixed admin_test

* added (none) as filter value + is_not filter type in UI

* added prefilling applied filter values and some UI tweaks

* added fetch options

* Make prop suggestions work with `props` filter

* Fix test

* Track login state internally

* Add CHANGELOG entry

Co-authored-by: Uku Taht <uku.taht@gmail.com>
2022-04-21 11:47:15 +03:00
Andrea Mazzarella
1128ff4bfa
Update the internal /sites api to paginate results (#1824)
* Update the internal /sites api to paginate results and adapts site-switcher to it

* Update the Changelog

* Format internal controller

* Remove the `+ Add Site` link from the site-switcher in the dashboard

* Change camel to snake case and replace imports with fully qualified calls

* Remove trailing comma from site-switcher
2022-04-18 12:32:01 +03:00
Vignesh Joglekar
3b97ecdc62
Adds Main Graph Metric Selection (#1364)
* First pass bringing in previous graph improvements, and comparsion context

* Swaps issue template to new issue form syntax

* Indentation update

* Indentation update?

* More indentation

* Intendation is hard

* Finalized indentation?

* Github indentation

* Missing fields

* Formatting changes

* Checkbox changes

* Uses new timeseries API, various UI improvements, descopes conversions, ToP from graphing

* Fixes Mobile UI Issues

* Improves point detection and display on hover

* Fixes & adds tests for updated main-graph API route

* Changelog

* Changes to better metric option declaration & minor UI/default fixes

* Fixes top stat tooltips showing unformatted numbers for special (non-rounded) top stats

* Formatting

* Fixes regression with dashed portion not stopping at present_index

* Removes comparison + lint

* Improves top stat active style

* Removes comparison tests

* Splits out tooltip and top stats

Still needs:
- Tests
- Potentially more cleanup

* Adds/moves tests for top stats

* Formatting

* Updates metric LS key, removes console log

* Various fixes + cleanup

* Makes tooltip position & style more consistent

* Fixes test (returns import status on both main graph & top stats)

* Fixes interaction with month dateFormatter

* Fixes edge case tooltip behavior

It was simpler than I thought :/

* Make the entire top stat clickable

* Minor UI improvements

* Fixes another tooltip visibility edge case + cleans up boolean algebra

Co-authored-by: Uku Taht <Uku.taht@gmail.com>
2022-04-13 10:38:47 +03:00
Uku Taht
83df555f55 Remove OTEL 2022-03-21 12:59:14 +02:00
Uku Taht
e27734ed79
[Continued] Google Analytics import (#1753)
* Add has_imported_stats boolean to Site

* Add Google Analytics import panel to general settings

* Get GA profiles to display in import settings panel

* Add import_from_google method as entrypoint to import data

* Add imported_visitors table

* Remove conflicting code from migration

* Import visitors data into clickhouse database

* Pass another dataset to main graph for rendering in red

This adds another entry to the JSON data returned via the main graph API
called `imported_plot`, which is similar to `plot` in form but will be
completed with previously imported data.  Currently it simply returns
the values from `plot` / 2. The data is rendered in the main graph in
red without fill, and without an indicator for the present. Rationale:
imported data will not continue to grow so there is no projection
forward, only backwards.

* Hook imported GA data to dashboard timeseries plot

* Add settings option to forget imported data

* Import sources from google analytics

* Merge imported sources when queried

* Merge imported source data native data when querying sources

* Start converting metrics to atoms so they can be subqueried

This changes "visitors" and in some places "sources" to atoms. This does
not change the behaviour of the functions - the tests all pass unchanged
following this commit. This is necessary as joining subqueries requires
that the keys in `select` statements be atoms and not strings.

* Convery GA (direct) source to empty string

* Import utm campaign and utm medium from GA

* format

* Import all data types from GA into new tables

* Handle large amounts of more data more safely

* Fix some mistakes in tables

* Make GA requests in chunks of 5 queries

* Only display imported timeseries when there is no filter

* Correctly show last 30 minutes timeseries when 'realtime'

* Add with_imported key to Query struct

* Account for injected :is_not filter on sources from dashboard

* Also add tentative imported_utm_sources table

This needs a bit more work on the google import side, as GA do not
report sources and utm sources as distinct things.

* Return imported data to dashboard for rest of Sources panel

This extends the merge_imported function definition for sources to
utm_sources, utm_mediums and utm_campaigns too. This appears to be
working on the DB side but something is incomplete on the client side.

* Clear imported stats from all tables when requested

* Merge entry pages and exit pages from imported data into unfiltered dashboard view

This requires converting the `"visits"` and `"visit_duration"` metrics
to atoms so that they can be used in ecto subqueries.

* Display imported devices, browsers and OSs on dashboard

* Display imported country data on dashboard

* Add more metrics to entries/exits for modals

* make sure data is returned via API with correct keys

* Import regions and cities from GA

* Capitalize device upon import to match native data

* Leave query limits/offsets until after possibly joining with imported data

* Also import timeOnPage and pageviews for pages from GA

* imported_countries -> imported_locations

* Get timeOnPage and pageviews for pages from GA

These are needed for the pages modal, and for calculating exit rates for
exit pages.

* Add indicator to dashboard when imported data is being used

* Don't show imported data as separately line on main graph

* "bounce_rate" -> :bounce_rate, so it works in subqueries

* Drop imported browser and OS versions

These are not needed.

* Toggle displaying imported data by clicking indicator

* Parse referrers with RefInspector

- Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual
  referrer host + path, whereas 'ga:source' includes utm_mediums and
  other values when relevant.
- 'ga:fullReferror' does however include search engine names directly,
  so they are manually checked for as RefInspector won't pick up on
  these.

* Keep imported data indicator on dashboard and strikethrough when hidden

* Add unlink google button to import panel

* Rename some GA browsers and OSes to plausible versions

* Get main top pages and exit pages panels working correctly with imported data

* mix format

* Fetch time_on_pages for imported data when needed

* entry pages need to fetch bounces from GA

* "sample_percent" -> :sample_percent as only atoms can be used in subqueries

* Calculate bounce_rate for joined native and imported data for top pages modal

* Flip some query bindings around to be less misleading

* Fixup entry page modal visit durations

* mix format

* Fetch bounces and visit_duration for sources from GA

* add more source metrics used for data in modals

* Make sources modals display correct values

* imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration

* Merge imported data into aggregate stats

* Reformat top graph side icons

* Ensure sample_percent is yielded from aggregate data

* filter event_props should be strings

* Hide imported data from frontend when using filter

* Fix existing tests

* fix tests

* Fix imported indicator appearing when filtering

* comma needed, lost when rebasing

* Import utm_terms and utm_content from GA

* Merge imported utm_term and utm_content

* Rename imported Countries data as Locations

* Set imported city schema field to int

* Remove utm_terms and utm_content when clearing imported

* Clean locations import from Google Analytics

- Country and region should be set to "" when GA provides "(not set)"
- City should be set to 0 for "unknown", as we cannot reliably import
  city data from GA.

* Display imported region and city in dashboard

* os -> operating_system in some parts of code

The inconsistency of using os in some places and operating_system in
others causes trouble with subqueries and joins for the native and
imported data, which would require additional logic to account for. The
simplest solution is the just use a consistent word for all uses. This
doesn't make any user-facing or database changes.

* to_atom -> to_existing_atom

* format

* "events" metric -> :events

* ignore imported data when "events" in metrics

* update "bounce_rate"

* atomise some more metrics from new city and region api

* atomise some more metrics for email handlers

* "conversion_rate" -> :conversion_rate during csv export

* Move imported data stats code to own module

* Move imported timeseries function to Stats.Imported

* Use Timex.parse to import dates from GA

* has_imported_stats -> imported_source

* "time_on_page" -> :time_on_page

* Convert imported GA data to UTC

* Clean up GA request code a bit

There was some weird logic here with two separate lists that really
ought to be together, so this merges those.

* Fail sooner if GA timezone can't be identified

* Link imported tables to site by id

* imported_utm_content -> imported_utm_contents

* Imported GA from all of time

* Reorganise GA data fetch logic

- Fetch data from the start of time (2005)
- Check whether no data was fetched, and if so, inform user and don't
  consider data to be imported.

* Clarify removal of "visits" data when it isn't in metrics

* Apply location filters from API

This makes it consistent with the sources etc which filter out 'Direct /
None' on the API side. These filters are used by both the native and
imported data handling code, which would otherwise both duplicate the
filters in their `where` clauses.

* Do not use changeset for setting site.imported_source

* Add all metrics to all dimensions

* Run GA import in the background

* Send email when GA import completes

* Add handler to insert imported data into tests and imported_browsers_factory

* Add remaining import data test factories

* Add imported location data to test

* Test main graph with imported data

* Add imported data to operating systems tests

* Add imported data to pages tests

* Add imported data to entry pages tests

* Add imported data to exit pages tests

* Add imported data to devices tests

* Add imported data to sources tests

* Add imported data to UTM tests

* Add new test module for the data import step

* Test import of sources GA data

* Test import of utm_mediums GA data

* Test import of utm_campaigns GA data

* Add tests for UTM terms

* Add tests for UTM contents

* Add test for importing pages and entry pages data from GA

* Add test for importing exit page data

* Fix module file name typo

* Add test for importing location data from GA

* Add test for importing devices data from GA

* Add test for importing browsers data from GA

* Add test for importing OS data from GA

* Paginate GA requests to download all data

* Bump clickhouse_ecto version

* Move RefInspector wrapper function into module

* Drop timezone transform on import

* Order imported by side_id then date

* More strings -> atoms

Also changes a conditional to be a bit nicer

* Remove parallelisation of data import

* Split sources and UTM sources from fetched GA data

GA has only a "source" dimension and no "UTM source" dimension. Instead
it returns these combined. The logic herein to tease these apart is:

1. "(direct)" -> it's a direct source
2. if the source is a domain -> it's a source
3. "google" -> it's from adwords; let's make this a UTM source "adwords"
4. else -> just a UTM source

* Keep prop names in queries as strings

* fix typo

* Fix import

* Insert data to clickhouse in batches

* Fix link when removing imported data

* Merge source tables

* Import hostname as well as pathname

* Record start and end time of imported data

* Track import progress

* Fix month interval with imported data

* Do not JOIN when imported date range has no overlap

* Fix time on page using exits

Co-authored-by: mcol <mcol@posteo.net>
2022-03-10 15:04:59 -06:00
RobertJoonas
b4992cedc1
Referrer spam blocklist (#1750)
* integrating blocklist library

* loads blocklist dependency from Github
2022-03-10 13:58:30 -06:00
Ralf Zimmermann
408d95fe09
Add custom goal props to breakdown endpoint (#1578) 2022-02-01 10:09:45 -06:00
Marko Saric
5bdd74a524
Some changes for the Netherlands (#1607) 2022-01-20 09:55:03 -06:00
Marko Saric
3e1a7a10bb
Some fixes for Austria (#1608) 2022-01-20 09:54:27 -06:00
Uku Taht
5344c477f3 Mix format 2022-01-20 09:53:07 -06:00
Marko Saric
1f29d9d358
Change for Norway (#1618) 2022-01-19 09:44:57 -06:00
Marko Saric
cc88526777
fixes for Zürich (#1612) 2022-01-19 09:38:09 -06:00
Uku Taht
6e21f28980 Access region safely 2022-01-18 11:13:35 -06:00
Uku Taht
febfaa44e0 Dont error with unknown country 2022-01-18 10:41:15 -06:00
Uku Taht
cd40579740 Ignore unknown country code 2022-01-18 10:23:26 -06:00
Uku Taht
81eb2b73df Notify Sentry when location code is not found 2022-01-18 10:04:13 -06:00
Uku Taht
e7f13f7d70 Remove debug statement 2022-01-15 09:05:50 -06:00
Uku Taht
3fb184a893 Log IP for testing 2022-01-14 18:10:58 -06:00
Uku Taht
3f5151585a Mix format 2022-01-12 11:40:53 -06:00
Marko Saric
8673fc29e0
Couple of changes to the cities (#1599)
* Couple of changes to the cities

- list of city overrides now alphabetical
- added new overrides for switzerland and canada

* Changes for Bucharest
2022-01-12 17:34:29 +00:00
Uku Taht
ecc233443e Remove duplicate entry from cities map 2022-01-06 17:28:43 +02:00
Uku Taht
6f4d5b7b8c mix format 2022-01-06 17:24:59 +02:00
Marko Saric
73579054a8
Merge the two Bratislavas (#1587)
* Update settings_email_reports.html.eex

* fixed two typos

upgrage = upgrade
tranfer = transfer

* change sites to stats

changing sites to stats as couple of people mentioned it sounds like we will lock their websites so nobody can visit them

* Change from 14 to 45 times smaller

* Merge the two Bratislavas

* Some fixes to Denmark
2022-01-06 15:56:33 +02:00
Matt Colligan
8148d41e42
Add cities and regions to exported CSV data (#1485) 2022-01-04 11:08:06 +02:00
Uku Taht
f5fee73363 Add city overrides for mexico city 2021-12-30 10:46:05 +02:00
Uku Taht
1bef6fda52 Add stockholm override 2021-12-29 16:00:17 +02:00
Uku Taht
3fffd0c05d Add city overrides 2021-12-29 15:29:13 +02:00
Uku Taht
4ea0e29620 Add UTM content and term to CSV export 2021-12-16 15:50:37 +02:00