2019-11-19 07:30:42 +03:00
|
|
|
defmodule PlausibleWeb.Api.StatsController.PagesTest do
|
|
|
|
use PlausibleWeb.ConnCase
|
2022-10-24 10:34:02 +03:00
|
|
|
|
2021-07-23 13:44:05 +03:00
|
|
|
@user_id 123
|
2019-11-19 07:30:42 +03:00
|
|
|
|
|
|
|
describe "GET /api/stats/:domain/pages" do
|
[Continued] Google Analytics import (#1753)
* Add has_imported_stats boolean to Site
* Add Google Analytics import panel to general settings
* Get GA profiles to display in import settings panel
* Add import_from_google method as entrypoint to import data
* Add imported_visitors table
* Remove conflicting code from migration
* Import visitors data into clickhouse database
* Pass another dataset to main graph for rendering in red
This adds another entry to the JSON data returned via the main graph API
called `imported_plot`, which is similar to `plot` in form but will be
completed with previously imported data. Currently it simply returns
the values from `plot` / 2. The data is rendered in the main graph in
red without fill, and without an indicator for the present. Rationale:
imported data will not continue to grow so there is no projection
forward, only backwards.
* Hook imported GA data to dashboard timeseries plot
* Add settings option to forget imported data
* Import sources from google analytics
* Merge imported sources when queried
* Merge imported source data native data when querying sources
* Start converting metrics to atoms so they can be subqueried
This changes "visitors" and in some places "sources" to atoms. This does
not change the behaviour of the functions - the tests all pass unchanged
following this commit. This is necessary as joining subqueries requires
that the keys in `select` statements be atoms and not strings.
* Convery GA (direct) source to empty string
* Import utm campaign and utm medium from GA
* format
* Import all data types from GA into new tables
* Handle large amounts of more data more safely
* Fix some mistakes in tables
* Make GA requests in chunks of 5 queries
* Only display imported timeseries when there is no filter
* Correctly show last 30 minutes timeseries when 'realtime'
* Add with_imported key to Query struct
* Account for injected :is_not filter on sources from dashboard
* Also add tentative imported_utm_sources table
This needs a bit more work on the google import side, as GA do not
report sources and utm sources as distinct things.
* Return imported data to dashboard for rest of Sources panel
This extends the merge_imported function definition for sources to
utm_sources, utm_mediums and utm_campaigns too. This appears to be
working on the DB side but something is incomplete on the client side.
* Clear imported stats from all tables when requested
* Merge entry pages and exit pages from imported data into unfiltered dashboard view
This requires converting the `"visits"` and `"visit_duration"` metrics
to atoms so that they can be used in ecto subqueries.
* Display imported devices, browsers and OSs on dashboard
* Display imported country data on dashboard
* Add more metrics to entries/exits for modals
* make sure data is returned via API with correct keys
* Import regions and cities from GA
* Capitalize device upon import to match native data
* Leave query limits/offsets until after possibly joining with imported data
* Also import timeOnPage and pageviews for pages from GA
* imported_countries -> imported_locations
* Get timeOnPage and pageviews for pages from GA
These are needed for the pages modal, and for calculating exit rates for
exit pages.
* Add indicator to dashboard when imported data is being used
* Don't show imported data as separately line on main graph
* "bounce_rate" -> :bounce_rate, so it works in subqueries
* Drop imported browser and OS versions
These are not needed.
* Toggle displaying imported data by clicking indicator
* Parse referrers with RefInspector
- Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual
referrer host + path, whereas 'ga:source' includes utm_mediums and
other values when relevant.
- 'ga:fullReferror' does however include search engine names directly,
so they are manually checked for as RefInspector won't pick up on
these.
* Keep imported data indicator on dashboard and strikethrough when hidden
* Add unlink google button to import panel
* Rename some GA browsers and OSes to plausible versions
* Get main top pages and exit pages panels working correctly with imported data
* mix format
* Fetch time_on_pages for imported data when needed
* entry pages need to fetch bounces from GA
* "sample_percent" -> :sample_percent as only atoms can be used in subqueries
* Calculate bounce_rate for joined native and imported data for top pages modal
* Flip some query bindings around to be less misleading
* Fixup entry page modal visit durations
* mix format
* Fetch bounces and visit_duration for sources from GA
* add more source metrics used for data in modals
* Make sources modals display correct values
* imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration
* Merge imported data into aggregate stats
* Reformat top graph side icons
* Ensure sample_percent is yielded from aggregate data
* filter event_props should be strings
* Hide imported data from frontend when using filter
* Fix existing tests
* fix tests
* Fix imported indicator appearing when filtering
* comma needed, lost when rebasing
* Import utm_terms and utm_content from GA
* Merge imported utm_term and utm_content
* Rename imported Countries data as Locations
* Set imported city schema field to int
* Remove utm_terms and utm_content when clearing imported
* Clean locations import from Google Analytics
- Country and region should be set to "" when GA provides "(not set)"
- City should be set to 0 for "unknown", as we cannot reliably import
city data from GA.
* Display imported region and city in dashboard
* os -> operating_system in some parts of code
The inconsistency of using os in some places and operating_system in
others causes trouble with subqueries and joins for the native and
imported data, which would require additional logic to account for. The
simplest solution is the just use a consistent word for all uses. This
doesn't make any user-facing or database changes.
* to_atom -> to_existing_atom
* format
* "events" metric -> :events
* ignore imported data when "events" in metrics
* update "bounce_rate"
* atomise some more metrics from new city and region api
* atomise some more metrics for email handlers
* "conversion_rate" -> :conversion_rate during csv export
* Move imported data stats code to own module
* Move imported timeseries function to Stats.Imported
* Use Timex.parse to import dates from GA
* has_imported_stats -> imported_source
* "time_on_page" -> :time_on_page
* Convert imported GA data to UTC
* Clean up GA request code a bit
There was some weird logic here with two separate lists that really
ought to be together, so this merges those.
* Fail sooner if GA timezone can't be identified
* Link imported tables to site by id
* imported_utm_content -> imported_utm_contents
* Imported GA from all of time
* Reorganise GA data fetch logic
- Fetch data from the start of time (2005)
- Check whether no data was fetched, and if so, inform user and don't
consider data to be imported.
* Clarify removal of "visits" data when it isn't in metrics
* Apply location filters from API
This makes it consistent with the sources etc which filter out 'Direct /
None' on the API side. These filters are used by both the native and
imported data handling code, which would otherwise both duplicate the
filters in their `where` clauses.
* Do not use changeset for setting site.imported_source
* Add all metrics to all dimensions
* Run GA import in the background
* Send email when GA import completes
* Add handler to insert imported data into tests and imported_browsers_factory
* Add remaining import data test factories
* Add imported location data to test
* Test main graph with imported data
* Add imported data to operating systems tests
* Add imported data to pages tests
* Add imported data to entry pages tests
* Add imported data to exit pages tests
* Add imported data to devices tests
* Add imported data to sources tests
* Add imported data to UTM tests
* Add new test module for the data import step
* Test import of sources GA data
* Test import of utm_mediums GA data
* Test import of utm_campaigns GA data
* Add tests for UTM terms
* Add tests for UTM contents
* Add test for importing pages and entry pages data from GA
* Add test for importing exit page data
* Fix module file name typo
* Add test for importing location data from GA
* Add test for importing devices data from GA
* Add test for importing browsers data from GA
* Add test for importing OS data from GA
* Paginate GA requests to download all data
* Bump clickhouse_ecto version
* Move RefInspector wrapper function into module
* Drop timezone transform on import
* Order imported by side_id then date
* More strings -> atoms
Also changes a conditional to be a bit nicer
* Remove parallelisation of data import
* Split sources and UTM sources from fetched GA data
GA has only a "source" dimension and no "UTM source" dimension. Instead
it returns these combined. The logic herein to tease these apart is:
1. "(direct)" -> it's a direct source
2. if the source is a domain -> it's a source
3. "google" -> it's from adwords; let's make this a UTM source "adwords"
4. else -> just a UTM source
* Keep prop names in queries as strings
* fix typo
* Fix import
* Insert data to clickhouse in batches
* Fix link when removing imported data
* Merge source tables
* Import hostname as well as pathname
* Record start and end time of imported data
* Track import progress
* Fix month interval with imported data
* Do not JOIN when imported date range has no overlap
* Fix time on page using exits
Co-authored-by: mcol <mcol@posteo.net>
2022-03-11 00:04:59 +03:00
|
|
|
setup [:create_user, :log_in, :create_new_site, :add_imported_data]
|
2019-11-19 07:30:42 +03:00
|
|
|
|
2020-07-30 11:18:28 +03:00
|
|
|
test "returns top pages by visitors", %{conn: conn, site: site} do
|
2021-07-23 13:44:05 +03:00
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview, pathname: "/"),
|
|
|
|
build(:pageview, pathname: "/"),
|
|
|
|
build(:pageview, pathname: "/"),
|
|
|
|
build(:pageview, pathname: "/register"),
|
|
|
|
build(:pageview, pathname: "/register"),
|
|
|
|
build(:pageview, pathname: "/contact")
|
|
|
|
])
|
|
|
|
|
|
|
|
conn = get(conn, "/api/stats/#{site.domain}/pages?period=day")
|
2019-11-19 07:30:42 +03:00
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
2021-11-04 15:20:39 +03:00
|
|
|
%{"visitors" => 3, "name" => "/"},
|
|
|
|
%{"visitors" => 2, "name" => "/register"},
|
|
|
|
%{"visitors" => 1, "name" => "/contact"}
|
2020-06-08 10:35:13 +03:00
|
|
|
]
|
2019-11-19 07:30:42 +03:00
|
|
|
end
|
2020-01-06 16:51:43 +03:00
|
|
|
|
2022-04-21 11:47:15 +03:00
|
|
|
test "returns top pages with :is filter on custom pageview props", %{conn: conn, site: site} do
|
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog/john-1",
|
|
|
|
"meta.key": ["author"],
|
|
|
|
"meta.value": ["John Doe"]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog/other-post",
|
|
|
|
"meta.key": ["author"],
|
|
|
|
"meta.value": ["other"]
|
|
|
|
),
|
|
|
|
build(:pageview, user_id: 123, pathname: "/")
|
|
|
|
])
|
|
|
|
|
|
|
|
filters = Jason.encode!(%{props: %{"author" => "John Doe"}})
|
|
|
|
conn = get(conn, "/api/stats/#{site.domain}/pages?period=day&filters=#{filters}")
|
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
|
|
|
%{"visitors" => 1, "name" => "/blog/john-1"}
|
|
|
|
]
|
|
|
|
end
|
|
|
|
|
|
|
|
test "returns top pages with :is_not filter on custom pageview props", %{
|
|
|
|
conn: conn,
|
|
|
|
site: site
|
|
|
|
} do
|
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog/john-1",
|
|
|
|
"meta.key": ["author"],
|
|
|
|
"meta.value": ["John Doe"]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog/other-post",
|
|
|
|
"meta.key": ["author"],
|
|
|
|
"meta.value": ["other"]
|
|
|
|
),
|
|
|
|
build(:pageview, pathname: "/")
|
|
|
|
])
|
|
|
|
|
|
|
|
filters = Jason.encode!(%{props: %{"author" => "!John Doe"}})
|
|
|
|
conn = get(conn, "/api/stats/#{site.domain}/pages?period=day&filters=#{filters}")
|
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
|
|
|
%{"visitors" => 1, "name" => "/"},
|
|
|
|
%{"visitors" => 1, "name" => "/blog/other-post"}
|
|
|
|
]
|
|
|
|
end
|
|
|
|
|
|
|
|
test "calculates bounce_rate and time_on_page with :is filter on custom pageview props",
|
|
|
|
%{conn: conn, site: site} do
|
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog/john-1",
|
|
|
|
"meta.key": ["author"],
|
|
|
|
"meta.value": ["John Doe"],
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:01:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog/john-2",
|
|
|
|
"meta.key": ["author"],
|
|
|
|
"meta.value": ["John Doe"],
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:02:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog/john-2",
|
|
|
|
"meta.key": ["author"],
|
|
|
|
"meta.value": ["John Doe"],
|
|
|
|
user_id: 456,
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog",
|
|
|
|
user_id: 456,
|
|
|
|
timestamp: ~N[2021-01-01 00:10:00]
|
|
|
|
)
|
|
|
|
])
|
|
|
|
|
|
|
|
filters = Jason.encode!(%{props: %{"author" => "John Doe"}})
|
|
|
|
|
|
|
|
conn =
|
|
|
|
get(
|
|
|
|
conn,
|
|
|
|
"/api/stats/#{site.domain}/pages?period=day&date=2021-01-01&filters=#{filters}&detailed=true"
|
|
|
|
)
|
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
|
|
|
%{
|
|
|
|
"name" => "/blog/john-2",
|
|
|
|
"visitors" => 2,
|
|
|
|
"pageviews" => 2,
|
|
|
|
"bounce_rate" => 0,
|
|
|
|
"time_on_page" => 600
|
|
|
|
},
|
|
|
|
%{
|
|
|
|
"name" => "/blog/john-1",
|
|
|
|
"visitors" => 1,
|
|
|
|
"pageviews" => 1,
|
|
|
|
"bounce_rate" => 0,
|
|
|
|
"time_on_page" => 60
|
|
|
|
}
|
|
|
|
]
|
|
|
|
end
|
|
|
|
|
|
|
|
test "calculates bounce_rate and time_on_page with :is_not filter on custom pageview props",
|
|
|
|
%{conn: conn, site: site} do
|
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog/john-1",
|
|
|
|
user_id: @user_id,
|
|
|
|
"meta.key": ["author"],
|
|
|
|
"meta.value": ["John Doe"],
|
|
|
|
timestamp: ~N[2021-01-01 00:01:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog/other-post",
|
|
|
|
user_id: @user_id,
|
|
|
|
"meta.key": ["author"],
|
|
|
|
"meta.value": ["other"],
|
|
|
|
timestamp: ~N[2021-01-01 00:02:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog",
|
|
|
|
user_id: 456,
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog/john-1",
|
|
|
|
"meta.key": ["author"],
|
|
|
|
"meta.value": ["John Doe"],
|
|
|
|
user_id: 456,
|
|
|
|
timestamp: ~N[2021-01-01 00:03:00]
|
|
|
|
)
|
|
|
|
])
|
|
|
|
|
|
|
|
filters = Jason.encode!(%{props: %{"author" => "!John Doe"}})
|
|
|
|
|
|
|
|
conn =
|
|
|
|
get(
|
|
|
|
conn,
|
|
|
|
"/api/stats/#{site.domain}/pages?period=day&date=2021-01-01&filters=#{filters}&detailed=true"
|
|
|
|
)
|
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
|
|
|
%{
|
|
|
|
"name" => "/blog",
|
|
|
|
"visitors" => 2,
|
|
|
|
"pageviews" => 2,
|
|
|
|
"bounce_rate" => 0,
|
|
|
|
"time_on_page" => 120.0
|
|
|
|
},
|
|
|
|
%{
|
|
|
|
"name" => "/blog/other-post",
|
|
|
|
"visitors" => 1,
|
|
|
|
"pageviews" => 1,
|
|
|
|
"bounce_rate" => nil,
|
|
|
|
"time_on_page" => nil
|
|
|
|
}
|
|
|
|
]
|
|
|
|
end
|
|
|
|
|
|
|
|
test "calculates bounce_rate and time_on_page with :is (none) filter on custom pageview props",
|
|
|
|
%{conn: conn, site: site} do
|
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog/john-1",
|
|
|
|
user_id: @user_id,
|
|
|
|
"meta.key": ["author"],
|
|
|
|
"meta.value": ["John Doe"],
|
|
|
|
timestamp: ~N[2021-01-01 00:01:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog/other-post",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:02:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog",
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
)
|
|
|
|
])
|
|
|
|
|
|
|
|
filters = Jason.encode!(%{props: %{"author" => "(none)"}})
|
|
|
|
|
|
|
|
conn =
|
|
|
|
get(
|
|
|
|
conn,
|
|
|
|
"/api/stats/#{site.domain}/pages?period=day&date=2021-01-01&filters=#{filters}&detailed=true"
|
|
|
|
)
|
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
|
|
|
%{
|
|
|
|
"name" => "/blog",
|
|
|
|
"visitors" => 2,
|
|
|
|
"pageviews" => 2,
|
|
|
|
"bounce_rate" => 50,
|
|
|
|
"time_on_page" => 60
|
|
|
|
},
|
|
|
|
%{
|
|
|
|
"name" => "/blog/other-post",
|
|
|
|
"visitors" => 1,
|
|
|
|
"pageviews" => 1,
|
|
|
|
"bounce_rate" => nil,
|
|
|
|
"time_on_page" => nil
|
|
|
|
}
|
|
|
|
]
|
|
|
|
end
|
|
|
|
|
|
|
|
test "calculates bounce_rate and time_on_page with :is_not (none) filter on custom pageview props",
|
|
|
|
%{conn: conn, site: site} do
|
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog/john-1",
|
|
|
|
user_id: @user_id,
|
|
|
|
"meta.key": ["author"],
|
|
|
|
"meta.value": ["John Doe"],
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:01:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog/other-post",
|
|
|
|
"meta.key": ["author"],
|
|
|
|
"meta.value": ["other"],
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:02:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog/other-post",
|
|
|
|
"meta.key": ["author"],
|
|
|
|
"meta.value": [""],
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
)
|
|
|
|
])
|
|
|
|
|
|
|
|
filters = Jason.encode!(%{props: %{"author" => "!(none)"}})
|
|
|
|
|
|
|
|
conn =
|
|
|
|
get(
|
|
|
|
conn,
|
|
|
|
"/api/stats/#{site.domain}/pages?period=day&date=2021-01-01&filters=#{filters}&detailed=true"
|
|
|
|
)
|
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
|
|
|
%{
|
|
|
|
"name" => "/blog/other-post",
|
|
|
|
"visitors" => 2,
|
|
|
|
"pageviews" => 2,
|
|
|
|
"bounce_rate" => 100,
|
|
|
|
"time_on_page" => nil
|
|
|
|
},
|
|
|
|
%{
|
|
|
|
"name" => "/blog/john-1",
|
|
|
|
"visitors" => 1,
|
|
|
|
"pageviews" => 1,
|
|
|
|
"bounce_rate" => 0,
|
|
|
|
"time_on_page" => 60
|
|
|
|
}
|
|
|
|
]
|
|
|
|
end
|
|
|
|
|
|
|
|
test "calculates bounce_rate and time_on_page for pages filtered by page path",
|
|
|
|
%{conn: conn, site: site} do
|
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/about",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:01:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:02:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/",
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/about",
|
|
|
|
timestamp: ~N[2021-01-01 00:10:00]
|
|
|
|
)
|
|
|
|
])
|
|
|
|
|
|
|
|
filters = Jason.encode!(%{page: "/"})
|
|
|
|
|
|
|
|
conn =
|
|
|
|
get(
|
|
|
|
conn,
|
|
|
|
"/api/stats/#{site.domain}/pages?period=day&date=2021-01-01&filters=#{filters}&detailed=true"
|
|
|
|
)
|
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
|
|
|
%{
|
|
|
|
"name" => "/",
|
|
|
|
"visitors" => 2,
|
|
|
|
"pageviews" => 3,
|
|
|
|
"bounce_rate" => 50,
|
|
|
|
"time_on_page" => 60
|
|
|
|
}
|
|
|
|
]
|
|
|
|
end
|
|
|
|
|
[Continued] Google Analytics import (#1753)
* Add has_imported_stats boolean to Site
* Add Google Analytics import panel to general settings
* Get GA profiles to display in import settings panel
* Add import_from_google method as entrypoint to import data
* Add imported_visitors table
* Remove conflicting code from migration
* Import visitors data into clickhouse database
* Pass another dataset to main graph for rendering in red
This adds another entry to the JSON data returned via the main graph API
called `imported_plot`, which is similar to `plot` in form but will be
completed with previously imported data. Currently it simply returns
the values from `plot` / 2. The data is rendered in the main graph in
red without fill, and without an indicator for the present. Rationale:
imported data will not continue to grow so there is no projection
forward, only backwards.
* Hook imported GA data to dashboard timeseries plot
* Add settings option to forget imported data
* Import sources from google analytics
* Merge imported sources when queried
* Merge imported source data native data when querying sources
* Start converting metrics to atoms so they can be subqueried
This changes "visitors" and in some places "sources" to atoms. This does
not change the behaviour of the functions - the tests all pass unchanged
following this commit. This is necessary as joining subqueries requires
that the keys in `select` statements be atoms and not strings.
* Convery GA (direct) source to empty string
* Import utm campaign and utm medium from GA
* format
* Import all data types from GA into new tables
* Handle large amounts of more data more safely
* Fix some mistakes in tables
* Make GA requests in chunks of 5 queries
* Only display imported timeseries when there is no filter
* Correctly show last 30 minutes timeseries when 'realtime'
* Add with_imported key to Query struct
* Account for injected :is_not filter on sources from dashboard
* Also add tentative imported_utm_sources table
This needs a bit more work on the google import side, as GA do not
report sources and utm sources as distinct things.
* Return imported data to dashboard for rest of Sources panel
This extends the merge_imported function definition for sources to
utm_sources, utm_mediums and utm_campaigns too. This appears to be
working on the DB side but something is incomplete on the client side.
* Clear imported stats from all tables when requested
* Merge entry pages and exit pages from imported data into unfiltered dashboard view
This requires converting the `"visits"` and `"visit_duration"` metrics
to atoms so that they can be used in ecto subqueries.
* Display imported devices, browsers and OSs on dashboard
* Display imported country data on dashboard
* Add more metrics to entries/exits for modals
* make sure data is returned via API with correct keys
* Import regions and cities from GA
* Capitalize device upon import to match native data
* Leave query limits/offsets until after possibly joining with imported data
* Also import timeOnPage and pageviews for pages from GA
* imported_countries -> imported_locations
* Get timeOnPage and pageviews for pages from GA
These are needed for the pages modal, and for calculating exit rates for
exit pages.
* Add indicator to dashboard when imported data is being used
* Don't show imported data as separately line on main graph
* "bounce_rate" -> :bounce_rate, so it works in subqueries
* Drop imported browser and OS versions
These are not needed.
* Toggle displaying imported data by clicking indicator
* Parse referrers with RefInspector
- Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual
referrer host + path, whereas 'ga:source' includes utm_mediums and
other values when relevant.
- 'ga:fullReferror' does however include search engine names directly,
so they are manually checked for as RefInspector won't pick up on
these.
* Keep imported data indicator on dashboard and strikethrough when hidden
* Add unlink google button to import panel
* Rename some GA browsers and OSes to plausible versions
* Get main top pages and exit pages panels working correctly with imported data
* mix format
* Fetch time_on_pages for imported data when needed
* entry pages need to fetch bounces from GA
* "sample_percent" -> :sample_percent as only atoms can be used in subqueries
* Calculate bounce_rate for joined native and imported data for top pages modal
* Flip some query bindings around to be less misleading
* Fixup entry page modal visit durations
* mix format
* Fetch bounces and visit_duration for sources from GA
* add more source metrics used for data in modals
* Make sources modals display correct values
* imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration
* Merge imported data into aggregate stats
* Reformat top graph side icons
* Ensure sample_percent is yielded from aggregate data
* filter event_props should be strings
* Hide imported data from frontend when using filter
* Fix existing tests
* fix tests
* Fix imported indicator appearing when filtering
* comma needed, lost when rebasing
* Import utm_terms and utm_content from GA
* Merge imported utm_term and utm_content
* Rename imported Countries data as Locations
* Set imported city schema field to int
* Remove utm_terms and utm_content when clearing imported
* Clean locations import from Google Analytics
- Country and region should be set to "" when GA provides "(not set)"
- City should be set to 0 for "unknown", as we cannot reliably import
city data from GA.
* Display imported region and city in dashboard
* os -> operating_system in some parts of code
The inconsistency of using os in some places and operating_system in
others causes trouble with subqueries and joins for the native and
imported data, which would require additional logic to account for. The
simplest solution is the just use a consistent word for all uses. This
doesn't make any user-facing or database changes.
* to_atom -> to_existing_atom
* format
* "events" metric -> :events
* ignore imported data when "events" in metrics
* update "bounce_rate"
* atomise some more metrics from new city and region api
* atomise some more metrics for email handlers
* "conversion_rate" -> :conversion_rate during csv export
* Move imported data stats code to own module
* Move imported timeseries function to Stats.Imported
* Use Timex.parse to import dates from GA
* has_imported_stats -> imported_source
* "time_on_page" -> :time_on_page
* Convert imported GA data to UTC
* Clean up GA request code a bit
There was some weird logic here with two separate lists that really
ought to be together, so this merges those.
* Fail sooner if GA timezone can't be identified
* Link imported tables to site by id
* imported_utm_content -> imported_utm_contents
* Imported GA from all of time
* Reorganise GA data fetch logic
- Fetch data from the start of time (2005)
- Check whether no data was fetched, and if so, inform user and don't
consider data to be imported.
* Clarify removal of "visits" data when it isn't in metrics
* Apply location filters from API
This makes it consistent with the sources etc which filter out 'Direct /
None' on the API side. These filters are used by both the native and
imported data handling code, which would otherwise both duplicate the
filters in their `where` clauses.
* Do not use changeset for setting site.imported_source
* Add all metrics to all dimensions
* Run GA import in the background
* Send email when GA import completes
* Add handler to insert imported data into tests and imported_browsers_factory
* Add remaining import data test factories
* Add imported location data to test
* Test main graph with imported data
* Add imported data to operating systems tests
* Add imported data to pages tests
* Add imported data to entry pages tests
* Add imported data to exit pages tests
* Add imported data to devices tests
* Add imported data to sources tests
* Add imported data to UTM tests
* Add new test module for the data import step
* Test import of sources GA data
* Test import of utm_mediums GA data
* Test import of utm_campaigns GA data
* Add tests for UTM terms
* Add tests for UTM contents
* Add test for importing pages and entry pages data from GA
* Add test for importing exit page data
* Fix module file name typo
* Add test for importing location data from GA
* Add test for importing devices data from GA
* Add test for importing browsers data from GA
* Add test for importing OS data from GA
* Paginate GA requests to download all data
* Bump clickhouse_ecto version
* Move RefInspector wrapper function into module
* Drop timezone transform on import
* Order imported by side_id then date
* More strings -> atoms
Also changes a conditional to be a bit nicer
* Remove parallelisation of data import
* Split sources and UTM sources from fetched GA data
GA has only a "source" dimension and no "UTM source" dimension. Instead
it returns these combined. The logic herein to tease these apart is:
1. "(direct)" -> it's a direct source
2. if the source is a domain -> it's a source
3. "google" -> it's from adwords; let's make this a UTM source "adwords"
4. else -> just a UTM source
* Keep prop names in queries as strings
* fix typo
* Fix import
* Insert data to clickhouse in batches
* Fix link when removing imported data
* Merge source tables
* Import hostname as well as pathname
* Record start and end time of imported data
* Track import progress
* Fix month interval with imported data
* Do not JOIN when imported date range has no overlap
* Fix time on page using exits
Co-authored-by: mcol <mcol@posteo.net>
2022-03-11 00:04:59 +03:00
|
|
|
test "returns top pages by visitors with imported data", %{conn: conn, site: site} do
|
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview, pathname: "/"),
|
|
|
|
build(:pageview, pathname: "/"),
|
|
|
|
build(:pageview, pathname: "/"),
|
|
|
|
build(:imported_pages, page: "/"),
|
|
|
|
build(:pageview, pathname: "/register"),
|
|
|
|
build(:pageview, pathname: "/register"),
|
|
|
|
build(:imported_pages, page: "/register"),
|
|
|
|
build(:pageview, pathname: "/contact")
|
|
|
|
])
|
|
|
|
|
|
|
|
conn = get(conn, "/api/stats/#{site.domain}/pages?period=day")
|
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
|
|
|
%{"visitors" => 3, "name" => "/"},
|
|
|
|
%{"visitors" => 2, "name" => "/register"},
|
|
|
|
%{"visitors" => 1, "name" => "/contact"}
|
|
|
|
]
|
|
|
|
|
|
|
|
conn = get(conn, "/api/stats/#{site.domain}/pages?period=day&with_imported=true")
|
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
|
|
|
%{"visitors" => 4, "name" => "/"},
|
|
|
|
%{"visitors" => 3, "name" => "/register"},
|
|
|
|
%{"visitors" => 1, "name" => "/contact"}
|
|
|
|
]
|
|
|
|
end
|
|
|
|
|
2021-05-18 15:14:33 +03:00
|
|
|
test "calculates bounce rate and time on page for pages", %{conn: conn, site: site} do
|
2021-07-23 13:44:05 +03:00
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/some-other-page",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:15:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/",
|
|
|
|
timestamp: ~N[2021-01-01 00:15:00]
|
|
|
|
)
|
|
|
|
])
|
|
|
|
|
2020-06-08 10:35:13 +03:00
|
|
|
conn =
|
|
|
|
get(
|
|
|
|
conn,
|
2021-07-23 13:44:05 +03:00
|
|
|
"/api/stats/#{site.domain}/pages?period=day&date=2021-01-01&detailed=true"
|
2020-06-08 10:35:13 +03:00
|
|
|
)
|
2020-01-06 16:51:43 +03:00
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
2020-11-03 12:20:11 +03:00
|
|
|
%{
|
2021-07-23 13:44:05 +03:00
|
|
|
"bounce_rate" => 50.0,
|
2022-02-10 23:03:56 +03:00
|
|
|
"time_on_page" => 900.0,
|
2021-11-04 15:20:39 +03:00
|
|
|
"visitors" => 2,
|
2020-11-03 12:20:11 +03:00
|
|
|
"pageviews" => 2,
|
2021-07-23 13:44:05 +03:00
|
|
|
"name" => "/"
|
2020-11-03 12:20:11 +03:00
|
|
|
},
|
|
|
|
%{
|
|
|
|
"bounce_rate" => nil,
|
2022-02-10 23:03:56 +03:00
|
|
|
"time_on_page" => nil,
|
2021-11-04 15:20:39 +03:00
|
|
|
"visitors" => 1,
|
2020-11-03 12:20:11 +03:00
|
|
|
"pageviews" => 1,
|
2021-07-23 13:44:05 +03:00
|
|
|
"name" => "/some-other-page"
|
2020-11-03 12:20:11 +03:00
|
|
|
}
|
|
|
|
]
|
2020-01-06 16:51:43 +03:00
|
|
|
end
|
2020-07-14 16:52:26 +03:00
|
|
|
|
[Continued] Google Analytics import (#1753)
* Add has_imported_stats boolean to Site
* Add Google Analytics import panel to general settings
* Get GA profiles to display in import settings panel
* Add import_from_google method as entrypoint to import data
* Add imported_visitors table
* Remove conflicting code from migration
* Import visitors data into clickhouse database
* Pass another dataset to main graph for rendering in red
This adds another entry to the JSON data returned via the main graph API
called `imported_plot`, which is similar to `plot` in form but will be
completed with previously imported data. Currently it simply returns
the values from `plot` / 2. The data is rendered in the main graph in
red without fill, and without an indicator for the present. Rationale:
imported data will not continue to grow so there is no projection
forward, only backwards.
* Hook imported GA data to dashboard timeseries plot
* Add settings option to forget imported data
* Import sources from google analytics
* Merge imported sources when queried
* Merge imported source data native data when querying sources
* Start converting metrics to atoms so they can be subqueried
This changes "visitors" and in some places "sources" to atoms. This does
not change the behaviour of the functions - the tests all pass unchanged
following this commit. This is necessary as joining subqueries requires
that the keys in `select` statements be atoms and not strings.
* Convery GA (direct) source to empty string
* Import utm campaign and utm medium from GA
* format
* Import all data types from GA into new tables
* Handle large amounts of more data more safely
* Fix some mistakes in tables
* Make GA requests in chunks of 5 queries
* Only display imported timeseries when there is no filter
* Correctly show last 30 minutes timeseries when 'realtime'
* Add with_imported key to Query struct
* Account for injected :is_not filter on sources from dashboard
* Also add tentative imported_utm_sources table
This needs a bit more work on the google import side, as GA do not
report sources and utm sources as distinct things.
* Return imported data to dashboard for rest of Sources panel
This extends the merge_imported function definition for sources to
utm_sources, utm_mediums and utm_campaigns too. This appears to be
working on the DB side but something is incomplete on the client side.
* Clear imported stats from all tables when requested
* Merge entry pages and exit pages from imported data into unfiltered dashboard view
This requires converting the `"visits"` and `"visit_duration"` metrics
to atoms so that they can be used in ecto subqueries.
* Display imported devices, browsers and OSs on dashboard
* Display imported country data on dashboard
* Add more metrics to entries/exits for modals
* make sure data is returned via API with correct keys
* Import regions and cities from GA
* Capitalize device upon import to match native data
* Leave query limits/offsets until after possibly joining with imported data
* Also import timeOnPage and pageviews for pages from GA
* imported_countries -> imported_locations
* Get timeOnPage and pageviews for pages from GA
These are needed for the pages modal, and for calculating exit rates for
exit pages.
* Add indicator to dashboard when imported data is being used
* Don't show imported data as separately line on main graph
* "bounce_rate" -> :bounce_rate, so it works in subqueries
* Drop imported browser and OS versions
These are not needed.
* Toggle displaying imported data by clicking indicator
* Parse referrers with RefInspector
- Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual
referrer host + path, whereas 'ga:source' includes utm_mediums and
other values when relevant.
- 'ga:fullReferror' does however include search engine names directly,
so they are manually checked for as RefInspector won't pick up on
these.
* Keep imported data indicator on dashboard and strikethrough when hidden
* Add unlink google button to import panel
* Rename some GA browsers and OSes to plausible versions
* Get main top pages and exit pages panels working correctly with imported data
* mix format
* Fetch time_on_pages for imported data when needed
* entry pages need to fetch bounces from GA
* "sample_percent" -> :sample_percent as only atoms can be used in subqueries
* Calculate bounce_rate for joined native and imported data for top pages modal
* Flip some query bindings around to be less misleading
* Fixup entry page modal visit durations
* mix format
* Fetch bounces and visit_duration for sources from GA
* add more source metrics used for data in modals
* Make sources modals display correct values
* imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration
* Merge imported data into aggregate stats
* Reformat top graph side icons
* Ensure sample_percent is yielded from aggregate data
* filter event_props should be strings
* Hide imported data from frontend when using filter
* Fix existing tests
* fix tests
* Fix imported indicator appearing when filtering
* comma needed, lost when rebasing
* Import utm_terms and utm_content from GA
* Merge imported utm_term and utm_content
* Rename imported Countries data as Locations
* Set imported city schema field to int
* Remove utm_terms and utm_content when clearing imported
* Clean locations import from Google Analytics
- Country and region should be set to "" when GA provides "(not set)"
- City should be set to 0 for "unknown", as we cannot reliably import
city data from GA.
* Display imported region and city in dashboard
* os -> operating_system in some parts of code
The inconsistency of using os in some places and operating_system in
others causes trouble with subqueries and joins for the native and
imported data, which would require additional logic to account for. The
simplest solution is the just use a consistent word for all uses. This
doesn't make any user-facing or database changes.
* to_atom -> to_existing_atom
* format
* "events" metric -> :events
* ignore imported data when "events" in metrics
* update "bounce_rate"
* atomise some more metrics from new city and region api
* atomise some more metrics for email handlers
* "conversion_rate" -> :conversion_rate during csv export
* Move imported data stats code to own module
* Move imported timeseries function to Stats.Imported
* Use Timex.parse to import dates from GA
* has_imported_stats -> imported_source
* "time_on_page" -> :time_on_page
* Convert imported GA data to UTC
* Clean up GA request code a bit
There was some weird logic here with two separate lists that really
ought to be together, so this merges those.
* Fail sooner if GA timezone can't be identified
* Link imported tables to site by id
* imported_utm_content -> imported_utm_contents
* Imported GA from all of time
* Reorganise GA data fetch logic
- Fetch data from the start of time (2005)
- Check whether no data was fetched, and if so, inform user and don't
consider data to be imported.
* Clarify removal of "visits" data when it isn't in metrics
* Apply location filters from API
This makes it consistent with the sources etc which filter out 'Direct /
None' on the API side. These filters are used by both the native and
imported data handling code, which would otherwise both duplicate the
filters in their `where` clauses.
* Do not use changeset for setting site.imported_source
* Add all metrics to all dimensions
* Run GA import in the background
* Send email when GA import completes
* Add handler to insert imported data into tests and imported_browsers_factory
* Add remaining import data test factories
* Add imported location data to test
* Test main graph with imported data
* Add imported data to operating systems tests
* Add imported data to pages tests
* Add imported data to entry pages tests
* Add imported data to exit pages tests
* Add imported data to devices tests
* Add imported data to sources tests
* Add imported data to UTM tests
* Add new test module for the data import step
* Test import of sources GA data
* Test import of utm_mediums GA data
* Test import of utm_campaigns GA data
* Add tests for UTM terms
* Add tests for UTM contents
* Add test for importing pages and entry pages data from GA
* Add test for importing exit page data
* Fix module file name typo
* Add test for importing location data from GA
* Add test for importing devices data from GA
* Add test for importing browsers data from GA
* Add test for importing OS data from GA
* Paginate GA requests to download all data
* Bump clickhouse_ecto version
* Move RefInspector wrapper function into module
* Drop timezone transform on import
* Order imported by side_id then date
* More strings -> atoms
Also changes a conditional to be a bit nicer
* Remove parallelisation of data import
* Split sources and UTM sources from fetched GA data
GA has only a "source" dimension and no "UTM source" dimension. Instead
it returns these combined. The logic herein to tease these apart is:
1. "(direct)" -> it's a direct source
2. if the source is a domain -> it's a source
3. "google" -> it's from adwords; let's make this a UTM source "adwords"
4. else -> just a UTM source
* Keep prop names in queries as strings
* fix typo
* Fix import
* Insert data to clickhouse in batches
* Fix link when removing imported data
* Merge source tables
* Import hostname as well as pathname
* Record start and end time of imported data
* Track import progress
* Fix month interval with imported data
* Do not JOIN when imported date range has no overlap
* Fix time on page using exits
Co-authored-by: mcol <mcol@posteo.net>
2022-03-11 00:04:59 +03:00
|
|
|
test "calculates bounce rate and time on page for pages with imported data", %{
|
|
|
|
conn: conn,
|
|
|
|
site: site
|
|
|
|
} do
|
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/some-other-page",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:15:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/",
|
|
|
|
timestamp: ~N[2021-01-01 00:15:00]
|
|
|
|
),
|
|
|
|
build(:imported_pages,
|
|
|
|
page: "/",
|
|
|
|
date: ~D[2021-01-01],
|
|
|
|
time_on_page: 700
|
|
|
|
),
|
|
|
|
build(:imported_entry_pages,
|
|
|
|
entry_page: "/",
|
|
|
|
date: ~D[2021-01-01],
|
|
|
|
entrances: 3,
|
|
|
|
bounces: 1
|
|
|
|
),
|
|
|
|
build(:imported_pages,
|
|
|
|
page: "/some-other-page",
|
|
|
|
date: ~D[2021-01-01],
|
|
|
|
time_on_page: 60
|
|
|
|
)
|
|
|
|
])
|
|
|
|
|
|
|
|
conn =
|
|
|
|
get(
|
|
|
|
conn,
|
|
|
|
"/api/stats/#{site.domain}/pages?period=day&date=2021-01-01&detailed=true&with_imported=true"
|
|
|
|
)
|
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
|
|
|
%{
|
|
|
|
"bounce_rate" => 40.0,
|
|
|
|
"time_on_page" => 800.0,
|
|
|
|
"visitors" => 3,
|
|
|
|
"pageviews" => 3,
|
|
|
|
"name" => "/"
|
|
|
|
},
|
|
|
|
%{
|
|
|
|
"bounce_rate" => nil,
|
|
|
|
"time_on_page" => 60,
|
|
|
|
"visitors" => 2,
|
|
|
|
"pageviews" => 2,
|
|
|
|
"name" => "/some-other-page"
|
|
|
|
}
|
|
|
|
]
|
|
|
|
end
|
|
|
|
|
2020-07-14 16:52:26 +03:00
|
|
|
test "returns top pages in realtime report", %{conn: conn, site: site} do
|
2021-07-23 13:44:05 +03:00
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview, pathname: "/page1"),
|
|
|
|
build(:pageview, pathname: "/page2"),
|
|
|
|
build(:pageview, pathname: "/page1")
|
|
|
|
])
|
|
|
|
|
2020-07-14 16:52:26 +03:00
|
|
|
conn = get(conn, "/api/stats/#{site.domain}/pages?period=realtime")
|
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
2021-11-04 15:20:39 +03:00
|
|
|
%{"visitors" => 2, "name" => "/page1"},
|
|
|
|
%{"visitors" => 1, "name" => "/page2"}
|
2020-07-30 11:18:28 +03:00
|
|
|
]
|
|
|
|
end
|
2021-09-20 17:17:11 +03:00
|
|
|
|
|
|
|
test "calculates conversion_rate when filtering for goal", %{conn: conn, site: site} do
|
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview, user_id: 1, pathname: "/"),
|
|
|
|
build(:pageview, user_id: 2, pathname: "/"),
|
|
|
|
build(:pageview, user_id: 3, pathname: "/"),
|
|
|
|
build(:event, user_id: 3, name: "Signup")
|
|
|
|
])
|
|
|
|
|
|
|
|
filters = Jason.encode!(%{"goal" => "Signup"})
|
|
|
|
|
|
|
|
conn = get(conn, "/api/stats/#{site.domain}/pages?period=day&filters=#{filters}")
|
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
2021-11-04 15:20:39 +03:00
|
|
|
%{"total_visitors" => 3, "visitors" => 1, "name" => "/", "conversion_rate" => 33.3}
|
2021-09-20 17:17:11 +03:00
|
|
|
]
|
|
|
|
end
|
2020-07-30 11:18:28 +03:00
|
|
|
end
|
|
|
|
|
|
|
|
describe "GET /api/stats/:domain/entry-pages" do
|
[Continued] Google Analytics import (#1753)
* Add has_imported_stats boolean to Site
* Add Google Analytics import panel to general settings
* Get GA profiles to display in import settings panel
* Add import_from_google method as entrypoint to import data
* Add imported_visitors table
* Remove conflicting code from migration
* Import visitors data into clickhouse database
* Pass another dataset to main graph for rendering in red
This adds another entry to the JSON data returned via the main graph API
called `imported_plot`, which is similar to `plot` in form but will be
completed with previously imported data. Currently it simply returns
the values from `plot` / 2. The data is rendered in the main graph in
red without fill, and without an indicator for the present. Rationale:
imported data will not continue to grow so there is no projection
forward, only backwards.
* Hook imported GA data to dashboard timeseries plot
* Add settings option to forget imported data
* Import sources from google analytics
* Merge imported sources when queried
* Merge imported source data native data when querying sources
* Start converting metrics to atoms so they can be subqueried
This changes "visitors" and in some places "sources" to atoms. This does
not change the behaviour of the functions - the tests all pass unchanged
following this commit. This is necessary as joining subqueries requires
that the keys in `select` statements be atoms and not strings.
* Convery GA (direct) source to empty string
* Import utm campaign and utm medium from GA
* format
* Import all data types from GA into new tables
* Handle large amounts of more data more safely
* Fix some mistakes in tables
* Make GA requests in chunks of 5 queries
* Only display imported timeseries when there is no filter
* Correctly show last 30 minutes timeseries when 'realtime'
* Add with_imported key to Query struct
* Account for injected :is_not filter on sources from dashboard
* Also add tentative imported_utm_sources table
This needs a bit more work on the google import side, as GA do not
report sources and utm sources as distinct things.
* Return imported data to dashboard for rest of Sources panel
This extends the merge_imported function definition for sources to
utm_sources, utm_mediums and utm_campaigns too. This appears to be
working on the DB side but something is incomplete on the client side.
* Clear imported stats from all tables when requested
* Merge entry pages and exit pages from imported data into unfiltered dashboard view
This requires converting the `"visits"` and `"visit_duration"` metrics
to atoms so that they can be used in ecto subqueries.
* Display imported devices, browsers and OSs on dashboard
* Display imported country data on dashboard
* Add more metrics to entries/exits for modals
* make sure data is returned via API with correct keys
* Import regions and cities from GA
* Capitalize device upon import to match native data
* Leave query limits/offsets until after possibly joining with imported data
* Also import timeOnPage and pageviews for pages from GA
* imported_countries -> imported_locations
* Get timeOnPage and pageviews for pages from GA
These are needed for the pages modal, and for calculating exit rates for
exit pages.
* Add indicator to dashboard when imported data is being used
* Don't show imported data as separately line on main graph
* "bounce_rate" -> :bounce_rate, so it works in subqueries
* Drop imported browser and OS versions
These are not needed.
* Toggle displaying imported data by clicking indicator
* Parse referrers with RefInspector
- Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual
referrer host + path, whereas 'ga:source' includes utm_mediums and
other values when relevant.
- 'ga:fullReferror' does however include search engine names directly,
so they are manually checked for as RefInspector won't pick up on
these.
* Keep imported data indicator on dashboard and strikethrough when hidden
* Add unlink google button to import panel
* Rename some GA browsers and OSes to plausible versions
* Get main top pages and exit pages panels working correctly with imported data
* mix format
* Fetch time_on_pages for imported data when needed
* entry pages need to fetch bounces from GA
* "sample_percent" -> :sample_percent as only atoms can be used in subqueries
* Calculate bounce_rate for joined native and imported data for top pages modal
* Flip some query bindings around to be less misleading
* Fixup entry page modal visit durations
* mix format
* Fetch bounces and visit_duration for sources from GA
* add more source metrics used for data in modals
* Make sources modals display correct values
* imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration
* Merge imported data into aggregate stats
* Reformat top graph side icons
* Ensure sample_percent is yielded from aggregate data
* filter event_props should be strings
* Hide imported data from frontend when using filter
* Fix existing tests
* fix tests
* Fix imported indicator appearing when filtering
* comma needed, lost when rebasing
* Import utm_terms and utm_content from GA
* Merge imported utm_term and utm_content
* Rename imported Countries data as Locations
* Set imported city schema field to int
* Remove utm_terms and utm_content when clearing imported
* Clean locations import from Google Analytics
- Country and region should be set to "" when GA provides "(not set)"
- City should be set to 0 for "unknown", as we cannot reliably import
city data from GA.
* Display imported region and city in dashboard
* os -> operating_system in some parts of code
The inconsistency of using os in some places and operating_system in
others causes trouble with subqueries and joins for the native and
imported data, which would require additional logic to account for. The
simplest solution is the just use a consistent word for all uses. This
doesn't make any user-facing or database changes.
* to_atom -> to_existing_atom
* format
* "events" metric -> :events
* ignore imported data when "events" in metrics
* update "bounce_rate"
* atomise some more metrics from new city and region api
* atomise some more metrics for email handlers
* "conversion_rate" -> :conversion_rate during csv export
* Move imported data stats code to own module
* Move imported timeseries function to Stats.Imported
* Use Timex.parse to import dates from GA
* has_imported_stats -> imported_source
* "time_on_page" -> :time_on_page
* Convert imported GA data to UTC
* Clean up GA request code a bit
There was some weird logic here with two separate lists that really
ought to be together, so this merges those.
* Fail sooner if GA timezone can't be identified
* Link imported tables to site by id
* imported_utm_content -> imported_utm_contents
* Imported GA from all of time
* Reorganise GA data fetch logic
- Fetch data from the start of time (2005)
- Check whether no data was fetched, and if so, inform user and don't
consider data to be imported.
* Clarify removal of "visits" data when it isn't in metrics
* Apply location filters from API
This makes it consistent with the sources etc which filter out 'Direct /
None' on the API side. These filters are used by both the native and
imported data handling code, which would otherwise both duplicate the
filters in their `where` clauses.
* Do not use changeset for setting site.imported_source
* Add all metrics to all dimensions
* Run GA import in the background
* Send email when GA import completes
* Add handler to insert imported data into tests and imported_browsers_factory
* Add remaining import data test factories
* Add imported location data to test
* Test main graph with imported data
* Add imported data to operating systems tests
* Add imported data to pages tests
* Add imported data to entry pages tests
* Add imported data to exit pages tests
* Add imported data to devices tests
* Add imported data to sources tests
* Add imported data to UTM tests
* Add new test module for the data import step
* Test import of sources GA data
* Test import of utm_mediums GA data
* Test import of utm_campaigns GA data
* Add tests for UTM terms
* Add tests for UTM contents
* Add test for importing pages and entry pages data from GA
* Add test for importing exit page data
* Fix module file name typo
* Add test for importing location data from GA
* Add test for importing devices data from GA
* Add test for importing browsers data from GA
* Add test for importing OS data from GA
* Paginate GA requests to download all data
* Bump clickhouse_ecto version
* Move RefInspector wrapper function into module
* Drop timezone transform on import
* Order imported by side_id then date
* More strings -> atoms
Also changes a conditional to be a bit nicer
* Remove parallelisation of data import
* Split sources and UTM sources from fetched GA data
GA has only a "source" dimension and no "UTM source" dimension. Instead
it returns these combined. The logic herein to tease these apart is:
1. "(direct)" -> it's a direct source
2. if the source is a domain -> it's a source
3. "google" -> it's from adwords; let's make this a UTM source "adwords"
4. else -> just a UTM source
* Keep prop names in queries as strings
* fix typo
* Fix import
* Insert data to clickhouse in batches
* Fix link when removing imported data
* Merge source tables
* Import hostname as well as pathname
* Record start and end time of imported data
* Track import progress
* Fix month interval with imported data
* Do not JOIN when imported date range has no overlap
* Fix time on page using exits
Co-authored-by: mcol <mcol@posteo.net>
2022-03-11 00:04:59 +03:00
|
|
|
setup [:create_user, :log_in, :create_new_site, :add_imported_data]
|
2020-07-30 11:18:28 +03:00
|
|
|
|
|
|
|
test "returns top entry pages by visitors", %{conn: conn, site: site} do
|
2021-07-23 13:44:05 +03:00
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page1",
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page1",
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page2",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page2",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:15:00]
|
|
|
|
)
|
|
|
|
])
|
2020-07-30 11:18:28 +03:00
|
|
|
|
2021-07-23 13:44:05 +03:00
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page2",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 23:15:00]
|
2020-07-30 11:18:28 +03:00
|
|
|
)
|
2021-07-23 13:44:05 +03:00
|
|
|
])
|
|
|
|
|
|
|
|
conn = get(conn, "/api/stats/#{site.domain}/entry-pages?period=day&date=2021-01-01")
|
2020-07-30 11:18:28 +03:00
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
2020-11-03 12:20:11 +03:00
|
|
|
%{
|
2021-11-04 15:20:39 +03:00
|
|
|
"unique_entrances" => 2,
|
|
|
|
"total_entrances" => 2,
|
2021-07-23 13:44:05 +03:00
|
|
|
"name" => "/page1",
|
|
|
|
"visit_duration" => 0
|
|
|
|
},
|
|
|
|
%{
|
2021-11-04 15:20:39 +03:00
|
|
|
"unique_entrances" => 1,
|
|
|
|
"total_entrances" => 2,
|
2021-07-23 13:44:05 +03:00
|
|
|
"name" => "/page2",
|
|
|
|
"visit_duration" => 450
|
2020-11-03 12:20:11 +03:00
|
|
|
}
|
|
|
|
]
|
2020-07-30 11:18:28 +03:00
|
|
|
end
|
2021-09-20 17:17:11 +03:00
|
|
|
|
2022-04-21 11:47:15 +03:00
|
|
|
test "returns top entry pages filtered by custom pageview props", %{conn: conn, site: site} do
|
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog",
|
|
|
|
user_id: 123,
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog/john-1",
|
|
|
|
"meta.key": ["author"],
|
|
|
|
"meta.value": ["John Doe"],
|
|
|
|
user_id: 123,
|
|
|
|
timestamp: ~N[2021-01-01 00:01:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog/john-2",
|
|
|
|
"meta.key": ["author"],
|
|
|
|
"meta.value": ["John Doe"],
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog/other-post",
|
|
|
|
"meta.key": ["author"],
|
|
|
|
"meta.value": ["other"],
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog",
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
)
|
|
|
|
])
|
|
|
|
|
|
|
|
filters = Jason.encode!(%{props: %{"author" => "John Doe"}})
|
|
|
|
|
|
|
|
conn =
|
|
|
|
get(
|
|
|
|
conn,
|
|
|
|
"/api/stats/#{site.domain}/entry-pages?period=day&date=2021-01-01&filters=#{filters}"
|
|
|
|
)
|
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
|
|
|
%{
|
|
|
|
"unique_entrances" => 1,
|
|
|
|
"total_entrances" => 1,
|
|
|
|
"name" => "/blog",
|
|
|
|
"visit_duration" => 60
|
|
|
|
},
|
|
|
|
%{
|
|
|
|
"unique_entrances" => 1,
|
|
|
|
"total_entrances" => 1,
|
|
|
|
"name" => "/blog/john-2",
|
|
|
|
"visit_duration" => 0
|
|
|
|
}
|
|
|
|
]
|
|
|
|
end
|
|
|
|
|
[Continued] Google Analytics import (#1753)
* Add has_imported_stats boolean to Site
* Add Google Analytics import panel to general settings
* Get GA profiles to display in import settings panel
* Add import_from_google method as entrypoint to import data
* Add imported_visitors table
* Remove conflicting code from migration
* Import visitors data into clickhouse database
* Pass another dataset to main graph for rendering in red
This adds another entry to the JSON data returned via the main graph API
called `imported_plot`, which is similar to `plot` in form but will be
completed with previously imported data. Currently it simply returns
the values from `plot` / 2. The data is rendered in the main graph in
red without fill, and without an indicator for the present. Rationale:
imported data will not continue to grow so there is no projection
forward, only backwards.
* Hook imported GA data to dashboard timeseries plot
* Add settings option to forget imported data
* Import sources from google analytics
* Merge imported sources when queried
* Merge imported source data native data when querying sources
* Start converting metrics to atoms so they can be subqueried
This changes "visitors" and in some places "sources" to atoms. This does
not change the behaviour of the functions - the tests all pass unchanged
following this commit. This is necessary as joining subqueries requires
that the keys in `select` statements be atoms and not strings.
* Convery GA (direct) source to empty string
* Import utm campaign and utm medium from GA
* format
* Import all data types from GA into new tables
* Handle large amounts of more data more safely
* Fix some mistakes in tables
* Make GA requests in chunks of 5 queries
* Only display imported timeseries when there is no filter
* Correctly show last 30 minutes timeseries when 'realtime'
* Add with_imported key to Query struct
* Account for injected :is_not filter on sources from dashboard
* Also add tentative imported_utm_sources table
This needs a bit more work on the google import side, as GA do not
report sources and utm sources as distinct things.
* Return imported data to dashboard for rest of Sources panel
This extends the merge_imported function definition for sources to
utm_sources, utm_mediums and utm_campaigns too. This appears to be
working on the DB side but something is incomplete on the client side.
* Clear imported stats from all tables when requested
* Merge entry pages and exit pages from imported data into unfiltered dashboard view
This requires converting the `"visits"` and `"visit_duration"` metrics
to atoms so that they can be used in ecto subqueries.
* Display imported devices, browsers and OSs on dashboard
* Display imported country data on dashboard
* Add more metrics to entries/exits for modals
* make sure data is returned via API with correct keys
* Import regions and cities from GA
* Capitalize device upon import to match native data
* Leave query limits/offsets until after possibly joining with imported data
* Also import timeOnPage and pageviews for pages from GA
* imported_countries -> imported_locations
* Get timeOnPage and pageviews for pages from GA
These are needed for the pages modal, and for calculating exit rates for
exit pages.
* Add indicator to dashboard when imported data is being used
* Don't show imported data as separately line on main graph
* "bounce_rate" -> :bounce_rate, so it works in subqueries
* Drop imported browser and OS versions
These are not needed.
* Toggle displaying imported data by clicking indicator
* Parse referrers with RefInspector
- Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual
referrer host + path, whereas 'ga:source' includes utm_mediums and
other values when relevant.
- 'ga:fullReferror' does however include search engine names directly,
so they are manually checked for as RefInspector won't pick up on
these.
* Keep imported data indicator on dashboard and strikethrough when hidden
* Add unlink google button to import panel
* Rename some GA browsers and OSes to plausible versions
* Get main top pages and exit pages panels working correctly with imported data
* mix format
* Fetch time_on_pages for imported data when needed
* entry pages need to fetch bounces from GA
* "sample_percent" -> :sample_percent as only atoms can be used in subqueries
* Calculate bounce_rate for joined native and imported data for top pages modal
* Flip some query bindings around to be less misleading
* Fixup entry page modal visit durations
* mix format
* Fetch bounces and visit_duration for sources from GA
* add more source metrics used for data in modals
* Make sources modals display correct values
* imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration
* Merge imported data into aggregate stats
* Reformat top graph side icons
* Ensure sample_percent is yielded from aggregate data
* filter event_props should be strings
* Hide imported data from frontend when using filter
* Fix existing tests
* fix tests
* Fix imported indicator appearing when filtering
* comma needed, lost when rebasing
* Import utm_terms and utm_content from GA
* Merge imported utm_term and utm_content
* Rename imported Countries data as Locations
* Set imported city schema field to int
* Remove utm_terms and utm_content when clearing imported
* Clean locations import from Google Analytics
- Country and region should be set to "" when GA provides "(not set)"
- City should be set to 0 for "unknown", as we cannot reliably import
city data from GA.
* Display imported region and city in dashboard
* os -> operating_system in some parts of code
The inconsistency of using os in some places and operating_system in
others causes trouble with subqueries and joins for the native and
imported data, which would require additional logic to account for. The
simplest solution is the just use a consistent word for all uses. This
doesn't make any user-facing or database changes.
* to_atom -> to_existing_atom
* format
* "events" metric -> :events
* ignore imported data when "events" in metrics
* update "bounce_rate"
* atomise some more metrics from new city and region api
* atomise some more metrics for email handlers
* "conversion_rate" -> :conversion_rate during csv export
* Move imported data stats code to own module
* Move imported timeseries function to Stats.Imported
* Use Timex.parse to import dates from GA
* has_imported_stats -> imported_source
* "time_on_page" -> :time_on_page
* Convert imported GA data to UTC
* Clean up GA request code a bit
There was some weird logic here with two separate lists that really
ought to be together, so this merges those.
* Fail sooner if GA timezone can't be identified
* Link imported tables to site by id
* imported_utm_content -> imported_utm_contents
* Imported GA from all of time
* Reorganise GA data fetch logic
- Fetch data from the start of time (2005)
- Check whether no data was fetched, and if so, inform user and don't
consider data to be imported.
* Clarify removal of "visits" data when it isn't in metrics
* Apply location filters from API
This makes it consistent with the sources etc which filter out 'Direct /
None' on the API side. These filters are used by both the native and
imported data handling code, which would otherwise both duplicate the
filters in their `where` clauses.
* Do not use changeset for setting site.imported_source
* Add all metrics to all dimensions
* Run GA import in the background
* Send email when GA import completes
* Add handler to insert imported data into tests and imported_browsers_factory
* Add remaining import data test factories
* Add imported location data to test
* Test main graph with imported data
* Add imported data to operating systems tests
* Add imported data to pages tests
* Add imported data to entry pages tests
* Add imported data to exit pages tests
* Add imported data to devices tests
* Add imported data to sources tests
* Add imported data to UTM tests
* Add new test module for the data import step
* Test import of sources GA data
* Test import of utm_mediums GA data
* Test import of utm_campaigns GA data
* Add tests for UTM terms
* Add tests for UTM contents
* Add test for importing pages and entry pages data from GA
* Add test for importing exit page data
* Fix module file name typo
* Add test for importing location data from GA
* Add test for importing devices data from GA
* Add test for importing browsers data from GA
* Add test for importing OS data from GA
* Paginate GA requests to download all data
* Bump clickhouse_ecto version
* Move RefInspector wrapper function into module
* Drop timezone transform on import
* Order imported by side_id then date
* More strings -> atoms
Also changes a conditional to be a bit nicer
* Remove parallelisation of data import
* Split sources and UTM sources from fetched GA data
GA has only a "source" dimension and no "UTM source" dimension. Instead
it returns these combined. The logic herein to tease these apart is:
1. "(direct)" -> it's a direct source
2. if the source is a domain -> it's a source
3. "google" -> it's from adwords; let's make this a UTM source "adwords"
4. else -> just a UTM source
* Keep prop names in queries as strings
* fix typo
* Fix import
* Insert data to clickhouse in batches
* Fix link when removing imported data
* Merge source tables
* Import hostname as well as pathname
* Record start and end time of imported data
* Track import progress
* Fix month interval with imported data
* Do not JOIN when imported date range has no overlap
* Fix time on page using exits
Co-authored-by: mcol <mcol@posteo.net>
2022-03-11 00:04:59 +03:00
|
|
|
test "returns top entry pages by visitors with imported data", %{conn: conn, site: site} do
|
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page1",
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page1",
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page2",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page2",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:15:00]
|
2022-06-06 10:44:33 +03:00
|
|
|
),
|
[Continued] Google Analytics import (#1753)
* Add has_imported_stats boolean to Site
* Add Google Analytics import panel to general settings
* Get GA profiles to display in import settings panel
* Add import_from_google method as entrypoint to import data
* Add imported_visitors table
* Remove conflicting code from migration
* Import visitors data into clickhouse database
* Pass another dataset to main graph for rendering in red
This adds another entry to the JSON data returned via the main graph API
called `imported_plot`, which is similar to `plot` in form but will be
completed with previously imported data. Currently it simply returns
the values from `plot` / 2. The data is rendered in the main graph in
red without fill, and without an indicator for the present. Rationale:
imported data will not continue to grow so there is no projection
forward, only backwards.
* Hook imported GA data to dashboard timeseries plot
* Add settings option to forget imported data
* Import sources from google analytics
* Merge imported sources when queried
* Merge imported source data native data when querying sources
* Start converting metrics to atoms so they can be subqueried
This changes "visitors" and in some places "sources" to atoms. This does
not change the behaviour of the functions - the tests all pass unchanged
following this commit. This is necessary as joining subqueries requires
that the keys in `select` statements be atoms and not strings.
* Convery GA (direct) source to empty string
* Import utm campaign and utm medium from GA
* format
* Import all data types from GA into new tables
* Handle large amounts of more data more safely
* Fix some mistakes in tables
* Make GA requests in chunks of 5 queries
* Only display imported timeseries when there is no filter
* Correctly show last 30 minutes timeseries when 'realtime'
* Add with_imported key to Query struct
* Account for injected :is_not filter on sources from dashboard
* Also add tentative imported_utm_sources table
This needs a bit more work on the google import side, as GA do not
report sources and utm sources as distinct things.
* Return imported data to dashboard for rest of Sources panel
This extends the merge_imported function definition for sources to
utm_sources, utm_mediums and utm_campaigns too. This appears to be
working on the DB side but something is incomplete on the client side.
* Clear imported stats from all tables when requested
* Merge entry pages and exit pages from imported data into unfiltered dashboard view
This requires converting the `"visits"` and `"visit_duration"` metrics
to atoms so that they can be used in ecto subqueries.
* Display imported devices, browsers and OSs on dashboard
* Display imported country data on dashboard
* Add more metrics to entries/exits for modals
* make sure data is returned via API with correct keys
* Import regions and cities from GA
* Capitalize device upon import to match native data
* Leave query limits/offsets until after possibly joining with imported data
* Also import timeOnPage and pageviews for pages from GA
* imported_countries -> imported_locations
* Get timeOnPage and pageviews for pages from GA
These are needed for the pages modal, and for calculating exit rates for
exit pages.
* Add indicator to dashboard when imported data is being used
* Don't show imported data as separately line on main graph
* "bounce_rate" -> :bounce_rate, so it works in subqueries
* Drop imported browser and OS versions
These are not needed.
* Toggle displaying imported data by clicking indicator
* Parse referrers with RefInspector
- Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual
referrer host + path, whereas 'ga:source' includes utm_mediums and
other values when relevant.
- 'ga:fullReferror' does however include search engine names directly,
so they are manually checked for as RefInspector won't pick up on
these.
* Keep imported data indicator on dashboard and strikethrough when hidden
* Add unlink google button to import panel
* Rename some GA browsers and OSes to plausible versions
* Get main top pages and exit pages panels working correctly with imported data
* mix format
* Fetch time_on_pages for imported data when needed
* entry pages need to fetch bounces from GA
* "sample_percent" -> :sample_percent as only atoms can be used in subqueries
* Calculate bounce_rate for joined native and imported data for top pages modal
* Flip some query bindings around to be less misleading
* Fixup entry page modal visit durations
* mix format
* Fetch bounces and visit_duration for sources from GA
* add more source metrics used for data in modals
* Make sources modals display correct values
* imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration
* Merge imported data into aggregate stats
* Reformat top graph side icons
* Ensure sample_percent is yielded from aggregate data
* filter event_props should be strings
* Hide imported data from frontend when using filter
* Fix existing tests
* fix tests
* Fix imported indicator appearing when filtering
* comma needed, lost when rebasing
* Import utm_terms and utm_content from GA
* Merge imported utm_term and utm_content
* Rename imported Countries data as Locations
* Set imported city schema field to int
* Remove utm_terms and utm_content when clearing imported
* Clean locations import from Google Analytics
- Country and region should be set to "" when GA provides "(not set)"
- City should be set to 0 for "unknown", as we cannot reliably import
city data from GA.
* Display imported region and city in dashboard
* os -> operating_system in some parts of code
The inconsistency of using os in some places and operating_system in
others causes trouble with subqueries and joins for the native and
imported data, which would require additional logic to account for. The
simplest solution is the just use a consistent word for all uses. This
doesn't make any user-facing or database changes.
* to_atom -> to_existing_atom
* format
* "events" metric -> :events
* ignore imported data when "events" in metrics
* update "bounce_rate"
* atomise some more metrics from new city and region api
* atomise some more metrics for email handlers
* "conversion_rate" -> :conversion_rate during csv export
* Move imported data stats code to own module
* Move imported timeseries function to Stats.Imported
* Use Timex.parse to import dates from GA
* has_imported_stats -> imported_source
* "time_on_page" -> :time_on_page
* Convert imported GA data to UTC
* Clean up GA request code a bit
There was some weird logic here with two separate lists that really
ought to be together, so this merges those.
* Fail sooner if GA timezone can't be identified
* Link imported tables to site by id
* imported_utm_content -> imported_utm_contents
* Imported GA from all of time
* Reorganise GA data fetch logic
- Fetch data from the start of time (2005)
- Check whether no data was fetched, and if so, inform user and don't
consider data to be imported.
* Clarify removal of "visits" data when it isn't in metrics
* Apply location filters from API
This makes it consistent with the sources etc which filter out 'Direct /
None' on the API side. These filters are used by both the native and
imported data handling code, which would otherwise both duplicate the
filters in their `where` clauses.
* Do not use changeset for setting site.imported_source
* Add all metrics to all dimensions
* Run GA import in the background
* Send email when GA import completes
* Add handler to insert imported data into tests and imported_browsers_factory
* Add remaining import data test factories
* Add imported location data to test
* Test main graph with imported data
* Add imported data to operating systems tests
* Add imported data to pages tests
* Add imported data to entry pages tests
* Add imported data to exit pages tests
* Add imported data to devices tests
* Add imported data to sources tests
* Add imported data to UTM tests
* Add new test module for the data import step
* Test import of sources GA data
* Test import of utm_mediums GA data
* Test import of utm_campaigns GA data
* Add tests for UTM terms
* Add tests for UTM contents
* Add test for importing pages and entry pages data from GA
* Add test for importing exit page data
* Fix module file name typo
* Add test for importing location data from GA
* Add test for importing devices data from GA
* Add test for importing browsers data from GA
* Add test for importing OS data from GA
* Paginate GA requests to download all data
* Bump clickhouse_ecto version
* Move RefInspector wrapper function into module
* Drop timezone transform on import
* Order imported by side_id then date
* More strings -> atoms
Also changes a conditional to be a bit nicer
* Remove parallelisation of data import
* Split sources and UTM sources from fetched GA data
GA has only a "source" dimension and no "UTM source" dimension. Instead
it returns these combined. The logic herein to tease these apart is:
1. "(direct)" -> it's a direct source
2. if the source is a domain -> it's a source
3. "google" -> it's from adwords; let's make this a UTM source "adwords"
4. else -> just a UTM source
* Keep prop names in queries as strings
* fix typo
* Fix import
* Insert data to clickhouse in batches
* Fix link when removing imported data
* Merge source tables
* Import hostname as well as pathname
* Record start and end time of imported data
* Track import progress
* Fix month interval with imported data
* Do not JOIN when imported date range has no overlap
* Fix time on page using exits
Co-authored-by: mcol <mcol@posteo.net>
2022-03-11 00:04:59 +03:00
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page2",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 23:15:00]
|
|
|
|
)
|
|
|
|
])
|
|
|
|
|
|
|
|
populate_stats(site, [
|
|
|
|
build(:imported_entry_pages,
|
|
|
|
entry_page: "/page2",
|
|
|
|
date: ~D[2021-01-01],
|
|
|
|
entrances: 3,
|
|
|
|
visitors: 2,
|
|
|
|
visit_duration: 300
|
|
|
|
)
|
|
|
|
])
|
|
|
|
|
|
|
|
conn = get(conn, "/api/stats/#{site.domain}/entry-pages?period=day&date=2021-01-01")
|
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
|
|
|
%{
|
|
|
|
"unique_entrances" => 2,
|
|
|
|
"total_entrances" => 2,
|
|
|
|
"name" => "/page1",
|
|
|
|
"visit_duration" => 0
|
|
|
|
},
|
|
|
|
%{
|
|
|
|
"unique_entrances" => 1,
|
|
|
|
"total_entrances" => 2,
|
|
|
|
"name" => "/page2",
|
|
|
|
"visit_duration" => 450
|
|
|
|
}
|
|
|
|
]
|
|
|
|
|
|
|
|
conn =
|
|
|
|
get(
|
|
|
|
conn,
|
|
|
|
"/api/stats/#{site.domain}/entry-pages?period=day&date=2021-01-01&with_imported=true"
|
|
|
|
)
|
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
|
|
|
%{
|
|
|
|
"unique_entrances" => 3,
|
|
|
|
"total_entrances" => 5,
|
|
|
|
"name" => "/page2",
|
|
|
|
"visit_duration" => 240.0
|
|
|
|
},
|
|
|
|
%{
|
|
|
|
"unique_entrances" => 2,
|
|
|
|
"total_entrances" => 2,
|
|
|
|
"name" => "/page1",
|
|
|
|
"visit_duration" => 0
|
|
|
|
}
|
|
|
|
]
|
|
|
|
end
|
|
|
|
|
2021-09-20 17:17:11 +03:00
|
|
|
test "calculates conversion_rate when filtering for goal", %{conn: conn, site: site} do
|
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page1",
|
|
|
|
user_id: 1,
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page1",
|
|
|
|
user_id: 2,
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:event,
|
|
|
|
name: "Signup",
|
|
|
|
user_id: 1,
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page2",
|
|
|
|
user_id: 3,
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page2",
|
|
|
|
user_id: 3,
|
|
|
|
timestamp: ~N[2021-01-01 00:15:00]
|
|
|
|
),
|
|
|
|
build(:event,
|
|
|
|
name: "Signup",
|
|
|
|
user_id: 3,
|
|
|
|
timestamp: ~N[2021-01-01 00:15:00]
|
|
|
|
)
|
|
|
|
])
|
|
|
|
|
|
|
|
filters = Jason.encode!(%{"goal" => "Signup"})
|
|
|
|
|
|
|
|
conn =
|
|
|
|
get(
|
|
|
|
conn,
|
|
|
|
"/api/stats/#{site.domain}/entry-pages?period=day&date=2021-01-01&filters=#{filters}"
|
|
|
|
)
|
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
|
|
|
%{
|
2021-09-29 14:28:29 +03:00
|
|
|
"total_visitors" => 2,
|
2021-11-04 15:20:39 +03:00
|
|
|
"unique_entrances" => 1,
|
|
|
|
"total_entrances" => 1,
|
2021-09-20 17:17:11 +03:00
|
|
|
"name" => "/page1",
|
|
|
|
"visit_duration" => 0,
|
|
|
|
"conversion_rate" => 50.0
|
2023-01-05 04:14:40 +03:00
|
|
|
},
|
|
|
|
%{
|
|
|
|
"total_visitors" => 1,
|
|
|
|
"unique_entrances" => 1,
|
|
|
|
"total_entrances" => 1,
|
|
|
|
"name" => "/page2",
|
|
|
|
"visit_duration" => 900,
|
|
|
|
"conversion_rate" => 100.0
|
2021-09-20 17:17:11 +03:00
|
|
|
}
|
|
|
|
]
|
|
|
|
end
|
2019-11-19 07:30:42 +03:00
|
|
|
end
|
Adds entry and exit pages to Top Pages module (#712)
* Initial Pass
* Adds support for page visits counting by referrer
* Includes goal selection in entry and exit computation
* Adds goal-based entry and exit page stats, formatting, code cleanup
* Changelog
* Format
* Exit rate, visit duration, updated tests
* I keep forgetting to format :/
* Tests, last time
* Fixes double counting, exit rate >100%, relevant tests
* Fixes exit pages on filter and goal states
* Adds entry and exit filters, fixes various bugs
* Fixes discussed issues
* Format
* Fixes impossible case in tests
Originally, there were only 2 pageviews for `test-site.com`,`/` on `2019-01-01`, but that doesn't make sense when there were 3 sessions that exited on the same site/date.
* Format
* Removes boolean function parameter in favor of separate function
* Adds support for queries that use `page` filter as `entry-page`
* Format
* Makes loader/title interaction in sources report consistent
2021-02-26 12:02:37 +03:00
|
|
|
|
|
|
|
describe "GET /api/stats/:domain/exit-pages" do
|
[Continued] Google Analytics import (#1753)
* Add has_imported_stats boolean to Site
* Add Google Analytics import panel to general settings
* Get GA profiles to display in import settings panel
* Add import_from_google method as entrypoint to import data
* Add imported_visitors table
* Remove conflicting code from migration
* Import visitors data into clickhouse database
* Pass another dataset to main graph for rendering in red
This adds another entry to the JSON data returned via the main graph API
called `imported_plot`, which is similar to `plot` in form but will be
completed with previously imported data. Currently it simply returns
the values from `plot` / 2. The data is rendered in the main graph in
red without fill, and without an indicator for the present. Rationale:
imported data will not continue to grow so there is no projection
forward, only backwards.
* Hook imported GA data to dashboard timeseries plot
* Add settings option to forget imported data
* Import sources from google analytics
* Merge imported sources when queried
* Merge imported source data native data when querying sources
* Start converting metrics to atoms so they can be subqueried
This changes "visitors" and in some places "sources" to atoms. This does
not change the behaviour of the functions - the tests all pass unchanged
following this commit. This is necessary as joining subqueries requires
that the keys in `select` statements be atoms and not strings.
* Convery GA (direct) source to empty string
* Import utm campaign and utm medium from GA
* format
* Import all data types from GA into new tables
* Handle large amounts of more data more safely
* Fix some mistakes in tables
* Make GA requests in chunks of 5 queries
* Only display imported timeseries when there is no filter
* Correctly show last 30 minutes timeseries when 'realtime'
* Add with_imported key to Query struct
* Account for injected :is_not filter on sources from dashboard
* Also add tentative imported_utm_sources table
This needs a bit more work on the google import side, as GA do not
report sources and utm sources as distinct things.
* Return imported data to dashboard for rest of Sources panel
This extends the merge_imported function definition for sources to
utm_sources, utm_mediums and utm_campaigns too. This appears to be
working on the DB side but something is incomplete on the client side.
* Clear imported stats from all tables when requested
* Merge entry pages and exit pages from imported data into unfiltered dashboard view
This requires converting the `"visits"` and `"visit_duration"` metrics
to atoms so that they can be used in ecto subqueries.
* Display imported devices, browsers and OSs on dashboard
* Display imported country data on dashboard
* Add more metrics to entries/exits for modals
* make sure data is returned via API with correct keys
* Import regions and cities from GA
* Capitalize device upon import to match native data
* Leave query limits/offsets until after possibly joining with imported data
* Also import timeOnPage and pageviews for pages from GA
* imported_countries -> imported_locations
* Get timeOnPage and pageviews for pages from GA
These are needed for the pages modal, and for calculating exit rates for
exit pages.
* Add indicator to dashboard when imported data is being used
* Don't show imported data as separately line on main graph
* "bounce_rate" -> :bounce_rate, so it works in subqueries
* Drop imported browser and OS versions
These are not needed.
* Toggle displaying imported data by clicking indicator
* Parse referrers with RefInspector
- Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual
referrer host + path, whereas 'ga:source' includes utm_mediums and
other values when relevant.
- 'ga:fullReferror' does however include search engine names directly,
so they are manually checked for as RefInspector won't pick up on
these.
* Keep imported data indicator on dashboard and strikethrough when hidden
* Add unlink google button to import panel
* Rename some GA browsers and OSes to plausible versions
* Get main top pages and exit pages panels working correctly with imported data
* mix format
* Fetch time_on_pages for imported data when needed
* entry pages need to fetch bounces from GA
* "sample_percent" -> :sample_percent as only atoms can be used in subqueries
* Calculate bounce_rate for joined native and imported data for top pages modal
* Flip some query bindings around to be less misleading
* Fixup entry page modal visit durations
* mix format
* Fetch bounces and visit_duration for sources from GA
* add more source metrics used for data in modals
* Make sources modals display correct values
* imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration
* Merge imported data into aggregate stats
* Reformat top graph side icons
* Ensure sample_percent is yielded from aggregate data
* filter event_props should be strings
* Hide imported data from frontend when using filter
* Fix existing tests
* fix tests
* Fix imported indicator appearing when filtering
* comma needed, lost when rebasing
* Import utm_terms and utm_content from GA
* Merge imported utm_term and utm_content
* Rename imported Countries data as Locations
* Set imported city schema field to int
* Remove utm_terms and utm_content when clearing imported
* Clean locations import from Google Analytics
- Country and region should be set to "" when GA provides "(not set)"
- City should be set to 0 for "unknown", as we cannot reliably import
city data from GA.
* Display imported region and city in dashboard
* os -> operating_system in some parts of code
The inconsistency of using os in some places and operating_system in
others causes trouble with subqueries and joins for the native and
imported data, which would require additional logic to account for. The
simplest solution is the just use a consistent word for all uses. This
doesn't make any user-facing or database changes.
* to_atom -> to_existing_atom
* format
* "events" metric -> :events
* ignore imported data when "events" in metrics
* update "bounce_rate"
* atomise some more metrics from new city and region api
* atomise some more metrics for email handlers
* "conversion_rate" -> :conversion_rate during csv export
* Move imported data stats code to own module
* Move imported timeseries function to Stats.Imported
* Use Timex.parse to import dates from GA
* has_imported_stats -> imported_source
* "time_on_page" -> :time_on_page
* Convert imported GA data to UTC
* Clean up GA request code a bit
There was some weird logic here with two separate lists that really
ought to be together, so this merges those.
* Fail sooner if GA timezone can't be identified
* Link imported tables to site by id
* imported_utm_content -> imported_utm_contents
* Imported GA from all of time
* Reorganise GA data fetch logic
- Fetch data from the start of time (2005)
- Check whether no data was fetched, and if so, inform user and don't
consider data to be imported.
* Clarify removal of "visits" data when it isn't in metrics
* Apply location filters from API
This makes it consistent with the sources etc which filter out 'Direct /
None' on the API side. These filters are used by both the native and
imported data handling code, which would otherwise both duplicate the
filters in their `where` clauses.
* Do not use changeset for setting site.imported_source
* Add all metrics to all dimensions
* Run GA import in the background
* Send email when GA import completes
* Add handler to insert imported data into tests and imported_browsers_factory
* Add remaining import data test factories
* Add imported location data to test
* Test main graph with imported data
* Add imported data to operating systems tests
* Add imported data to pages tests
* Add imported data to entry pages tests
* Add imported data to exit pages tests
* Add imported data to devices tests
* Add imported data to sources tests
* Add imported data to UTM tests
* Add new test module for the data import step
* Test import of sources GA data
* Test import of utm_mediums GA data
* Test import of utm_campaigns GA data
* Add tests for UTM terms
* Add tests for UTM contents
* Add test for importing pages and entry pages data from GA
* Add test for importing exit page data
* Fix module file name typo
* Add test for importing location data from GA
* Add test for importing devices data from GA
* Add test for importing browsers data from GA
* Add test for importing OS data from GA
* Paginate GA requests to download all data
* Bump clickhouse_ecto version
* Move RefInspector wrapper function into module
* Drop timezone transform on import
* Order imported by side_id then date
* More strings -> atoms
Also changes a conditional to be a bit nicer
* Remove parallelisation of data import
* Split sources and UTM sources from fetched GA data
GA has only a "source" dimension and no "UTM source" dimension. Instead
it returns these combined. The logic herein to tease these apart is:
1. "(direct)" -> it's a direct source
2. if the source is a domain -> it's a source
3. "google" -> it's from adwords; let's make this a UTM source "adwords"
4. else -> just a UTM source
* Keep prop names in queries as strings
* fix typo
* Fix import
* Insert data to clickhouse in batches
* Fix link when removing imported data
* Merge source tables
* Import hostname as well as pathname
* Record start and end time of imported data
* Track import progress
* Fix month interval with imported data
* Do not JOIN when imported date range has no overlap
* Fix time on page using exits
Co-authored-by: mcol <mcol@posteo.net>
2022-03-11 00:04:59 +03:00
|
|
|
setup [:create_user, :log_in, :create_new_site, :add_imported_data]
|
Adds entry and exit pages to Top Pages module (#712)
* Initial Pass
* Adds support for page visits counting by referrer
* Includes goal selection in entry and exit computation
* Adds goal-based entry and exit page stats, formatting, code cleanup
* Changelog
* Format
* Exit rate, visit duration, updated tests
* I keep forgetting to format :/
* Tests, last time
* Fixes double counting, exit rate >100%, relevant tests
* Fixes exit pages on filter and goal states
* Adds entry and exit filters, fixes various bugs
* Fixes discussed issues
* Format
* Fixes impossible case in tests
Originally, there were only 2 pageviews for `test-site.com`,`/` on `2019-01-01`, but that doesn't make sense when there were 3 sessions that exited on the same site/date.
* Format
* Removes boolean function parameter in favor of separate function
* Adds support for queries that use `page` filter as `entry-page`
* Format
* Makes loader/title interaction in sources report consistent
2021-02-26 12:02:37 +03:00
|
|
|
|
|
|
|
test "returns top exit pages by visitors", %{conn: conn, site: site} do
|
2021-07-23 13:44:05 +03:00
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page1",
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page1",
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page1",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page2",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:15:00]
|
|
|
|
)
|
|
|
|
])
|
|
|
|
|
|
|
|
conn = get(conn, "/api/stats/#{site.domain}/exit-pages?period=day&date=2021-01-01")
|
Adds entry and exit pages to Top Pages module (#712)
* Initial Pass
* Adds support for page visits counting by referrer
* Includes goal selection in entry and exit computation
* Adds goal-based entry and exit page stats, formatting, code cleanup
* Changelog
* Format
* Exit rate, visit duration, updated tests
* I keep forgetting to format :/
* Tests, last time
* Fixes double counting, exit rate >100%, relevant tests
* Fixes exit pages on filter and goal states
* Adds entry and exit filters, fixes various bugs
* Fixes discussed issues
* Format
* Fixes impossible case in tests
Originally, there were only 2 pageviews for `test-site.com`,`/` on `2019-01-01`, but that doesn't make sense when there were 3 sessions that exited on the same site/date.
* Format
* Removes boolean function parameter in favor of separate function
* Adds support for queries that use `page` filter as `entry-page`
* Format
* Makes loader/title interaction in sources report consistent
2021-02-26 12:02:37 +03:00
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
2021-11-04 15:20:39 +03:00
|
|
|
%{"name" => "/page1", "unique_exits" => 2, "total_exits" => 2, "exit_rate" => 66},
|
|
|
|
%{"name" => "/page2", "unique_exits" => 1, "total_exits" => 1, "exit_rate" => 100}
|
Adds entry and exit pages to Top Pages module (#712)
* Initial Pass
* Adds support for page visits counting by referrer
* Includes goal selection in entry and exit computation
* Adds goal-based entry and exit page stats, formatting, code cleanup
* Changelog
* Format
* Exit rate, visit duration, updated tests
* I keep forgetting to format :/
* Tests, last time
* Fixes double counting, exit rate >100%, relevant tests
* Fixes exit pages on filter and goal states
* Adds entry and exit filters, fixes various bugs
* Fixes discussed issues
* Format
* Fixes impossible case in tests
Originally, there were only 2 pageviews for `test-site.com`,`/` on `2019-01-01`, but that doesn't make sense when there were 3 sessions that exited on the same site/date.
* Format
* Removes boolean function parameter in favor of separate function
* Adds support for queries that use `page` filter as `entry-page`
* Format
* Makes loader/title interaction in sources report consistent
2021-02-26 12:02:37 +03:00
|
|
|
]
|
|
|
|
end
|
2021-08-19 10:32:03 +03:00
|
|
|
|
2022-04-21 11:47:15 +03:00
|
|
|
test "returns top exit pages filtered by custom pageview props", %{conn: conn, site: site} do
|
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog/john-1",
|
|
|
|
"meta.key": ["author"],
|
|
|
|
"meta.value": ["John Doe"],
|
|
|
|
user_id: 123,
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/",
|
|
|
|
user_id: 123,
|
|
|
|
timestamp: ~N[2021-01-01 00:01:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/blog/other-post",
|
|
|
|
"meta.key": ["author"],
|
|
|
|
"meta.value": ["other"],
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/",
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
)
|
|
|
|
])
|
|
|
|
|
|
|
|
filters = Jason.encode!(%{props: %{"author" => "John Doe"}})
|
|
|
|
|
|
|
|
conn =
|
|
|
|
get(
|
|
|
|
conn,
|
|
|
|
"/api/stats/#{site.domain}/exit-pages?period=day&date=2021-01-01&filters=#{filters}"
|
|
|
|
)
|
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
|
|
|
%{"name" => "/", "unique_exits" => 1, "total_exits" => 1}
|
|
|
|
]
|
|
|
|
end
|
|
|
|
|
[Continued] Google Analytics import (#1753)
* Add has_imported_stats boolean to Site
* Add Google Analytics import panel to general settings
* Get GA profiles to display in import settings panel
* Add import_from_google method as entrypoint to import data
* Add imported_visitors table
* Remove conflicting code from migration
* Import visitors data into clickhouse database
* Pass another dataset to main graph for rendering in red
This adds another entry to the JSON data returned via the main graph API
called `imported_plot`, which is similar to `plot` in form but will be
completed with previously imported data. Currently it simply returns
the values from `plot` / 2. The data is rendered in the main graph in
red without fill, and without an indicator for the present. Rationale:
imported data will not continue to grow so there is no projection
forward, only backwards.
* Hook imported GA data to dashboard timeseries plot
* Add settings option to forget imported data
* Import sources from google analytics
* Merge imported sources when queried
* Merge imported source data native data when querying sources
* Start converting metrics to atoms so they can be subqueried
This changes "visitors" and in some places "sources" to atoms. This does
not change the behaviour of the functions - the tests all pass unchanged
following this commit. This is necessary as joining subqueries requires
that the keys in `select` statements be atoms and not strings.
* Convery GA (direct) source to empty string
* Import utm campaign and utm medium from GA
* format
* Import all data types from GA into new tables
* Handle large amounts of more data more safely
* Fix some mistakes in tables
* Make GA requests in chunks of 5 queries
* Only display imported timeseries when there is no filter
* Correctly show last 30 minutes timeseries when 'realtime'
* Add with_imported key to Query struct
* Account for injected :is_not filter on sources from dashboard
* Also add tentative imported_utm_sources table
This needs a bit more work on the google import side, as GA do not
report sources and utm sources as distinct things.
* Return imported data to dashboard for rest of Sources panel
This extends the merge_imported function definition for sources to
utm_sources, utm_mediums and utm_campaigns too. This appears to be
working on the DB side but something is incomplete on the client side.
* Clear imported stats from all tables when requested
* Merge entry pages and exit pages from imported data into unfiltered dashboard view
This requires converting the `"visits"` and `"visit_duration"` metrics
to atoms so that they can be used in ecto subqueries.
* Display imported devices, browsers and OSs on dashboard
* Display imported country data on dashboard
* Add more metrics to entries/exits for modals
* make sure data is returned via API with correct keys
* Import regions and cities from GA
* Capitalize device upon import to match native data
* Leave query limits/offsets until after possibly joining with imported data
* Also import timeOnPage and pageviews for pages from GA
* imported_countries -> imported_locations
* Get timeOnPage and pageviews for pages from GA
These are needed for the pages modal, and for calculating exit rates for
exit pages.
* Add indicator to dashboard when imported data is being used
* Don't show imported data as separately line on main graph
* "bounce_rate" -> :bounce_rate, so it works in subqueries
* Drop imported browser and OS versions
These are not needed.
* Toggle displaying imported data by clicking indicator
* Parse referrers with RefInspector
- Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual
referrer host + path, whereas 'ga:source' includes utm_mediums and
other values when relevant.
- 'ga:fullReferror' does however include search engine names directly,
so they are manually checked for as RefInspector won't pick up on
these.
* Keep imported data indicator on dashboard and strikethrough when hidden
* Add unlink google button to import panel
* Rename some GA browsers and OSes to plausible versions
* Get main top pages and exit pages panels working correctly with imported data
* mix format
* Fetch time_on_pages for imported data when needed
* entry pages need to fetch bounces from GA
* "sample_percent" -> :sample_percent as only atoms can be used in subqueries
* Calculate bounce_rate for joined native and imported data for top pages modal
* Flip some query bindings around to be less misleading
* Fixup entry page modal visit durations
* mix format
* Fetch bounces and visit_duration for sources from GA
* add more source metrics used for data in modals
* Make sources modals display correct values
* imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration
* Merge imported data into aggregate stats
* Reformat top graph side icons
* Ensure sample_percent is yielded from aggregate data
* filter event_props should be strings
* Hide imported data from frontend when using filter
* Fix existing tests
* fix tests
* Fix imported indicator appearing when filtering
* comma needed, lost when rebasing
* Import utm_terms and utm_content from GA
* Merge imported utm_term and utm_content
* Rename imported Countries data as Locations
* Set imported city schema field to int
* Remove utm_terms and utm_content when clearing imported
* Clean locations import from Google Analytics
- Country and region should be set to "" when GA provides "(not set)"
- City should be set to 0 for "unknown", as we cannot reliably import
city data from GA.
* Display imported region and city in dashboard
* os -> operating_system in some parts of code
The inconsistency of using os in some places and operating_system in
others causes trouble with subqueries and joins for the native and
imported data, which would require additional logic to account for. The
simplest solution is the just use a consistent word for all uses. This
doesn't make any user-facing or database changes.
* to_atom -> to_existing_atom
* format
* "events" metric -> :events
* ignore imported data when "events" in metrics
* update "bounce_rate"
* atomise some more metrics from new city and region api
* atomise some more metrics for email handlers
* "conversion_rate" -> :conversion_rate during csv export
* Move imported data stats code to own module
* Move imported timeseries function to Stats.Imported
* Use Timex.parse to import dates from GA
* has_imported_stats -> imported_source
* "time_on_page" -> :time_on_page
* Convert imported GA data to UTC
* Clean up GA request code a bit
There was some weird logic here with two separate lists that really
ought to be together, so this merges those.
* Fail sooner if GA timezone can't be identified
* Link imported tables to site by id
* imported_utm_content -> imported_utm_contents
* Imported GA from all of time
* Reorganise GA data fetch logic
- Fetch data from the start of time (2005)
- Check whether no data was fetched, and if so, inform user and don't
consider data to be imported.
* Clarify removal of "visits" data when it isn't in metrics
* Apply location filters from API
This makes it consistent with the sources etc which filter out 'Direct /
None' on the API side. These filters are used by both the native and
imported data handling code, which would otherwise both duplicate the
filters in their `where` clauses.
* Do not use changeset for setting site.imported_source
* Add all metrics to all dimensions
* Run GA import in the background
* Send email when GA import completes
* Add handler to insert imported data into tests and imported_browsers_factory
* Add remaining import data test factories
* Add imported location data to test
* Test main graph with imported data
* Add imported data to operating systems tests
* Add imported data to pages tests
* Add imported data to entry pages tests
* Add imported data to exit pages tests
* Add imported data to devices tests
* Add imported data to sources tests
* Add imported data to UTM tests
* Add new test module for the data import step
* Test import of sources GA data
* Test import of utm_mediums GA data
* Test import of utm_campaigns GA data
* Add tests for UTM terms
* Add tests for UTM contents
* Add test for importing pages and entry pages data from GA
* Add test for importing exit page data
* Fix module file name typo
* Add test for importing location data from GA
* Add test for importing devices data from GA
* Add test for importing browsers data from GA
* Add test for importing OS data from GA
* Paginate GA requests to download all data
* Bump clickhouse_ecto version
* Move RefInspector wrapper function into module
* Drop timezone transform on import
* Order imported by side_id then date
* More strings -> atoms
Also changes a conditional to be a bit nicer
* Remove parallelisation of data import
* Split sources and UTM sources from fetched GA data
GA has only a "source" dimension and no "UTM source" dimension. Instead
it returns these combined. The logic herein to tease these apart is:
1. "(direct)" -> it's a direct source
2. if the source is a domain -> it's a source
3. "google" -> it's from adwords; let's make this a UTM source "adwords"
4. else -> just a UTM source
* Keep prop names in queries as strings
* fix typo
* Fix import
* Insert data to clickhouse in batches
* Fix link when removing imported data
* Merge source tables
* Import hostname as well as pathname
* Record start and end time of imported data
* Track import progress
* Fix month interval with imported data
* Do not JOIN when imported date range has no overlap
* Fix time on page using exits
Co-authored-by: mcol <mcol@posteo.net>
2022-03-11 00:04:59 +03:00
|
|
|
test "returns top exit pages by visitors with imported data", %{conn: conn, site: site} do
|
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page1",
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page1",
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page1",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
pathname: "/page2",
|
|
|
|
user_id: @user_id,
|
|
|
|
timestamp: ~N[2021-01-01 00:15:00]
|
|
|
|
)
|
|
|
|
])
|
|
|
|
|
|
|
|
populate_stats(site, [
|
|
|
|
build(:imported_pages,
|
|
|
|
page: "/page2",
|
|
|
|
date: ~D[2021-01-01],
|
|
|
|
pageviews: 4,
|
|
|
|
visitors: 2
|
|
|
|
),
|
|
|
|
build(:imported_exit_pages,
|
|
|
|
exit_page: "/page2",
|
|
|
|
date: ~D[2021-01-01],
|
|
|
|
exits: 3,
|
|
|
|
visitors: 2
|
|
|
|
)
|
|
|
|
])
|
|
|
|
|
|
|
|
conn = get(conn, "/api/stats/#{site.domain}/exit-pages?period=day&date=2021-01-01")
|
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
|
|
|
%{"name" => "/page1", "unique_exits" => 2, "total_exits" => 2, "exit_rate" => 66},
|
|
|
|
%{"name" => "/page2", "unique_exits" => 1, "total_exits" => 1, "exit_rate" => 100}
|
|
|
|
]
|
|
|
|
|
|
|
|
conn =
|
|
|
|
get(
|
|
|
|
conn,
|
|
|
|
"/api/stats/#{site.domain}/exit-pages?period=day&date=2021-01-01&with_imported=true"
|
|
|
|
)
|
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
|
|
|
%{
|
|
|
|
"name" => "/page2",
|
|
|
|
"unique_exits" => 3,
|
|
|
|
"total_exits" => 4,
|
|
|
|
"exit_rate" => 80.0
|
|
|
|
},
|
|
|
|
%{"name" => "/page1", "unique_exits" => 2, "total_exits" => 2, "exit_rate" => 66}
|
|
|
|
]
|
|
|
|
end
|
|
|
|
|
2021-09-20 17:17:11 +03:00
|
|
|
test "calculates correct exit rate and conversion_rate when filtering for goal", %{
|
|
|
|
conn: conn,
|
|
|
|
site: site
|
|
|
|
} do
|
2021-08-19 10:32:03 +03:00
|
|
|
populate_stats(site, [
|
|
|
|
build(:event,
|
|
|
|
name: "Signup",
|
|
|
|
user_id: 1,
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
user_id: 1,
|
|
|
|
pathname: "/exit1",
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:event,
|
|
|
|
name: "Signup",
|
|
|
|
user_id: 2,
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
user_id: 2,
|
|
|
|
pathname: "/exit1",
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
user_id: 2,
|
|
|
|
pathname: "/exit2",
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
)
|
|
|
|
])
|
|
|
|
|
|
|
|
filters = Jason.encode!(%{"goal" => "Signup"})
|
|
|
|
|
|
|
|
conn =
|
|
|
|
get(
|
|
|
|
conn,
|
|
|
|
"/api/stats/#{site.domain}/exit-pages?period=day&date=2021-01-01&filters=#{filters}"
|
|
|
|
)
|
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
2021-09-20 17:17:11 +03:00
|
|
|
%{
|
|
|
|
"name" => "/exit1",
|
2021-11-04 15:20:39 +03:00
|
|
|
"unique_exits" => 1,
|
2021-09-29 14:28:29 +03:00
|
|
|
"total_visitors" => 1,
|
2021-11-04 15:20:39 +03:00
|
|
|
"total_exits" => 1,
|
2021-09-20 17:17:11 +03:00
|
|
|
"conversion_rate" => 100.0
|
|
|
|
},
|
|
|
|
%{
|
|
|
|
"name" => "/exit2",
|
2021-11-04 15:20:39 +03:00
|
|
|
"unique_exits" => 1,
|
2021-09-29 14:28:29 +03:00
|
|
|
"total_visitors" => 1,
|
2021-11-04 15:20:39 +03:00
|
|
|
"total_exits" => 1,
|
2021-09-20 17:17:11 +03:00
|
|
|
"conversion_rate" => 100.0
|
|
|
|
}
|
2021-08-19 10:32:03 +03:00
|
|
|
]
|
|
|
|
end
|
|
|
|
|
|
|
|
test "calculates correct exit rate when filtering for page", %{conn: conn, site: site} do
|
|
|
|
populate_stats(site, [
|
|
|
|
build(:pageview,
|
|
|
|
user_id: 1,
|
|
|
|
pathname: "/exit1",
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
user_id: 2,
|
|
|
|
pathname: "/exit1",
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
user_id: 2,
|
|
|
|
pathname: "/exit2",
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
user_id: 3,
|
|
|
|
pathname: "/exit2",
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
),
|
|
|
|
build(:pageview,
|
|
|
|
user_id: 3,
|
|
|
|
pathname: "/should-not-appear",
|
|
|
|
timestamp: ~N[2021-01-01 00:00:00]
|
|
|
|
)
|
|
|
|
])
|
|
|
|
|
|
|
|
filters = Jason.encode!(%{"page" => "/exit1"})
|
|
|
|
|
|
|
|
conn =
|
|
|
|
get(
|
|
|
|
conn,
|
|
|
|
"/api/stats/#{site.domain}/exit-pages?period=day&date=2021-01-01&filters=#{filters}"
|
|
|
|
)
|
|
|
|
|
|
|
|
assert json_response(conn, 200) == [
|
2022-04-21 11:47:15 +03:00
|
|
|
%{"name" => "/exit1", "unique_exits" => 1, "total_exits" => 1},
|
|
|
|
%{"name" => "/exit2", "unique_exits" => 1, "total_exits" => 1}
|
2021-08-19 10:32:03 +03:00
|
|
|
]
|
|
|
|
end
|
Adds entry and exit pages to Top Pages module (#712)
* Initial Pass
* Adds support for page visits counting by referrer
* Includes goal selection in entry and exit computation
* Adds goal-based entry and exit page stats, formatting, code cleanup
* Changelog
* Format
* Exit rate, visit duration, updated tests
* I keep forgetting to format :/
* Tests, last time
* Fixes double counting, exit rate >100%, relevant tests
* Fixes exit pages on filter and goal states
* Adds entry and exit filters, fixes various bugs
* Fixes discussed issues
* Format
* Fixes impossible case in tests
Originally, there were only 2 pageviews for `test-site.com`,`/` on `2019-01-01`, but that doesn't make sense when there were 3 sessions that exited on the same site/date.
* Format
* Removes boolean function parameter in favor of separate function
* Adds support for queries that use `page` filter as `entry-page`
* Format
* Makes loader/title interaction in sources report consistent
2021-02-26 12:02:37 +03:00
|
|
|
end
|
2019-11-19 07:30:42 +03:00
|
|
|
end
|