mirror of
https://github.com/plausible/analytics.git
synced 2024-12-22 00:51:36 +03:00
e27734ed79
* Add has_imported_stats boolean to Site * Add Google Analytics import panel to general settings * Get GA profiles to display in import settings panel * Add import_from_google method as entrypoint to import data * Add imported_visitors table * Remove conflicting code from migration * Import visitors data into clickhouse database * Pass another dataset to main graph for rendering in red This adds another entry to the JSON data returned via the main graph API called `imported_plot`, which is similar to `plot` in form but will be completed with previously imported data. Currently it simply returns the values from `plot` / 2. The data is rendered in the main graph in red without fill, and without an indicator for the present. Rationale: imported data will not continue to grow so there is no projection forward, only backwards. * Hook imported GA data to dashboard timeseries plot * Add settings option to forget imported data * Import sources from google analytics * Merge imported sources when queried * Merge imported source data native data when querying sources * Start converting metrics to atoms so they can be subqueried This changes "visitors" and in some places "sources" to atoms. This does not change the behaviour of the functions - the tests all pass unchanged following this commit. This is necessary as joining subqueries requires that the keys in `select` statements be atoms and not strings. * Convery GA (direct) source to empty string * Import utm campaign and utm medium from GA * format * Import all data types from GA into new tables * Handle large amounts of more data more safely * Fix some mistakes in tables * Make GA requests in chunks of 5 queries * Only display imported timeseries when there is no filter * Correctly show last 30 minutes timeseries when 'realtime' * Add with_imported key to Query struct * Account for injected :is_not filter on sources from dashboard * Also add tentative imported_utm_sources table This needs a bit more work on the google import side, as GA do not report sources and utm sources as distinct things. * Return imported data to dashboard for rest of Sources panel This extends the merge_imported function definition for sources to utm_sources, utm_mediums and utm_campaigns too. This appears to be working on the DB side but something is incomplete on the client side. * Clear imported stats from all tables when requested * Merge entry pages and exit pages from imported data into unfiltered dashboard view This requires converting the `"visits"` and `"visit_duration"` metrics to atoms so that they can be used in ecto subqueries. * Display imported devices, browsers and OSs on dashboard * Display imported country data on dashboard * Add more metrics to entries/exits for modals * make sure data is returned via API with correct keys * Import regions and cities from GA * Capitalize device upon import to match native data * Leave query limits/offsets until after possibly joining with imported data * Also import timeOnPage and pageviews for pages from GA * imported_countries -> imported_locations * Get timeOnPage and pageviews for pages from GA These are needed for the pages modal, and for calculating exit rates for exit pages. * Add indicator to dashboard when imported data is being used * Don't show imported data as separately line on main graph * "bounce_rate" -> :bounce_rate, so it works in subqueries * Drop imported browser and OS versions These are not needed. * Toggle displaying imported data by clicking indicator * Parse referrers with RefInspector - Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual referrer host + path, whereas 'ga:source' includes utm_mediums and other values when relevant. - 'ga:fullReferror' does however include search engine names directly, so they are manually checked for as RefInspector won't pick up on these. * Keep imported data indicator on dashboard and strikethrough when hidden * Add unlink google button to import panel * Rename some GA browsers and OSes to plausible versions * Get main top pages and exit pages panels working correctly with imported data * mix format * Fetch time_on_pages for imported data when needed * entry pages need to fetch bounces from GA * "sample_percent" -> :sample_percent as only atoms can be used in subqueries * Calculate bounce_rate for joined native and imported data for top pages modal * Flip some query bindings around to be less misleading * Fixup entry page modal visit durations * mix format * Fetch bounces and visit_duration for sources from GA * add more source metrics used for data in modals * Make sources modals display correct values * imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration * Merge imported data into aggregate stats * Reformat top graph side icons * Ensure sample_percent is yielded from aggregate data * filter event_props should be strings * Hide imported data from frontend when using filter * Fix existing tests * fix tests * Fix imported indicator appearing when filtering * comma needed, lost when rebasing * Import utm_terms and utm_content from GA * Merge imported utm_term and utm_content * Rename imported Countries data as Locations * Set imported city schema field to int * Remove utm_terms and utm_content when clearing imported * Clean locations import from Google Analytics - Country and region should be set to "" when GA provides "(not set)" - City should be set to 0 for "unknown", as we cannot reliably import city data from GA. * Display imported region and city in dashboard * os -> operating_system in some parts of code The inconsistency of using os in some places and operating_system in others causes trouble with subqueries and joins for the native and imported data, which would require additional logic to account for. The simplest solution is the just use a consistent word for all uses. This doesn't make any user-facing or database changes. * to_atom -> to_existing_atom * format * "events" metric -> :events * ignore imported data when "events" in metrics * update "bounce_rate" * atomise some more metrics from new city and region api * atomise some more metrics for email handlers * "conversion_rate" -> :conversion_rate during csv export * Move imported data stats code to own module * Move imported timeseries function to Stats.Imported * Use Timex.parse to import dates from GA * has_imported_stats -> imported_source * "time_on_page" -> :time_on_page * Convert imported GA data to UTC * Clean up GA request code a bit There was some weird logic here with two separate lists that really ought to be together, so this merges those. * Fail sooner if GA timezone can't be identified * Link imported tables to site by id * imported_utm_content -> imported_utm_contents * Imported GA from all of time * Reorganise GA data fetch logic - Fetch data from the start of time (2005) - Check whether no data was fetched, and if so, inform user and don't consider data to be imported. * Clarify removal of "visits" data when it isn't in metrics * Apply location filters from API This makes it consistent with the sources etc which filter out 'Direct / None' on the API side. These filters are used by both the native and imported data handling code, which would otherwise both duplicate the filters in their `where` clauses. * Do not use changeset for setting site.imported_source * Add all metrics to all dimensions * Run GA import in the background * Send email when GA import completes * Add handler to insert imported data into tests and imported_browsers_factory * Add remaining import data test factories * Add imported location data to test * Test main graph with imported data * Add imported data to operating systems tests * Add imported data to pages tests * Add imported data to entry pages tests * Add imported data to exit pages tests * Add imported data to devices tests * Add imported data to sources tests * Add imported data to UTM tests * Add new test module for the data import step * Test import of sources GA data * Test import of utm_mediums GA data * Test import of utm_campaigns GA data * Add tests for UTM terms * Add tests for UTM contents * Add test for importing pages and entry pages data from GA * Add test for importing exit page data * Fix module file name typo * Add test for importing location data from GA * Add test for importing devices data from GA * Add test for importing browsers data from GA * Add test for importing OS data from GA * Paginate GA requests to download all data * Bump clickhouse_ecto version * Move RefInspector wrapper function into module * Drop timezone transform on import * Order imported by side_id then date * More strings -> atoms Also changes a conditional to be a bit nicer * Remove parallelisation of data import * Split sources and UTM sources from fetched GA data GA has only a "source" dimension and no "UTM source" dimension. Instead it returns these combined. The logic herein to tease these apart is: 1. "(direct)" -> it's a direct source 2. if the source is a domain -> it's a source 3. "google" -> it's from adwords; let's make this a UTM source "adwords" 4. else -> just a UTM source * Keep prop names in queries as strings * fix typo * Fix import * Insert data to clickhouse in batches * Fix link when removing imported data * Merge source tables * Import hostname as well as pathname * Record start and end time of imported data * Track import progress * Fix month interval with imported data * Do not JOIN when imported date range has no overlap * Fix time on page using exits Co-authored-by: mcol <mcol@posteo.net>
209 lines
7.7 KiB
Elixir
209 lines
7.7 KiB
Elixir
defmodule PlausibleWeb.StatsController do
|
|
use PlausibleWeb, :controller
|
|
use Plausible.Repo
|
|
alias PlausibleWeb.Api
|
|
alias Plausible.Stats.{Query, Filters}
|
|
|
|
plug PlausibleWeb.AuthorizeSiteAccess when action in [:stats, :csv_export]
|
|
|
|
def stats(%{assigns: %{site: site}} = conn, _params) do
|
|
has_stats = Plausible.Sites.has_stats?(site)
|
|
can_see_stats = !site.locked || conn.assigns[:current_user_role] == :super_admin
|
|
|
|
cond do
|
|
has_stats && can_see_stats ->
|
|
demo = site.domain == PlausibleWeb.Endpoint.host()
|
|
offer_email_report = get_session(conn, site.domain <> "_offer_email_report")
|
|
|
|
conn
|
|
|> assign(:skip_plausible_tracking, !demo)
|
|
|> remove_email_report_banner(site)
|
|
|> put_resp_header("x-robots-tag", "noindex")
|
|
|> render("stats.html",
|
|
site: site,
|
|
has_goals: Plausible.Sites.has_goals?(site),
|
|
title: "Plausible · " <> site.domain,
|
|
offer_email_report: offer_email_report,
|
|
demo: demo
|
|
)
|
|
|
|
!has_stats && can_see_stats ->
|
|
conn
|
|
|> assign(:skip_plausible_tracking, true)
|
|
|> render("waiting_first_pageview.html", site: site)
|
|
|
|
site.locked ->
|
|
owner = Plausible.Sites.owner_for(site)
|
|
|
|
conn
|
|
|> assign(:skip_plausible_tracking, true)
|
|
|> render("site_locked.html", owner: owner, site: site)
|
|
end
|
|
end
|
|
|
|
@doc """
|
|
The export is limited to 300 entries for other reports and 100 entries for pages because bigger result sets
|
|
start causing failures. Since we request data like time on page or bounce_rate for pages in a separate query
|
|
using the IN filter, it causes the requests to balloon in payload size.
|
|
"""
|
|
def csv_export(conn, params) do
|
|
site = conn.assigns[:site]
|
|
query = Query.from(site, params) |> Filters.add_prefix()
|
|
|
|
metrics = [:visitors, :pageviews, :bounce_rate, :visit_duration]
|
|
graph = Plausible.Stats.timeseries(site, query, metrics)
|
|
headers = [:date | metrics]
|
|
|
|
visitors =
|
|
Enum.map(graph, fn row -> Enum.map(headers, &row[&1]) end)
|
|
|> (fn data -> [headers | data] end).()
|
|
|> CSV.encode()
|
|
|> Enum.join()
|
|
|
|
filename =
|
|
"Plausible export #{params["domain"]} #{Timex.format!(query.date_range.first, "{ISOdate} ")} to #{Timex.format!(query.date_range.last, "{ISOdate} ")}.zip"
|
|
|
|
params = Map.merge(params, %{"limit" => "300", "csv" => "True", "detailed" => "True"})
|
|
limited_params = Map.merge(params, %{"limit" => "100"})
|
|
|
|
csvs = [
|
|
{'sources.csv', fn -> Api.StatsController.sources(conn, params) end},
|
|
{'utm_mediums.csv', fn -> Api.StatsController.utm_mediums(conn, params) end},
|
|
{'utm_sources.csv', fn -> Api.StatsController.utm_sources(conn, params) end},
|
|
{'utm_campaigns.csv', fn -> Api.StatsController.utm_campaigns(conn, params) end},
|
|
{'utm_contents.csv', fn -> Api.StatsController.utm_contents(conn, params) end},
|
|
{'utm_terms.csv', fn -> Api.StatsController.utm_terms(conn, params) end},
|
|
{'pages.csv', fn -> Api.StatsController.pages(conn, limited_params) end},
|
|
{'entry_pages.csv', fn -> Api.StatsController.entry_pages(conn, params) end},
|
|
{'exit_pages.csv', fn -> Api.StatsController.exit_pages(conn, limited_params) end},
|
|
{'countries.csv', fn -> Api.StatsController.countries(conn, params) end},
|
|
{'regions.csv', fn -> Api.StatsController.regions(conn, params) end},
|
|
{'cities.csv', fn -> Api.StatsController.cities(conn, params) end},
|
|
{'browsers.csv', fn -> Api.StatsController.browsers(conn, params) end},
|
|
{'operating_systems.csv', fn -> Api.StatsController.operating_systems(conn, params) end},
|
|
{'devices.csv', fn -> Api.StatsController.screen_sizes(conn, params) end},
|
|
{'conversions.csv', fn -> Api.StatsController.conversions(conn, params) end},
|
|
{'prop_breakdown.csv', fn -> Api.StatsController.all_props_breakdown(conn, params) end}
|
|
]
|
|
|
|
csvs =
|
|
csvs
|
|
|> Enum.map(fn {file, task} -> {file, Task.async(task)} end)
|
|
|> Enum.map(fn {file, task} -> {file, Task.await(task)} end)
|
|
|
|
csvs = [{'visitors.csv', visitors} | csvs]
|
|
|
|
{:ok, {_, zip_content}} = :zip.create(filename, csvs, [:memory])
|
|
|
|
conn
|
|
|> put_resp_content_type("application/zip")
|
|
|> put_resp_header("content-disposition", "attachment; filename=\"#{filename}\"")
|
|
|> delete_resp_cookie("exporting")
|
|
|> send_resp(200, zip_content)
|
|
end
|
|
|
|
def shared_link(conn, %{"domain" => domain, "auth" => auth}) do
|
|
shared_link =
|
|
Repo.get_by(Plausible.Site.SharedLink, slug: auth)
|
|
|> Repo.preload(:site)
|
|
|
|
if shared_link && shared_link.site.domain == domain do
|
|
if shared_link.password_hash do
|
|
with conn <- Plug.Conn.fetch_cookies(conn),
|
|
{:ok, token} <- Map.fetch(conn.req_cookies, shared_link_cookie_name(auth)),
|
|
{:ok, %{slug: token_slug}} <- Plausible.Auth.Token.verify_shared_link(token),
|
|
true <- token_slug == shared_link.slug do
|
|
render_shared_link(conn, shared_link)
|
|
else
|
|
_e ->
|
|
conn
|
|
|> assign(:skip_plausible_tracking, true)
|
|
|> render("shared_link_password.html",
|
|
link: shared_link,
|
|
layout: {PlausibleWeb.LayoutView, "focus.html"}
|
|
)
|
|
end
|
|
else
|
|
render_shared_link(conn, shared_link)
|
|
end
|
|
end
|
|
end
|
|
|
|
def shared_link(conn, %{"slug" => slug}) do
|
|
shared_link =
|
|
Repo.get_by(Plausible.Site.SharedLink, slug: slug)
|
|
|> Repo.preload(:site)
|
|
|
|
if shared_link do
|
|
redirect(conn, to: "/share/#{URI.encode_www_form(shared_link.site.domain)}?auth=#{slug}")
|
|
else
|
|
render_error(conn, 404)
|
|
end
|
|
end
|
|
|
|
def authenticate_shared_link(conn, %{"slug" => slug, "password" => password}) do
|
|
shared_link =
|
|
Repo.get_by(Plausible.Site.SharedLink, slug: slug)
|
|
|> Repo.preload(:site)
|
|
|
|
if shared_link do
|
|
if Plausible.Auth.Password.match?(password, shared_link.password_hash) do
|
|
token = Plausible.Auth.Token.sign_shared_link(slug)
|
|
|
|
conn
|
|
|> put_resp_cookie(shared_link_cookie_name(slug), token)
|
|
|> redirect(to: "/share/#{URI.encode_www_form(shared_link.site.domain)}?auth=#{slug}")
|
|
else
|
|
conn
|
|
|> assign(:skip_plausible_tracking, true)
|
|
|> render("shared_link_password.html",
|
|
link: shared_link,
|
|
error: "Incorrect password. Please try again.",
|
|
layout: {PlausibleWeb.LayoutView, "focus.html"}
|
|
)
|
|
end
|
|
else
|
|
render_error(conn, 404)
|
|
end
|
|
end
|
|
|
|
defp render_shared_link(conn, shared_link) do
|
|
cond do
|
|
!shared_link.site.locked ->
|
|
conn
|
|
|> assign(:skip_plausible_tracking, true)
|
|
|> put_resp_header("x-robots-tag", "noindex")
|
|
|> delete_resp_header("x-frame-options")
|
|
|> render("stats.html",
|
|
site: shared_link.site,
|
|
has_goals: Plausible.Sites.has_goals?(shared_link.site),
|
|
title: "Plausible · " <> shared_link.site.domain,
|
|
offer_email_report: false,
|
|
demo: false,
|
|
skip_plausible_tracking: true,
|
|
shared_link_auth: shared_link.slug,
|
|
embedded: conn.params["embed"] == "true",
|
|
background: conn.params["background"],
|
|
theme: conn.params["theme"]
|
|
)
|
|
|
|
shared_link.site.locked ->
|
|
owner = Plausible.Sites.owner_for(shared_link.site)
|
|
|
|
conn
|
|
|> assign(:skip_plausible_tracking, true)
|
|
|> render("site_locked.html", owner: owner, site: shared_link.site)
|
|
end
|
|
end
|
|
|
|
defp remove_email_report_banner(conn, site) do
|
|
if conn.assigns[:current_user] do
|
|
delete_session(conn, site.domain <> "_offer_email_report")
|
|
else
|
|
conn
|
|
end
|
|
end
|
|
|
|
defp shared_link_cookie_name(slug), do: "shared-link-" <> slug
|
|
end
|