analytics/test/support/test_utils.ex

286 lines
6.8 KiB
Elixir
Raw Normal View History

2019-09-02 14:29:19 +03:00
defmodule Plausible.TestUtils do
use Plausible.Repo
alias Plausible.Factory
defmacro __using__(_) do
quote do
require Plausible.TestUtils
import Plausible.TestUtils
end
end
defmacro patch_env(env_key, value) do
quote do
original_env = Application.get_env(:plausible, unquote(env_key))
Application.put_env(:plausible, unquote(env_key), unquote(value))
on_exit(fn ->
Application.put_env(:plausible, unquote(env_key), original_env)
end)
{:ok, %{patched_env: true}}
end
end
defmacro setup_patch_env(env_key, value) do
quote do
setup do
patch_env(unquote(env_key), unquote(value))
end
end
end
2019-09-02 14:29:19 +03:00
def create_user(_) do
{:ok, user: Factory.insert(:user)}
end
def create_site(%{user: user}) do
site =
Factory.insert(:site,
members: [user]
)
2019-09-02 14:29:19 +03:00
{:ok, site: site}
end
[Continued] Google Analytics import (#1753) * Add has_imported_stats boolean to Site * Add Google Analytics import panel to general settings * Get GA profiles to display in import settings panel * Add import_from_google method as entrypoint to import data * Add imported_visitors table * Remove conflicting code from migration * Import visitors data into clickhouse database * Pass another dataset to main graph for rendering in red This adds another entry to the JSON data returned via the main graph API called `imported_plot`, which is similar to `plot` in form but will be completed with previously imported data. Currently it simply returns the values from `plot` / 2. The data is rendered in the main graph in red without fill, and without an indicator for the present. Rationale: imported data will not continue to grow so there is no projection forward, only backwards. * Hook imported GA data to dashboard timeseries plot * Add settings option to forget imported data * Import sources from google analytics * Merge imported sources when queried * Merge imported source data native data when querying sources * Start converting metrics to atoms so they can be subqueried This changes "visitors" and in some places "sources" to atoms. This does not change the behaviour of the functions - the tests all pass unchanged following this commit. This is necessary as joining subqueries requires that the keys in `select` statements be atoms and not strings. * Convery GA (direct) source to empty string * Import utm campaign and utm medium from GA * format * Import all data types from GA into new tables * Handle large amounts of more data more safely * Fix some mistakes in tables * Make GA requests in chunks of 5 queries * Only display imported timeseries when there is no filter * Correctly show last 30 minutes timeseries when 'realtime' * Add with_imported key to Query struct * Account for injected :is_not filter on sources from dashboard * Also add tentative imported_utm_sources table This needs a bit more work on the google import side, as GA do not report sources and utm sources as distinct things. * Return imported data to dashboard for rest of Sources panel This extends the merge_imported function definition for sources to utm_sources, utm_mediums and utm_campaigns too. This appears to be working on the DB side but something is incomplete on the client side. * Clear imported stats from all tables when requested * Merge entry pages and exit pages from imported data into unfiltered dashboard view This requires converting the `"visits"` and `"visit_duration"` metrics to atoms so that they can be used in ecto subqueries. * Display imported devices, browsers and OSs on dashboard * Display imported country data on dashboard * Add more metrics to entries/exits for modals * make sure data is returned via API with correct keys * Import regions and cities from GA * Capitalize device upon import to match native data * Leave query limits/offsets until after possibly joining with imported data * Also import timeOnPage and pageviews for pages from GA * imported_countries -> imported_locations * Get timeOnPage and pageviews for pages from GA These are needed for the pages modal, and for calculating exit rates for exit pages. * Add indicator to dashboard when imported data is being used * Don't show imported data as separately line on main graph * "bounce_rate" -> :bounce_rate, so it works in subqueries * Drop imported browser and OS versions These are not needed. * Toggle displaying imported data by clicking indicator * Parse referrers with RefInspector - Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual referrer host + path, whereas 'ga:source' includes utm_mediums and other values when relevant. - 'ga:fullReferror' does however include search engine names directly, so they are manually checked for as RefInspector won't pick up on these. * Keep imported data indicator on dashboard and strikethrough when hidden * Add unlink google button to import panel * Rename some GA browsers and OSes to plausible versions * Get main top pages and exit pages panels working correctly with imported data * mix format * Fetch time_on_pages for imported data when needed * entry pages need to fetch bounces from GA * "sample_percent" -> :sample_percent as only atoms can be used in subqueries * Calculate bounce_rate for joined native and imported data for top pages modal * Flip some query bindings around to be less misleading * Fixup entry page modal visit durations * mix format * Fetch bounces and visit_duration for sources from GA * add more source metrics used for data in modals * Make sources modals display correct values * imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration * Merge imported data into aggregate stats * Reformat top graph side icons * Ensure sample_percent is yielded from aggregate data * filter event_props should be strings * Hide imported data from frontend when using filter * Fix existing tests * fix tests * Fix imported indicator appearing when filtering * comma needed, lost when rebasing * Import utm_terms and utm_content from GA * Merge imported utm_term and utm_content * Rename imported Countries data as Locations * Set imported city schema field to int * Remove utm_terms and utm_content when clearing imported * Clean locations import from Google Analytics - Country and region should be set to "" when GA provides "(not set)" - City should be set to 0 for "unknown", as we cannot reliably import city data from GA. * Display imported region and city in dashboard * os -> operating_system in some parts of code The inconsistency of using os in some places and operating_system in others causes trouble with subqueries and joins for the native and imported data, which would require additional logic to account for. The simplest solution is the just use a consistent word for all uses. This doesn't make any user-facing or database changes. * to_atom -> to_existing_atom * format * "events" metric -> :events * ignore imported data when "events" in metrics * update "bounce_rate" * atomise some more metrics from new city and region api * atomise some more metrics for email handlers * "conversion_rate" -> :conversion_rate during csv export * Move imported data stats code to own module * Move imported timeseries function to Stats.Imported * Use Timex.parse to import dates from GA * has_imported_stats -> imported_source * "time_on_page" -> :time_on_page * Convert imported GA data to UTC * Clean up GA request code a bit There was some weird logic here with two separate lists that really ought to be together, so this merges those. * Fail sooner if GA timezone can't be identified * Link imported tables to site by id * imported_utm_content -> imported_utm_contents * Imported GA from all of time * Reorganise GA data fetch logic - Fetch data from the start of time (2005) - Check whether no data was fetched, and if so, inform user and don't consider data to be imported. * Clarify removal of "visits" data when it isn't in metrics * Apply location filters from API This makes it consistent with the sources etc which filter out 'Direct / None' on the API side. These filters are used by both the native and imported data handling code, which would otherwise both duplicate the filters in their `where` clauses. * Do not use changeset for setting site.imported_source * Add all metrics to all dimensions * Run GA import in the background * Send email when GA import completes * Add handler to insert imported data into tests and imported_browsers_factory * Add remaining import data test factories * Add imported location data to test * Test main graph with imported data * Add imported data to operating systems tests * Add imported data to pages tests * Add imported data to entry pages tests * Add imported data to exit pages tests * Add imported data to devices tests * Add imported data to sources tests * Add imported data to UTM tests * Add new test module for the data import step * Test import of sources GA data * Test import of utm_mediums GA data * Test import of utm_campaigns GA data * Add tests for UTM terms * Add tests for UTM contents * Add test for importing pages and entry pages data from GA * Add test for importing exit page data * Fix module file name typo * Add test for importing location data from GA * Add test for importing devices data from GA * Add test for importing browsers data from GA * Add test for importing OS data from GA * Paginate GA requests to download all data * Bump clickhouse_ecto version * Move RefInspector wrapper function into module * Drop timezone transform on import * Order imported by side_id then date * More strings -> atoms Also changes a conditional to be a bit nicer * Remove parallelisation of data import * Split sources and UTM sources from fetched GA data GA has only a "source" dimension and no "UTM source" dimension. Instead it returns these combined. The logic herein to tease these apart is: 1. "(direct)" -> it's a direct source 2. if the source is a domain -> it's a source 3. "google" -> it's from adwords; let's make this a UTM source "adwords" 4. else -> just a UTM source * Keep prop names in queries as strings * fix typo * Fix import * Insert data to clickhouse in batches * Fix link when removing imported data * Merge source tables * Import hostname as well as pathname * Record start and end time of imported data * Track import progress * Fix month interval with imported data * Do not JOIN when imported date range has no overlap * Fix time on page using exits Co-authored-by: mcol <mcol@posteo.net>
2022-03-11 00:04:59 +03:00
def add_imported_data(%{site: site}) do
site =
site
|> Plausible.Site.start_import(~D[2005-01-01], Timex.today(), "Google Analytics", "ok")
[Continued] Google Analytics import (#1753) * Add has_imported_stats boolean to Site * Add Google Analytics import panel to general settings * Get GA profiles to display in import settings panel * Add import_from_google method as entrypoint to import data * Add imported_visitors table * Remove conflicting code from migration * Import visitors data into clickhouse database * Pass another dataset to main graph for rendering in red This adds another entry to the JSON data returned via the main graph API called `imported_plot`, which is similar to `plot` in form but will be completed with previously imported data. Currently it simply returns the values from `plot` / 2. The data is rendered in the main graph in red without fill, and without an indicator for the present. Rationale: imported data will not continue to grow so there is no projection forward, only backwards. * Hook imported GA data to dashboard timeseries plot * Add settings option to forget imported data * Import sources from google analytics * Merge imported sources when queried * Merge imported source data native data when querying sources * Start converting metrics to atoms so they can be subqueried This changes "visitors" and in some places "sources" to atoms. This does not change the behaviour of the functions - the tests all pass unchanged following this commit. This is necessary as joining subqueries requires that the keys in `select` statements be atoms and not strings. * Convery GA (direct) source to empty string * Import utm campaign and utm medium from GA * format * Import all data types from GA into new tables * Handle large amounts of more data more safely * Fix some mistakes in tables * Make GA requests in chunks of 5 queries * Only display imported timeseries when there is no filter * Correctly show last 30 minutes timeseries when 'realtime' * Add with_imported key to Query struct * Account for injected :is_not filter on sources from dashboard * Also add tentative imported_utm_sources table This needs a bit more work on the google import side, as GA do not report sources and utm sources as distinct things. * Return imported data to dashboard for rest of Sources panel This extends the merge_imported function definition for sources to utm_sources, utm_mediums and utm_campaigns too. This appears to be working on the DB side but something is incomplete on the client side. * Clear imported stats from all tables when requested * Merge entry pages and exit pages from imported data into unfiltered dashboard view This requires converting the `"visits"` and `"visit_duration"` metrics to atoms so that they can be used in ecto subqueries. * Display imported devices, browsers and OSs on dashboard * Display imported country data on dashboard * Add more metrics to entries/exits for modals * make sure data is returned via API with correct keys * Import regions and cities from GA * Capitalize device upon import to match native data * Leave query limits/offsets until after possibly joining with imported data * Also import timeOnPage and pageviews for pages from GA * imported_countries -> imported_locations * Get timeOnPage and pageviews for pages from GA These are needed for the pages modal, and for calculating exit rates for exit pages. * Add indicator to dashboard when imported data is being used * Don't show imported data as separately line on main graph * "bounce_rate" -> :bounce_rate, so it works in subqueries * Drop imported browser and OS versions These are not needed. * Toggle displaying imported data by clicking indicator * Parse referrers with RefInspector - Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual referrer host + path, whereas 'ga:source' includes utm_mediums and other values when relevant. - 'ga:fullReferror' does however include search engine names directly, so they are manually checked for as RefInspector won't pick up on these. * Keep imported data indicator on dashboard and strikethrough when hidden * Add unlink google button to import panel * Rename some GA browsers and OSes to plausible versions * Get main top pages and exit pages panels working correctly with imported data * mix format * Fetch time_on_pages for imported data when needed * entry pages need to fetch bounces from GA * "sample_percent" -> :sample_percent as only atoms can be used in subqueries * Calculate bounce_rate for joined native and imported data for top pages modal * Flip some query bindings around to be less misleading * Fixup entry page modal visit durations * mix format * Fetch bounces and visit_duration for sources from GA * add more source metrics used for data in modals * Make sources modals display correct values * imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration * Merge imported data into aggregate stats * Reformat top graph side icons * Ensure sample_percent is yielded from aggregate data * filter event_props should be strings * Hide imported data from frontend when using filter * Fix existing tests * fix tests * Fix imported indicator appearing when filtering * comma needed, lost when rebasing * Import utm_terms and utm_content from GA * Merge imported utm_term and utm_content * Rename imported Countries data as Locations * Set imported city schema field to int * Remove utm_terms and utm_content when clearing imported * Clean locations import from Google Analytics - Country and region should be set to "" when GA provides "(not set)" - City should be set to 0 for "unknown", as we cannot reliably import city data from GA. * Display imported region and city in dashboard * os -> operating_system in some parts of code The inconsistency of using os in some places and operating_system in others causes trouble with subqueries and joins for the native and imported data, which would require additional logic to account for. The simplest solution is the just use a consistent word for all uses. This doesn't make any user-facing or database changes. * to_atom -> to_existing_atom * format * "events" metric -> :events * ignore imported data when "events" in metrics * update "bounce_rate" * atomise some more metrics from new city and region api * atomise some more metrics for email handlers * "conversion_rate" -> :conversion_rate during csv export * Move imported data stats code to own module * Move imported timeseries function to Stats.Imported * Use Timex.parse to import dates from GA * has_imported_stats -> imported_source * "time_on_page" -> :time_on_page * Convert imported GA data to UTC * Clean up GA request code a bit There was some weird logic here with two separate lists that really ought to be together, so this merges those. * Fail sooner if GA timezone can't be identified * Link imported tables to site by id * imported_utm_content -> imported_utm_contents * Imported GA from all of time * Reorganise GA data fetch logic - Fetch data from the start of time (2005) - Check whether no data was fetched, and if so, inform user and don't consider data to be imported. * Clarify removal of "visits" data when it isn't in metrics * Apply location filters from API This makes it consistent with the sources etc which filter out 'Direct / None' on the API side. These filters are used by both the native and imported data handling code, which would otherwise both duplicate the filters in their `where` clauses. * Do not use changeset for setting site.imported_source * Add all metrics to all dimensions * Run GA import in the background * Send email when GA import completes * Add handler to insert imported data into tests and imported_browsers_factory * Add remaining import data test factories * Add imported location data to test * Test main graph with imported data * Add imported data to operating systems tests * Add imported data to pages tests * Add imported data to entry pages tests * Add imported data to exit pages tests * Add imported data to devices tests * Add imported data to sources tests * Add imported data to UTM tests * Add new test module for the data import step * Test import of sources GA data * Test import of utm_mediums GA data * Test import of utm_campaigns GA data * Add tests for UTM terms * Add tests for UTM contents * Add test for importing pages and entry pages data from GA * Add test for importing exit page data * Fix module file name typo * Add test for importing location data from GA * Add test for importing devices data from GA * Add test for importing browsers data from GA * Add test for importing OS data from GA * Paginate GA requests to download all data * Bump clickhouse_ecto version * Move RefInspector wrapper function into module * Drop timezone transform on import * Order imported by side_id then date * More strings -> atoms Also changes a conditional to be a bit nicer * Remove parallelisation of data import * Split sources and UTM sources from fetched GA data GA has only a "source" dimension and no "UTM source" dimension. Instead it returns these combined. The logic herein to tease these apart is: 1. "(direct)" -> it's a direct source 2. if the source is a domain -> it's a source 3. "google" -> it's from adwords; let's make this a UTM source "adwords" 4. else -> just a UTM source * Keep prop names in queries as strings * fix typo * Fix import * Insert data to clickhouse in batches * Fix link when removing imported data * Merge source tables * Import hostname as well as pathname * Record start and end time of imported data * Track import progress * Fix month interval with imported data * Do not JOIN when imported date range has no overlap * Fix time on page using exits Co-authored-by: mcol <mcol@posteo.net>
2022-03-11 00:04:59 +03:00
|> Repo.update!()
{:ok, site: site}
end
def create_new_site(%{user: user}) do
site = Factory.insert(:site, members: [user])
{:ok, site: site}
end
def create_api_key(%{user: user}) do
api_key = Factory.insert(:api_key, user: user)
{:ok, api_key: api_key.key}
end
def use_api_key(%{conn: conn, api_key: api_key}) do
conn = Plug.Conn.put_req_header(conn, "authorization", "Bearer #{api_key}")
{:ok, conn: conn}
end
def create_pageviews(pageviews) do
2020-11-03 12:20:11 +03:00
pageviews =
Enum.map(pageviews, fn pageview ->
Conditionally support switching between v1 and v2 clickhouse schemas (#2780) * Remove ClickhouseSetup module This has been an implicit point of contact to many tests. From now on the goal is for each test to maintain its own, isolated setup so that no accidental clashes and implicit assumptions are relied upon. * Implement v2 schema check An environment variable V2_MIGRATION_DONE acts like a feature flag, switching plausible from using old events/sessions schemas to v2 schemas introduced by NumericIDs migration. * Run both test suites sequentially While the code for v1 and v2 schemas must be kept still, we will from now on run tests against both code paths. Secondary test run will set V2_MIGRATION_DONE=1 variable, thus making all `Plausible.v2?()` checks return `true'. * Remove unused function This is a remnant from the short period when we would check for existing events before allowing creating a new site. * Update test setups/factories with v2 migration check * Make GateKeeper return site id along with :allow * Make Billing module check for v2 schema * Make ingestion aware of v2 schema * Disable site transfers for when v2 is live In a separate changeset we will implement simplified site transfer for when v2 migration is complete. The new transfer will only rename the site domain in postgres and keep track of the original site prior to the transfer so we keep an ingestion grace period until the customers redeploy their scripting. * Make Stats base queries aware of v2 schema switch * Update breakdown with v2 conditionals * Update pageview local start with v2 check * Update current visitoris with v2 check * Update stats controller with v2 checks * Update external controller with v2 checks * Update remaining tests with proper fixtures * Rewrite redundant assignment * Remove unused alias * Mute credo, this is not the right time * Add test_helper prompt * Fetch priv dir so it works with a release * Fetch distinct partitions only * Don't limit inspect output for partitions * Ensure SQL is printed to IO * Remove redundant domain fixture
2023-03-27 14:52:42 +03:00
pageview =
if Plausible.v2?() do
pageview
|> Map.delete(:site)
|> Map.put(:site_id, pageview.site.id)
else
pageview
|> Map.delete(:site)
|> Map.put(:domain, pageview.site.domain)
end
Factory.build(:pageview, pageview)
|> Map.from_struct()
|> Map.delete(:__meta__)
|> update_in([:timestamp], &to_naive_truncate/1)
2020-11-03 12:20:11 +03:00
end)
Conditionally support switching between v1 and v2 clickhouse schemas (#2780) * Remove ClickhouseSetup module This has been an implicit point of contact to many tests. From now on the goal is for each test to maintain its own, isolated setup so that no accidental clashes and implicit assumptions are relied upon. * Implement v2 schema check An environment variable V2_MIGRATION_DONE acts like a feature flag, switching plausible from using old events/sessions schemas to v2 schemas introduced by NumericIDs migration. * Run both test suites sequentially While the code for v1 and v2 schemas must be kept still, we will from now on run tests against both code paths. Secondary test run will set V2_MIGRATION_DONE=1 variable, thus making all `Plausible.v2?()` checks return `true'. * Remove unused function This is a remnant from the short period when we would check for existing events before allowing creating a new site. * Update test setups/factories with v2 migration check * Make GateKeeper return site id along with :allow * Make Billing module check for v2 schema * Make ingestion aware of v2 schema * Disable site transfers for when v2 is live In a separate changeset we will implement simplified site transfer for when v2 migration is complete. The new transfer will only rename the site domain in postgres and keep track of the original site prior to the transfer so we keep an ingestion grace period until the customers redeploy their scripting. * Make Stats base queries aware of v2 schema switch * Update breakdown with v2 conditionals * Update pageview local start with v2 check * Update current visitoris with v2 check * Update stats controller with v2 checks * Update external controller with v2 checks * Update remaining tests with proper fixtures * Rewrite redundant assignment * Remove unused alias * Mute credo, this is not the right time * Add test_helper prompt * Fetch priv dir so it works with a release * Fetch distinct partitions only * Don't limit inspect output for partitions * Ensure SQL is printed to IO * Remove redundant domain fixture
2023-03-27 14:52:42 +03:00
if Plausible.v2?() do
Plausible.IngestRepo.insert_all(Plausible.ClickhouseEventV2, pageviews)
else
Plausible.IngestRepo.insert_all(Plausible.ClickhouseEvent, pageviews)
end
end
def create_events(events) do
2020-11-03 12:20:11 +03:00
events =
Enum.map(events, fn event ->
Conditionally support switching between v1 and v2 clickhouse schemas (#2780) * Remove ClickhouseSetup module This has been an implicit point of contact to many tests. From now on the goal is for each test to maintain its own, isolated setup so that no accidental clashes and implicit assumptions are relied upon. * Implement v2 schema check An environment variable V2_MIGRATION_DONE acts like a feature flag, switching plausible from using old events/sessions schemas to v2 schemas introduced by NumericIDs migration. * Run both test suites sequentially While the code for v1 and v2 schemas must be kept still, we will from now on run tests against both code paths. Secondary test run will set V2_MIGRATION_DONE=1 variable, thus making all `Plausible.v2?()` checks return `true'. * Remove unused function This is a remnant from the short period when we would check for existing events before allowing creating a new site. * Update test setups/factories with v2 migration check * Make GateKeeper return site id along with :allow * Make Billing module check for v2 schema * Make ingestion aware of v2 schema * Disable site transfers for when v2 is live In a separate changeset we will implement simplified site transfer for when v2 migration is complete. The new transfer will only rename the site domain in postgres and keep track of the original site prior to the transfer so we keep an ingestion grace period until the customers redeploy their scripting. * Make Stats base queries aware of v2 schema switch * Update breakdown with v2 conditionals * Update pageview local start with v2 check * Update current visitoris with v2 check * Update stats controller with v2 checks * Update external controller with v2 checks * Update remaining tests with proper fixtures * Rewrite redundant assignment * Remove unused alias * Mute credo, this is not the right time * Add test_helper prompt * Fetch priv dir so it works with a release * Fetch distinct partitions only * Don't limit inspect output for partitions * Ensure SQL is printed to IO * Remove redundant domain fixture
2023-03-27 14:52:42 +03:00
event =
if Plausible.v2?() do
Map.delete(event, :domain)
else
Map.delete(event, :site_id)
end
Factory.build(:event, event)
|> Map.from_struct()
|> Map.delete(:__meta__)
|> update_in([:timestamp], &to_naive_truncate/1)
2020-11-03 12:20:11 +03:00
end)
Conditionally support switching between v1 and v2 clickhouse schemas (#2780) * Remove ClickhouseSetup module This has been an implicit point of contact to many tests. From now on the goal is for each test to maintain its own, isolated setup so that no accidental clashes and implicit assumptions are relied upon. * Implement v2 schema check An environment variable V2_MIGRATION_DONE acts like a feature flag, switching plausible from using old events/sessions schemas to v2 schemas introduced by NumericIDs migration. * Run both test suites sequentially While the code for v1 and v2 schemas must be kept still, we will from now on run tests against both code paths. Secondary test run will set V2_MIGRATION_DONE=1 variable, thus making all `Plausible.v2?()` checks return `true'. * Remove unused function This is a remnant from the short period when we would check for existing events before allowing creating a new site. * Update test setups/factories with v2 migration check * Make GateKeeper return site id along with :allow * Make Billing module check for v2 schema * Make ingestion aware of v2 schema * Disable site transfers for when v2 is live In a separate changeset we will implement simplified site transfer for when v2 migration is complete. The new transfer will only rename the site domain in postgres and keep track of the original site prior to the transfer so we keep an ingestion grace period until the customers redeploy their scripting. * Make Stats base queries aware of v2 schema switch * Update breakdown with v2 conditionals * Update pageview local start with v2 check * Update current visitoris with v2 check * Update stats controller with v2 checks * Update external controller with v2 checks * Update remaining tests with proper fixtures * Rewrite redundant assignment * Remove unused alias * Mute credo, this is not the right time * Add test_helper prompt * Fetch priv dir so it works with a release * Fetch distinct partitions only * Don't limit inspect output for partitions * Ensure SQL is printed to IO * Remove redundant domain fixture
2023-03-27 14:52:42 +03:00
if Plausible.v2?() do
Plausible.IngestRepo.insert_all(Plausible.ClickhouseEventV2, events)
else
Plausible.IngestRepo.insert_all(Plausible.ClickhouseEvent, events)
end
end
def create_sessions(sessions) do
2020-11-03 12:20:11 +03:00
sessions =
Enum.map(sessions, fn session ->
Conditionally support switching between v1 and v2 clickhouse schemas (#2780) * Remove ClickhouseSetup module This has been an implicit point of contact to many tests. From now on the goal is for each test to maintain its own, isolated setup so that no accidental clashes and implicit assumptions are relied upon. * Implement v2 schema check An environment variable V2_MIGRATION_DONE acts like a feature flag, switching plausible from using old events/sessions schemas to v2 schemas introduced by NumericIDs migration. * Run both test suites sequentially While the code for v1 and v2 schemas must be kept still, we will from now on run tests against both code paths. Secondary test run will set V2_MIGRATION_DONE=1 variable, thus making all `Plausible.v2?()` checks return `true'. * Remove unused function This is a remnant from the short period when we would check for existing events before allowing creating a new site. * Update test setups/factories with v2 migration check * Make GateKeeper return site id along with :allow * Make Billing module check for v2 schema * Make ingestion aware of v2 schema * Disable site transfers for when v2 is live In a separate changeset we will implement simplified site transfer for when v2 migration is complete. The new transfer will only rename the site domain in postgres and keep track of the original site prior to the transfer so we keep an ingestion grace period until the customers redeploy their scripting. * Make Stats base queries aware of v2 schema switch * Update breakdown with v2 conditionals * Update pageview local start with v2 check * Update current visitoris with v2 check * Update stats controller with v2 checks * Update external controller with v2 checks * Update remaining tests with proper fixtures * Rewrite redundant assignment * Remove unused alias * Mute credo, this is not the right time * Add test_helper prompt * Fetch priv dir so it works with a release * Fetch distinct partitions only * Don't limit inspect output for partitions * Ensure SQL is printed to IO * Remove redundant domain fixture
2023-03-27 14:52:42 +03:00
session =
if Plausible.v2?() do
Map.delete(session, :domain)
else
Map.delete(session, :site_id)
end
Factory.build(:ch_session, session)
|> Map.from_struct()
|> Map.delete(:__meta__)
|> update_in([:timestamp], &to_naive_truncate/1)
|> update_in([:start], &to_naive_truncate/1)
2020-11-03 12:20:11 +03:00
end)
Conditionally support switching between v1 and v2 clickhouse schemas (#2780) * Remove ClickhouseSetup module This has been an implicit point of contact to many tests. From now on the goal is for each test to maintain its own, isolated setup so that no accidental clashes and implicit assumptions are relied upon. * Implement v2 schema check An environment variable V2_MIGRATION_DONE acts like a feature flag, switching plausible from using old events/sessions schemas to v2 schemas introduced by NumericIDs migration. * Run both test suites sequentially While the code for v1 and v2 schemas must be kept still, we will from now on run tests against both code paths. Secondary test run will set V2_MIGRATION_DONE=1 variable, thus making all `Plausible.v2?()` checks return `true'. * Remove unused function This is a remnant from the short period when we would check for existing events before allowing creating a new site. * Update test setups/factories with v2 migration check * Make GateKeeper return site id along with :allow * Make Billing module check for v2 schema * Make ingestion aware of v2 schema * Disable site transfers for when v2 is live In a separate changeset we will implement simplified site transfer for when v2 migration is complete. The new transfer will only rename the site domain in postgres and keep track of the original site prior to the transfer so we keep an ingestion grace period until the customers redeploy their scripting. * Make Stats base queries aware of v2 schema switch * Update breakdown with v2 conditionals * Update pageview local start with v2 check * Update current visitoris with v2 check * Update stats controller with v2 checks * Update external controller with v2 checks * Update remaining tests with proper fixtures * Rewrite redundant assignment * Remove unused alias * Mute credo, this is not the right time * Add test_helper prompt * Fetch priv dir so it works with a release * Fetch distinct partitions only * Don't limit inspect output for partitions * Ensure SQL is printed to IO * Remove redundant domain fixture
2023-03-27 14:52:42 +03:00
if Plausible.v2?() do
Plausible.IngestRepo.insert_all(Plausible.ClickhouseSessionV2, sessions)
else
Plausible.IngestRepo.insert_all(Plausible.ClickhouseSession, sessions)
end
end
2019-09-02 14:29:19 +03:00
def log_in(%{user: user, conn: conn}) do
Formatting only changes - No code change (#75) * first commit with test and compile job Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * adding 'prepare' stage Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * updated ci script to include "test" compile phase Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * adding environment variables for connecting to postgresql Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * updated ci config for postgres Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * using non-alpine version of elixir Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * re-using the 'compile' artifacts and added explict env variables for testing Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * removing redundant deps fetching from common code Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * formatting using mix.format -- beware no-code changes! Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * added release config Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * adding consistent env variable for Database Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * more cleaning up of environment variables Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Adding releases config for enabling releases Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * cleaning up env configs Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Cleaned up config and prepared config for releases Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * updated CI script with new config for test Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Added Dockerfile for creating production docker image Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Adding "docker" build job yay! Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * using non-slim version of debian and installing webpack Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Adding overlays for migrations on releases Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * restricting the docker built to master branch only Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * typo fix Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * adding "Hosting.md" to explain hosting instructions Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * removed the default comments Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Added documentation related to env variables * updated documentation and fixed typo Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * updated documentation * Bumping up elixir version as `overlays` are only supported in latest version read release notes: https://github.com/elixir-lang/elixir/releases/tag/v1.10.0 Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Adding tarball assembly during release Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * updated HOSTING.md Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Added support for db migration Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * minor corrections Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * initializing admin user Admin user has been added in the "migration" phase. A default user is automatically created in the process. One can provide the related env variables, else a new one will be automatically created for you. Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Initial base domain update - phase#1 These changes are only meant for correct operating it under self-hosting. There are many other cosmetic changes, that require updates to email, site and other places where the original website and author is used. Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Using dedicated config variable `base_domain` instead Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * adding base_domain to releases config Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * removing the dedicated config "base_domain", relying on endpoint host Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Removed the usage of "Mix" in code! It is bad practice to use "mix" module inside the code as in actual release this module is unavailable. Replacing this with a config environment variable Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Added support for SMTP via Bamboo Smtp Adapter Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Capturing SMTP errors via Sentry Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Minor updates Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Adding junit formatter -- useful for generating test reports Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * adding documentation for default user * Resolve "Gitlab Adoption: Add supported services in "Security & Compliance"" * bumping up the debian version to fix issues fixing some vulnerabilities identified by the scanning tools * More updates for self-hosting Changes in most of the places to suit self-hosting. Although, there are some which have been left-off. Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * quick-dirty-fix! * bumping up the db connect timeout Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * bumping up the db connect timeout Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * bumping up the db connect timeout Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * bumping up timeout - skipping MRs :-/ * removing restrictions on watching for changes this stuff isn't working * Update HOSTING.md * renamed the module name * reverting formatting-whitespace changes Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * reverting the name to release Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * adding docker-compose.yml and related instructions Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * using `plausible_url` instead of assuming `https` this is because, it is much to test in local dev machines and in most cases there's already a layer above which is capable for `https` termination and http -> https upgrade Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * WIP: merging changes from upstream Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * wip: more changes * Pushing in changes from upstream Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * changes to ci for testing Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * cleaning up and finishing clickhouse integration Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * updating readme with hosting details * removing deleted files from upstream * minor config adjustments Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * formatting changes Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me>
2020-06-08 10:35:13 +03:00
conn =
init_session(conn)
|> Plug.Conn.put_session(:current_user_id, user.id)
{:ok, conn: conn}
end
def init_session(conn) do
2019-09-02 14:29:19 +03:00
opts =
Plug.Session.init(
store: :cookie,
key: "foobar",
encryption_salt: "encrypted cookie salt",
signing_salt: "signing salt",
log: false,
encrypt: false
)
conn
Formatting only changes - No code change (#75) * first commit with test and compile job Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * adding 'prepare' stage Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * updated ci script to include "test" compile phase Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * adding environment variables for connecting to postgresql Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * updated ci config for postgres Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * using non-alpine version of elixir Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * re-using the 'compile' artifacts and added explict env variables for testing Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * removing redundant deps fetching from common code Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * formatting using mix.format -- beware no-code changes! Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * added release config Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * adding consistent env variable for Database Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * more cleaning up of environment variables Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Adding releases config for enabling releases Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * cleaning up env configs Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Cleaned up config and prepared config for releases Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * updated CI script with new config for test Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Added Dockerfile for creating production docker image Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Adding "docker" build job yay! Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * using non-slim version of debian and installing webpack Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Adding overlays for migrations on releases Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * restricting the docker built to master branch only Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * typo fix Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * adding "Hosting.md" to explain hosting instructions Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * removed the default comments Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Added documentation related to env variables * updated documentation and fixed typo Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * updated documentation * Bumping up elixir version as `overlays` are only supported in latest version read release notes: https://github.com/elixir-lang/elixir/releases/tag/v1.10.0 Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Adding tarball assembly during release Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * updated HOSTING.md Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Added support for db migration Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * minor corrections Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * initializing admin user Admin user has been added in the "migration" phase. A default user is automatically created in the process. One can provide the related env variables, else a new one will be automatically created for you. Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Initial base domain update - phase#1 These changes are only meant for correct operating it under self-hosting. There are many other cosmetic changes, that require updates to email, site and other places where the original website and author is used. Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Using dedicated config variable `base_domain` instead Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * adding base_domain to releases config Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * removing the dedicated config "base_domain", relying on endpoint host Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Removed the usage of "Mix" in code! It is bad practice to use "mix" module inside the code as in actual release this module is unavailable. Replacing this with a config environment variable Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Added support for SMTP via Bamboo Smtp Adapter Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Capturing SMTP errors via Sentry Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Minor updates Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * Adding junit formatter -- useful for generating test reports Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * adding documentation for default user * Resolve "Gitlab Adoption: Add supported services in "Security & Compliance"" * bumping up the debian version to fix issues fixing some vulnerabilities identified by the scanning tools * More updates for self-hosting Changes in most of the places to suit self-hosting. Although, there are some which have been left-off. Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * quick-dirty-fix! * bumping up the db connect timeout Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * bumping up the db connect timeout Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * bumping up the db connect timeout Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * bumping up timeout - skipping MRs :-/ * removing restrictions on watching for changes this stuff isn't working * Update HOSTING.md * renamed the module name * reverting formatting-whitespace changes Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * reverting the name to release Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * adding docker-compose.yml and related instructions Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * using `plausible_url` instead of assuming `https` this is because, it is much to test in local dev machines and in most cases there's already a layer above which is capable for `https` termination and http -> https upgrade Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * WIP: merging changes from upstream Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * wip: more changes * Pushing in changes from upstream Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * changes to ci for testing Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * cleaning up and finishing clickhouse integration Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * updating readme with hosting details * removing deleted files from upstream * minor config adjustments Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me> * formatting changes Signed-off-by: Chandra Tungathurthi <tckb@tgrthi.me>
2020-06-08 10:35:13 +03:00
|> Plug.Session.call(opts)
|> Plug.Conn.fetch_session()
2019-09-02 14:29:19 +03:00
end
def populate_stats(site, events) do
Enum.map(events, fn event ->
[Continued] Google Analytics import (#1753) * Add has_imported_stats boolean to Site * Add Google Analytics import panel to general settings * Get GA profiles to display in import settings panel * Add import_from_google method as entrypoint to import data * Add imported_visitors table * Remove conflicting code from migration * Import visitors data into clickhouse database * Pass another dataset to main graph for rendering in red This adds another entry to the JSON data returned via the main graph API called `imported_plot`, which is similar to `plot` in form but will be completed with previously imported data. Currently it simply returns the values from `plot` / 2. The data is rendered in the main graph in red without fill, and without an indicator for the present. Rationale: imported data will not continue to grow so there is no projection forward, only backwards. * Hook imported GA data to dashboard timeseries plot * Add settings option to forget imported data * Import sources from google analytics * Merge imported sources when queried * Merge imported source data native data when querying sources * Start converting metrics to atoms so they can be subqueried This changes "visitors" and in some places "sources" to atoms. This does not change the behaviour of the functions - the tests all pass unchanged following this commit. This is necessary as joining subqueries requires that the keys in `select` statements be atoms and not strings. * Convery GA (direct) source to empty string * Import utm campaign and utm medium from GA * format * Import all data types from GA into new tables * Handle large amounts of more data more safely * Fix some mistakes in tables * Make GA requests in chunks of 5 queries * Only display imported timeseries when there is no filter * Correctly show last 30 minutes timeseries when 'realtime' * Add with_imported key to Query struct * Account for injected :is_not filter on sources from dashboard * Also add tentative imported_utm_sources table This needs a bit more work on the google import side, as GA do not report sources and utm sources as distinct things. * Return imported data to dashboard for rest of Sources panel This extends the merge_imported function definition for sources to utm_sources, utm_mediums and utm_campaigns too. This appears to be working on the DB side but something is incomplete on the client side. * Clear imported stats from all tables when requested * Merge entry pages and exit pages from imported data into unfiltered dashboard view This requires converting the `"visits"` and `"visit_duration"` metrics to atoms so that they can be used in ecto subqueries. * Display imported devices, browsers and OSs on dashboard * Display imported country data on dashboard * Add more metrics to entries/exits for modals * make sure data is returned via API with correct keys * Import regions and cities from GA * Capitalize device upon import to match native data * Leave query limits/offsets until after possibly joining with imported data * Also import timeOnPage and pageviews for pages from GA * imported_countries -> imported_locations * Get timeOnPage and pageviews for pages from GA These are needed for the pages modal, and for calculating exit rates for exit pages. * Add indicator to dashboard when imported data is being used * Don't show imported data as separately line on main graph * "bounce_rate" -> :bounce_rate, so it works in subqueries * Drop imported browser and OS versions These are not needed. * Toggle displaying imported data by clicking indicator * Parse referrers with RefInspector - Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual referrer host + path, whereas 'ga:source' includes utm_mediums and other values when relevant. - 'ga:fullReferror' does however include search engine names directly, so they are manually checked for as RefInspector won't pick up on these. * Keep imported data indicator on dashboard and strikethrough when hidden * Add unlink google button to import panel * Rename some GA browsers and OSes to plausible versions * Get main top pages and exit pages panels working correctly with imported data * mix format * Fetch time_on_pages for imported data when needed * entry pages need to fetch bounces from GA * "sample_percent" -> :sample_percent as only atoms can be used in subqueries * Calculate bounce_rate for joined native and imported data for top pages modal * Flip some query bindings around to be less misleading * Fixup entry page modal visit durations * mix format * Fetch bounces and visit_duration for sources from GA * add more source metrics used for data in modals * Make sources modals display correct values * imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration * Merge imported data into aggregate stats * Reformat top graph side icons * Ensure sample_percent is yielded from aggregate data * filter event_props should be strings * Hide imported data from frontend when using filter * Fix existing tests * fix tests * Fix imported indicator appearing when filtering * comma needed, lost when rebasing * Import utm_terms and utm_content from GA * Merge imported utm_term and utm_content * Rename imported Countries data as Locations * Set imported city schema field to int * Remove utm_terms and utm_content when clearing imported * Clean locations import from Google Analytics - Country and region should be set to "" when GA provides "(not set)" - City should be set to 0 for "unknown", as we cannot reliably import city data from GA. * Display imported region and city in dashboard * os -> operating_system in some parts of code The inconsistency of using os in some places and operating_system in others causes trouble with subqueries and joins for the native and imported data, which would require additional logic to account for. The simplest solution is the just use a consistent word for all uses. This doesn't make any user-facing or database changes. * to_atom -> to_existing_atom * format * "events" metric -> :events * ignore imported data when "events" in metrics * update "bounce_rate" * atomise some more metrics from new city and region api * atomise some more metrics for email handlers * "conversion_rate" -> :conversion_rate during csv export * Move imported data stats code to own module * Move imported timeseries function to Stats.Imported * Use Timex.parse to import dates from GA * has_imported_stats -> imported_source * "time_on_page" -> :time_on_page * Convert imported GA data to UTC * Clean up GA request code a bit There was some weird logic here with two separate lists that really ought to be together, so this merges those. * Fail sooner if GA timezone can't be identified * Link imported tables to site by id * imported_utm_content -> imported_utm_contents * Imported GA from all of time * Reorganise GA data fetch logic - Fetch data from the start of time (2005) - Check whether no data was fetched, and if so, inform user and don't consider data to be imported. * Clarify removal of "visits" data when it isn't in metrics * Apply location filters from API This makes it consistent with the sources etc which filter out 'Direct / None' on the API side. These filters are used by both the native and imported data handling code, which would otherwise both duplicate the filters in their `where` clauses. * Do not use changeset for setting site.imported_source * Add all metrics to all dimensions * Run GA import in the background * Send email when GA import completes * Add handler to insert imported data into tests and imported_browsers_factory * Add remaining import data test factories * Add imported location data to test * Test main graph with imported data * Add imported data to operating systems tests * Add imported data to pages tests * Add imported data to entry pages tests * Add imported data to exit pages tests * Add imported data to devices tests * Add imported data to sources tests * Add imported data to UTM tests * Add new test module for the data import step * Test import of sources GA data * Test import of utm_mediums GA data * Test import of utm_campaigns GA data * Add tests for UTM terms * Add tests for UTM contents * Add test for importing pages and entry pages data from GA * Add test for importing exit page data * Fix module file name typo * Add test for importing location data from GA * Add test for importing devices data from GA * Add test for importing browsers data from GA * Add test for importing OS data from GA * Paginate GA requests to download all data * Bump clickhouse_ecto version * Move RefInspector wrapper function into module * Drop timezone transform on import * Order imported by side_id then date * More strings -> atoms Also changes a conditional to be a bit nicer * Remove parallelisation of data import * Split sources and UTM sources from fetched GA data GA has only a "source" dimension and no "UTM source" dimension. Instead it returns these combined. The logic herein to tease these apart is: 1. "(direct)" -> it's a direct source 2. if the source is a domain -> it's a source 3. "google" -> it's from adwords; let's make this a UTM source "adwords" 4. else -> just a UTM source * Keep prop names in queries as strings * fix typo * Fix import * Insert data to clickhouse in batches * Fix link when removing imported data * Merge source tables * Import hostname as well as pathname * Record start and end time of imported data * Track import progress * Fix month interval with imported data * Do not JOIN when imported date range has no overlap * Fix time on page using exits Co-authored-by: mcol <mcol@posteo.net>
2022-03-11 00:04:59 +03:00
case event do
Conditionally support switching between v1 and v2 clickhouse schemas (#2780) * Remove ClickhouseSetup module This has been an implicit point of contact to many tests. From now on the goal is for each test to maintain its own, isolated setup so that no accidental clashes and implicit assumptions are relied upon. * Implement v2 schema check An environment variable V2_MIGRATION_DONE acts like a feature flag, switching plausible from using old events/sessions schemas to v2 schemas introduced by NumericIDs migration. * Run both test suites sequentially While the code for v1 and v2 schemas must be kept still, we will from now on run tests against both code paths. Secondary test run will set V2_MIGRATION_DONE=1 variable, thus making all `Plausible.v2?()` checks return `true'. * Remove unused function This is a remnant from the short period when we would check for existing events before allowing creating a new site. * Update test setups/factories with v2 migration check * Make GateKeeper return site id along with :allow * Make Billing module check for v2 schema * Make ingestion aware of v2 schema * Disable site transfers for when v2 is live In a separate changeset we will implement simplified site transfer for when v2 migration is complete. The new transfer will only rename the site domain in postgres and keep track of the original site prior to the transfer so we keep an ingestion grace period until the customers redeploy their scripting. * Make Stats base queries aware of v2 schema switch * Update breakdown with v2 conditionals * Update pageview local start with v2 check * Update current visitoris with v2 check * Update stats controller with v2 checks * Update external controller with v2 checks * Update remaining tests with proper fixtures * Rewrite redundant assignment * Remove unused alias * Mute credo, this is not the right time * Add test_helper prompt * Fetch priv dir so it works with a release * Fetch distinct partitions only * Don't limit inspect output for partitions * Ensure SQL is printed to IO * Remove redundant domain fixture
2023-03-27 14:52:42 +03:00
%Plausible.ClickhouseEventV2{} ->
Map.put(event, :site_id, site.id)
[Continued] Google Analytics import (#1753) * Add has_imported_stats boolean to Site * Add Google Analytics import panel to general settings * Get GA profiles to display in import settings panel * Add import_from_google method as entrypoint to import data * Add imported_visitors table * Remove conflicting code from migration * Import visitors data into clickhouse database * Pass another dataset to main graph for rendering in red This adds another entry to the JSON data returned via the main graph API called `imported_plot`, which is similar to `plot` in form but will be completed with previously imported data. Currently it simply returns the values from `plot` / 2. The data is rendered in the main graph in red without fill, and without an indicator for the present. Rationale: imported data will not continue to grow so there is no projection forward, only backwards. * Hook imported GA data to dashboard timeseries plot * Add settings option to forget imported data * Import sources from google analytics * Merge imported sources when queried * Merge imported source data native data when querying sources * Start converting metrics to atoms so they can be subqueried This changes "visitors" and in some places "sources" to atoms. This does not change the behaviour of the functions - the tests all pass unchanged following this commit. This is necessary as joining subqueries requires that the keys in `select` statements be atoms and not strings. * Convery GA (direct) source to empty string * Import utm campaign and utm medium from GA * format * Import all data types from GA into new tables * Handle large amounts of more data more safely * Fix some mistakes in tables * Make GA requests in chunks of 5 queries * Only display imported timeseries when there is no filter * Correctly show last 30 minutes timeseries when 'realtime' * Add with_imported key to Query struct * Account for injected :is_not filter on sources from dashboard * Also add tentative imported_utm_sources table This needs a bit more work on the google import side, as GA do not report sources and utm sources as distinct things. * Return imported data to dashboard for rest of Sources panel This extends the merge_imported function definition for sources to utm_sources, utm_mediums and utm_campaigns too. This appears to be working on the DB side but something is incomplete on the client side. * Clear imported stats from all tables when requested * Merge entry pages and exit pages from imported data into unfiltered dashboard view This requires converting the `"visits"` and `"visit_duration"` metrics to atoms so that they can be used in ecto subqueries. * Display imported devices, browsers and OSs on dashboard * Display imported country data on dashboard * Add more metrics to entries/exits for modals * make sure data is returned via API with correct keys * Import regions and cities from GA * Capitalize device upon import to match native data * Leave query limits/offsets until after possibly joining with imported data * Also import timeOnPage and pageviews for pages from GA * imported_countries -> imported_locations * Get timeOnPage and pageviews for pages from GA These are needed for the pages modal, and for calculating exit rates for exit pages. * Add indicator to dashboard when imported data is being used * Don't show imported data as separately line on main graph * "bounce_rate" -> :bounce_rate, so it works in subqueries * Drop imported browser and OS versions These are not needed. * Toggle displaying imported data by clicking indicator * Parse referrers with RefInspector - Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual referrer host + path, whereas 'ga:source' includes utm_mediums and other values when relevant. - 'ga:fullReferror' does however include search engine names directly, so they are manually checked for as RefInspector won't pick up on these. * Keep imported data indicator on dashboard and strikethrough when hidden * Add unlink google button to import panel * Rename some GA browsers and OSes to plausible versions * Get main top pages and exit pages panels working correctly with imported data * mix format * Fetch time_on_pages for imported data when needed * entry pages need to fetch bounces from GA * "sample_percent" -> :sample_percent as only atoms can be used in subqueries * Calculate bounce_rate for joined native and imported data for top pages modal * Flip some query bindings around to be less misleading * Fixup entry page modal visit durations * mix format * Fetch bounces and visit_duration for sources from GA * add more source metrics used for data in modals * Make sources modals display correct values * imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration * Merge imported data into aggregate stats * Reformat top graph side icons * Ensure sample_percent is yielded from aggregate data * filter event_props should be strings * Hide imported data from frontend when using filter * Fix existing tests * fix tests * Fix imported indicator appearing when filtering * comma needed, lost when rebasing * Import utm_terms and utm_content from GA * Merge imported utm_term and utm_content * Rename imported Countries data as Locations * Set imported city schema field to int * Remove utm_terms and utm_content when clearing imported * Clean locations import from Google Analytics - Country and region should be set to "" when GA provides "(not set)" - City should be set to 0 for "unknown", as we cannot reliably import city data from GA. * Display imported region and city in dashboard * os -> operating_system in some parts of code The inconsistency of using os in some places and operating_system in others causes trouble with subqueries and joins for the native and imported data, which would require additional logic to account for. The simplest solution is the just use a consistent word for all uses. This doesn't make any user-facing or database changes. * to_atom -> to_existing_atom * format * "events" metric -> :events * ignore imported data when "events" in metrics * update "bounce_rate" * atomise some more metrics from new city and region api * atomise some more metrics for email handlers * "conversion_rate" -> :conversion_rate during csv export * Move imported data stats code to own module * Move imported timeseries function to Stats.Imported * Use Timex.parse to import dates from GA * has_imported_stats -> imported_source * "time_on_page" -> :time_on_page * Convert imported GA data to UTC * Clean up GA request code a bit There was some weird logic here with two separate lists that really ought to be together, so this merges those. * Fail sooner if GA timezone can't be identified * Link imported tables to site by id * imported_utm_content -> imported_utm_contents * Imported GA from all of time * Reorganise GA data fetch logic - Fetch data from the start of time (2005) - Check whether no data was fetched, and if so, inform user and don't consider data to be imported. * Clarify removal of "visits" data when it isn't in metrics * Apply location filters from API This makes it consistent with the sources etc which filter out 'Direct / None' on the API side. These filters are used by both the native and imported data handling code, which would otherwise both duplicate the filters in their `where` clauses. * Do not use changeset for setting site.imported_source * Add all metrics to all dimensions * Run GA import in the background * Send email when GA import completes * Add handler to insert imported data into tests and imported_browsers_factory * Add remaining import data test factories * Add imported location data to test * Test main graph with imported data * Add imported data to operating systems tests * Add imported data to pages tests * Add imported data to entry pages tests * Add imported data to exit pages tests * Add imported data to devices tests * Add imported data to sources tests * Add imported data to UTM tests * Add new test module for the data import step * Test import of sources GA data * Test import of utm_mediums GA data * Test import of utm_campaigns GA data * Add tests for UTM terms * Add tests for UTM contents * Add test for importing pages and entry pages data from GA * Add test for importing exit page data * Fix module file name typo * Add test for importing location data from GA * Add test for importing devices data from GA * Add test for importing browsers data from GA * Add test for importing OS data from GA * Paginate GA requests to download all data * Bump clickhouse_ecto version * Move RefInspector wrapper function into module * Drop timezone transform on import * Order imported by side_id then date * More strings -> atoms Also changes a conditional to be a bit nicer * Remove parallelisation of data import * Split sources and UTM sources from fetched GA data GA has only a "source" dimension and no "UTM source" dimension. Instead it returns these combined. The logic herein to tease these apart is: 1. "(direct)" -> it's a direct source 2. if the source is a domain -> it's a source 3. "google" -> it's from adwords; let's make this a UTM source "adwords" 4. else -> just a UTM source * Keep prop names in queries as strings * fix typo * Fix import * Insert data to clickhouse in batches * Fix link when removing imported data * Merge source tables * Import hostname as well as pathname * Record start and end time of imported data * Track import progress * Fix month interval with imported data * Do not JOIN when imported date range has no overlap * Fix time on page using exits Co-authored-by: mcol <mcol@posteo.net>
2022-03-11 00:04:59 +03:00
%Plausible.ClickhouseEvent{} ->
Map.put(event, :domain, site.domain)
_ ->
Map.put(event, :site_id, site.id)
end
end)
|> populate_stats
end
def populate_stats(events) do
[Continued] Google Analytics import (#1753) * Add has_imported_stats boolean to Site * Add Google Analytics import panel to general settings * Get GA profiles to display in import settings panel * Add import_from_google method as entrypoint to import data * Add imported_visitors table * Remove conflicting code from migration * Import visitors data into clickhouse database * Pass another dataset to main graph for rendering in red This adds another entry to the JSON data returned via the main graph API called `imported_plot`, which is similar to `plot` in form but will be completed with previously imported data. Currently it simply returns the values from `plot` / 2. The data is rendered in the main graph in red without fill, and without an indicator for the present. Rationale: imported data will not continue to grow so there is no projection forward, only backwards. * Hook imported GA data to dashboard timeseries plot * Add settings option to forget imported data * Import sources from google analytics * Merge imported sources when queried * Merge imported source data native data when querying sources * Start converting metrics to atoms so they can be subqueried This changes "visitors" and in some places "sources" to atoms. This does not change the behaviour of the functions - the tests all pass unchanged following this commit. This is necessary as joining subqueries requires that the keys in `select` statements be atoms and not strings. * Convery GA (direct) source to empty string * Import utm campaign and utm medium from GA * format * Import all data types from GA into new tables * Handle large amounts of more data more safely * Fix some mistakes in tables * Make GA requests in chunks of 5 queries * Only display imported timeseries when there is no filter * Correctly show last 30 minutes timeseries when 'realtime' * Add with_imported key to Query struct * Account for injected :is_not filter on sources from dashboard * Also add tentative imported_utm_sources table This needs a bit more work on the google import side, as GA do not report sources and utm sources as distinct things. * Return imported data to dashboard for rest of Sources panel This extends the merge_imported function definition for sources to utm_sources, utm_mediums and utm_campaigns too. This appears to be working on the DB side but something is incomplete on the client side. * Clear imported stats from all tables when requested * Merge entry pages and exit pages from imported data into unfiltered dashboard view This requires converting the `"visits"` and `"visit_duration"` metrics to atoms so that they can be used in ecto subqueries. * Display imported devices, browsers and OSs on dashboard * Display imported country data on dashboard * Add more metrics to entries/exits for modals * make sure data is returned via API with correct keys * Import regions and cities from GA * Capitalize device upon import to match native data * Leave query limits/offsets until after possibly joining with imported data * Also import timeOnPage and pageviews for pages from GA * imported_countries -> imported_locations * Get timeOnPage and pageviews for pages from GA These are needed for the pages modal, and for calculating exit rates for exit pages. * Add indicator to dashboard when imported data is being used * Don't show imported data as separately line on main graph * "bounce_rate" -> :bounce_rate, so it works in subqueries * Drop imported browser and OS versions These are not needed. * Toggle displaying imported data by clicking indicator * Parse referrers with RefInspector - Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual referrer host + path, whereas 'ga:source' includes utm_mediums and other values when relevant. - 'ga:fullReferror' does however include search engine names directly, so they are manually checked for as RefInspector won't pick up on these. * Keep imported data indicator on dashboard and strikethrough when hidden * Add unlink google button to import panel * Rename some GA browsers and OSes to plausible versions * Get main top pages and exit pages panels working correctly with imported data * mix format * Fetch time_on_pages for imported data when needed * entry pages need to fetch bounces from GA * "sample_percent" -> :sample_percent as only atoms can be used in subqueries * Calculate bounce_rate for joined native and imported data for top pages modal * Flip some query bindings around to be less misleading * Fixup entry page modal visit durations * mix format * Fetch bounces and visit_duration for sources from GA * add more source metrics used for data in modals * Make sources modals display correct values * imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration * Merge imported data into aggregate stats * Reformat top graph side icons * Ensure sample_percent is yielded from aggregate data * filter event_props should be strings * Hide imported data from frontend when using filter * Fix existing tests * fix tests * Fix imported indicator appearing when filtering * comma needed, lost when rebasing * Import utm_terms and utm_content from GA * Merge imported utm_term and utm_content * Rename imported Countries data as Locations * Set imported city schema field to int * Remove utm_terms and utm_content when clearing imported * Clean locations import from Google Analytics - Country and region should be set to "" when GA provides "(not set)" - City should be set to 0 for "unknown", as we cannot reliably import city data from GA. * Display imported region and city in dashboard * os -> operating_system in some parts of code The inconsistency of using os in some places and operating_system in others causes trouble with subqueries and joins for the native and imported data, which would require additional logic to account for. The simplest solution is the just use a consistent word for all uses. This doesn't make any user-facing or database changes. * to_atom -> to_existing_atom * format * "events" metric -> :events * ignore imported data when "events" in metrics * update "bounce_rate" * atomise some more metrics from new city and region api * atomise some more metrics for email handlers * "conversion_rate" -> :conversion_rate during csv export * Move imported data stats code to own module * Move imported timeseries function to Stats.Imported * Use Timex.parse to import dates from GA * has_imported_stats -> imported_source * "time_on_page" -> :time_on_page * Convert imported GA data to UTC * Clean up GA request code a bit There was some weird logic here with two separate lists that really ought to be together, so this merges those. * Fail sooner if GA timezone can't be identified * Link imported tables to site by id * imported_utm_content -> imported_utm_contents * Imported GA from all of time * Reorganise GA data fetch logic - Fetch data from the start of time (2005) - Check whether no data was fetched, and if so, inform user and don't consider data to be imported. * Clarify removal of "visits" data when it isn't in metrics * Apply location filters from API This makes it consistent with the sources etc which filter out 'Direct / None' on the API side. These filters are used by both the native and imported data handling code, which would otherwise both duplicate the filters in their `where` clauses. * Do not use changeset for setting site.imported_source * Add all metrics to all dimensions * Run GA import in the background * Send email when GA import completes * Add handler to insert imported data into tests and imported_browsers_factory * Add remaining import data test factories * Add imported location data to test * Test main graph with imported data * Add imported data to operating systems tests * Add imported data to pages tests * Add imported data to entry pages tests * Add imported data to exit pages tests * Add imported data to devices tests * Add imported data to sources tests * Add imported data to UTM tests * Add new test module for the data import step * Test import of sources GA data * Test import of utm_mediums GA data * Test import of utm_campaigns GA data * Add tests for UTM terms * Add tests for UTM contents * Add test for importing pages and entry pages data from GA * Add test for importing exit page data * Fix module file name typo * Add test for importing location data from GA * Add test for importing devices data from GA * Add test for importing browsers data from GA * Add test for importing OS data from GA * Paginate GA requests to download all data * Bump clickhouse_ecto version * Move RefInspector wrapper function into module * Drop timezone transform on import * Order imported by side_id then date * More strings -> atoms Also changes a conditional to be a bit nicer * Remove parallelisation of data import * Split sources and UTM sources from fetched GA data GA has only a "source" dimension and no "UTM source" dimension. Instead it returns these combined. The logic herein to tease these apart is: 1. "(direct)" -> it's a direct source 2. if the source is a domain -> it's a source 3. "google" -> it's from adwords; let's make this a UTM source "adwords" 4. else -> just a UTM source * Keep prop names in queries as strings * fix typo * Fix import * Insert data to clickhouse in batches * Fix link when removing imported data * Merge source tables * Import hostname as well as pathname * Record start and end time of imported data * Track import progress * Fix month interval with imported data * Do not JOIN when imported date range has no overlap * Fix time on page using exits Co-authored-by: mcol <mcol@posteo.net>
2022-03-11 00:04:59 +03:00
{native, imported} =
events
|> Enum.map(fn event ->
case event do
%{timestamp: timestamp} ->
%{event | timestamp: to_naive_truncate(timestamp)}
_other ->
event
end
end)
|> Enum.split_with(fn event ->
[Continued] Google Analytics import (#1753) * Add has_imported_stats boolean to Site * Add Google Analytics import panel to general settings * Get GA profiles to display in import settings panel * Add import_from_google method as entrypoint to import data * Add imported_visitors table * Remove conflicting code from migration * Import visitors data into clickhouse database * Pass another dataset to main graph for rendering in red This adds another entry to the JSON data returned via the main graph API called `imported_plot`, which is similar to `plot` in form but will be completed with previously imported data. Currently it simply returns the values from `plot` / 2. The data is rendered in the main graph in red without fill, and without an indicator for the present. Rationale: imported data will not continue to grow so there is no projection forward, only backwards. * Hook imported GA data to dashboard timeseries plot * Add settings option to forget imported data * Import sources from google analytics * Merge imported sources when queried * Merge imported source data native data when querying sources * Start converting metrics to atoms so they can be subqueried This changes "visitors" and in some places "sources" to atoms. This does not change the behaviour of the functions - the tests all pass unchanged following this commit. This is necessary as joining subqueries requires that the keys in `select` statements be atoms and not strings. * Convery GA (direct) source to empty string * Import utm campaign and utm medium from GA * format * Import all data types from GA into new tables * Handle large amounts of more data more safely * Fix some mistakes in tables * Make GA requests in chunks of 5 queries * Only display imported timeseries when there is no filter * Correctly show last 30 minutes timeseries when 'realtime' * Add with_imported key to Query struct * Account for injected :is_not filter on sources from dashboard * Also add tentative imported_utm_sources table This needs a bit more work on the google import side, as GA do not report sources and utm sources as distinct things. * Return imported data to dashboard for rest of Sources panel This extends the merge_imported function definition for sources to utm_sources, utm_mediums and utm_campaigns too. This appears to be working on the DB side but something is incomplete on the client side. * Clear imported stats from all tables when requested * Merge entry pages and exit pages from imported data into unfiltered dashboard view This requires converting the `"visits"` and `"visit_duration"` metrics to atoms so that they can be used in ecto subqueries. * Display imported devices, browsers and OSs on dashboard * Display imported country data on dashboard * Add more metrics to entries/exits for modals * make sure data is returned via API with correct keys * Import regions and cities from GA * Capitalize device upon import to match native data * Leave query limits/offsets until after possibly joining with imported data * Also import timeOnPage and pageviews for pages from GA * imported_countries -> imported_locations * Get timeOnPage and pageviews for pages from GA These are needed for the pages modal, and for calculating exit rates for exit pages. * Add indicator to dashboard when imported data is being used * Don't show imported data as separately line on main graph * "bounce_rate" -> :bounce_rate, so it works in subqueries * Drop imported browser and OS versions These are not needed. * Toggle displaying imported data by clicking indicator * Parse referrers with RefInspector - Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual referrer host + path, whereas 'ga:source' includes utm_mediums and other values when relevant. - 'ga:fullReferror' does however include search engine names directly, so they are manually checked for as RefInspector won't pick up on these. * Keep imported data indicator on dashboard and strikethrough when hidden * Add unlink google button to import panel * Rename some GA browsers and OSes to plausible versions * Get main top pages and exit pages panels working correctly with imported data * mix format * Fetch time_on_pages for imported data when needed * entry pages need to fetch bounces from GA * "sample_percent" -> :sample_percent as only atoms can be used in subqueries * Calculate bounce_rate for joined native and imported data for top pages modal * Flip some query bindings around to be less misleading * Fixup entry page modal visit durations * mix format * Fetch bounces and visit_duration for sources from GA * add more source metrics used for data in modals * Make sources modals display correct values * imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration * Merge imported data into aggregate stats * Reformat top graph side icons * Ensure sample_percent is yielded from aggregate data * filter event_props should be strings * Hide imported data from frontend when using filter * Fix existing tests * fix tests * Fix imported indicator appearing when filtering * comma needed, lost when rebasing * Import utm_terms and utm_content from GA * Merge imported utm_term and utm_content * Rename imported Countries data as Locations * Set imported city schema field to int * Remove utm_terms and utm_content when clearing imported * Clean locations import from Google Analytics - Country and region should be set to "" when GA provides "(not set)" - City should be set to 0 for "unknown", as we cannot reliably import city data from GA. * Display imported region and city in dashboard * os -> operating_system in some parts of code The inconsistency of using os in some places and operating_system in others causes trouble with subqueries and joins for the native and imported data, which would require additional logic to account for. The simplest solution is the just use a consistent word for all uses. This doesn't make any user-facing or database changes. * to_atom -> to_existing_atom * format * "events" metric -> :events * ignore imported data when "events" in metrics * update "bounce_rate" * atomise some more metrics from new city and region api * atomise some more metrics for email handlers * "conversion_rate" -> :conversion_rate during csv export * Move imported data stats code to own module * Move imported timeseries function to Stats.Imported * Use Timex.parse to import dates from GA * has_imported_stats -> imported_source * "time_on_page" -> :time_on_page * Convert imported GA data to UTC * Clean up GA request code a bit There was some weird logic here with two separate lists that really ought to be together, so this merges those. * Fail sooner if GA timezone can't be identified * Link imported tables to site by id * imported_utm_content -> imported_utm_contents * Imported GA from all of time * Reorganise GA data fetch logic - Fetch data from the start of time (2005) - Check whether no data was fetched, and if so, inform user and don't consider data to be imported. * Clarify removal of "visits" data when it isn't in metrics * Apply location filters from API This makes it consistent with the sources etc which filter out 'Direct / None' on the API side. These filters are used by both the native and imported data handling code, which would otherwise both duplicate the filters in their `where` clauses. * Do not use changeset for setting site.imported_source * Add all metrics to all dimensions * Run GA import in the background * Send email when GA import completes * Add handler to insert imported data into tests and imported_browsers_factory * Add remaining import data test factories * Add imported location data to test * Test main graph with imported data * Add imported data to operating systems tests * Add imported data to pages tests * Add imported data to entry pages tests * Add imported data to exit pages tests * Add imported data to devices tests * Add imported data to sources tests * Add imported data to UTM tests * Add new test module for the data import step * Test import of sources GA data * Test import of utm_mediums GA data * Test import of utm_campaigns GA data * Add tests for UTM terms * Add tests for UTM contents * Add test for importing pages and entry pages data from GA * Add test for importing exit page data * Fix module file name typo * Add test for importing location data from GA * Add test for importing devices data from GA * Add test for importing browsers data from GA * Add test for importing OS data from GA * Paginate GA requests to download all data * Bump clickhouse_ecto version * Move RefInspector wrapper function into module * Drop timezone transform on import * Order imported by side_id then date * More strings -> atoms Also changes a conditional to be a bit nicer * Remove parallelisation of data import * Split sources and UTM sources from fetched GA data GA has only a "source" dimension and no "UTM source" dimension. Instead it returns these combined. The logic herein to tease these apart is: 1. "(direct)" -> it's a direct source 2. if the source is a domain -> it's a source 3. "google" -> it's from adwords; let's make this a UTM source "adwords" 4. else -> just a UTM source * Keep prop names in queries as strings * fix typo * Fix import * Insert data to clickhouse in batches * Fix link when removing imported data * Merge source tables * Import hostname as well as pathname * Record start and end time of imported data * Track import progress * Fix month interval with imported data * Do not JOIN when imported date range has no overlap * Fix time on page using exits Co-authored-by: mcol <mcol@posteo.net>
2022-03-11 00:04:59 +03:00
case event do
Conditionally support switching between v1 and v2 clickhouse schemas (#2780) * Remove ClickhouseSetup module This has been an implicit point of contact to many tests. From now on the goal is for each test to maintain its own, isolated setup so that no accidental clashes and implicit assumptions are relied upon. * Implement v2 schema check An environment variable V2_MIGRATION_DONE acts like a feature flag, switching plausible from using old events/sessions schemas to v2 schemas introduced by NumericIDs migration. * Run both test suites sequentially While the code for v1 and v2 schemas must be kept still, we will from now on run tests against both code paths. Secondary test run will set V2_MIGRATION_DONE=1 variable, thus making all `Plausible.v2?()` checks return `true'. * Remove unused function This is a remnant from the short period when we would check for existing events before allowing creating a new site. * Update test setups/factories with v2 migration check * Make GateKeeper return site id along with :allow * Make Billing module check for v2 schema * Make ingestion aware of v2 schema * Disable site transfers for when v2 is live In a separate changeset we will implement simplified site transfer for when v2 migration is complete. The new transfer will only rename the site domain in postgres and keep track of the original site prior to the transfer so we keep an ingestion grace period until the customers redeploy their scripting. * Make Stats base queries aware of v2 schema switch * Update breakdown with v2 conditionals * Update pageview local start with v2 check * Update current visitoris with v2 check * Update stats controller with v2 checks * Update external controller with v2 checks * Update remaining tests with proper fixtures * Rewrite redundant assignment * Remove unused alias * Mute credo, this is not the right time * Add test_helper prompt * Fetch priv dir so it works with a release * Fetch distinct partitions only * Don't limit inspect output for partitions * Ensure SQL is printed to IO * Remove redundant domain fixture
2023-03-27 14:52:42 +03:00
%Plausible.ClickhouseEventV2{} ->
true
[Continued] Google Analytics import (#1753) * Add has_imported_stats boolean to Site * Add Google Analytics import panel to general settings * Get GA profiles to display in import settings panel * Add import_from_google method as entrypoint to import data * Add imported_visitors table * Remove conflicting code from migration * Import visitors data into clickhouse database * Pass another dataset to main graph for rendering in red This adds another entry to the JSON data returned via the main graph API called `imported_plot`, which is similar to `plot` in form but will be completed with previously imported data. Currently it simply returns the values from `plot` / 2. The data is rendered in the main graph in red without fill, and without an indicator for the present. Rationale: imported data will not continue to grow so there is no projection forward, only backwards. * Hook imported GA data to dashboard timeseries plot * Add settings option to forget imported data * Import sources from google analytics * Merge imported sources when queried * Merge imported source data native data when querying sources * Start converting metrics to atoms so they can be subqueried This changes "visitors" and in some places "sources" to atoms. This does not change the behaviour of the functions - the tests all pass unchanged following this commit. This is necessary as joining subqueries requires that the keys in `select` statements be atoms and not strings. * Convery GA (direct) source to empty string * Import utm campaign and utm medium from GA * format * Import all data types from GA into new tables * Handle large amounts of more data more safely * Fix some mistakes in tables * Make GA requests in chunks of 5 queries * Only display imported timeseries when there is no filter * Correctly show last 30 minutes timeseries when 'realtime' * Add with_imported key to Query struct * Account for injected :is_not filter on sources from dashboard * Also add tentative imported_utm_sources table This needs a bit more work on the google import side, as GA do not report sources and utm sources as distinct things. * Return imported data to dashboard for rest of Sources panel This extends the merge_imported function definition for sources to utm_sources, utm_mediums and utm_campaigns too. This appears to be working on the DB side but something is incomplete on the client side. * Clear imported stats from all tables when requested * Merge entry pages and exit pages from imported data into unfiltered dashboard view This requires converting the `"visits"` and `"visit_duration"` metrics to atoms so that they can be used in ecto subqueries. * Display imported devices, browsers and OSs on dashboard * Display imported country data on dashboard * Add more metrics to entries/exits for modals * make sure data is returned via API with correct keys * Import regions and cities from GA * Capitalize device upon import to match native data * Leave query limits/offsets until after possibly joining with imported data * Also import timeOnPage and pageviews for pages from GA * imported_countries -> imported_locations * Get timeOnPage and pageviews for pages from GA These are needed for the pages modal, and for calculating exit rates for exit pages. * Add indicator to dashboard when imported data is being used * Don't show imported data as separately line on main graph * "bounce_rate" -> :bounce_rate, so it works in subqueries * Drop imported browser and OS versions These are not needed. * Toggle displaying imported data by clicking indicator * Parse referrers with RefInspector - Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual referrer host + path, whereas 'ga:source' includes utm_mediums and other values when relevant. - 'ga:fullReferror' does however include search engine names directly, so they are manually checked for as RefInspector won't pick up on these. * Keep imported data indicator on dashboard and strikethrough when hidden * Add unlink google button to import panel * Rename some GA browsers and OSes to plausible versions * Get main top pages and exit pages panels working correctly with imported data * mix format * Fetch time_on_pages for imported data when needed * entry pages need to fetch bounces from GA * "sample_percent" -> :sample_percent as only atoms can be used in subqueries * Calculate bounce_rate for joined native and imported data for top pages modal * Flip some query bindings around to be less misleading * Fixup entry page modal visit durations * mix format * Fetch bounces and visit_duration for sources from GA * add more source metrics used for data in modals * Make sources modals display correct values * imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration * Merge imported data into aggregate stats * Reformat top graph side icons * Ensure sample_percent is yielded from aggregate data * filter event_props should be strings * Hide imported data from frontend when using filter * Fix existing tests * fix tests * Fix imported indicator appearing when filtering * comma needed, lost when rebasing * Import utm_terms and utm_content from GA * Merge imported utm_term and utm_content * Rename imported Countries data as Locations * Set imported city schema field to int * Remove utm_terms and utm_content when clearing imported * Clean locations import from Google Analytics - Country and region should be set to "" when GA provides "(not set)" - City should be set to 0 for "unknown", as we cannot reliably import city data from GA. * Display imported region and city in dashboard * os -> operating_system in some parts of code The inconsistency of using os in some places and operating_system in others causes trouble with subqueries and joins for the native and imported data, which would require additional logic to account for. The simplest solution is the just use a consistent word for all uses. This doesn't make any user-facing or database changes. * to_atom -> to_existing_atom * format * "events" metric -> :events * ignore imported data when "events" in metrics * update "bounce_rate" * atomise some more metrics from new city and region api * atomise some more metrics for email handlers * "conversion_rate" -> :conversion_rate during csv export * Move imported data stats code to own module * Move imported timeseries function to Stats.Imported * Use Timex.parse to import dates from GA * has_imported_stats -> imported_source * "time_on_page" -> :time_on_page * Convert imported GA data to UTC * Clean up GA request code a bit There was some weird logic here with two separate lists that really ought to be together, so this merges those. * Fail sooner if GA timezone can't be identified * Link imported tables to site by id * imported_utm_content -> imported_utm_contents * Imported GA from all of time * Reorganise GA data fetch logic - Fetch data from the start of time (2005) - Check whether no data was fetched, and if so, inform user and don't consider data to be imported. * Clarify removal of "visits" data when it isn't in metrics * Apply location filters from API This makes it consistent with the sources etc which filter out 'Direct / None' on the API side. These filters are used by both the native and imported data handling code, which would otherwise both duplicate the filters in their `where` clauses. * Do not use changeset for setting site.imported_source * Add all metrics to all dimensions * Run GA import in the background * Send email when GA import completes * Add handler to insert imported data into tests and imported_browsers_factory * Add remaining import data test factories * Add imported location data to test * Test main graph with imported data * Add imported data to operating systems tests * Add imported data to pages tests * Add imported data to entry pages tests * Add imported data to exit pages tests * Add imported data to devices tests * Add imported data to sources tests * Add imported data to UTM tests * Add new test module for the data import step * Test import of sources GA data * Test import of utm_mediums GA data * Test import of utm_campaigns GA data * Add tests for UTM terms * Add tests for UTM contents * Add test for importing pages and entry pages data from GA * Add test for importing exit page data * Fix module file name typo * Add test for importing location data from GA * Add test for importing devices data from GA * Add test for importing browsers data from GA * Add test for importing OS data from GA * Paginate GA requests to download all data * Bump clickhouse_ecto version * Move RefInspector wrapper function into module * Drop timezone transform on import * Order imported by side_id then date * More strings -> atoms Also changes a conditional to be a bit nicer * Remove parallelisation of data import * Split sources and UTM sources from fetched GA data GA has only a "source" dimension and no "UTM source" dimension. Instead it returns these combined. The logic herein to tease these apart is: 1. "(direct)" -> it's a direct source 2. if the source is a domain -> it's a source 3. "google" -> it's from adwords; let's make this a UTM source "adwords" 4. else -> just a UTM source * Keep prop names in queries as strings * fix typo * Fix import * Insert data to clickhouse in batches * Fix link when removing imported data * Merge source tables * Import hostname as well as pathname * Record start and end time of imported data * Track import progress * Fix month interval with imported data * Do not JOIN when imported date range has no overlap * Fix time on page using exits Co-authored-by: mcol <mcol@posteo.net>
2022-03-11 00:04:59 +03:00
%Plausible.ClickhouseEvent{} ->
true
_ ->
false
end
end)
populate_native_stats(native)
populate_imported_stats(imported)
[Continued] Google Analytics import (#1753) * Add has_imported_stats boolean to Site * Add Google Analytics import panel to general settings * Get GA profiles to display in import settings panel * Add import_from_google method as entrypoint to import data * Add imported_visitors table * Remove conflicting code from migration * Import visitors data into clickhouse database * Pass another dataset to main graph for rendering in red This adds another entry to the JSON data returned via the main graph API called `imported_plot`, which is similar to `plot` in form but will be completed with previously imported data. Currently it simply returns the values from `plot` / 2. The data is rendered in the main graph in red without fill, and without an indicator for the present. Rationale: imported data will not continue to grow so there is no projection forward, only backwards. * Hook imported GA data to dashboard timeseries plot * Add settings option to forget imported data * Import sources from google analytics * Merge imported sources when queried * Merge imported source data native data when querying sources * Start converting metrics to atoms so they can be subqueried This changes "visitors" and in some places "sources" to atoms. This does not change the behaviour of the functions - the tests all pass unchanged following this commit. This is necessary as joining subqueries requires that the keys in `select` statements be atoms and not strings. * Convery GA (direct) source to empty string * Import utm campaign and utm medium from GA * format * Import all data types from GA into new tables * Handle large amounts of more data more safely * Fix some mistakes in tables * Make GA requests in chunks of 5 queries * Only display imported timeseries when there is no filter * Correctly show last 30 minutes timeseries when 'realtime' * Add with_imported key to Query struct * Account for injected :is_not filter on sources from dashboard * Also add tentative imported_utm_sources table This needs a bit more work on the google import side, as GA do not report sources and utm sources as distinct things. * Return imported data to dashboard for rest of Sources panel This extends the merge_imported function definition for sources to utm_sources, utm_mediums and utm_campaigns too. This appears to be working on the DB side but something is incomplete on the client side. * Clear imported stats from all tables when requested * Merge entry pages and exit pages from imported data into unfiltered dashboard view This requires converting the `"visits"` and `"visit_duration"` metrics to atoms so that they can be used in ecto subqueries. * Display imported devices, browsers and OSs on dashboard * Display imported country data on dashboard * Add more metrics to entries/exits for modals * make sure data is returned via API with correct keys * Import regions and cities from GA * Capitalize device upon import to match native data * Leave query limits/offsets until after possibly joining with imported data * Also import timeOnPage and pageviews for pages from GA * imported_countries -> imported_locations * Get timeOnPage and pageviews for pages from GA These are needed for the pages modal, and for calculating exit rates for exit pages. * Add indicator to dashboard when imported data is being used * Don't show imported data as separately line on main graph * "bounce_rate" -> :bounce_rate, so it works in subqueries * Drop imported browser and OS versions These are not needed. * Toggle displaying imported data by clicking indicator * Parse referrers with RefInspector - Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual referrer host + path, whereas 'ga:source' includes utm_mediums and other values when relevant. - 'ga:fullReferror' does however include search engine names directly, so they are manually checked for as RefInspector won't pick up on these. * Keep imported data indicator on dashboard and strikethrough when hidden * Add unlink google button to import panel * Rename some GA browsers and OSes to plausible versions * Get main top pages and exit pages panels working correctly with imported data * mix format * Fetch time_on_pages for imported data when needed * entry pages need to fetch bounces from GA * "sample_percent" -> :sample_percent as only atoms can be used in subqueries * Calculate bounce_rate for joined native and imported data for top pages modal * Flip some query bindings around to be less misleading * Fixup entry page modal visit durations * mix format * Fetch bounces and visit_duration for sources from GA * add more source metrics used for data in modals * Make sources modals display correct values * imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration * Merge imported data into aggregate stats * Reformat top graph side icons * Ensure sample_percent is yielded from aggregate data * filter event_props should be strings * Hide imported data from frontend when using filter * Fix existing tests * fix tests * Fix imported indicator appearing when filtering * comma needed, lost when rebasing * Import utm_terms and utm_content from GA * Merge imported utm_term and utm_content * Rename imported Countries data as Locations * Set imported city schema field to int * Remove utm_terms and utm_content when clearing imported * Clean locations import from Google Analytics - Country and region should be set to "" when GA provides "(not set)" - City should be set to 0 for "unknown", as we cannot reliably import city data from GA. * Display imported region and city in dashboard * os -> operating_system in some parts of code The inconsistency of using os in some places and operating_system in others causes trouble with subqueries and joins for the native and imported data, which would require additional logic to account for. The simplest solution is the just use a consistent word for all uses. This doesn't make any user-facing or database changes. * to_atom -> to_existing_atom * format * "events" metric -> :events * ignore imported data when "events" in metrics * update "bounce_rate" * atomise some more metrics from new city and region api * atomise some more metrics for email handlers * "conversion_rate" -> :conversion_rate during csv export * Move imported data stats code to own module * Move imported timeseries function to Stats.Imported * Use Timex.parse to import dates from GA * has_imported_stats -> imported_source * "time_on_page" -> :time_on_page * Convert imported GA data to UTC * Clean up GA request code a bit There was some weird logic here with two separate lists that really ought to be together, so this merges those. * Fail sooner if GA timezone can't be identified * Link imported tables to site by id * imported_utm_content -> imported_utm_contents * Imported GA from all of time * Reorganise GA data fetch logic - Fetch data from the start of time (2005) - Check whether no data was fetched, and if so, inform user and don't consider data to be imported. * Clarify removal of "visits" data when it isn't in metrics * Apply location filters from API This makes it consistent with the sources etc which filter out 'Direct / None' on the API side. These filters are used by both the native and imported data handling code, which would otherwise both duplicate the filters in their `where` clauses. * Do not use changeset for setting site.imported_source * Add all metrics to all dimensions * Run GA import in the background * Send email when GA import completes * Add handler to insert imported data into tests and imported_browsers_factory * Add remaining import data test factories * Add imported location data to test * Test main graph with imported data * Add imported data to operating systems tests * Add imported data to pages tests * Add imported data to entry pages tests * Add imported data to exit pages tests * Add imported data to devices tests * Add imported data to sources tests * Add imported data to UTM tests * Add new test module for the data import step * Test import of sources GA data * Test import of utm_mediums GA data * Test import of utm_campaigns GA data * Add tests for UTM terms * Add tests for UTM contents * Add test for importing pages and entry pages data from GA * Add test for importing exit page data * Fix module file name typo * Add test for importing location data from GA * Add test for importing devices data from GA * Add test for importing browsers data from GA * Add test for importing OS data from GA * Paginate GA requests to download all data * Bump clickhouse_ecto version * Move RefInspector wrapper function into module * Drop timezone transform on import * Order imported by side_id then date * More strings -> atoms Also changes a conditional to be a bit nicer * Remove parallelisation of data import * Split sources and UTM sources from fetched GA data GA has only a "source" dimension and no "UTM source" dimension. Instead it returns these combined. The logic herein to tease these apart is: 1. "(direct)" -> it's a direct source 2. if the source is a domain -> it's a source 3. "google" -> it's from adwords; let's make this a UTM source "adwords" 4. else -> just a UTM source * Keep prop names in queries as strings * fix typo * Fix import * Insert data to clickhouse in batches * Fix link when removing imported data * Merge source tables * Import hostname as well as pathname * Record start and end time of imported data * Track import progress * Fix month interval with imported data * Do not JOIN when imported date range has no overlap * Fix time on page using exits Co-authored-by: mcol <mcol@posteo.net>
2022-03-11 00:04:59 +03:00
end
defp populate_native_stats(events) do
sessions =
Enum.reduce(events, %{}, fn event, sessions ->
session_id = Plausible.Session.CacheStore.on_event(event, nil)
Conditionally support switching between v1 and v2 clickhouse schemas (#2780) * Remove ClickhouseSetup module This has been an implicit point of contact to many tests. From now on the goal is for each test to maintain its own, isolated setup so that no accidental clashes and implicit assumptions are relied upon. * Implement v2 schema check An environment variable V2_MIGRATION_DONE acts like a feature flag, switching plausible from using old events/sessions schemas to v2 schemas introduced by NumericIDs migration. * Run both test suites sequentially While the code for v1 and v2 schemas must be kept still, we will from now on run tests against both code paths. Secondary test run will set V2_MIGRATION_DONE=1 variable, thus making all `Plausible.v2?()` checks return `true'. * Remove unused function This is a remnant from the short period when we would check for existing events before allowing creating a new site. * Update test setups/factories with v2 migration check * Make GateKeeper return site id along with :allow * Make Billing module check for v2 schema * Make ingestion aware of v2 schema * Disable site transfers for when v2 is live In a separate changeset we will implement simplified site transfer for when v2 migration is complete. The new transfer will only rename the site domain in postgres and keep track of the original site prior to the transfer so we keep an ingestion grace period until the customers redeploy their scripting. * Make Stats base queries aware of v2 schema switch * Update breakdown with v2 conditionals * Update pageview local start with v2 check * Update current visitoris with v2 check * Update stats controller with v2 checks * Update external controller with v2 checks * Update remaining tests with proper fixtures * Rewrite redundant assignment * Remove unused alias * Mute credo, this is not the right time * Add test_helper prompt * Fetch priv dir so it works with a release * Fetch distinct partitions only * Don't limit inspect output for partitions * Ensure SQL is printed to IO * Remove redundant domain fixture
2023-03-27 14:52:42 +03:00
if Plausible.v2?() do
Map.put(sessions, {event.site_id, event.user_id}, session_id)
else
Map.put(sessions, {event.domain, event.user_id}, session_id)
end
end)
Enum.each(events, fn event ->
Conditionally support switching between v1 and v2 clickhouse schemas (#2780) * Remove ClickhouseSetup module This has been an implicit point of contact to many tests. From now on the goal is for each test to maintain its own, isolated setup so that no accidental clashes and implicit assumptions are relied upon. * Implement v2 schema check An environment variable V2_MIGRATION_DONE acts like a feature flag, switching plausible from using old events/sessions schemas to v2 schemas introduced by NumericIDs migration. * Run both test suites sequentially While the code for v1 and v2 schemas must be kept still, we will from now on run tests against both code paths. Secondary test run will set V2_MIGRATION_DONE=1 variable, thus making all `Plausible.v2?()` checks return `true'. * Remove unused function This is a remnant from the short period when we would check for existing events before allowing creating a new site. * Update test setups/factories with v2 migration check * Make GateKeeper return site id along with :allow * Make Billing module check for v2 schema * Make ingestion aware of v2 schema * Disable site transfers for when v2 is live In a separate changeset we will implement simplified site transfer for when v2 migration is complete. The new transfer will only rename the site domain in postgres and keep track of the original site prior to the transfer so we keep an ingestion grace period until the customers redeploy their scripting. * Make Stats base queries aware of v2 schema switch * Update breakdown with v2 conditionals * Update pageview local start with v2 check * Update current visitoris with v2 check * Update stats controller with v2 checks * Update external controller with v2 checks * Update remaining tests with proper fixtures * Rewrite redundant assignment * Remove unused alias * Mute credo, this is not the right time * Add test_helper prompt * Fetch priv dir so it works with a release * Fetch distinct partitions only * Don't limit inspect output for partitions * Ensure SQL is printed to IO * Remove redundant domain fixture
2023-03-27 14:52:42 +03:00
event =
if Plausible.v2?() do
Map.put(event, :session_id, sessions[{event.site_id, event.user_id}])
else
Map.put(event, :session_id, sessions[{event.domain, event.user_id}])
end
Plausible.Event.WriteBuffer.insert(event)
end)
Plausible.Session.WriteBuffer.flush()
Plausible.Event.WriteBuffer.flush()
end
[Continued] Google Analytics import (#1753) * Add has_imported_stats boolean to Site * Add Google Analytics import panel to general settings * Get GA profiles to display in import settings panel * Add import_from_google method as entrypoint to import data * Add imported_visitors table * Remove conflicting code from migration * Import visitors data into clickhouse database * Pass another dataset to main graph for rendering in red This adds another entry to the JSON data returned via the main graph API called `imported_plot`, which is similar to `plot` in form but will be completed with previously imported data. Currently it simply returns the values from `plot` / 2. The data is rendered in the main graph in red without fill, and without an indicator for the present. Rationale: imported data will not continue to grow so there is no projection forward, only backwards. * Hook imported GA data to dashboard timeseries plot * Add settings option to forget imported data * Import sources from google analytics * Merge imported sources when queried * Merge imported source data native data when querying sources * Start converting metrics to atoms so they can be subqueried This changes "visitors" and in some places "sources" to atoms. This does not change the behaviour of the functions - the tests all pass unchanged following this commit. This is necessary as joining subqueries requires that the keys in `select` statements be atoms and not strings. * Convery GA (direct) source to empty string * Import utm campaign and utm medium from GA * format * Import all data types from GA into new tables * Handle large amounts of more data more safely * Fix some mistakes in tables * Make GA requests in chunks of 5 queries * Only display imported timeseries when there is no filter * Correctly show last 30 minutes timeseries when 'realtime' * Add with_imported key to Query struct * Account for injected :is_not filter on sources from dashboard * Also add tentative imported_utm_sources table This needs a bit more work on the google import side, as GA do not report sources and utm sources as distinct things. * Return imported data to dashboard for rest of Sources panel This extends the merge_imported function definition for sources to utm_sources, utm_mediums and utm_campaigns too. This appears to be working on the DB side but something is incomplete on the client side. * Clear imported stats from all tables when requested * Merge entry pages and exit pages from imported data into unfiltered dashboard view This requires converting the `"visits"` and `"visit_duration"` metrics to atoms so that they can be used in ecto subqueries. * Display imported devices, browsers and OSs on dashboard * Display imported country data on dashboard * Add more metrics to entries/exits for modals * make sure data is returned via API with correct keys * Import regions and cities from GA * Capitalize device upon import to match native data * Leave query limits/offsets until after possibly joining with imported data * Also import timeOnPage and pageviews for pages from GA * imported_countries -> imported_locations * Get timeOnPage and pageviews for pages from GA These are needed for the pages modal, and for calculating exit rates for exit pages. * Add indicator to dashboard when imported data is being used * Don't show imported data as separately line on main graph * "bounce_rate" -> :bounce_rate, so it works in subqueries * Drop imported browser and OS versions These are not needed. * Toggle displaying imported data by clicking indicator * Parse referrers with RefInspector - Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual referrer host + path, whereas 'ga:source' includes utm_mediums and other values when relevant. - 'ga:fullReferror' does however include search engine names directly, so they are manually checked for as RefInspector won't pick up on these. * Keep imported data indicator on dashboard and strikethrough when hidden * Add unlink google button to import panel * Rename some GA browsers and OSes to plausible versions * Get main top pages and exit pages panels working correctly with imported data * mix format * Fetch time_on_pages for imported data when needed * entry pages need to fetch bounces from GA * "sample_percent" -> :sample_percent as only atoms can be used in subqueries * Calculate bounce_rate for joined native and imported data for top pages modal * Flip some query bindings around to be less misleading * Fixup entry page modal visit durations * mix format * Fetch bounces and visit_duration for sources from GA * add more source metrics used for data in modals * Make sources modals display correct values * imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration * Merge imported data into aggregate stats * Reformat top graph side icons * Ensure sample_percent is yielded from aggregate data * filter event_props should be strings * Hide imported data from frontend when using filter * Fix existing tests * fix tests * Fix imported indicator appearing when filtering * comma needed, lost when rebasing * Import utm_terms and utm_content from GA * Merge imported utm_term and utm_content * Rename imported Countries data as Locations * Set imported city schema field to int * Remove utm_terms and utm_content when clearing imported * Clean locations import from Google Analytics - Country and region should be set to "" when GA provides "(not set)" - City should be set to 0 for "unknown", as we cannot reliably import city data from GA. * Display imported region and city in dashboard * os -> operating_system in some parts of code The inconsistency of using os in some places and operating_system in others causes trouble with subqueries and joins for the native and imported data, which would require additional logic to account for. The simplest solution is the just use a consistent word for all uses. This doesn't make any user-facing or database changes. * to_atom -> to_existing_atom * format * "events" metric -> :events * ignore imported data when "events" in metrics * update "bounce_rate" * atomise some more metrics from new city and region api * atomise some more metrics for email handlers * "conversion_rate" -> :conversion_rate during csv export * Move imported data stats code to own module * Move imported timeseries function to Stats.Imported * Use Timex.parse to import dates from GA * has_imported_stats -> imported_source * "time_on_page" -> :time_on_page * Convert imported GA data to UTC * Clean up GA request code a bit There was some weird logic here with two separate lists that really ought to be together, so this merges those. * Fail sooner if GA timezone can't be identified * Link imported tables to site by id * imported_utm_content -> imported_utm_contents * Imported GA from all of time * Reorganise GA data fetch logic - Fetch data from the start of time (2005) - Check whether no data was fetched, and if so, inform user and don't consider data to be imported. * Clarify removal of "visits" data when it isn't in metrics * Apply location filters from API This makes it consistent with the sources etc which filter out 'Direct / None' on the API side. These filters are used by both the native and imported data handling code, which would otherwise both duplicate the filters in their `where` clauses. * Do not use changeset for setting site.imported_source * Add all metrics to all dimensions * Run GA import in the background * Send email when GA import completes * Add handler to insert imported data into tests and imported_browsers_factory * Add remaining import data test factories * Add imported location data to test * Test main graph with imported data * Add imported data to operating systems tests * Add imported data to pages tests * Add imported data to entry pages tests * Add imported data to exit pages tests * Add imported data to devices tests * Add imported data to sources tests * Add imported data to UTM tests * Add new test module for the data import step * Test import of sources GA data * Test import of utm_mediums GA data * Test import of utm_campaigns GA data * Add tests for UTM terms * Add tests for UTM contents * Add test for importing pages and entry pages data from GA * Add test for importing exit page data * Fix module file name typo * Add test for importing location data from GA * Add test for importing devices data from GA * Add test for importing browsers data from GA * Add test for importing OS data from GA * Paginate GA requests to download all data * Bump clickhouse_ecto version * Move RefInspector wrapper function into module * Drop timezone transform on import * Order imported by side_id then date * More strings -> atoms Also changes a conditional to be a bit nicer * Remove parallelisation of data import * Split sources and UTM sources from fetched GA data GA has only a "source" dimension and no "UTM source" dimension. Instead it returns these combined. The logic herein to tease these apart is: 1. "(direct)" -> it's a direct source 2. if the source is a domain -> it's a source 3. "google" -> it's from adwords; let's make this a UTM source "adwords" 4. else -> just a UTM source * Keep prop names in queries as strings * fix typo * Fix import * Insert data to clickhouse in batches * Fix link when removing imported data * Merge source tables * Import hostname as well as pathname * Record start and end time of imported data * Track import progress * Fix month interval with imported data * Do not JOIN when imported date range has no overlap * Fix time on page using exits Co-authored-by: mcol <mcol@posteo.net>
2022-03-11 00:04:59 +03:00
defp populate_imported_stats(events) do
Enum.group_by(events, &Map.fetch!(&1, :table), &Map.delete(&1, :table))
|> Enum.map(fn {table, events} -> Plausible.Google.Buffer.insert_all(table, events) end)
[Continued] Google Analytics import (#1753) * Add has_imported_stats boolean to Site * Add Google Analytics import panel to general settings * Get GA profiles to display in import settings panel * Add import_from_google method as entrypoint to import data * Add imported_visitors table * Remove conflicting code from migration * Import visitors data into clickhouse database * Pass another dataset to main graph for rendering in red This adds another entry to the JSON data returned via the main graph API called `imported_plot`, which is similar to `plot` in form but will be completed with previously imported data. Currently it simply returns the values from `plot` / 2. The data is rendered in the main graph in red without fill, and without an indicator for the present. Rationale: imported data will not continue to grow so there is no projection forward, only backwards. * Hook imported GA data to dashboard timeseries plot * Add settings option to forget imported data * Import sources from google analytics * Merge imported sources when queried * Merge imported source data native data when querying sources * Start converting metrics to atoms so they can be subqueried This changes "visitors" and in some places "sources" to atoms. This does not change the behaviour of the functions - the tests all pass unchanged following this commit. This is necessary as joining subqueries requires that the keys in `select` statements be atoms and not strings. * Convery GA (direct) source to empty string * Import utm campaign and utm medium from GA * format * Import all data types from GA into new tables * Handle large amounts of more data more safely * Fix some mistakes in tables * Make GA requests in chunks of 5 queries * Only display imported timeseries when there is no filter * Correctly show last 30 minutes timeseries when 'realtime' * Add with_imported key to Query struct * Account for injected :is_not filter on sources from dashboard * Also add tentative imported_utm_sources table This needs a bit more work on the google import side, as GA do not report sources and utm sources as distinct things. * Return imported data to dashboard for rest of Sources panel This extends the merge_imported function definition for sources to utm_sources, utm_mediums and utm_campaigns too. This appears to be working on the DB side but something is incomplete on the client side. * Clear imported stats from all tables when requested * Merge entry pages and exit pages from imported data into unfiltered dashboard view This requires converting the `"visits"` and `"visit_duration"` metrics to atoms so that they can be used in ecto subqueries. * Display imported devices, browsers and OSs on dashboard * Display imported country data on dashboard * Add more metrics to entries/exits for modals * make sure data is returned via API with correct keys * Import regions and cities from GA * Capitalize device upon import to match native data * Leave query limits/offsets until after possibly joining with imported data * Also import timeOnPage and pageviews for pages from GA * imported_countries -> imported_locations * Get timeOnPage and pageviews for pages from GA These are needed for the pages modal, and for calculating exit rates for exit pages. * Add indicator to dashboard when imported data is being used * Don't show imported data as separately line on main graph * "bounce_rate" -> :bounce_rate, so it works in subqueries * Drop imported browser and OS versions These are not needed. * Toggle displaying imported data by clicking indicator * Parse referrers with RefInspector - Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual referrer host + path, whereas 'ga:source' includes utm_mediums and other values when relevant. - 'ga:fullReferror' does however include search engine names directly, so they are manually checked for as RefInspector won't pick up on these. * Keep imported data indicator on dashboard and strikethrough when hidden * Add unlink google button to import panel * Rename some GA browsers and OSes to plausible versions * Get main top pages and exit pages panels working correctly with imported data * mix format * Fetch time_on_pages for imported data when needed * entry pages need to fetch bounces from GA * "sample_percent" -> :sample_percent as only atoms can be used in subqueries * Calculate bounce_rate for joined native and imported data for top pages modal * Flip some query bindings around to be less misleading * Fixup entry page modal visit durations * mix format * Fetch bounces and visit_duration for sources from GA * add more source metrics used for data in modals * Make sources modals display correct values * imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration * Merge imported data into aggregate stats * Reformat top graph side icons * Ensure sample_percent is yielded from aggregate data * filter event_props should be strings * Hide imported data from frontend when using filter * Fix existing tests * fix tests * Fix imported indicator appearing when filtering * comma needed, lost when rebasing * Import utm_terms and utm_content from GA * Merge imported utm_term and utm_content * Rename imported Countries data as Locations * Set imported city schema field to int * Remove utm_terms and utm_content when clearing imported * Clean locations import from Google Analytics - Country and region should be set to "" when GA provides "(not set)" - City should be set to 0 for "unknown", as we cannot reliably import city data from GA. * Display imported region and city in dashboard * os -> operating_system in some parts of code The inconsistency of using os in some places and operating_system in others causes trouble with subqueries and joins for the native and imported data, which would require additional logic to account for. The simplest solution is the just use a consistent word for all uses. This doesn't make any user-facing or database changes. * to_atom -> to_existing_atom * format * "events" metric -> :events * ignore imported data when "events" in metrics * update "bounce_rate" * atomise some more metrics from new city and region api * atomise some more metrics for email handlers * "conversion_rate" -> :conversion_rate during csv export * Move imported data stats code to own module * Move imported timeseries function to Stats.Imported * Use Timex.parse to import dates from GA * has_imported_stats -> imported_source * "time_on_page" -> :time_on_page * Convert imported GA data to UTC * Clean up GA request code a bit There was some weird logic here with two separate lists that really ought to be together, so this merges those. * Fail sooner if GA timezone can't be identified * Link imported tables to site by id * imported_utm_content -> imported_utm_contents * Imported GA from all of time * Reorganise GA data fetch logic - Fetch data from the start of time (2005) - Check whether no data was fetched, and if so, inform user and don't consider data to be imported. * Clarify removal of "visits" data when it isn't in metrics * Apply location filters from API This makes it consistent with the sources etc which filter out 'Direct / None' on the API side. These filters are used by both the native and imported data handling code, which would otherwise both duplicate the filters in their `where` clauses. * Do not use changeset for setting site.imported_source * Add all metrics to all dimensions * Run GA import in the background * Send email when GA import completes * Add handler to insert imported data into tests and imported_browsers_factory * Add remaining import data test factories * Add imported location data to test * Test main graph with imported data * Add imported data to operating systems tests * Add imported data to pages tests * Add imported data to entry pages tests * Add imported data to exit pages tests * Add imported data to devices tests * Add imported data to sources tests * Add imported data to UTM tests * Add new test module for the data import step * Test import of sources GA data * Test import of utm_mediums GA data * Test import of utm_campaigns GA data * Add tests for UTM terms * Add tests for UTM contents * Add test for importing pages and entry pages data from GA * Add test for importing exit page data * Fix module file name typo * Add test for importing location data from GA * Add test for importing devices data from GA * Add test for importing browsers data from GA * Add test for importing OS data from GA * Paginate GA requests to download all data * Bump clickhouse_ecto version * Move RefInspector wrapper function into module * Drop timezone transform on import * Order imported by side_id then date * More strings -> atoms Also changes a conditional to be a bit nicer * Remove parallelisation of data import * Split sources and UTM sources from fetched GA data GA has only a "source" dimension and no "UTM source" dimension. Instead it returns these combined. The logic herein to tease these apart is: 1. "(direct)" -> it's a direct source 2. if the source is a domain -> it's a source 3. "google" -> it's from adwords; let's make this a UTM source "adwords" 4. else -> just a UTM source * Keep prop names in queries as strings * fix typo * Fix import * Insert data to clickhouse in batches * Fix link when removing imported data * Merge source tables * Import hostname as well as pathname * Record start and end time of imported data * Track import progress * Fix month interval with imported data * Do not JOIN when imported date range has no overlap * Fix time on page using exits Co-authored-by: mcol <mcol@posteo.net>
2022-03-11 00:04:59 +03:00
end
def relative_time(shifts) do
NaiveDateTime.utc_now()
|> Timex.shift(shifts)
|> NaiveDateTime.truncate(:second)
end
def to_naive_truncate(%DateTime{} = dt) do
to_naive_truncate(DateTime.to_naive(dt))
end
def to_naive_truncate(%NaiveDateTime{} = naive) do
NaiveDateTime.truncate(naive, :second)
end
def eventually(expectation, wait_time_ms \\ 50, retries \\ 10) do
2022-10-13 15:56:25 +03:00
Enum.reduce_while(1..retries, nil, fn attempt, _acc ->
case expectation.() do
{true, result} ->
{:halt, result}
{false, _} ->
2022-10-13 15:56:25 +03:00
Process.sleep(wait_time_ms * attempt)
{:cont, nil}
end
end)
end
def await_clickhouse_count(query, expected) do
eventually(
fn ->
count = Plausible.ClickhouseRepo.aggregate(query, :count)
{count == expected, count}
end,
200,
10
)
end
2019-09-02 14:29:19 +03:00
end