analytics/priv/repo/migrations
Uku Taht e27734ed79
[Continued] Google Analytics import (#1753)
* Add has_imported_stats boolean to Site

* Add Google Analytics import panel to general settings

* Get GA profiles to display in import settings panel

* Add import_from_google method as entrypoint to import data

* Add imported_visitors table

* Remove conflicting code from migration

* Import visitors data into clickhouse database

* Pass another dataset to main graph for rendering in red

This adds another entry to the JSON data returned via the main graph API
called `imported_plot`, which is similar to `plot` in form but will be
completed with previously imported data.  Currently it simply returns
the values from `plot` / 2. The data is rendered in the main graph in
red without fill, and without an indicator for the present. Rationale:
imported data will not continue to grow so there is no projection
forward, only backwards.

* Hook imported GA data to dashboard timeseries plot

* Add settings option to forget imported data

* Import sources from google analytics

* Merge imported sources when queried

* Merge imported source data native data when querying sources

* Start converting metrics to atoms so they can be subqueried

This changes "visitors" and in some places "sources" to atoms. This does
not change the behaviour of the functions - the tests all pass unchanged
following this commit. This is necessary as joining subqueries requires
that the keys in `select` statements be atoms and not strings.

* Convery GA (direct) source to empty string

* Import utm campaign and utm medium from GA

* format

* Import all data types from GA into new tables

* Handle large amounts of more data more safely

* Fix some mistakes in tables

* Make GA requests in chunks of 5 queries

* Only display imported timeseries when there is no filter

* Correctly show last 30 minutes timeseries when 'realtime'

* Add with_imported key to Query struct

* Account for injected :is_not filter on sources from dashboard

* Also add tentative imported_utm_sources table

This needs a bit more work on the google import side, as GA do not
report sources and utm sources as distinct things.

* Return imported data to dashboard for rest of Sources panel

This extends the merge_imported function definition for sources to
utm_sources, utm_mediums and utm_campaigns too. This appears to be
working on the DB side but something is incomplete on the client side.

* Clear imported stats from all tables when requested

* Merge entry pages and exit pages from imported data into unfiltered dashboard view

This requires converting the `"visits"` and `"visit_duration"` metrics
to atoms so that they can be used in ecto subqueries.

* Display imported devices, browsers and OSs on dashboard

* Display imported country data on dashboard

* Add more metrics to entries/exits for modals

* make sure data is returned via API with correct keys

* Import regions and cities from GA

* Capitalize device upon import to match native data

* Leave query limits/offsets until after possibly joining with imported data

* Also import timeOnPage and pageviews for pages from GA

* imported_countries -> imported_locations

* Get timeOnPage and pageviews for pages from GA

These are needed for the pages modal, and for calculating exit rates for
exit pages.

* Add indicator to dashboard when imported data is being used

* Don't show imported data as separately line on main graph

* "bounce_rate" -> :bounce_rate, so it works in subqueries

* Drop imported browser and OS versions

These are not needed.

* Toggle displaying imported data by clicking indicator

* Parse referrers with RefInspector

- Use 'ga:fullReferrer' instead of 'ga:source'. This provides the actual
  referrer host + path, whereas 'ga:source' includes utm_mediums and
  other values when relevant.
- 'ga:fullReferror' does however include search engine names directly,
  so they are manually checked for as RefInspector won't pick up on
  these.

* Keep imported data indicator on dashboard and strikethrough when hidden

* Add unlink google button to import panel

* Rename some GA browsers and OSes to plausible versions

* Get main top pages and exit pages panels working correctly with imported data

* mix format

* Fetch time_on_pages for imported data when needed

* entry pages need to fetch bounces from GA

* "sample_percent" -> :sample_percent as only atoms can be used in subqueries

* Calculate bounce_rate for joined native and imported data for top pages modal

* Flip some query bindings around to be less misleading

* Fixup entry page modal visit durations

* mix format

* Fetch bounces and visit_duration for sources from GA

* add more source metrics used for data in modals

* Make sources modals display correct values

* imported_visitors: bounce_rate -> bounces, avg_visit_duration -> visit_duration

* Merge imported data into aggregate stats

* Reformat top graph side icons

* Ensure sample_percent is yielded from aggregate data

* filter event_props should be strings

* Hide imported data from frontend when using filter

* Fix existing tests

* fix tests

* Fix imported indicator appearing when filtering

* comma needed, lost when rebasing

* Import utm_terms and utm_content from GA

* Merge imported utm_term and utm_content

* Rename imported Countries data as Locations

* Set imported city schema field to int

* Remove utm_terms and utm_content when clearing imported

* Clean locations import from Google Analytics

- Country and region should be set to "" when GA provides "(not set)"
- City should be set to 0 for "unknown", as we cannot reliably import
  city data from GA.

* Display imported region and city in dashboard

* os -> operating_system in some parts of code

The inconsistency of using os in some places and operating_system in
others causes trouble with subqueries and joins for the native and
imported data, which would require additional logic to account for. The
simplest solution is the just use a consistent word for all uses. This
doesn't make any user-facing or database changes.

* to_atom -> to_existing_atom

* format

* "events" metric -> :events

* ignore imported data when "events" in metrics

* update "bounce_rate"

* atomise some more metrics from new city and region api

* atomise some more metrics for email handlers

* "conversion_rate" -> :conversion_rate during csv export

* Move imported data stats code to own module

* Move imported timeseries function to Stats.Imported

* Use Timex.parse to import dates from GA

* has_imported_stats -> imported_source

* "time_on_page" -> :time_on_page

* Convert imported GA data to UTC

* Clean up GA request code a bit

There was some weird logic here with two separate lists that really
ought to be together, so this merges those.

* Fail sooner if GA timezone can't be identified

* Link imported tables to site by id

* imported_utm_content -> imported_utm_contents

* Imported GA from all of time

* Reorganise GA data fetch logic

- Fetch data from the start of time (2005)
- Check whether no data was fetched, and if so, inform user and don't
  consider data to be imported.

* Clarify removal of "visits" data when it isn't in metrics

* Apply location filters from API

This makes it consistent with the sources etc which filter out 'Direct /
None' on the API side. These filters are used by both the native and
imported data handling code, which would otherwise both duplicate the
filters in their `where` clauses.

* Do not use changeset for setting site.imported_source

* Add all metrics to all dimensions

* Run GA import in the background

* Send email when GA import completes

* Add handler to insert imported data into tests and imported_browsers_factory

* Add remaining import data test factories

* Add imported location data to test

* Test main graph with imported data

* Add imported data to operating systems tests

* Add imported data to pages tests

* Add imported data to entry pages tests

* Add imported data to exit pages tests

* Add imported data to devices tests

* Add imported data to sources tests

* Add imported data to UTM tests

* Add new test module for the data import step

* Test import of sources GA data

* Test import of utm_mediums GA data

* Test import of utm_campaigns GA data

* Add tests for UTM terms

* Add tests for UTM contents

* Add test for importing pages and entry pages data from GA

* Add test for importing exit page data

* Fix module file name typo

* Add test for importing location data from GA

* Add test for importing devices data from GA

* Add test for importing browsers data from GA

* Add test for importing OS data from GA

* Paginate GA requests to download all data

* Bump clickhouse_ecto version

* Move RefInspector wrapper function into module

* Drop timezone transform on import

* Order imported by side_id then date

* More strings -> atoms

Also changes a conditional to be a bit nicer

* Remove parallelisation of data import

* Split sources and UTM sources from fetched GA data

GA has only a "source" dimension and no "UTM source" dimension. Instead
it returns these combined. The logic herein to tease these apart is:

1. "(direct)" -> it's a direct source
2. if the source is a domain -> it's a source
3. "google" -> it's from adwords; let's make this a UTM source "adwords"
4. else -> just a UTM source

* Keep prop names in queries as strings

* fix typo

* Fix import

* Insert data to clickhouse in batches

* Fix link when removing imported data

* Merge source tables

* Import hostname as well as pathname

* Record start and end time of imported data

* Track import progress

* Fix month interval with imported data

* Do not JOIN when imported date range has no overlap

* Fix time on page using exits

Co-authored-by: mcol <mcol@posteo.net>
2022-03-10 15:04:59 -06:00
..
.formatter.exs Initial commit 2019-09-02 12:29:19 +01:00
20181201181549_add_pageviews.exs Initial commit 2019-09-02 12:29:19 +01:00
20181214201821_add_new_visitor_to_pageviews.exs Remove historical migrations that stop test database from being created 2019-10-25 14:34:54 +08:00
20181215140923_add_session_id_to_pageviews.exs Initial commit 2019-09-02 12:29:19 +01:00
20190109173917_create_sites.exs Initial commit 2019-09-02 12:29:19 +01:00
20190117135714_add_uid_to_pageviews.exs Remove historical migrations that stop test database from being created 2019-10-25 14:34:54 +08:00
20190118154210_add_derived_data_to_pageviews.exs Initial commit 2019-09-02 12:29:19 +01:00
20190126135857_add_name_to_users.exs Initial commit 2019-09-02 12:29:19 +01:00
20190127213938_add_tz_to_sites.exs Formatting only changes - No code change (#75) 2020-06-08 10:35:13 +03:00
20190205165931_add_last_seen_to_users.exs Initial commit 2019-09-02 12:29:19 +01:00
20190213224404_add_intro_emails.exs Initial commit 2019-09-02 12:29:19 +01:00
20190219130809_delete_intro_emails_when_user_is_deleted.exs Formatting only changes - No code change (#75) 2020-06-08 10:35:13 +03:00
20190301122344_add_country_code_to_pageviews.exs Initial commit 2019-09-02 12:29:19 +01:00
20190324155606_add_password_hash_to_users.exs Initial commit 2019-09-02 12:29:19 +01:00
20190402145007_remove_device_type_from_pageviews.exs Initial commit 2019-09-02 12:29:19 +01:00
20190402145357_remove_screen_height_from_pageviews.exs Initial commit 2019-09-02 12:29:19 +01:00
20190402172423_add_index_to_pageviews.exs Initial commit 2019-09-02 12:29:19 +01:00
20190410095248_add_feedback_emails.exs Initial commit 2019-09-02 12:29:19 +01:00
20190424162903_delete_feedback_emails_when_user_is_deleted.exs Formatting only changes - No code change (#75) 2020-06-08 10:35:13 +03:00
20190430140411_use_citext_for_email.exs Initial commit 2019-09-02 12:29:19 +01:00
20190430152923_create_subscriptions.exs Initial commit 2019-09-02 12:29:19 +01:00
20190516113517_remove_session_id_from_pageviews.exs Initial commit 2019-09-02 12:29:19 +01:00
20190520144229_change_user_id_to_uuid.exs Remove historical migrations that stop test database from being created 2019-10-25 14:34:54 +08:00
20190523160838_add_raw_referrer.exs Initial commit 2019-09-02 12:29:19 +01:00
20190523171519_add_indices_to_referrers.exs Initial commit 2019-09-02 12:29:19 +01:00
20190618165016_add_public_sites.exs Support for docker based self-hosting (#64) 2020-05-26 16:09:34 +03:00
20190718160353_create_google_search_console_integration.exs Initial commit 2019-09-02 12:29:19 +01:00
20190723141824_associate_google_auth_with_site.exs Initial commit 2019-09-02 12:29:19 +01:00
20190730014913_add_monthly_stats.exs Initial commit 2019-09-02 12:29:19 +01:00
20190730142200_add_weekly_stats.exs Initial commit 2019-09-02 12:29:19 +01:00
20190730144413_add_daily_stats.exs Initial commit 2019-09-02 12:29:19 +01:00
20190809174105_calc_screen_size.exs Initial commit 2019-09-02 12:29:19 +01:00
20190810145419_remove_unused_indices.exs Initial commit 2019-09-02 12:29:19 +01:00
20190820140747_remove_rollup_tables.exs Initial commit 2019-09-02 12:29:19 +01:00
20190906111810_add_email_reporting.exs Add weekly email report template 2019-09-02 13:11:37 +01:00
20190907134114_add_unique_index_to_email_settings.exs Allow user to change email settings from settings page 2019-09-07 15:01:37 +01:00
20190910120900_add_email_address_to_settings.exs Remove unused fields from pageviews 2019-10-15 15:34:41 +08:00
20190911102027_add_monthly_reports.exs Formatting only changes - No code change (#75) 2020-06-08 10:35:13 +03:00
20191010031425_add_property_to_google_auth.exs [Continued] Google Analytics import (#1753) 2022-03-10 15:04:59 -06:00
20191015072730_remove_unused_fields.exs Remove unused fields from pageviews 2019-10-15 15:34:41 +08:00
20191015073507_proper_timestamp_for_pageviews.exs Proper timestamp 2019-10-15 15:37:55 +08:00
20191024062200_rename_pageviews_to_events.exs Rename pageviews to events 2019-10-24 14:58:17 +08:00
20191025055334_add_name_to_events.exs Remove stats from postgres (#74) 2020-06-05 16:14:17 +03:00
20191031051340_add_goals.exs Add goals and conversions 2019-10-31 13:39:51 +08:00
20191031063001_remove_goal_name.exs Remove goal name 2019-10-31 14:36:16 +08:00
20191118075359_allow_free_subscriptions.exs Create free subscriptions 2019-11-18 16:13:54 +08:00
20191216064647_add_unique_index_to_email_reports.exs Add unique index to monthly reports site id 2019-12-16 14:53:28 +08:00
20191218082207_add_sessions.exs Correct migration order 2019-12-18 16:07:50 +08:00
20191220042658_add_session_start.exs Capture session start 2019-12-20 12:30:29 +08:00
20200106090739_cascade_google_auth_deletion.exs Cascade google auth deletion 2020-01-06 11:08:36 +02:00
20200107095234_add_entry_page_to_sessions.exs Show bounce rate for referrers and pages 2020-01-07 14:53:04 +02:00
20200113143927_add_exit_page_to_session.exs Add exit page to sessions (#25) 2020-01-15 11:00:42 +02:00
20200114131538_add_tweets.exs Fetch and display tweets (#27) 2020-01-16 13:39:47 +02:00
20200120091134_change_session_referrer_to_text.exs Change session referrer to text 2020-01-20 11:12:13 +02:00
20200121091251_add_recipients.exs Add and remove recipients for email reports (#28) 2020-01-22 11:16:53 +02:00
20200122150130_add_shared_links.exs Shared links (#29) 2020-01-29 11:29:11 +02:00
20200130123049_add_site_id_to_events.exs Configurable site id (#30) 2020-02-04 15:44:13 +02:00
20200204093801_rename_site_id_to_domain.exs Configurable site id (#30) 2020-02-04 15:44:13 +02:00
20200204133522_drop_events_hostname_index.exs Configurable site id (#30) 2020-02-04 15:44:13 +02:00
20200210134612_add_fingerprint_to_events.exs Track fingerprint 2020-02-10 16:00:19 +02:00
20200211080841_add_raw_fingerprint.exs Store raw fingerprint for testing 2020-02-11 10:10:53 +02:00
20200211090126_remove_raw_fingerprint.exs Remove raw fingerprint 2020-02-11 11:02:44 +02:00
20200211133829_add_initial_source_and_referrer_to_events.exs Introduce initial referrer and initial referrer source (#32) 2020-02-12 11:11:02 +02:00
20200219124314_create_custom_domains.exs Custom domains (#34) 2020-02-26 10:54:21 +02:00
20200227092821_add_fingerprint_sesssions.exs Add fingerprint sessions (#36) 2020-02-27 11:46:48 +02:00
20200302105632_flexible_fingerprint_referrer.exs Make fingerprint referrer flexible 2020-03-02 12:57:51 +02:00
20200317093028_add_trial_expiry_to_users.exs Add field to track trial expiry date (#45) 2020-03-18 16:27:46 +02:00
20200317142459_backfill_fingerprints.exs Analytics without using cookies (#44) 2020-03-24 10:50:16 +02:00
20200320100803_add_setup_emails.exs Email flows (#46) 2020-03-23 11:34:25 +02:00
20200323083536_add_create_site_emails.exs Email flows (#46) 2020-03-23 11:34:25 +02:00
20200323084954_add_check_stats_emails.exs Email flows (#46) 2020-03-23 11:34:25 +02:00
20200324132431_make_cookie_fields_non_required.exs Formatting only changes - No code change (#75) 2020-06-08 10:35:13 +03:00
20200406115153_cascade_custom_domain_deletion.exs Cascade custom domain deletion 2020-04-06 14:53:31 +03:00
20200408122329_cascade_setup_emails_deletion.exs Cascade setup success emails deletion 2020-04-08 15:25:35 +03:00
20200529071028_add_oban_jobs_table.exs Schedule regular jobs with Oban (#69) 2020-06-02 13:37:38 +03:00
20200605134616_remove_events_and_sessions.exs Remove events and sessions table from postgres 2020-06-05 16:47:23 +03:00
20200605142737_remove_fingerprint_sessions_table.exs Remove fingerprint sessions table 2020-06-05 17:27:59 +03:00
20200619071221_create_salts_table.exs Rotate salts on a daily basis (#224) 2020-07-15 11:47:24 +03:00
20201130083829_add_email_verification_codes.exs Onboarding UX improvements (#441) 2020-12-15 11:30:45 +02:00
20201208173543_add_spike_notifications.exs Add basic spike notifications 2020-12-11 17:03:25 +02:00
20201210085345_add_email_verified_to_users.exs Add elixir action (#526) 2020-12-29 15:17:27 +02:00
20201214072008_add_theme_pref_to_users.exs Add elixir action (#526) 2020-12-29 15:17:27 +02:00
20201230085939_delete_email_records_when_user_is_deleted.exs Fix user deletion 2020-12-30 11:00:37 +02:00
20210115092331_cascade_site_deletion_to_spike_notification.exs Cascade deletion of site to spike notifications 2021-01-15 11:24:28 +02:00
20210119093337_add_unique_index_to_spike_notification.exs Do not allow duplicate spike notification to be created 2021-01-19 11:41:15 +02:00
20210128083453_cascade_site_deletion.exs Cascade site_membership deletion 2021-01-28 10:37:44 +02:00
20210128084657_create_api_keys.exs Stats API (#679) 2021-02-05 11:23:30 +02:00
20210209095257_add_last_payment_details.exs Track billing cycles (#697) 2021-02-12 10:17:53 +02:00
20210406073254_add_name_to_shared_links.exs Add name to shared links (#910) 2021-04-06 14:32:38 +03:00
20210409074413_add_unique_index_to_shared_link_name.exs Add unique index to shared link name 2021-04-14 11:45:45 +03:00
20210409082603_add_api_key_scopes.exs Add API key scopes 2021-04-14 11:45:45 +03:00
20210420075623_add_sent_renewal_notifications.exs Send renewal notification for annual subscriptions (#949) 2021-04-21 15:57:38 +03:00
20210426075157_upgrade_oban_jobs_to_v9.exs Upgrades Oban to v2.6.1 (#967) 2021-04-26 11:32:18 +03:00
20210513091653_add_currency_to_subscription.exs Localize billing screens 2021-05-13 12:42:01 +03:00
20210525085655_add_rate_limit_to_api_keys.exs Add rate limit to API requests 2021-05-25 11:58:49 +03:00
20210531080158_add_role_to_site_memberships.exs Invitations (#1122) 2021-06-16 15:00:07 +03:00
20210601090924_add_invitations.exs Invitations (#1122) 2021-06-16 15:00:07 +03:00
20210604085943_add_locked_to_sites.exs Invitations (#1122) 2021-06-16 15:00:07 +03:00
20210629124428_cascade_site_deletion_to_invitations.exs Cascade site deletion to invitations 2021-06-29 15:45:24 +03:00
20210726090211_make_invitation_email_case_insensitive.exs Make invitations email case insensitive 2021-07-26 12:08:35 +03:00
20210906102736_memoize_setup_complete.exs Memoize has_stats? (#1302) 2021-09-06 13:54:51 +03:00
20210908081119_allow_trial_expiry_to_be_null.exs Remove trial banner for admins & viewers (#1308) 2021-09-08 15:15:37 +03:00
20211020093238_add_enterprise_plans.exs Add enterprise plans 2021-10-20 16:49:11 +02:00
20211022084427_add_site_limit_to_enterprise_plans.exs Check site limit for enterprise customers 2021-10-22 11:26:07 +02:00
20211028122202_grace_period_end.exs Remove grace period if user upgrades 2021-11-16 10:14:24 +02:00
20211110174617_add_site_imported_source.exs [Continued] Google Analytics import (#1753) 2022-03-10 15:04:59 -06:00
20211202094732_remove_tweets.exs Remove Twitter stuff 2021-12-02 11:53:29 +02:00