mirror of
https://github.com/plausible/analytics.git
synced 2025-01-06 01:36:10 +03:00
Remove SAMPLE BY from an events query that joins sessions (#4341)
Some users are reporting wrong results for stats querying from events table but with session filters. This seems to be caused by sampling: Namely both tables end up sampled and the incorrect sample rate seems to get used. This is sadly impossible to test for in our test suite, but reveals itself in production with the following query: ```sql SELECT toUInt64 (round(uniq (se0.user_id) * any(_sample_factor))) AS visitors FROM events_v2 AS se0 SAMPLE 20000000 INNER JOIN ( SELECT sss0.session_id AS session_id, any(_sample_factor) AS _sample_factor FROM sessions_v2 AS sss0 SAMPLE 20000000 WHERE (sss0.site_id = _CAST (8195000, 'Int64')) AND (sss0.timestamp >= _CAST (1720497600, 'DateTime')) AND (sss0.start < _CAST (1720584000, 'DateTime')) AND (sss0.entry_page IN ('/')) AND (sss0.sign = 1) GROUP BY sss0.session_id ) AS ss1 ON se0.session_id = ss1.session_id WHERE (se0.site_id = _CAST (8195000, 'Int64')) AND (se0.timestamp >= _CAST (1720497600, 'DateTime')) AND (se0.timestamp < _CAST (1720584000, 'DateTime')) ``` By removing the inner query SAMPLE clause the result changes >50% Related old PR: https://github.com/plausible/analytics/pull/2962 This will likely require more work in the future so looking forward to that - after my vacation.
This commit is contained in:
parent
76da6b49a1
commit
3ab47e6401
@ -8,7 +8,7 @@ defmodule Plausible.Stats.SQL.QueryBuilder do
|
||||
import Plausible.Stats.Imported
|
||||
import Plausible.Stats.Util
|
||||
|
||||
alias Plausible.Stats.{Base, Filters, Query, QueryOptimizer, TableDecider, SQL}
|
||||
alias Plausible.Stats.{Filters, Query, QueryOptimizer, TableDecider, SQL}
|
||||
alias Plausible.Stats.SQL.Expression
|
||||
|
||||
require Plausible.Stats.SQL.Expression
|
||||
@ -50,9 +50,10 @@ defmodule Plausible.Stats.SQL.QueryBuilder do
|
||||
if TableDecider.events_join_sessions?(query) do
|
||||
sessions_q =
|
||||
from(
|
||||
s in Base.query_sessions(site, query),
|
||||
select: %{session_id: s.session_id},
|
||||
s in "sessions_v2",
|
||||
where: ^SQL.WhereBuilder.build(:sessions, site, query),
|
||||
where: s.sign == 1,
|
||||
select: %{session_id: s.session_id},
|
||||
group_by: s.session_id
|
||||
)
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user