Remove SAMPLE BY from an events query that joins sessions (#4341)

Some users are reporting wrong results for stats querying from events table
but with session filters.

This seems to be caused by sampling: Namely both tables end up sampled and the
incorrect sample rate seems to get used.

This is sadly impossible to test for in our test suite, but reveals itself in
production with the following query:

```sql
SELECT
    toUInt64 (round(uniq (se0.user_id) * any(_sample_factor))) AS visitors
FROM
    events_v2 AS se0 SAMPLE 20000000
    INNER JOIN (
        SELECT
            sss0.session_id AS session_id,
            any(_sample_factor) AS _sample_factor
        FROM
            sessions_v2 AS sss0 SAMPLE 20000000
        WHERE
            (sss0.site_id = _CAST (8195000, 'Int64'))
            AND (sss0.timestamp >= _CAST (1720497600, 'DateTime'))
            AND (sss0.start < _CAST (1720584000, 'DateTime'))
            AND (sss0.entry_page IN ('/'))
            AND (sss0.sign = 1)
        GROUP BY
            sss0.session_id
    ) AS ss1 ON se0.session_id = ss1.session_id
WHERE
    (se0.site_id = _CAST (8195000, 'Int64'))
    AND (se0.timestamp >= _CAST (1720497600, 'DateTime'))
    AND (se0.timestamp < _CAST (1720584000, 'DateTime'))
```

By removing the inner query SAMPLE clause the result changes >50%

Related old PR: https://github.com/plausible/analytics/pull/2962

This will likely require more work in the future so looking forward to that -
after my vacation.
This commit is contained in:
Karl-Aksel Puulmann 2024-07-11 15:08:20 +03:00 committed by GitHub
parent 76da6b49a1
commit 3ab47e6401
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -8,7 +8,7 @@ defmodule Plausible.Stats.SQL.QueryBuilder do
import Plausible.Stats.Imported
import Plausible.Stats.Util
alias Plausible.Stats.{Base, Filters, Query, QueryOptimizer, TableDecider, SQL}
alias Plausible.Stats.{Filters, Query, QueryOptimizer, TableDecider, SQL}
alias Plausible.Stats.SQL.Expression
require Plausible.Stats.SQL.Expression
@ -50,9 +50,10 @@ defmodule Plausible.Stats.SQL.QueryBuilder do
if TableDecider.events_join_sessions?(query) do
sessions_q =
from(
s in Base.query_sessions(site, query),
select: %{session_id: s.session_id},
s in "sessions_v2",
where: ^SQL.WhereBuilder.build(:sessions, site, query),
where: s.sign == 1,
select: %{session_id: s.session_id},
group_by: s.session_id
)