Commit Graph

2 Commits

Author SHA1 Message Date
Abby Sassel
9b7234b861 server/bigquery: add explicit order_by argument to fix non-deterministic test failures
https://github.com/hasura/graphql-engine-mono/pull/2355

GitOrigin-RevId: d2e088e1f1edd3c4bb2912b285fde8feeb3889ed
2021-09-15 15:44:45 +00:00
Chris Done
614c0dab80 Bigquery/fix limit offset for array aggregates
Blocked on https://github.com/hasura/graphql-engine-mono/pull/1640.

While fiddling with BigQuery I noticed a severe issue with offset/limit for array-aggregates. I've fixed it now.

The basic problem was that I was using a query like this:

```graphql
query MyQuery {
  hasura_Artist(order_by: {artist_self_id: asc}) {
    artist_self_id
    albums_aggregate(order_by: {album_self_id: asc}, limit: 2) {
      nodes {
        album_self_id
      }
      aggregate {
        count
      }
    }
  }
}
```

Producing this SQL:

```sql
SELECT `t_Artist1`.`artist_self_id` AS `artist_self_id`,
       STRUCT(IFNULL(`aa_albums1`.`nodes`, NULL) AS `nodes`, IFNULL(`aa_albums1`.`aggregate`, STRUCT(0 AS `count`)) AS `aggregate`) AS `albums_aggregate`
FROM `hasura`.`Artist` AS `t_Artist1`
LEFT OUTER JOIN (SELECT ARRAY_AGG(STRUCT(`t_Album1`.`album_self_id` AS `album_self_id`) ORDER BY (`t_Album1`.`album_self_id`) ASC) AS `nodes`,
                        STRUCT(COUNT(*) AS `count`) AS `aggregate`,
                        `t_Album1`.`artist_other_id` AS `artist_other_id`
                 FROM (SELECT *
                       FROM `hasura`.`Album` AS `t_Album1`
                       ORDER BY (`t_Album1`.`album_self_id`) ASC NULLS FIRST
                       -- PROBLEM HERE
                       LIMIT @param0) AS `t_Album1`
                 GROUP BY `t_Album1`.`artist_other_id`)
AS `aa_albums1`
ON (`aa_albums1`.`artist_other_id` = `t_Artist1`.`artist_self_id`)
ORDER BY (`t_Artist1`.`artist_self_id`) ASC NULLS FIRST
```

Note the `LIMIT @param0` -- that is incorrect because we want to limit
per artist. Instead, we want:

```sql
SELECT `t_Artist1`.`artist_self_id` AS `artist_self_id`,
       STRUCT(IFNULL(`aa_albums1`.`nodes`, NULL) AS `nodes`, IFNULL(`aa_albums1`.`aggregate`, STRUCT(0 AS `count`)) AS `aggregate`) AS `albums_aggregate`
FROM `hasura`.`Artist` AS `t_Artist1`
LEFT OUTER JOIN (SELECT ARRAY_AGG(STRUCT(`t_Album1`.`album_self_id` AS `album_self_id`) ORDER BY (`t_Album1`.`album_self_id`) ASC) AS `nodes`,
                        STRUCT(COUNT(*) AS `count`) AS `aggregate`,
                        `t_Album1`.`artist_other_id` AS `artist_other_id`
                 FROM (SELECT *,
                            -- ADDED
                            ROW_NUMBER() OVER(PARTITION BY artist_other_id) artist_album_index
                       FROM `hasura`.`Album` AS `t_Album1`
                       ORDER BY (`t_Album1`.`album_self_id`) ASC NULLS FIRST
                       ) AS `t_Album1`
                 -- CHANGED
                 WHERE artist_album_index <= @param
                 GROUP BY `t_Album1`.`artist_other_id`)
AS `aa_albums1`
ON (`aa_albums1`.`artist_other_id` = `t_Artist1`.`artist_self_id`)
ORDER BY (`t_Artist1`.`artist_self_id`) ASC NULLS FIRST
```

That serves both the LIMIT/OFFSET function in the where clause. Then,
both the ARRAY_AGG and the COUNT are correct per artist.

I've updated my Haskell test suite to add regression tests for this. I'll push a commit for Python tests shortly. The tests still pass there.

This just fixes a case that we hadn't noticed.

https://github.com/hasura/graphql-engine-mono/pull/1641

GitOrigin-RevId: 49933fa5e09a9306c89565743ecccf2cb54eaa80
2021-07-06 08:29:39 +00:00