graphql-engine/server/src-lib/Hasura/Backends/BigQuery/Instances/Execute.hs

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

254 lines
9.2 KiB
Haskell
Raw Normal View History

{-# OPTIONS_GHC -fno-warn-orphans #-}
module Hasura.Backends.BigQuery.Instances.Execute () where
import Data.Aeson qualified as Aeson
import Data.Aeson.Text qualified as Aeson
import Data.Environment qualified as Env
import Data.HashMap.Strict qualified as Map
import Data.HashMap.Strict.InsOrd qualified as OMap
import Data.Text qualified as T
import Data.Text.Lazy qualified as LT
import Data.Text.Lazy.Builder qualified as LT
import Data.Vector qualified as V
import Hasura.Backends.BigQuery.Execute qualified as DataLoader
import Hasura.Backends.BigQuery.FromIr qualified as BigQuery
import Hasura.Backends.BigQuery.Plan
import Hasura.Backends.BigQuery.ToQuery qualified as ToQuery
import Hasura.Backends.BigQuery.Types qualified as BigQuery
import Hasura.Base.Error
import Hasura.Base.Error qualified as E
import Hasura.EncJSON
import Hasura.GraphQL.Execute.Backend
import Hasura.GraphQL.Namespace (RootFieldAlias)
import Hasura.GraphQL.Schema.Options qualified as Options
import Hasura.Prelude
import Hasura.QueryTags
( emptyQueryTagsComment,
)
server: support remote relationships on SQL Server and BigQuery (#1497) Remote relationships are now supported on SQL Server and BigQuery. The major change though is the re-architecture of remote join execution logic. Prior to this PR, each backend is responsible for processing the remote relationships that are part of their AST. This is not ideal as there is nothing specific about a remote join's execution that ties it to a backend. The only backend specific part is whether or not the specification of the remote relationship is valid (i.e, we'll need to validate whether the scalars are compatible). The approach now changes to this: 1. Before delegating the AST to the backend, we traverse the AST, collect all the remote joins while modifying the AST to add necessary join fields where needed. 1. Once the remote joins are collected from the AST, the database call is made to fetch the response. The necessary data for the remote join(s) is collected from the database's response and one or more remote schema calls are constructed as necessary. 1. The remote schema calls are then executed and the data from the database and from the remote schemas is joined to produce the final response. ### Known issues 1. Ideally the traversal of the IR to collect remote joins should return an AST which does not include remote join fields. This operation can be type safe but isn't taken up as part of the PR. 1. There is a lot of code duplication between `Transport/HTTP.hs` and `Transport/Websocket.hs` which needs to be fixed ASAP. This too hasn't been taken up by this PR. 1. The type which represents the execution plan is only modified to handle our current remote joins and as such it will have to be changed to accommodate general remote joins. 1. Use of lenses would have reduced the boilerplate code to collect remote joins from the base AST. 1. The current remote join logic assumes that the join columns of a remote relationship appear with their names in the database response. This however is incorrect as they could be aliased. This can be taken up by anyone, I've left a comment in the code. ### Notes to the reviewers I think it is best reviewed commit by commit. 1. The first one is very straight forward. 1. The second one refactors the remote join execution logic but other than moving things around, it doesn't change the user facing functionality. This moves Postgres specific parts to `Backends/Postgres` module from `Execute`. Some IR related code to `Hasura.RQL.IR` module. Simplifies various type class function signatures as a backend doesn't have to handle remote joins anymore 1. The third one fixes partial case matches that for some weird reason weren't shown as warnings before this refactor 1. The fourth one generalizes the validation logic of remote relationships and implements `scalarTypeGraphQLName` function on SQL Server and BigQuery which is used by the validation logic. This enables remote relationships on BigQuery and SQL Server. https://github.com/hasura/graphql-engine-mono/pull/1497 GitOrigin-RevId: 77dd8eed326602b16e9a8496f52f46d22b795598
2021-06-11 06:26:50 +03:00
import Hasura.RQL.IR
import Hasura.RQL.IR.Select qualified as IR
import Hasura.RQL.Types.Backend
import Hasura.RQL.Types.Column
import Hasura.RQL.Types.Common
import Hasura.RQL.Types.Function
import Hasura.SQL.AnyBackend qualified as AB
import Hasura.SQL.Backend
import Hasura.Session
import Hasura.Tracing qualified as Tracing
import Language.GraphQL.Draft.Syntax qualified as G
import Network.HTTP.Types qualified as HTTP
instance BackendExecute 'BigQuery where
type PreparedQuery 'BigQuery = Text
type MultiplexedQuery 'BigQuery = Void
type ExecutionMonad 'BigQuery = Tracing.TraceT (ExceptT QErr IO)
mkDBQueryPlan = bqDBQueryPlan
mkDBMutationPlan = bqDBMutationPlan
mkLiveQuerySubscriptionPlan _ _ _ _ _ _ _ =
throw500 "Cannot currently perform subscriptions on BigQuery sources."
mkDBStreamingSubscriptionPlan _ _ _ _ _ _ =
throw500 "Cannot currently perform subscriptions on BigQuery sources."
mkDBQueryExplain = bqDBQueryExplain
mkSubscriptionExplain _ =
throw500 "Cannot currently retrieve query execution plans on BigQuery sources."
-- NOTE: Currently unimplemented!.
--
-- This function is just a stub for future implementation; for now it just
-- throws a 500 error.
mkDBRemoteRelationshipPlan =
bqDBRemoteRelationshipPlan
-- query
bqDBQueryPlan ::
forall m.
( MonadError E.QErr m
) =>
server: support remote relationships on SQL Server and BigQuery (#1497) Remote relationships are now supported on SQL Server and BigQuery. The major change though is the re-architecture of remote join execution logic. Prior to this PR, each backend is responsible for processing the remote relationships that are part of their AST. This is not ideal as there is nothing specific about a remote join's execution that ties it to a backend. The only backend specific part is whether or not the specification of the remote relationship is valid (i.e, we'll need to validate whether the scalars are compatible). The approach now changes to this: 1. Before delegating the AST to the backend, we traverse the AST, collect all the remote joins while modifying the AST to add necessary join fields where needed. 1. Once the remote joins are collected from the AST, the database call is made to fetch the response. The necessary data for the remote join(s) is collected from the database's response and one or more remote schema calls are constructed as necessary. 1. The remote schema calls are then executed and the data from the database and from the remote schemas is joined to produce the final response. ### Known issues 1. Ideally the traversal of the IR to collect remote joins should return an AST which does not include remote join fields. This operation can be type safe but isn't taken up as part of the PR. 1. There is a lot of code duplication between `Transport/HTTP.hs` and `Transport/Websocket.hs` which needs to be fixed ASAP. This too hasn't been taken up by this PR. 1. The type which represents the execution plan is only modified to handle our current remote joins and as such it will have to be changed to accommodate general remote joins. 1. Use of lenses would have reduced the boilerplate code to collect remote joins from the base AST. 1. The current remote join logic assumes that the join columns of a remote relationship appear with their names in the database response. This however is incorrect as they could be aliased. This can be taken up by anyone, I've left a comment in the code. ### Notes to the reviewers I think it is best reviewed commit by commit. 1. The first one is very straight forward. 1. The second one refactors the remote join execution logic but other than moving things around, it doesn't change the user facing functionality. This moves Postgres specific parts to `Backends/Postgres` module from `Execute`. Some IR related code to `Hasura.RQL.IR` module. Simplifies various type class function signatures as a backend doesn't have to handle remote joins anymore 1. The third one fixes partial case matches that for some weird reason weren't shown as warnings before this refactor 1. The fourth one generalizes the validation logic of remote relationships and implements `scalarTypeGraphQLName` function on SQL Server and BigQuery which is used by the validation logic. This enables remote relationships on BigQuery and SQL Server. https://github.com/hasura/graphql-engine-mono/pull/1497 GitOrigin-RevId: 77dd8eed326602b16e9a8496f52f46d22b795598
2021-06-11 06:26:50 +03:00
UserInfo ->
Env.Environment ->
SourceName ->
SourceConfig 'BigQuery ->
QueryDB 'BigQuery Void (UnpreparedValue 'BigQuery) ->
[HTTP.Header] ->
Maybe G.Name ->
server: support remote relationships on SQL Server and BigQuery (#1497) Remote relationships are now supported on SQL Server and BigQuery. The major change though is the re-architecture of remote join execution logic. Prior to this PR, each backend is responsible for processing the remote relationships that are part of their AST. This is not ideal as there is nothing specific about a remote join's execution that ties it to a backend. The only backend specific part is whether or not the specification of the remote relationship is valid (i.e, we'll need to validate whether the scalars are compatible). The approach now changes to this: 1. Before delegating the AST to the backend, we traverse the AST, collect all the remote joins while modifying the AST to add necessary join fields where needed. 1. Once the remote joins are collected from the AST, the database call is made to fetch the response. The necessary data for the remote join(s) is collected from the database's response and one or more remote schema calls are constructed as necessary. 1. The remote schema calls are then executed and the data from the database and from the remote schemas is joined to produce the final response. ### Known issues 1. Ideally the traversal of the IR to collect remote joins should return an AST which does not include remote join fields. This operation can be type safe but isn't taken up as part of the PR. 1. There is a lot of code duplication between `Transport/HTTP.hs` and `Transport/Websocket.hs` which needs to be fixed ASAP. This too hasn't been taken up by this PR. 1. The type which represents the execution plan is only modified to handle our current remote joins and as such it will have to be changed to accommodate general remote joins. 1. Use of lenses would have reduced the boilerplate code to collect remote joins from the base AST. 1. The current remote join logic assumes that the join columns of a remote relationship appear with their names in the database response. This however is incorrect as they could be aliased. This can be taken up by anyone, I've left a comment in the code. ### Notes to the reviewers I think it is best reviewed commit by commit. 1. The first one is very straight forward. 1. The second one refactors the remote join execution logic but other than moving things around, it doesn't change the user facing functionality. This moves Postgres specific parts to `Backends/Postgres` module from `Execute`. Some IR related code to `Hasura.RQL.IR` module. Simplifies various type class function signatures as a backend doesn't have to handle remote joins anymore 1. The third one fixes partial case matches that for some weird reason weren't shown as warnings before this refactor 1. The fourth one generalizes the validation logic of remote relationships and implements `scalarTypeGraphQLName` function on SQL Server and BigQuery which is used by the validation logic. This enables remote relationships on BigQuery and SQL Server. https://github.com/hasura/graphql-engine-mono/pull/1497 GitOrigin-RevId: 77dd8eed326602b16e9a8496f52f46d22b795598
2021-06-11 06:26:50 +03:00
m (DBStepInfo 'BigQuery)
bqDBQueryPlan userInfo _env sourceName sourceConfig qrf _ _ = do
-- TODO (naveen): Append query tags to the query
select <- planNoPlan (BigQuery.bigQuerySourceConfigToFromIrConfig sourceConfig) userInfo qrf
let action = do
result <-
DataLoader.runExecute
sourceConfig
(DataLoader.executeSelect select)
case result of
Left err -> throw500WithDetail (DataLoader.executeProblemMessage DataLoader.HideDetails err) $ Aeson.toJSON err
Right recordSet -> pure $! recordSetToEncJSON (BigQuery.selectCardinality select) recordSet
pure $ DBStepInfo @'BigQuery sourceName sourceConfig (Just (selectSQLTextForExplain select)) action ()
-- | Convert the dataloader's 'RecordSet' type to JSON.
recordSetToEncJSON :: BigQuery.Cardinality -> DataLoader.RecordSet -> EncJSON
recordSetToEncJSON cardinality DataLoader.RecordSet {rows} =
case cardinality of
BigQuery.One
| Just row <- rows V.!? 0 -> encJFromRecord row
| otherwise -> encJFromList (toList (fmap encJFromRecord rows))
BigQuery.Many -> encJFromList (toList (fmap encJFromRecord rows))
where
encJFromRecord =
encJFromInsOrdHashMap . fmap encJFromOutputValue . OMap.mapKeys coerce
encJFromOutputValue outputValue =
case outputValue of
DataLoader.NullOutputValue -> encJFromJValue Aeson.Null
DataLoader.DecimalOutputValue i -> encJFromJValue i
DataLoader.BigDecimalOutputValue i -> encJFromJValue i
DataLoader.FloatOutputValue i -> encJFromJValue i
DataLoader.TextOutputValue i -> encJFromJValue i
DataLoader.BytesOutputValue i -> encJFromJValue i
DataLoader.DateOutputValue i -> encJFromJValue i
DataLoader.TimestampOutputValue i -> encJFromJValue i
DataLoader.TimeOutputValue i -> encJFromJValue i
DataLoader.DatetimeOutputValue i -> encJFromJValue i
DataLoader.GeographyOutputValue i -> encJFromJValue i
DataLoader.BoolOutputValue i -> encJFromJValue i
DataLoader.IntegerOutputValue i -> encJFromJValue i
DataLoader.ArrayOutputValue vector ->
encJFromList (toList (fmap encJFromOutputValue vector))
-- Really, the case below shouldn't be happening. But if it
-- does, it's not a problem either. The output will just have
-- a record in it.
DataLoader.RecordOutputValue record -> encJFromRecord record
-- mutation
bqDBMutationPlan ::
forall m.
( MonadError E.QErr m
) =>
server: support remote relationships on SQL Server and BigQuery (#1497) Remote relationships are now supported on SQL Server and BigQuery. The major change though is the re-architecture of remote join execution logic. Prior to this PR, each backend is responsible for processing the remote relationships that are part of their AST. This is not ideal as there is nothing specific about a remote join's execution that ties it to a backend. The only backend specific part is whether or not the specification of the remote relationship is valid (i.e, we'll need to validate whether the scalars are compatible). The approach now changes to this: 1. Before delegating the AST to the backend, we traverse the AST, collect all the remote joins while modifying the AST to add necessary join fields where needed. 1. Once the remote joins are collected from the AST, the database call is made to fetch the response. The necessary data for the remote join(s) is collected from the database's response and one or more remote schema calls are constructed as necessary. 1. The remote schema calls are then executed and the data from the database and from the remote schemas is joined to produce the final response. ### Known issues 1. Ideally the traversal of the IR to collect remote joins should return an AST which does not include remote join fields. This operation can be type safe but isn't taken up as part of the PR. 1. There is a lot of code duplication between `Transport/HTTP.hs` and `Transport/Websocket.hs` which needs to be fixed ASAP. This too hasn't been taken up by this PR. 1. The type which represents the execution plan is only modified to handle our current remote joins and as such it will have to be changed to accommodate general remote joins. 1. Use of lenses would have reduced the boilerplate code to collect remote joins from the base AST. 1. The current remote join logic assumes that the join columns of a remote relationship appear with their names in the database response. This however is incorrect as they could be aliased. This can be taken up by anyone, I've left a comment in the code. ### Notes to the reviewers I think it is best reviewed commit by commit. 1. The first one is very straight forward. 1. The second one refactors the remote join execution logic but other than moving things around, it doesn't change the user facing functionality. This moves Postgres specific parts to `Backends/Postgres` module from `Execute`. Some IR related code to `Hasura.RQL.IR` module. Simplifies various type class function signatures as a backend doesn't have to handle remote joins anymore 1. The third one fixes partial case matches that for some weird reason weren't shown as warnings before this refactor 1. The fourth one generalizes the validation logic of remote relationships and implements `scalarTypeGraphQLName` function on SQL Server and BigQuery which is used by the validation logic. This enables remote relationships on BigQuery and SQL Server. https://github.com/hasura/graphql-engine-mono/pull/1497 GitOrigin-RevId: 77dd8eed326602b16e9a8496f52f46d22b795598
2021-06-11 06:26:50 +03:00
UserInfo ->
Env.Environment ->
Options.StringifyNumbers ->
SourceName ->
SourceConfig 'BigQuery ->
MutationDB 'BigQuery Void (UnpreparedValue 'BigQuery) ->
[HTTP.Header] ->
Maybe G.Name ->
server: support remote relationships on SQL Server and BigQuery (#1497) Remote relationships are now supported on SQL Server and BigQuery. The major change though is the re-architecture of remote join execution logic. Prior to this PR, each backend is responsible for processing the remote relationships that are part of their AST. This is not ideal as there is nothing specific about a remote join's execution that ties it to a backend. The only backend specific part is whether or not the specification of the remote relationship is valid (i.e, we'll need to validate whether the scalars are compatible). The approach now changes to this: 1. Before delegating the AST to the backend, we traverse the AST, collect all the remote joins while modifying the AST to add necessary join fields where needed. 1. Once the remote joins are collected from the AST, the database call is made to fetch the response. The necessary data for the remote join(s) is collected from the database's response and one or more remote schema calls are constructed as necessary. 1. The remote schema calls are then executed and the data from the database and from the remote schemas is joined to produce the final response. ### Known issues 1. Ideally the traversal of the IR to collect remote joins should return an AST which does not include remote join fields. This operation can be type safe but isn't taken up as part of the PR. 1. There is a lot of code duplication between `Transport/HTTP.hs` and `Transport/Websocket.hs` which needs to be fixed ASAP. This too hasn't been taken up by this PR. 1. The type which represents the execution plan is only modified to handle our current remote joins and as such it will have to be changed to accommodate general remote joins. 1. Use of lenses would have reduced the boilerplate code to collect remote joins from the base AST. 1. The current remote join logic assumes that the join columns of a remote relationship appear with their names in the database response. This however is incorrect as they could be aliased. This can be taken up by anyone, I've left a comment in the code. ### Notes to the reviewers I think it is best reviewed commit by commit. 1. The first one is very straight forward. 1. The second one refactors the remote join execution logic but other than moving things around, it doesn't change the user facing functionality. This moves Postgres specific parts to `Backends/Postgres` module from `Execute`. Some IR related code to `Hasura.RQL.IR` module. Simplifies various type class function signatures as a backend doesn't have to handle remote joins anymore 1. The third one fixes partial case matches that for some weird reason weren't shown as warnings before this refactor 1. The fourth one generalizes the validation logic of remote relationships and implements `scalarTypeGraphQLName` function on SQL Server and BigQuery which is used by the validation logic. This enables remote relationships on BigQuery and SQL Server. https://github.com/hasura/graphql-engine-mono/pull/1497 GitOrigin-RevId: 77dd8eed326602b16e9a8496f52f46d22b795598
2021-06-11 06:26:50 +03:00
m (DBStepInfo 'BigQuery)
bqDBMutationPlan _userInfo _environment _stringifyNum _sourceName _sourceConfig _mrf _headers _gName =
throw500 "mutations are not supported in BigQuery; this should be unreachable"
-- explain
bqDBQueryExplain ::
MonadError E.QErr m =>
RootFieldAlias ->
UserInfo ->
SourceName ->
SourceConfig 'BigQuery ->
QueryDB 'BigQuery Void (UnpreparedValue 'BigQuery) ->
[HTTP.Header] ->
Maybe G.Name ->
m (AB.AnyBackend DBStepInfo)
bqDBQueryExplain fieldName userInfo sourceName sourceConfig qrf _ _ = do
select <- planNoPlan (BigQuery.bigQuerySourceConfigToFromIrConfig sourceConfig) userInfo qrf
let textSQL = selectSQLTextForExplain select
pure $
AB.mkAnyBackend $
DBStepInfo @'BigQuery
sourceName
sourceConfig
Nothing
( pure $
encJFromJValue $
ExplainPlan
fieldName
(Just $ textSQL)
(Just $ T.lines $ textSQL)
)
()
-- | Get the SQL text for a select, with parameters left as $1, $2, .. holes.
selectSQLTextForExplain :: BigQuery.Select -> Text
selectSQLTextForExplain =
LT.toStrict
. LT.toLazyText
. fst
. ToQuery.renderBuilderPretty
. ToQuery.fromSelect
--------------------------------------------------------------------------------
-- Remote Relationships (e.g. DB-to-DB Joins, remote schema joins, etc.)
--------------------------------------------------------------------------------
-- | Construct an action (i.e. 'DBStepInfo') which can marshal some remote
-- relationship information into a form that BigQuery can query against.
--
-- XXX: Currently unimplemented; the Postgres implementation uses
-- @jsonb_to_recordset@ to query the remote relationship, however this
-- functionality doesn't exist in BigQuery.
--
-- NOTE: The following typeclass constraints will be necessary when implementing
-- this function for real:
--
-- @
-- MonadQueryTags m
-- Backend 'BigQuery
-- @
bqDBRemoteRelationshipPlan ::
forall m.
( MonadError QErr m
) =>
UserInfo ->
SourceName ->
SourceConfig 'BigQuery ->
-- | List of json objects, each of which becomes a row of the table.
NonEmpty Aeson.Object ->
-- | The above objects have this schema
--
-- XXX: What is this for/what does this mean?
HashMap FieldName (Column 'BigQuery, ScalarType 'BigQuery) ->
-- | This is a field name from the lhs that *has* to be selected in the
-- response along with the relationship.
FieldName ->
(FieldName, SourceRelationshipSelection 'BigQuery Void UnpreparedValue) ->
[HTTP.Header] ->
Maybe G.Name ->
Options.StringifyNumbers ->
m (DBStepInfo 'BigQuery)
bqDBRemoteRelationshipPlan userInfo sourceName sourceConfig lhs lhsSchema argumentId relationship reqHeaders operationName stringifyNumbers = do
flip runReaderT emptyQueryTagsComment $ bqDBQueryPlan userInfo Env.emptyEnvironment sourceName sourceConfig rootSelection reqHeaders operationName
where
coerceToColumn = BigQuery.ColumnName . getFieldNameTxt
joinColumnMapping = mapKeys coerceToColumn lhsSchema
rowsArgument :: UnpreparedValue 'BigQuery
rowsArgument =
UVParameter Nothing $
ColumnValue (ColumnScalar BigQuery.StringScalarType) $
BigQuery.StringValue . LT.toStrict $
Aeson.encodeToLazyText lhs
recordSetDefinitionList =
(coerceToColumn argumentId, BigQuery.IntegerScalarType) : Map.toList (fmap snd joinColumnMapping)
jsonToRecordSet :: IR.SelectFromG ('BigQuery) (UnpreparedValue 'BigQuery)
jsonToRecordSet =
IR.FromFunction
(BigQuery.FunctionName "unnest" Nothing)
( FunctionArgsExp
[BigQuery.AEInput rowsArgument]
mempty
)
(Just recordSetDefinitionList)
rootSelection =
convertRemoteSourceRelationship
(fst <$> joinColumnMapping)
jsonToRecordSet
(BigQuery.ColumnName $ getFieldNameTxt argumentId)
(ColumnScalar BigQuery.IntegerScalarType)
relationship
stringifyNumbers