graphql-engine/server/src-lib/Hasura/GraphQL/Execute/RemoteJoin/Collect.hs

317 lines
13 KiB
Haskell
Raw Normal View History

server: support remote relationships on SQL Server and BigQuery (#1497) Remote relationships are now supported on SQL Server and BigQuery. The major change though is the re-architecture of remote join execution logic. Prior to this PR, each backend is responsible for processing the remote relationships that are part of their AST. This is not ideal as there is nothing specific about a remote join's execution that ties it to a backend. The only backend specific part is whether or not the specification of the remote relationship is valid (i.e, we'll need to validate whether the scalars are compatible). The approach now changes to this: 1. Before delegating the AST to the backend, we traverse the AST, collect all the remote joins while modifying the AST to add necessary join fields where needed. 1. Once the remote joins are collected from the AST, the database call is made to fetch the response. The necessary data for the remote join(s) is collected from the database's response and one or more remote schema calls are constructed as necessary. 1. The remote schema calls are then executed and the data from the database and from the remote schemas is joined to produce the final response. ### Known issues 1. Ideally the traversal of the IR to collect remote joins should return an AST which does not include remote join fields. This operation can be type safe but isn't taken up as part of the PR. 1. There is a lot of code duplication between `Transport/HTTP.hs` and `Transport/Websocket.hs` which needs to be fixed ASAP. This too hasn't been taken up by this PR. 1. The type which represents the execution plan is only modified to handle our current remote joins and as such it will have to be changed to accommodate general remote joins. 1. Use of lenses would have reduced the boilerplate code to collect remote joins from the base AST. 1. The current remote join logic assumes that the join columns of a remote relationship appear with their names in the database response. This however is incorrect as they could be aliased. This can be taken up by anyone, I've left a comment in the code. ### Notes to the reviewers I think it is best reviewed commit by commit. 1. The first one is very straight forward. 1. The second one refactors the remote join execution logic but other than moving things around, it doesn't change the user facing functionality. This moves Postgres specific parts to `Backends/Postgres` module from `Execute`. Some IR related code to `Hasura.RQL.IR` module. Simplifies various type class function signatures as a backend doesn't have to handle remote joins anymore 1. The third one fixes partial case matches that for some weird reason weren't shown as warnings before this refactor 1. The fourth one generalizes the validation logic of remote relationships and implements `scalarTypeGraphQLName` function on SQL Server and BigQuery which is used by the validation logic. This enables remote relationships on BigQuery and SQL Server. https://github.com/hasura/graphql-engine-mono/pull/1497 GitOrigin-RevId: 77dd8eed326602b16e9a8496f52f46d22b795598
2021-06-11 06:26:50 +03:00
module Hasura.GraphQL.Execute.RemoteJoin.Collect
( RemoteJoins
, RemoteJoinMap
, FieldPath(..)
, appendPath
, getRemoteJoins
, getRemoteJoinsSelect
, getRemoteJoinsMutationDB
, getRemoteJoinsActionQuery
, getRemoteJoinsActionMutation
) where
import Hasura.Prelude
import qualified Data.HashMap.Strict as Map
import qualified Data.HashSet as HS
import qualified Data.List.NonEmpty as NE
import Control.Lens
import Hasura.GraphQL.Execute.RemoteJoin.Types
import Hasura.RQL.IR
import Hasura.RQL.IR.Returning
import Hasura.RQL.Types
{- Note: [Remote Joins Architecture]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Unparsed Incoming GraphQL +------------------------------+
--------------------------> | Parsing of the GraphQL query |-----+
+------------------------------+ |
| DB Query and remote joins (if any)
|
V
+----------------------------------+ SQL query response +----------------------------+
| Traverse the DB response to | <------------------- | Execution of the DB query |
| get the values of the arguments | +----------------------------+
| of the remote field |
+----------------------------------+
|
| Remote field arguments
V
+--------------------------+ Remote schema response +----------------------------------------+
| Query the remote schema | ------------------------> | Replace the remote join fields in |
| with the remote field | | the SQL query response (JSON) with |
| arguments to the remote | | the response obtained from the remote |
| field configured in the | | schema at appropriate places. |
| remote join. | +----------------------------------------+
+--------------------------+
-}
-- | Collects remote joins from the AST and also adds the necessary join fields
getRemoteJoins
:: Backend b
=> QueryDB b u
-> (QueryDB b u, Maybe RemoteJoins)
getRemoteJoins = \case
QDBMultipleRows s -> first QDBMultipleRows $ getRemoteJoinsSelect s
QDBSingleRow s -> first QDBSingleRow $ getRemoteJoinsSelect s
QDBAggregation s -> first QDBAggregation $ getRemoteJoinsAggregateSelect s
QDBConnection s -> first QDBConnection $ getRemoteJoinsConnectionSelect s
-- | Traverse through 'AnnSimpleSel' and collect remote join fields (if any).
getRemoteJoinsSelect
:: Backend b
=> AnnSimpleSelG b u
-> (AnnSimpleSelG b u, Maybe RemoteJoins)
getRemoteJoinsSelect =
second mapToNonEmpty . flip runState mempty . transformSelect mempty
-- | Traverse through @'AnnAggregateSelect' and collect remote join fields (if any).
getRemoteJoinsAggregateSelect
:: Backend b
=> AnnAggregateSelectG b u
-> (AnnAggregateSelectG b u, Maybe RemoteJoins)
getRemoteJoinsAggregateSelect =
second mapToNonEmpty . flip runState mempty . transformAggregateSelect mempty
-- | Traverse through @'ConnectionSelect' and collect remote join fields (if any).
getRemoteJoinsConnectionSelect
:: Backend b
=> ConnectionSelect b u
-> (ConnectionSelect b u, Maybe RemoteJoins)
getRemoteJoinsConnectionSelect =
second mapToNonEmpty . flip runState mempty . transformConnectionSelect mempty
-- | Traverse through 'MutationOutput' and collect remote join fields (if any)
getRemoteJoinsMutationOutput
:: Backend b
=> MutationOutputG b u
-> (MutationOutputG b u, Maybe RemoteJoins)
getRemoteJoinsMutationOutput =
second mapToNonEmpty . flip runState mempty . transformMutationOutput mempty
where
transformMutationOutput path = \case
MOutMultirowFields mutationFields ->
MOutMultirowFields <$> transfromMutationFields mutationFields
MOutSinglerowObject annFields ->
MOutSinglerowObject <$> transformAnnFields path annFields
where
transfromMutationFields fields =
forM fields $ \(fieldName, field') -> do
let fieldPath = appendPath fieldName path
(fieldName,) <$> case field' of
MCount -> pure MCount
MExp t -> pure $ MExp t
MRet annFields -> MRet <$> transformAnnFields fieldPath annFields
-- local helpers
transformSelect
:: Backend b
=> FieldPath
-> AnnSimpleSelG b u
-> State RemoteJoinMap (AnnSimpleSelG b u)
transformSelect path sel = do
let fields = _asnFields sel
-- Transform selects in array, object and computed fields
transformedFields <- transformAnnFields path fields
pure sel{_asnFields = transformedFields}
transformAggregateSelect
:: Backend b
=> FieldPath
-> AnnAggregateSelectG b u
-> State RemoteJoinMap (AnnAggregateSelectG b u)
transformAggregateSelect path sel = do
let aggFields = _asnFields sel
transformedFields <- forM aggFields $ \(fieldName, aggField) ->
(fieldName,) <$> case aggField of
TAFAgg agg -> pure $ TAFAgg agg
TAFNodes x annFields -> TAFNodes x <$> transformAnnFields (appendPath fieldName path) annFields
TAFExp t -> pure $ TAFExp t
pure sel{_asnFields = transformedFields}
transformConnectionSelect
:: Backend b
=> FieldPath
-> ConnectionSelect b u
-> State RemoteJoinMap (ConnectionSelect b u)
transformConnectionSelect path ConnectionSelect{..} = do
let connectionFields = _asnFields _csSelect
transformedFields <- forM connectionFields $ \(fieldName, field) ->
(fieldName,) <$> case field of
ConnectionTypename t -> pure $ ConnectionTypename t
ConnectionPageInfo p -> pure $ ConnectionPageInfo p
ConnectionEdges edges -> ConnectionEdges <$> transformEdges (appendPath fieldName path) edges
let select = _csSelect{_asnFields = transformedFields}
pure $ ConnectionSelect _csXRelay _csPrimaryKeyColumns _csSplit _csSlice select
where
transformEdges edgePath edgeFields =
forM edgeFields $ \(fieldName, edgeField) ->
(fieldName,) <$> case edgeField of
EdgeTypename t -> pure $ EdgeTypename t
EdgeCursor -> pure EdgeCursor
EdgeNode annFields ->
EdgeNode <$> transformAnnFields (appendPath fieldName edgePath) annFields
transformObjectSelect
:: Backend b
=> FieldPath
-> AnnObjectSelectG b u
-> State RemoteJoinMap (AnnObjectSelectG b u)
transformObjectSelect path sel = do
let fields = _aosFields sel
transformedFields <- transformAnnFields path fields
pure sel{_aosFields = transformedFields}
getRemoteJoinsAnnFields
:: Backend b
=> AnnFieldsG b u
-> (AnnFieldsG b u, Maybe RemoteJoins)
getRemoteJoinsAnnFields =
second mapToNonEmpty . flip runState mempty . transformAnnFields mempty
transformAnnFields
:: forall b u
. Backend b
=> FieldPath
-> AnnFieldsG b u
-> State RemoteJoinMap (AnnFieldsG b u)
transformAnnFields path fields = do
-- TODO: Check for correctness. I think this entire function seems to be
-- assuming that the column names will appear as is in the response from the
-- server, which is incorrect as they can be aliased. Similarly, the phantom
-- columns are being added without checking for overlap with aliases
let pgColumnFields = HS.fromList $ map (pgiColumn . _acfInfo . snd) $
getFields _AFColumn fields
remoteSelects = getFields (_AFRemote) fields
remoteJoins = flip map remoteSelects $ \(fieldName, remoteSelect) ->
let RemoteSelect argsMap selSet hasuraColumns remoteFields rsi = remoteSelect
hasuraColumnFields = HS.map (fromCol @b . pgiColumn) hasuraColumns
phantomColumns = HS.filter ((`notElem` pgColumnFields) . pgiColumn) hasuraColumns
in (phantomColumns, RemoteJoin fieldName argsMap selSet hasuraColumnFields remoteFields rsi $
map (fromCol @b . pgiColumn) $ toList phantomColumns)
transformedFields <- forM fields $ \(fieldName, field') -> do
let fieldPath = appendPath fieldName path
(fieldName,) <$> case field' of
AFNodeId x qt pkeys -> pure $ AFNodeId x qt pkeys
AFColumn c -> pure $ AFColumn c
AFObjectRelation annRel ->
AFObjectRelation <$> transformAnnRelation annRel (transformObjectSelect fieldPath)
AFArrayRelation (ASSimple annRel) ->
AFArrayRelation . ASSimple <$> transformAnnRelation annRel (transformSelect fieldPath)
AFArrayRelation (ASAggregate aggRel) ->
AFArrayRelation . ASAggregate <$> transformAnnAggregateRelation fieldPath aggRel
AFArrayRelation (ASConnection annRel) ->
AFArrayRelation . ASConnection <$> transformArrayConnection fieldPath annRel
AFComputedField x computedField ->
AFComputedField x <$> case computedField of
CFSScalar _ _ -> pure computedField
CFSTable jas annSel -> CFSTable jas <$> transformSelect fieldPath annSel
AFRemote rs -> pure $ AFRemote rs
AFExpression t -> pure $ AFExpression t
case NE.nonEmpty remoteJoins of
Nothing -> pure transformedFields
Just nonEmptyRemoteJoins -> do
let phantomColumns = map (\ci -> (fromCol @b $ pgiColumn ci, AFColumn $ AnnColumnField ci False Nothing Nothing)) $ toList $ HS.unions $ map fst $ remoteJoins
modify (Map.insert path $ fmap snd nonEmptyRemoteJoins)
pure $ transformedFields <> phantomColumns
where
getFields f = mapMaybe (sequence . second (^? f))
transformAnnRelation annRel f = do
let annSel = aarAnnSelect annRel
transformedSel <- f annSel
pure annRel{aarAnnSelect = transformedSel}
transformAnnAggregateRelation fieldPath annRel = do
let annSel = aarAnnSelect annRel
transformedSel <- transformAggregateSelect fieldPath annSel
pure annRel{aarAnnSelect = transformedSel}
transformArrayConnection fieldPath annRel = do
let connectionSelect = aarAnnSelect annRel
transformedConnectionSelect <- transformConnectionSelect fieldPath connectionSelect
pure annRel{aarAnnSelect = transformedConnectionSelect}
mapToNonEmpty :: RemoteJoinMap -> Maybe RemoteJoins
mapToNonEmpty = NE.nonEmpty . Map.toList
getRemoteJoinsMutationDB
:: Backend b
=> MutationDB b u
-> (MutationDB b u, Maybe RemoteJoins)
getRemoteJoinsMutationDB = \case
MDBInsert insert ->
first MDBInsert $ getRemoteJoinsInsert insert
MDBUpdate update ->
first MDBUpdate $ getRemoteJoinsUpdate update
MDBDelete delete ->
first MDBDelete $ getRemoteJoinsDelete delete
MDBFunction aggSelect select ->
first (MDBFunction aggSelect) $ getRemoteJoinsSelect select
where
getRemoteJoinsInsert insert =
let (output', remoteJoins) = getRemoteJoinsMutationOutput $ _aiOutput insert
in (insert{ _aiOutput = output'}, remoteJoins)
getRemoteJoinsUpdate update =
let (output', remoteJoins) = getRemoteJoinsMutationOutput $ uqp1Output update
in (update{ uqp1Output = output'}, remoteJoins)
getRemoteJoinsDelete delete =
let (output', remoteJoins) = getRemoteJoinsMutationOutput $ dqp1Output delete
in (delete{ dqp1Output = output'}, remoteJoins)
getRemoteJoinsSyncAction
:: (Backend b)
=> AnnActionExecution b v
-> (AnnActionExecution b v, Maybe RemoteJoins)
getRemoteJoinsSyncAction actionExecution =
let (fields', remoteJoins) = getRemoteJoinsAnnFields $ _aaeFields actionExecution
in (actionExecution { _aaeFields = fields' }, remoteJoins)
getRemoteJoinsActionQuery
:: (Backend b)
=> ActionQuery b v
-> (ActionQuery b v, Maybe RemoteJoins)
getRemoteJoinsActionQuery = \case
AQQuery sync ->
first AQQuery $ getRemoteJoinsSyncAction sync
AQAsync async ->
first AQAsync $ getRemoteJoinsAsyncQuery async
where
getRemoteJoinsAsyncQuery async =
let (fields', remoteJoins) =
second mapToNonEmpty . flip runState mempty . transformAsyncFields mempty $
_aaaqFields async
in (async { _aaaqFields = fields' }, remoteJoins)
transformAsyncFields path fields =
forM fields $ \(fieldName, field) -> do
let fieldPath = appendPath fieldName path
(fieldName,) <$> case field of
AsyncTypename t -> pure $ AsyncTypename t
AsyncOutput outputFields ->
AsyncOutput <$> transformAnnFields fieldPath outputFields
AsyncId -> pure AsyncId
AsyncCreatedAt -> pure AsyncCreatedAt
AsyncErrors -> pure AsyncErrors
getRemoteJoinsActionMutation
:: (Backend b)
=> ActionMutation b v
-> (ActionMutation b v, Maybe RemoteJoins)
getRemoteJoinsActionMutation = \case
AMSync sync ->
first AMSync $ getRemoteJoinsSyncAction sync
AMAsync async -> (AMAsync async, Nothing)