graphql-engine/server/src-lib/Hasura/RQL/IR/RemoteSchema.hs

348 lines
12 KiB
Haskell
Raw Normal View History

{-# LANGUAGE TemplateHaskell #-}
-- | Representation for queries going to remote schemas. Due to the existence of
-- remote relationships from remote schemas, we can't simply reuse the GraphQL
-- document AST we define in graphql-parser-hs, and instead redefine a custom
-- structure to represent such queries.
module Hasura.RQL.IR.RemoteSchema
( -- AST
SelectionSet (..),
DeduplicatedSelectionSet (..),
dssCommonFields,
dssMemberSelectionSets,
ObjectSelectionSet,
mkInterfaceSelectionSet,
mkUnionSelectionSet,
Field (..),
_FieldGraphQL,
_FieldRemote,
GraphQLField (..),
fAlias,
fName,
fArguments,
fDirectives,
fSelectionSet,
mkGraphQLField,
-- entry points
RemoteSchemaRootField (..),
SchemaRemoteRelationshipSelect (..),
RemoteFieldArgument (..),
RemoteSchemaSelect (..),
-- AST conversion
convertSelectionSet,
convertGraphQLField,
)
where
import Control.Lens.TH (makeLenses, makePrisms)
import Data.HashMap.Strict qualified as HashMap
import Data.HashMap.Strict.InsOrd.Extended qualified as InsOrdHashMap
import Data.HashSet qualified as Set
import Data.List.Extended (longestCommonPrefix)
import Hasura.GraphQL.Parser.Name as GName
server: Metadata origin for definitions (type parameter version v2) The code that builds the GraphQL schema, and `buildGQLContext` in particular, is partial: not every value of `(ServerConfigCtx, GraphQLQueryType, SourceCache, HashMap RemoteSchemaName (RemoteSchemaCtx, MetadataObject), ActionCache, AnnotatedCustomTypes)` results in a valid GraphQL schema. When it fails, we want to be able to return better error messages than we currently do. The key thing that is missing is a way to trace back GraphQL type information to their origin from the Hasura metadata. Currently, we have a number of correctness checks of our GraphQL schema. But these correctness checks only have access to pure GraphQL type information, and hence can only report errors in terms of that. Possibly the worst is the "conflicting definitions" error, which, in practice, can only be debugged by Hasura engineers. This is terrible DX for customers. This PR allows us to print better error messages, by adding a field to the `Definition` type that traces the GraphQL type to its origin in the metadata. So the idea is simple: just add `MetadataObjId`, or `Maybe` that, or some other sum type of that, to `Definition`. However, we want to avoid having to import a `Hasura.RQL` module from `Hasura.GraphQL.Parser`. So we instead define this additional field of `Definition` through a new type parameter, which is threaded through in `Hasura.GraphQL.Parser`. We then define type synonyms in `Hasura.GraphQL.Schema.Parser` that fill in this type parameter, so that it is not visible for the majority of the codebase. The idea of associating metadata information to `Definition`s really comes to fruition when combined with hasura/graphql-engine-mono#4517. Their combination would allow us to use the API of fatal errors (just like the current `MonadError QErr`) to report _inconsistencies_ in the metadata. Such inconsistencies are then _automatically_ ignored. So no ad-hoc decisions need to be made on how to cut out inconsistent metadata from the GraphQL schema. This will allow us to report much better errors, as well as improve the likelihood of a successful HGE startup. PR-URL: https://github.com/hasura/graphql-engine-mono/pull/4770 Co-authored-by: Samir Talwar <47582+SamirTalwar@users.noreply.github.com> GitOrigin-RevId: 728402b0cae83ae8e83463a826ceeb609001acae
2022-06-28 18:52:26 +03:00
import Hasura.GraphQL.Parser.Variable (InputValue)
import Hasura.Prelude
import Hasura.RQL.Types.Common (FieldName)
import Hasura.RQL.Types.ResultCustomization
import Hasura.RQL.Types.ResultCustomization qualified as RQL
scaffolding for remote-schemas module The main aim of the PR is: 1. To set up a module structure for 'remote-schemas' package. 2. Move parts by the remote schema codebase into the new module structure to validate it. ## Notes to the reviewer Why a PR with large-ish diff? 1. We've been making progress on the MM project but we don't yet know long it is going to take us to get to the first milestone. To understand this better, we need to figure out the unknowns as soon as possible. Hence I've taken a stab at the first two items in the [end-state](https://gist.github.com/0x777/ca2bdc4284d21c3eec153b51dea255c9) document to figure out the unknowns. Unsurprisingly, there are a bunch of issues that we haven't discussed earlier. These are documented in the 'open questions' section. 1. The diff is large but that is only code moved around and I've added a section that documents how things are moved. In addition, there are fair number of PR comments to help with the review process. ## Changes in the PR ### Module structure Sets up the module structure as follows: ``` Hasura/ RemoteSchema/ Metadata/ Types.hs SchemaCache/ Types.hs Permission.hs RemoteRelationship.hs Build.hs MetadataAPI/ Types.hs Execute.hs ``` ### 1. Types representing metadata are moved Types that capture metadata information (currently scattered across several RQL modules) are moved into `Hasura.RemoteSchema.Metadata.Types`. - This new module only depends on very 'core' modules such as `Hasura.Session` for the notion of roles and `Hasura.Incremental` for `Cacheable` typeclass. - The requirement on database modules is avoided by generalizing the remote schemas metadata to accept an arbitrary 'r' for a remote relationship definition. ### 2. SchemaCache related types and build logic have been moved Types that represent remote schemas information in SchemaCache are moved into `Hasura.RemoteSchema.SchemaCache.Types`. Similar to `H.RS.Metadata.Types`, this module depends on 'core' modules except for `Hasura.GraphQL.Parser.Variable`. It has something to do with remote relationships but I haven't spent time looking into it. The validation of 'remote relationships to remote schema' is also something that needs to be looked at. Rips out the logic that builds remote schema's SchemaCache information from the monolithic `buildSchemaCacheRule` and moves it into `Hasura.RemoteSchema.SchemaCache.Build`. Further, the `.SchemaCache.Permission` and `.SchemaCache.RemoteRelationship` have been created from existing modules that capture schema cache building logic for those two components. This was a fair amount of work. On main, currently remote schema's SchemaCache information is built in two phases - in the first phase, 'permissions' and 'remote relationships' are ignored and in the second phase they are filled in. While remote relationships can only be resolved after partially resolving sources and other remote schemas, the same isn't true for permissions. Further, most of the work that is done to resolve remote relationships can be moved to the first phase so that the second phase can be a very simple traversal. This is the approach that was taken - resolve permissions and as much as remote relationships information in the first phase. ### 3. Metadata APIs related types and build logic have been moved The types that represent remote schema related metadata APIs and the execution logic have been moved to `Hasura.RemoteSchema.MetadataAPI.Types` and `.Execute` modules respectively. ## Open questions: 1. `Hasura.RemoteSchema.Metadata.Types` is so called because I was hoping that all of the metadata related APIs of remote schema can be brought in at `Hasura.RemoteSchema.Metadata.API`. However, as metadata APIs depended on functions from `SchemaCache` module (see [1](https://github.com/hasura/graphql-engine-mono/blob/ceba6d62264603ee5d279814677b29bcc43ecaea/server/src-lib/Hasura/RQL/DDL/RemoteSchema.hs#L55) and [2](https://github.com/hasura/graphql-engine-mono/blob/ceba6d62264603ee5d279814677b29bcc43ecaea/server/src-lib/Hasura/RQL/DDL/RemoteSchema.hs#L91), it made more sense to create a separate top-level module for `MetadataAPI`s. Maybe we can just have `Hasura.RemoteSchema.Metadata` and get rid of the extra nesting or have `Hasura.RemoteSchema.Metadata.{Core,Permission,RemoteRelationship}` if we want to break them down further. 1. `buildRemoteSchemas` in `H.RS.SchemaCache.Build` has the following type: ```haskell buildRemoteSchemas :: ( ArrowChoice arr, Inc.ArrowDistribute arr, ArrowWriter (Seq CollectedInfo) arr, Inc.ArrowCache m arr, MonadIO m, HasHttpManagerM m, Inc.Cacheable remoteRelationshipDefinition, ToJSON remoteRelationshipDefinition, MonadError QErr m ) => Env.Environment -> ( (Inc.Dependency (HashMap RemoteSchemaName Inc.InvalidationKey), OrderedRoles), [RemoteSchemaMetadataG remoteRelationshipDefinition] ) `arr` HashMap RemoteSchemaName (PartiallyResolvedRemoteSchemaCtxG remoteRelationshipDefinition, MetadataObject) ``` Note the dependence on `CollectedInfo` which is defined as ```haskell data CollectedInfo = CIInconsistency InconsistentMetadata | CIDependency MetadataObject -- ^ for error reporting on missing dependencies SchemaObjId SchemaDependency deriving (Eq) ``` this pretty much means that remote schemas is dependent on types from databases, actions, .... How do we fix this? Maybe introduce a typeclass such as `ArrowCollectRemoteSchemaDependencies` which is defined in `Hasura.RemoteSchema` and then implemented in graphql-engine? 1. The dependency on `buildSchemaCacheFor` in `.MetadataAPI.Execute` which has the following signature: ```haskell buildSchemaCacheFor :: (QErrM m, CacheRWM m, MetadataM m) => MetadataObjId -> MetadataModifier -> ``` This can be easily resolved if we restrict what the metadata APIs are allowed to do. Currently, they operate in an unfettered access to modify SchemaCache (the `CacheRWM` constraint): ```haskell runAddRemoteSchema :: ( QErrM m, CacheRWM m, MonadIO m, HasHttpManagerM m, MetadataM m, Tracing.MonadTrace m ) => Env.Environment -> AddRemoteSchemaQuery -> m EncJSON ``` This should instead be changed to restrict remote schema APIs to only modify remote schema metadata (but has access to the remote schemas part of the schema cache), this dependency is completely removed. ```haskell runAddRemoteSchema :: ( QErrM m, MonadIO m, HasHttpManagerM m, MonadReader RemoteSchemasSchemaCache m, MonadState RemoteSchemaMetadata m, Tracing.MonadTrace m ) => Env.Environment -> AddRemoteSchemaQuery -> m RemoteSchemeMetadataObjId ``` The idea is that the core graphql-engine would call these functions and then call `buildSchemaCacheFor`. PR-URL: https://github.com/hasura/graphql-engine-mono/pull/6291 GitOrigin-RevId: 51357148c6404afe70219afa71bd1d59bdf4ffc6
2022-10-21 06:13:07 +03:00
import Hasura.RemoteSchema.SchemaCache.Types
import Language.GraphQL.Draft.Syntax qualified as G
-------------------------------------------------------------------------------
-- Custom AST
-- | Custom representation of a selection set.
--
-- Similarly to other parts of the IR, the @r@ argument is used for remote
-- relationships.
data SelectionSet r var
= SelectionSetObject (ObjectSelectionSet r var)
| SelectionSetUnion (DeduplicatedSelectionSet r var)
| SelectionSetInterface (DeduplicatedSelectionSet r var)
| SelectionSetNone
deriving (Show, Eq, Functor, Foldable, Traversable)
-- | Representation of the normalized selection set of an interface/union type.
--
-- This representation is used to attempt to minimize the size of the GraphQL
-- query that eventually gets sent to the GraphQL server by defining as many
-- fields as possible on the abstract type.
data DeduplicatedSelectionSet r var = DeduplicatedSelectionSet
{ -- | Fields that aren't explicitly defined for member types
_dssCommonFields :: Set.HashSet G.Name,
-- | SelectionSets of individual member types
_dssMemberSelectionSets :: HashMap.HashMap G.Name (ObjectSelectionSet r var)
}
deriving (Show, Eq, Functor, Foldable, Traversable, Generic)
type ObjectSelectionSet r var = InsOrdHashMap.InsOrdHashMap G.Name (Field r var)
-- | Constructs an 'InterfaceSelectionSet' from a set of interface fields and an
-- association list of the fields. This function ensures that @__typename@ is
-- present in the set of interface fields.
mkInterfaceSelectionSet ::
-- | Member fields of the interface
Set.HashSet G.Name ->
-- | Selection sets for all the member types
[(G.Name, ObjectSelectionSet r var)] ->
DeduplicatedSelectionSet r var
mkInterfaceSelectionSet interfaceFields selectionSets =
DeduplicatedSelectionSet
(Set.insert GName.___typename interfaceFields)
(HashMap.fromList selectionSets)
-- | Constructs an 'UnionSelectionSet' from a list of the fields, using a
-- singleton set of @__typename@ for the set of common fields.
mkUnionSelectionSet ::
-- | Selection sets for all the member types
[(G.Name, ObjectSelectionSet r var)] ->
DeduplicatedSelectionSet r var
mkUnionSelectionSet selectionSets =
DeduplicatedSelectionSet
(Set.singleton GName.___typename)
(HashMap.fromList selectionSets)
-- | Representation of one individual field.
--
-- This particular type is the reason why we need a different representation
-- from the one in 'graphql-parser-hs': we differentiate between selection
-- fields that target the actual remote schema, and fields that, instead, are
-- remote from it and need to be treated differently.
data Field r var
= FieldGraphQL (GraphQLField r var)
| FieldRemote (SchemaRemoteRelationshipSelect r)
deriving (Show, Eq, Functor, Foldable, Traversable)
-- | Normalized representation of a GraphQL field.
--
-- This type is almost identical to 'G.Field', except for the fact that the
-- selection set is our annotated 'SelectionSet', instead of the original
-- 'G.SelectionSet'. We use this type to represent the fields of a selection
-- that do target the remote schema.
data GraphQLField r var = GraphQLField
{ _fAlias :: G.Name,
_fName :: G.Name,
_fArguments :: HashMap G.Name (G.Value var),
_fDirectives :: [G.Directive var],
_fSelectionSet :: SelectionSet r var
}
deriving (Show, Eq, Functor, Foldable, Traversable)
mkGraphQLField ::
Maybe G.Name ->
G.Name ->
HashMap G.Name (G.Value var) ->
[G.Directive var] ->
SelectionSet r var ->
GraphQLField r var
mkGraphQLField alias name =
GraphQLField (fromMaybe name alias) name
-------------------------------------------------------------------------------
-- Remote schema entry points
-- | Root entry point for a remote schema.
data RemoteSchemaRootField r var = RemoteSchemaRootField
scaffolding for remote-schemas module The main aim of the PR is: 1. To set up a module structure for 'remote-schemas' package. 2. Move parts by the remote schema codebase into the new module structure to validate it. ## Notes to the reviewer Why a PR with large-ish diff? 1. We've been making progress on the MM project but we don't yet know long it is going to take us to get to the first milestone. To understand this better, we need to figure out the unknowns as soon as possible. Hence I've taken a stab at the first two items in the [end-state](https://gist.github.com/0x777/ca2bdc4284d21c3eec153b51dea255c9) document to figure out the unknowns. Unsurprisingly, there are a bunch of issues that we haven't discussed earlier. These are documented in the 'open questions' section. 1. The diff is large but that is only code moved around and I've added a section that documents how things are moved. In addition, there are fair number of PR comments to help with the review process. ## Changes in the PR ### Module structure Sets up the module structure as follows: ``` Hasura/ RemoteSchema/ Metadata/ Types.hs SchemaCache/ Types.hs Permission.hs RemoteRelationship.hs Build.hs MetadataAPI/ Types.hs Execute.hs ``` ### 1. Types representing metadata are moved Types that capture metadata information (currently scattered across several RQL modules) are moved into `Hasura.RemoteSchema.Metadata.Types`. - This new module only depends on very 'core' modules such as `Hasura.Session` for the notion of roles and `Hasura.Incremental` for `Cacheable` typeclass. - The requirement on database modules is avoided by generalizing the remote schemas metadata to accept an arbitrary 'r' for a remote relationship definition. ### 2. SchemaCache related types and build logic have been moved Types that represent remote schemas information in SchemaCache are moved into `Hasura.RemoteSchema.SchemaCache.Types`. Similar to `H.RS.Metadata.Types`, this module depends on 'core' modules except for `Hasura.GraphQL.Parser.Variable`. It has something to do with remote relationships but I haven't spent time looking into it. The validation of 'remote relationships to remote schema' is also something that needs to be looked at. Rips out the logic that builds remote schema's SchemaCache information from the monolithic `buildSchemaCacheRule` and moves it into `Hasura.RemoteSchema.SchemaCache.Build`. Further, the `.SchemaCache.Permission` and `.SchemaCache.RemoteRelationship` have been created from existing modules that capture schema cache building logic for those two components. This was a fair amount of work. On main, currently remote schema's SchemaCache information is built in two phases - in the first phase, 'permissions' and 'remote relationships' are ignored and in the second phase they are filled in. While remote relationships can only be resolved after partially resolving sources and other remote schemas, the same isn't true for permissions. Further, most of the work that is done to resolve remote relationships can be moved to the first phase so that the second phase can be a very simple traversal. This is the approach that was taken - resolve permissions and as much as remote relationships information in the first phase. ### 3. Metadata APIs related types and build logic have been moved The types that represent remote schema related metadata APIs and the execution logic have been moved to `Hasura.RemoteSchema.MetadataAPI.Types` and `.Execute` modules respectively. ## Open questions: 1. `Hasura.RemoteSchema.Metadata.Types` is so called because I was hoping that all of the metadata related APIs of remote schema can be brought in at `Hasura.RemoteSchema.Metadata.API`. However, as metadata APIs depended on functions from `SchemaCache` module (see [1](https://github.com/hasura/graphql-engine-mono/blob/ceba6d62264603ee5d279814677b29bcc43ecaea/server/src-lib/Hasura/RQL/DDL/RemoteSchema.hs#L55) and [2](https://github.com/hasura/graphql-engine-mono/blob/ceba6d62264603ee5d279814677b29bcc43ecaea/server/src-lib/Hasura/RQL/DDL/RemoteSchema.hs#L91), it made more sense to create a separate top-level module for `MetadataAPI`s. Maybe we can just have `Hasura.RemoteSchema.Metadata` and get rid of the extra nesting or have `Hasura.RemoteSchema.Metadata.{Core,Permission,RemoteRelationship}` if we want to break them down further. 1. `buildRemoteSchemas` in `H.RS.SchemaCache.Build` has the following type: ```haskell buildRemoteSchemas :: ( ArrowChoice arr, Inc.ArrowDistribute arr, ArrowWriter (Seq CollectedInfo) arr, Inc.ArrowCache m arr, MonadIO m, HasHttpManagerM m, Inc.Cacheable remoteRelationshipDefinition, ToJSON remoteRelationshipDefinition, MonadError QErr m ) => Env.Environment -> ( (Inc.Dependency (HashMap RemoteSchemaName Inc.InvalidationKey), OrderedRoles), [RemoteSchemaMetadataG remoteRelationshipDefinition] ) `arr` HashMap RemoteSchemaName (PartiallyResolvedRemoteSchemaCtxG remoteRelationshipDefinition, MetadataObject) ``` Note the dependence on `CollectedInfo` which is defined as ```haskell data CollectedInfo = CIInconsistency InconsistentMetadata | CIDependency MetadataObject -- ^ for error reporting on missing dependencies SchemaObjId SchemaDependency deriving (Eq) ``` this pretty much means that remote schemas is dependent on types from databases, actions, .... How do we fix this? Maybe introduce a typeclass such as `ArrowCollectRemoteSchemaDependencies` which is defined in `Hasura.RemoteSchema` and then implemented in graphql-engine? 1. The dependency on `buildSchemaCacheFor` in `.MetadataAPI.Execute` which has the following signature: ```haskell buildSchemaCacheFor :: (QErrM m, CacheRWM m, MetadataM m) => MetadataObjId -> MetadataModifier -> ``` This can be easily resolved if we restrict what the metadata APIs are allowed to do. Currently, they operate in an unfettered access to modify SchemaCache (the `CacheRWM` constraint): ```haskell runAddRemoteSchema :: ( QErrM m, CacheRWM m, MonadIO m, HasHttpManagerM m, MetadataM m, Tracing.MonadTrace m ) => Env.Environment -> AddRemoteSchemaQuery -> m EncJSON ``` This should instead be changed to restrict remote schema APIs to only modify remote schema metadata (but has access to the remote schemas part of the schema cache), this dependency is completely removed. ```haskell runAddRemoteSchema :: ( QErrM m, MonadIO m, HasHttpManagerM m, MonadReader RemoteSchemasSchemaCache m, MonadState RemoteSchemaMetadata m, Tracing.MonadTrace m ) => Env.Environment -> AddRemoteSchemaQuery -> m RemoteSchemeMetadataObjId ``` The idea is that the core graphql-engine would call these functions and then call `buildSchemaCacheFor`. PR-URL: https://github.com/hasura/graphql-engine-mono/pull/6291 GitOrigin-RevId: 51357148c6404afe70219afa71bd1d59bdf4ffc6
2022-10-21 06:13:07 +03:00
{ _rfRemoteSchemaInfo :: RemoteSchemaInfo,
_rfResultCustomizer :: RQL.ResultCustomizer,
_rfField :: GraphQLField r var
}
deriving (Functor, Foldable, Traversable)
-- | A remote relationship's selection and fields required for its join condition.
data SchemaRemoteRelationshipSelect r = SchemaRemoteRelationshipSelect
{ -- | The fields on the table that are required for the join condition
-- of the remote relationship
_srrsLHSJoinFields :: HashMap FieldName G.Name,
-- | The field that captures the relationship
-- r ~ (RemoteRelationshipField UnpreparedValue) when the AST is emitted by the parser.
-- r ~ Void when an execution tree is constructed so that a backend is
-- absolved of dealing with remote relationships.
_srrsRelationship :: r
}
deriving (Eq, Show, Functor, Foldable, Traversable)
data RemoteFieldArgument = RemoteFieldArgument
{ _rfaArgument :: G.Name,
_rfaValue :: InputValue RemoteSchemaVariable
}
deriving (Eq, Show)
Enable remote joins from remote schemas in the execution engine. ### Description This PR adds the ability to perform remote joins from remote schemas in the engine. To do so, we alter the definition of an `ExecutionStep` targeting a remote schema: the `ExecStepRemote` constructor now expects a `Maybe RemoteJoins`. This new argument is used when processing the execution step, in the transport layer (either `Transport.HTTP` or `Transport.WebSocket`). For this `Maybe RemoteJoins` to be extracted from a parsed query, this PR also extends the `Execute.RemoteJoin.Collect` module, to implement "collection" from a selection set. Not only do those new functions extract the remote joins, but they also apply all necessary transformations to the selection sets (such as inserting the necessary "phantom" fields used as join keys). Finally in `Execute.RemoteJoin.Join`, we make two changes. First, we now always look for nested remote joins, regardless of whether the join we just performed went to a source or a remote schema; and second we adapt our join tree logic according to the special cases that were added to deal with remote server edge cases. Additionally, this PR refactors / cleans / documents `Execute.RemoteJoin.RemoteServer`. This is not required as part of this change and could be moved to a separate PR if needed (a similar cleanup of `Join` is done independently in #3894). It also introduces a draft of a new documentation page for this project, that will be refined in the release PR that ships the feature (either #3069 or a copy of it). While this PR extends the engine, it doesn't plug such relationships in the schema, meaning that, as of this PR, the new code paths in `Join` are technically unreachable. Adding the corresponding schema code and, ultimately, enabling the metadata API will be done in subsequent PRs. ### Keeping track of concrete type names The main change this PR makes to the existing `Join` code is to handle a new reserved field we sometimes use when targeting remote servers: the `__hasura_internal_typename` field. In short, a GraphQL selection set can sometimes "branch" based on the concrete "runtime type" of the object on which the selection happens: ```graphql query { author(id: 53478) { ... on Writer { name articles { title } } ... on Artist { name articles { title } } } } ``` If both of those `articles` are remote joins, we need to be able, when we get the answer, to differentiate between the two different cases. We do this by asking for `__typename`, to be able to decide if we're in the `Writer` or the `Artist` branch of the query. To avoid further processing / customization of results, we only insert this `__hasura_internal_typename: __typename` field in the query in the case of unions of interfaces AND if we have the guarantee that we will processing the request as part of the remote joins "folding": that is, if there's any remote join in this branch in the tree. Otherwise, we don't insert the field, and we leave that part of the response untouched. PR-URL: https://github.com/hasura/graphql-engine-mono/pull/3810 GitOrigin-RevId: 89aaf16274d68e26ad3730b80c2d2fdc2896b96c
2022-03-09 06:17:28 +03:00
data RemoteSchemaSelect r = RemoteSchemaSelect
{ _rselArgs :: [RemoteFieldArgument],
_rselResultCustomizer :: ResultCustomizer,
Enable remote joins from remote schemas in the execution engine. ### Description This PR adds the ability to perform remote joins from remote schemas in the engine. To do so, we alter the definition of an `ExecutionStep` targeting a remote schema: the `ExecStepRemote` constructor now expects a `Maybe RemoteJoins`. This new argument is used when processing the execution step, in the transport layer (either `Transport.HTTP` or `Transport.WebSocket`). For this `Maybe RemoteJoins` to be extracted from a parsed query, this PR also extends the `Execute.RemoteJoin.Collect` module, to implement "collection" from a selection set. Not only do those new functions extract the remote joins, but they also apply all necessary transformations to the selection sets (such as inserting the necessary "phantom" fields used as join keys). Finally in `Execute.RemoteJoin.Join`, we make two changes. First, we now always look for nested remote joins, regardless of whether the join we just performed went to a source or a remote schema; and second we adapt our join tree logic according to the special cases that were added to deal with remote server edge cases. Additionally, this PR refactors / cleans / documents `Execute.RemoteJoin.RemoteServer`. This is not required as part of this change and could be moved to a separate PR if needed (a similar cleanup of `Join` is done independently in #3894). It also introduces a draft of a new documentation page for this project, that will be refined in the release PR that ships the feature (either #3069 or a copy of it). While this PR extends the engine, it doesn't plug such relationships in the schema, meaning that, as of this PR, the new code paths in `Join` are technically unreachable. Adding the corresponding schema code and, ultimately, enabling the metadata API will be done in subsequent PRs. ### Keeping track of concrete type names The main change this PR makes to the existing `Join` code is to handle a new reserved field we sometimes use when targeting remote servers: the `__hasura_internal_typename` field. In short, a GraphQL selection set can sometimes "branch" based on the concrete "runtime type" of the object on which the selection happens: ```graphql query { author(id: 53478) { ... on Writer { name articles { title } } ... on Artist { name articles { title } } } } ``` If both of those `articles` are remote joins, we need to be able, when we get the answer, to differentiate between the two different cases. We do this by asking for `__typename`, to be able to decide if we're in the `Writer` or the `Artist` branch of the query. To avoid further processing / customization of results, we only insert this `__hasura_internal_typename: __typename` field in the query in the case of unions of interfaces AND if we have the guarantee that we will processing the request as part of the remote joins "folding": that is, if there's any remote join in this branch in the tree. Otherwise, we don't insert the field, and we leave that part of the response untouched. PR-URL: https://github.com/hasura/graphql-engine-mono/pull/3810 GitOrigin-RevId: 89aaf16274d68e26ad3730b80c2d2fdc2896b96c
2022-03-09 06:17:28 +03:00
_rselSelection :: SelectionSet r RemoteSchemaVariable,
_rselFieldCall :: NonEmpty FieldCall,
_rselRemoteSchema :: RemoteSchemaInfo
}
deriving (Show)
-------------------------------------------------------------------------------
-- Conversion back to a GraphQL document
-- | Converts a normalized selection set back into a selection set as defined in
-- GraphQL spec, in order to send it to a remote server.
--
-- This function expects a 'SelectionSet' for which @r@ is 'Void', which
-- guarantees that there is no longer any remote join field in the selection
-- set.
convertSelectionSet ::
forall var.
Eq var =>
SelectionSet Void var ->
G.SelectionSet G.NoFragments var
convertSelectionSet = \case
SelectionSetObject s -> convertObjectSelectionSet s
SelectionSetUnion s -> convertAbstractTypeSelectionSet s
SelectionSetInterface s -> convertAbstractTypeSelectionSet s
SelectionSetNone -> mempty
where
convertField :: Field Void var -> G.Field G.NoFragments var
convertField = \case
FieldGraphQL f -> convertGraphQLField f
convertObjectSelectionSet =
map (G.SelectionField . convertField . snd) . InsOrdHashMap.toList
convertAbstractTypeSelectionSet abstractSelectionSet =
let (base, members) = reduceAbstractTypeSelectionSet abstractSelectionSet
commonFields = convertObjectSelectionSet base
concreteTypeSelectionSets =
HashMap.toList members <&> \(concreteType, selectionSet) ->
G.InlineFragment
{ G._ifTypeCondition = Just concreteType,
G._ifDirectives = mempty,
G._ifSelectionSet = convertObjectSelectionSet selectionSet
}
in -- The base selection set first and then the more specific member
-- selection sets. Note that the rendering strategy here should be
-- inline with the strategy used in `mkAbstractTypeSelectionSet`
commonFields <> map G.SelectionInlineFragment concreteTypeSelectionSets
convertGraphQLField :: Eq var => GraphQLField Void var -> G.Field G.NoFragments var
convertGraphQLField GraphQLField {..} =
G.Field
{ -- add the alias only if it is different from the field name. This
-- keeps the outbound request more readable
G._fAlias = if _fAlias /= _fName then Just _fAlias else Nothing,
G._fName = _fName,
G._fArguments = _fArguments,
G._fDirectives = mempty,
G._fSelectionSet = convertSelectionSet _fSelectionSet
}
-- | Builds the selection set for an abstract type.
--
-- Let's consider this query on starwars API:
-- The type `Node` an interface is implemented by `Film`, `Species`, `Planet`,
-- `Person`, `Starship`, `Vehicle`
--
-- query f {
-- node(id: "ZmlsbXM6MQ==") {
-- __typename
-- id
-- ... on Film {
-- title
-- }
-- ... on Species {
-- name
-- }
-- }
-- }
--
-- When we parse this, it gets normalized into this query:
--
-- query f {
-- node(id: "ZmlsbXM6MQ==") {
-- ... on Film {
-- __typename: __typename
-- id
-- title
-- }
-- ... on Species {
-- __typename: __typename
-- id
-- name
-- }
-- ... on Planet {
-- __typename: __typename
-- id
-- }
-- ... on Person {
-- __typename: __typename
-- id
-- }
-- ... on Starship {
-- __typename: __typename
-- id
-- }
-- ... on Vehicle {
-- __typename: __typename
-- id
-- }
-- }
-- }
--
-- `__typename` and `id` get pushed to each of the member types. From the above
-- normalized selection set, we want to costruct a query as close to the
-- original as possible. We do this as follows:
--
-- 1. find the longest common set of fields that each selection set starts with
-- (in the above case, they are `__typename` and `id`)
-- 2. from the above list of fields, find the first field that cannot be
-- defined on the abstract type. The fields that can be defined on the
-- abstract type are all the fields that occur before the first non abstract
-- type field (in the above case, both` __typename` and `id` can be defined
-- on the `Node` type)
-- 3. Strip the base selection set fields from all the member selection sets and
-- filter out the member type selection sets that are subsumed by the base
-- selection set
--
-- The above query now translates to this:
--
-- query f {
-- node(id: "ZmlsbXM6MQ==") {
-- __typename: __typename
-- id
-- ... on Film {
-- title
-- }
-- ... on Species {
-- name
-- }
-- }
-- }
--
-- Note that it is not always possible to get the same shape as the original
-- query and there is more than one approach to this. For example, we could
-- have picked the selection set (that can be defined on the abstract type)
-- that is common across all the member selection sets and used that as the
-- base selection.
reduceAbstractTypeSelectionSet ::
(Eq var) =>
DeduplicatedSelectionSet Void var ->
(ObjectSelectionSet Void var, HashMap.HashMap G.Name (ObjectSelectionSet Void var))
reduceAbstractTypeSelectionSet (DeduplicatedSelectionSet baseMemberFields selectionSets) =
(baseSelectionSet, HashMap.fromList memberSelectionSets)
where
sharedSelectionSetPrefix = longestCommonPrefix $ map (InsOrdHashMap.toList . snd) $ HashMap.toList selectionSets
baseSelectionSet = InsOrdHashMap.fromList $ takeWhile (shouldAddToBase . snd) sharedSelectionSetPrefix
shouldAddToBase = \case
FieldGraphQL f -> Set.member (_fName f) baseMemberFields
memberSelectionSets =
-- remove member selection sets that are subsumed by base selection set
filter (not . null . snd) $
-- remove the common prefix from member selection sets
map (second (InsOrdHashMap.fromList . drop (InsOrdHashMap.size baseSelectionSet) . InsOrdHashMap.toList)) $
HashMap.toList selectionSets
-------------------------------------------------------------------------------
-- TH lens generation
$(makePrisms ''Field)
$(makeLenses ''GraphQLField)
$(makeLenses ''DeduplicatedSelectionSet)