Haxl/example/sql/readme.md
2014-06-11 15:02:15 +02:00

7.1 KiB
Raw Blame History

Solving the "N+1 Selects Problem" with Haxl

The so-called “N+1 selects problem” is characterized by a set of queries in a loop. To ape the example from Ollie Charles:

getAllUsernames = do
  userIds <- getAllUserIds
  for userIds $ \userId -> do
    getUsernameById userId

The IO version of this code would perform one data fetch for getAllUsers, then another for each call to getUsernameById; assuming each one is implemented with something like the SQL select statement, that means “N+1 selects”.

But Haxl does not suffer from this problem. Using this very code, the Haxl implementation will perform exactly two data fetches: one to getAllUsers and one with all the getUsernameById calls batched together.

First, a dash of boilerplate:

{-# LANGUAGE DeriveDataTypeable,
             GADTs,
             MultiParamTypeClasses,
             StandaloneDeriving,
             TypeFamilies #-}

import Data.Typeable
import Haxl.Core

The Request Type

First we make a data type with a constructor for each type of request.

data UserReq a where
  GetAllIds   :: UserReq [Id]
  GetNameById :: Id -> UserReq Name
  deriving (Typeable)

type Id = Int
type Name = String

deriving instance Eq (UserReq a)
instance Hashable (UserReq a) where
   hashWithSalt s GetAllIds       = hashWithSalt s (0::Int)
   hashWithSalt s (GetNameById a) = hashWithSalt s (1::Int, a)

deriving instance Show (UserReq a)
instance Show1 UserReq where show1 = show

This type is parameterized so that each request can indicate which type of result it returns. It is Typeable so that Haxl can safely store requests to multiple data sources at once, as well as Eq and Hashable for caching and Show for debug output.

Making a Data Source from a Request Type

Now we make this an instance of Haxls StateKey and DataSource classes. StateKey lets us associate a data source with its global state, to be initialized once. (We wont take advantage of this here.)

instance StateKey UserReq where
  data State UserReq = UserState {}

Every data source needs tell Haxl its name, by giving an instance for the DataSourceName class:

instance DataSourceName UserReq where
  dataSourceName _ = "UserDataSource"

Next, DataSource lets us specify how a set of blocked requests are to be fetched. It is parameterized by the type of a user environment of crossdata source global data, as well as a request type that is an instance of StateKey. It is defined as follows:

class (DataSourceName, StateKey req, Show1 req) => DataSource u req where
  fetch
    :: State req           -- Data source state.
    -> Flags               -- Flags, containing tracing verbosity level, etc.
    -> u                   -- User environment for crossdata source globals.
    -> [BlockedFetch req]  -- Set of blocked fetches to perform.
    -> PerformFetch        -- An action to perform the fetching.

We are mainly concerned with implementing the fetch method, and in this case we can ignore many of its parameters, which are to support more complex data sources than ours. The key point is that Haxl gives us a list of all the requests that are currently waiting to be fetched, which means we can batch them together however we please.

Taking a look at the definition of BlockedFetch informs us how to implement the fetch method:

data BlockedFetch r = forall a. BlockedFetch (r a) (ResultVar a)
type ResultVar a = MVar (Either SomeException a)

Here we have that a BlockedFetch consists of a pair of a request (of type r a) and a MVar containing Either SomeException (if fetching failed) or the result of the request. The role of fetch is to fill these MVars.

Implementing the fetch Method

Now, a data source can fetch data in one of two ways:

  • Synchronously: the fetching operation is an IO () that fetches all the data and then returns.

  • Asynchronously: we can do something else while the data is being fetched. The fetching operation takes an IO () as an argument, which is the operation to perform while the data is being fetched.

These are represented by the constructors of the PerformFetch type that fetch returns:

data PerformFetch
  = SyncFetch  (IO ())
  | AsyncFetch (IO () -> IO ())

We will use SyncFetch here for simplicity. (Haxl also includes syncFetch and asyncFetch helper functions for implementing common fetch patterns.) Now, in the implementation of fetch, assuming we have some function sql for running SQL queries, we can do something like this:

-- We have no user environment, so we use ().
type Haxl = GenHaxl ()

instance DataSource () UserReq where
  dataSourceName _ = "UserDataSource"
  fetch _state _flags _userEnv blockedFetches = SyncFetch $ do

    unless (null allIdVars) $ do

      -- Fetch all the IDs.
      ids <- sql $ "select id from ids"

      -- Store the results.
      mapM_ (\m -> putMVar m (Right ids)) allIdVars

    unless (null ids) $ do

      -- Fetch all the names.
      names <- sql $ unwords
        [ "select name from names where"
        , intercalate " or " $ map ("id = " ++) idStrings
        , "order by find_in_set(id, '" ++ intercalate "," idStrings ++ "')"
        ]

      -- Store the results.
      mapM_ (\ (m, res) -> putMVar m (Right res)) (zip vars names)

    where
    allIdVars :: [ResultVar [Id]]
    allIdVars = mapMaybe
      (\bf -> case bf of
        BlockedFetch GetAllIds m -> Just m
        _ -> Nothing)
      blockedFetches

    idStrings :: [String]
    idStrings = map show ids

    ids :: [Id]
    vars :: [ResultVar Name]
    (ids, vars) = unzip $ foldr go [] blockedFetches

    go
      :: BlockedFetch UserReq
      -> [(Id, ResultVar Name)]
      -> [(Id, ResultVar Name)]
    go (BlockedFetch (GetNameById userId) m) acc = (userId, m) : acc
    go _ acc = acc

Tying it All Together

All that remains to make the original example work is to define getAllUserIds and getUserById using Haxls dataFetch function.

getAllUserIds :: Haxl [Id]
getAllUserIds = dataFetch GetAllIds

getUsernameById :: Id -> Haxl Name
getUsernameById userId = dataFetch (GetNameById userId)

dataFetch simply takes a request to a data source and returns a GenHaxl action to fetch it concurrently with others.

dataFetch :: (DataSource u r, Request r a) => r a -> GenHaxl u a

Like magic, the naïve code that looks like it will do N+1 fetches will now do just two.

getAllUsernames :: Haxl [Name]
getAllUsernames = do
  userIds <- getAllUserIds       -- Round 1
  for userIds $ \userId -> do    -- Round 2
    getUsernameById userId

The only change is that its type signature is now Haxl instead of IO, and at the top level we have to place a call to runHaxl:

main :: IO ()
main = do

  -- Initialize Haxl state.
  let stateStore = stateSet UserState{} stateEmpty

  -- Initialize Haxl environment.
  env0 <- initEnv stateStore ()

  -- Run action.
  names <- runHaxl env0 getAllUsernames

  print names