9d5db2ae63
Summary: This diff does two things: 1. Claws back performance lost to lightweight profiling, and then some. Haxl monad with lightweight profiling is now faster than it was before lightweight profiling was added. par1 and tree are ~20% faster. seqr is ~10% faster. par2 and seql are unchanged. 2. Eliminate redundant constraints on some exported functions. Wherever types on exported functions changed, they became less constrained with no loss of functionality. Notably, the *WithShow functions no longer require pointless Show constraints. Now the gory details: Monadbench on master (before lightweight profiling): par1 10000 reqs: 0.01s 100000 reqs: 0.11s 1000000 reqs: 1.10s par2 10000 reqs: 0.02s 100000 reqs: 0.41s 500000 reqs: 2.02s seql 10000 reqs: 0.04s 100000 reqs: 0.50s 500000 reqs: 2.65s seqr 200000 reqs: 0.02s 2000000 reqs: 0.19s 20000000 reqs: 1.92s tree 17 reqs: 0.48s 18 reqs: 0.99s 19 reqs: 2.04s After D3316018, par1 and tree got faster (surprise win), but par2 got worse, and seql got much worse: par1 10000 reqs: 0.01s 100000 reqs: 0.08s 1000000 reqs: 0.91s par2 10000 reqs: 0.03s 100000 reqs: 0.42s 500000 reqs: 2.29s seql 10000 reqs: 0.04s 100000 reqs: 0.61s 500000 reqs: 3.89s seqr 200000 reqs: 0.02s 2000000 reqs: 0.19s 20000000 reqs: 1.83s tree 17 reqs: 0.39s 18 reqs: 0.77s 19 reqs: 1.58s Looked at the core (-ddump-prep) for Monad module. Main observation is that GHC is really bad at optimizing the 'Request r a' constraint because it is a tuple. To see why: f :: Request r a => ... f = ... g ... h ... g :: Show (r a) => ... h :: Request r a => ... GHC will end up with something like: f $dRequest = let $dShow = case $dRequest of ... in let $dEq = case $dRequest of ... in ... etc for Typeable, Hashable, and the other Show ... let g' = g $dShow ... in let req_tup = ($dShow, $dEq, ... etc ...) in h req_tup ... That is, it unboxes each of the underlying dictionaries lazily, even though it only needs the single Show dictionary. It then reboxes them all in order to call 'h', meaning none of the unboxed ones are dead code. I couldn't figure out how to get it to do the sane thing (unbox the one it needs and pass the original dictionary onwards). We should investigate improving the optimizer. To avoid the problem, I tightened up the constraints in several places to be only what is necessary (instead of all of Request). Notably: Removed Request constraint from ShowReq, as it was completely unnecessary. All the *WithShow variants do not take Show constraints at all. Doing so seemed to violate their purpose. The crucial *WithInsert functions only take the bare constraints they need, avoiding the reboxing. Since *WithInsert are used by *WithShow, I had to explicitly pass a show function in places. See Note [showFn] for an explanation. This gave us back quite a bit on seql, and a bit on seqr: par1 10000 reqs: 0.01s 100000 reqs: 0.08s 1000000 reqs: 0.90s par2 10000 reqs: 0.02s 100000 reqs: 0.36s 500000 reqs: 2.18s seql 10000 reqs: 0.04s 100000 reqs: 0.55s 500000 reqs: 3.00s seqr 200000 reqs: 0.02s 2000000 reqs: 0.18s 20000000 reqs: 1.73s tree 17 reqs: 0.39s 18 reqs: 0.79s 19 reqs: 1.54s Finally, addProfileFetch was getting inlined into dataFetchWithInsert. This caused some let-bound stuff to float out and get allocated before the flag test. Adding a NOINLINE prevented this, getting about 10% speedup on par2 and seql. Doing the constraint work above enabled this, because otherwise the call to addProfileFetches was creating the reboxing issue where it didn't exist before. par1 10000 reqs: 0.01s 100000 reqs: 0.08s 1000000 reqs: 0.89s par2 10000 reqs: 0.02s 100000 reqs: 0.35s 500000 reqs: 1.98s seql 10000 reqs: 0.04s 100000 reqs: 0.53s 500000 reqs: 2.72s seqr 200000 reqs: 0.02s 2000000 reqs: 0.17s 20000000 reqs: 1.67s tree 17 reqs: 0.39s 18 reqs: 0.82s 19 reqs: 1.65s Reviewed By: simonmar Differential Revision: D3378141 fbshipit-source-id: 4b9dbe0c347f924805a7ed4c526c4e7c9aeef077 |
||
---|---|---|
example | ||
Haxl | ||
tests | ||
.gitignore | ||
.travis.yml | ||
changelog.md | ||
haxl.cabal | ||
LICENSE | ||
logo.png | ||
logo.svg | ||
PATENTS | ||
readme.md | ||
Setup.hs |
Haxl
Haxl is a Haskell library that simplifies access to remote data, such as databases or web-based services. Haxl can automatically
- batch multiple requests to the same data source,
- request data from multiple data sources concurrently,
- cache previous requests.
Having all this handled for you behind the scenes means that your data-fetching code can be much cleaner and clearer than it would otherwise be if it had to worry about optimizing data-fetching. We'll give some examples of how this works in the pages linked below.
There are two Haskell packages here:
haxl
: The core Haxl frameworkhaxl-facebook
(in example/facebook): An (incomplete) example data source for accessing the Facebook Graph API
To use Haxl in your own application, you will likely need to build one or more
data sources: the thin layer between Haxl and the data that you want
to fetch, be it a database, a web API, a cloud service, or whatever.
The haxl-facebook
package shows how we might build a Haxl data
source based on the existing fb
package for talking to the Facebook
Graph API.
Where to go next?
-
The Story of Haxl explains how Haxl came about at Facebook, and discusses our particular use case.
-
An example Facebook data source walks through building an example data source that queries the Facebook Graph API concurrently.
-
The N+1 Selects Problem explains how Haxl can address a common performance problem with SQL queries by automatically batching multiple queries into a single query, completely invisibly to the programmer.
-
Haxl Documentation on Hackage.
-
There is no Fork: An Abstraction for Efficient, Concurrent, and Concise Data Access, our paper on Haxl, accepted for publication at ICFP'14.