Commit Graph

114 Commits

Author SHA1 Message Date
P. C. Shyamshankar
3cc0b3e054 Add 1-ary and 2-ary function memoization machinery
Summary:
This revision generalizes the existing memoization framework to 1-ary and 2-ary
functions (namely functions of type (a -> GenHaxl u b) and (a -> b _> GenHaxl u c)).

For every support arity (currently 0, 1, and 2), a family of functions {
newMemoWithX, prepareMemoX and runMemoX } are provided. newMemo itself is
generic across all arities.

Reviewed By: simonmar

Differential Revision: D3555791

fbshipit-source-id: 010a9889d42327607c8b03a5f7a609ee0c70de49
2016-07-25 06:16:28 -07:00
Kubo Kovac
d7483ccf6b fix haddock
Summary: fix code examples in haddock

Reviewed By: sykora

Differential Revision: D3605259

fbshipit-source-id: ff7bfa1cb6a1d592967c9b69ac5e37de95d1b502
2016-07-22 07:16:27 -07:00
P. C. Shyamshankar
bff7b643f5 Refactor cachedComputation to use newMemo/runMemo.
Summary:
This revision refactors cachedComputation to only contain logic relevant to
where the request-scope memo lives; memo creation and running logic is delegated
to newMemo(with) and runMemo.

Comments in cachedComputation have been moved over to newMemo/runMemo, and a
benchmark for cachedComputation has been added to monadbench. Surprisingly,
performance might have improved, albeit very slightly.

Reviewed By: simonmar

Differential Revision: D3514791

fbshipit-source-id: b2f0627824adc79b766e4f4e28c4af957ff00a00
2016-07-06 03:31:25 -07:00
Simon Marlow
549c14fb26 Track stack traces of dataFetch calls when profiling
Summary: This diff collects the stack traces of `dataFetch` calls, when `reportLevel` >= 5 and profiling is on.  Zero overhead for non-profiled code.

Reviewed By: niteria

Differential Revision: D2535947

fbshipit-source-id: fd43c20edd5455bd5e41113059fc69206b998e44
2016-07-05 02:31:24 -07:00
P. C. Shyamshankar
7cd98c4076 Add createMemo/updateMemo helpers, with monadbench test-case.
Summary:
This diff adds the createMemo and updateMemo helper functions, which abstract
the memoization reference management logic of cachedComputation. This separates
the work of *how* a memoized computation is created/updated, from *where* the
memo reference lives, allowing the same code to be used to manage request-scope
and feature-scope memos simultaneously.

A refactor of cachedComputation to use this abstraction is forthcoming.

Reviewed By: simonmar

Differential Revision: D3492803

fbshipit-source-id: 9dadd3860d5bec3bf776eef7c1bd610c25283729
2016-07-01 08:31:28 -07:00
Jonathan Coens
3d14694b22 Add allocation statistics per data source per round
Summary: Track and report allocation usage for data sources and rounds

Reviewed By: simonmar

Differential Revision: D3488169

fbshipit-source-id: 39c853ec89881e9f7d8d32b1a8d0a878c847a33e
2016-06-28 03:01:27 -07:00
Katie Ots
f1256b6ae7 Improve clarity in Haxl documentation
Summary: Improve clarity in Haxl documentation

Reviewed By: niteria

Differential Revision: D3462720

fbshipit-source-id: e73f149b05b87ea10ef43f051f88a269ca91ca66
2016-06-21 05:31:32 -07:00
Jon Coens
3a88f2a088 Merge pull request #50 from adarqui/typo
Just a typo fix
2016-06-21 09:13:19 +01:00
Andrew Darqui
a90363a03c typo fix 2016-06-21 02:48:30 -04:00
P. C. Shyamshankar
a7b3552d5c Add memoization benchmarks to monadbench
Summary:
Memoized operations were not represented in monadbench, this diff fixes that. Three tests are included:

1. Unmemoized computation, repeated N times.
2. Memoized computation, repeated N times.
3. Memoized computation, repeated N times **under different memo keys**

Reviewed By: simonmar

Differential Revision: D3444238

fbshipit-source-id: b2df534232acd5c02f9f6aea030c55d5cc846eb0
2016-06-20 02:46:24 -07:00
P. C. Shyamshankar
43f95ee605 Add memoHits to sigma profiling framework.
Summary: Integrate counts of memoized computation access into the profiling framework. Every call to `cachedComputation` logs one hit, including the first one.

Reviewed By: simonmar

Differential Revision: D3430491

fbshipit-source-id: a799c0e603c7bc94813da9801d7f4931a011131d
2016-06-15 03:31:34 -07:00
Andrew Farmer
922717e736 Add :profile command
Summary:
Add a :profile command to haxlsh to view lightweight profiling data.
Has optional flags to sort output on a given column, or filter rows by name
with a regex.

Useful for iterating while trying to squash allocation issues.

Differential Revision: D3429989

fbshipit-source-id: 3631afbac6f7a8580b1c46fae8039bacaa996ab3
2016-06-14 09:17:25 -07:00
Gergely Szilvasy
251ad5c3e5 Version bounds on binary package
Summary:
Putting version bounds on binary package in haxl.cabal
Closes https://github.com/facebook/Haxl/pull/48

Reviewed By: watashi

Differential Revision: D3371245

Pulled By: algoriddle

fbshipit-source-id: 574cbbe3ee081bd4e8af7c91fa89cf6ab6a03029
2016-06-07 15:16:29 -07:00
Gergely Szilvasy
4c5a07b98f Fix compilation error on 7.8.4 due to missing import
Summary: Adding missing import

Reviewed By: xich

Differential Revision: D3390552

fbshipit-source-id: 0a083bb9612f2923207880b813f10bf471af1497
2016-06-05 03:34:29 -07:00
Gergely Szilvasy
1893551564 Unbreak cabal test
Summary: Some tests were failing, but we ignored the test failures by not checking the return value from the test runner. This patch fixes both the test runner and the tests.

Reviewed By: watashi

Differential Revision: D3379609

fbshipit-source-id: 0a1278879faa5beb0f9779ddfaa622cdbf05a73f
2016-06-04 15:31:43 -07:00
Andrew Farmer
9d5db2ae63 Optimize Haxl Monad
Summary:
This diff does two things:

1. Claws back performance lost to lightweight profiling, and then some.
Haxl monad with lightweight profiling is now faster than it was before
lightweight profiling was added.

par1 and tree are ~20% faster.
seqr is ~10% faster.
par2 and seql are unchanged.

2. Eliminate redundant constraints on some exported functions.
Wherever types on exported functions changed, they became less
constrained with no loss of functionality. Notably, the *WithShow
functions no longer require pointless Show constraints.

Now the gory details:

Monadbench on master (before lightweight profiling):

  par1
  10000 reqs: 0.01s
  100000 reqs: 0.11s
  1000000 reqs: 1.10s
  par2
  10000 reqs: 0.02s
  100000 reqs: 0.41s
  500000 reqs: 2.02s
  seql
  10000 reqs: 0.04s
  100000 reqs: 0.50s
  500000 reqs: 2.65s
  seqr
  200000 reqs: 0.02s
  2000000 reqs: 0.19s
  20000000 reqs: 1.92s
  tree
  17 reqs: 0.48s
  18 reqs: 0.99s
  19 reqs: 2.04s

After D3316018, par1 and tree got faster (surprise win), but par2 got worse, and seql got much worse:

  par1
  10000 reqs: 0.01s
  100000 reqs: 0.08s
  1000000 reqs: 0.91s
  par2
  10000 reqs: 0.03s
  100000 reqs: 0.42s
  500000 reqs: 2.29s
  seql
  10000 reqs: 0.04s
  100000 reqs: 0.61s
  500000 reqs: 3.89s
  seqr
  200000 reqs: 0.02s
  2000000 reqs: 0.19s
  20000000 reqs: 1.83s
  tree
  17 reqs: 0.39s
  18 reqs: 0.77s
  19 reqs: 1.58s

Looked at the core (-ddump-prep) for Monad module.
Main observation is that GHC is really bad at optimizing the 'Request r a' constraint because it is a tuple.

To see why:

  f :: Request r a => ...
  f = ... g ... h ...

  g :: Show (r a) => ...
  h :: Request r a => ...

GHC will end up with something like:

  f $dRequest =
    let $dShow = case $dRequest of ... in
    let $dEq = case $dRequest of ... in
    ... etc for Typeable, Hashable, and the other Show ...
    let g' = g $dShow ... in
    let req_tup = ($dShow, $dEq, ... etc ...) in
    h req_tup ...

That is, it unboxes each of the underlying dictionaries lazily, even though it only needs the single Show dictionary.
It then reboxes them all in order to call 'h', meaning none of the unboxed ones are dead code.
I couldn't figure out how to get it to do the sane thing (unbox the one it needs and pass the original dictionary onwards).
We should investigate improving the optimizer.

To avoid the problem, I tightened up the constraints in several places to be only what is necessary (instead of all of Request).

Notably:

Removed Request constraint from ShowReq, as it was completely unnecessary.
All the *WithShow variants do not take Show constraints at all. Doing so seemed to violate their purpose.
The crucial *WithInsert functions only take the bare constraints they need, avoiding the reboxing.
Since *WithInsert are used by *WithShow, I had to explicitly pass a show function in places.
See Note [showFn] for an explanation.

This gave us back quite a bit on seql, and a bit on seqr:

  par1
  10000 reqs: 0.01s
  100000 reqs: 0.08s
  1000000 reqs: 0.90s
  par2
  10000 reqs: 0.02s
  100000 reqs: 0.36s
  500000 reqs: 2.18s
  seql
  10000 reqs: 0.04s
  100000 reqs: 0.55s
  500000 reqs: 3.00s
  seqr
  200000 reqs: 0.02s
  2000000 reqs: 0.18s
  20000000 reqs: 1.73s
  tree
  17 reqs: 0.39s
  18 reqs: 0.79s
  19 reqs: 1.54s

Finally, addProfileFetch was getting inlined into dataFetchWithInsert.
This caused some let-bound stuff to float out and get allocated before the flag test.
Adding a NOINLINE prevented this, getting about 10% speedup on par2 and seql.
Doing the constraint work above enabled this, because otherwise the call to
addProfileFetches was creating the reboxing issue where it didn't exist before.

  par1
  10000 reqs: 0.01s
  100000 reqs: 0.08s
  1000000 reqs: 0.89s
  par2
  10000 reqs: 0.02s
  100000 reqs: 0.35s
  500000 reqs: 1.98s
  seql
  10000 reqs: 0.04s
  100000 reqs: 0.53s
  500000 reqs: 2.72s
  seqr
  200000 reqs: 0.02s
  2000000 reqs: 0.17s
  20000000 reqs: 1.67s
  tree
  17 reqs: 0.39s
  18 reqs: 0.82s
  19 reqs: 1.65s

Reviewed By: simonmar

Differential Revision: D3378141

fbshipit-source-id: 4b9dbe0c347f924805a7ed4c526c4e7c9aeef077
2016-06-04 15:20:42 -07:00
Andrew Farmer
b5a305b5c1 Count rounds/fetches for profiling labels.
Summary:
This collects the highest round in which a label adds a fetch, as well as
number of fetches per label per datasource. It reports these, along with
aggregated values with scuba sample of profiling data.

Aggregation for number of rounds is the maximum round of label or any of
label's children. Aggregation for number of fetches is sum.

Reviewed By: simonmar

Differential Revision: D3316018

fbshipit-source-id: 152690c7b8811d22f566437675c943f755029528
2016-06-04 15:20:42 -07:00
Gergely Szilvasy
19b024634b resolve test runner conflict
Summary: We use a FB-specific test runner in fbcode. As a result currently tests/Main.hs is different on github to allow 'cabal test' to pass. This diff resolves the difference by creating a common list of tests and two separate entry points for running the tests: tests/Main.hs for internal use, and tests/TestMain.hs for github. tests/Main.hs will (eventually) be excluded from the public sources.

Reviewed By: simonmar

Differential Revision: D3371609

fbshipit-source-id: 46a7382df814687230db43136acd496d0c5ebca9
2016-06-02 06:44:43 -07:00
Gergely Szilvasy
d7acb3421c Dummy get/setAllocationCounter for ghc < 7.10
Summary: Fix compilation errors since get/setAllocationCounter is not supported on ghc 7.8.4.

Reviewed By: xich

Differential Revision: D3371487

fbshipit-source-id: 33c41c12503b54eb7e4d82f1d987e089792b6a0f
2016-06-02 01:30:53 -07:00
P. C. Shyamshankar
2215e0bde0 Fix Haxl core/engine to work with -DEVENTLOG.
Summary:
Implementation of runHaxl gated behind -DEVENTLOG was slightly
bitrotted; this diff should fix that, and associated warnings and
errors.

Reviewed By: simonmar

Differential Revision: D3365756

fbshipit-source-id: 6632eff8fee29d431c1bc3cff55b74dc3ab0ae61
2016-06-02 01:30:53 -07:00
Gergely Szilvasy
c1638504d1 Add missing dependency in haxl.cabal
Summary: We use Data.Binary so the binary package is required for Haxl to build.

Reviewed By: niteria, kuk0

Differential Revision: D3365843

fbshipit-source-id: 4e1198a1d27dac91e6525fc163aaeb10f7a65972
2016-05-31 08:50:31 -07:00
Gergely Szilvasy
56612b9eba typo fix to eliminate fb and github diff 2016-05-31 08:08:55 -07:00
Simon Marlow
32eeb6879a Update haxl.cabal
Summary: We added a new module

Reviewed By: JonCoens

Differential Revision: D2254852

fbshipit-source-id: 223af03aeea4c1d24e098be70108d31cf0b3bb67
2016-05-31 07:25:22 -07:00
Zejun Wu
cf56b25aa5 Populate the number of failed data source requests to scuba
Summary:
This will be very helpful when investigating some bad requests or
infrastructure issues, especially when the exception is caught
by withDefault later so we cannot see them in sigma_feature_errors.

The cost is negligible, so

whynot

Reviewed By: simonmar

Differential Revision: D2548503

fbshipit-source-id: 8167ef536d201923f80793aca298cc6c5dff92d1
2016-05-31 06:47:52 -07:00
Noam Zilberstein
bab7a57f5c remove extra comment
Summary: someone added this useless comment. I'm going to delete it

Reviewed By: niteria, simonmar

Differential Revision: D3327964

fbshipit-source-id: 15c52b9497d70b360aeddf56fa4a402496949e43
2016-05-31 06:25:54 -07:00
Noam Zilberstein
2ec5ba1e21 kick sandcastle build
Summary: ignore this diff. just trying to kick a sandcastle build

Reviewed By: niteria

Differential Revision: D3327658

fbshipit-source-id: a774f53e54db5e7a29a3a6152dcf7fb2d62db4b0
2016-05-31 06:25:54 -07:00
Simon Marlow
f08a9d8803 Add support for disabling the cache
Summary:
Some people want to use Haxl for large batch jobs or long-running
computations.  For these use-cases, caching everything is not
practical, but we still want the batching behaviour that Haxl provides.

The new Flag field, caching, controls whether caching is enabled or
not.  If not, we discard the cache after each round.

Reviewed By: niteria

Differential Revision: D3310266

fbshipit-source-id: 3fb707f77075dd42410bd50fc7465b18b216804e
2016-05-31 06:25:54 -07:00
Simon Marlow
9744e9e122 Fix profile test
Summary:
The value was too sensitive, it was different in the dbg way, and even
when running individual tests vs. all tests.

Reviewed By: codemiller

Differential Revision: D3310273

fbshipit-source-id: b15f81350f389888189409e7195486dc218f2573
2016-05-31 06:25:54 -07:00
Andrew Farmer
83f9fff80f Name MemoFingerprintKeys
Summary: Associate names with MemoFingerprintKeys. This adds two unboxed string fields to MemoFingerprintKey. (One for module name, one for feature name.) Each is a pointer to a statically allocated CString, which we can then turn into a Text for withLabel.

Reviewed By: simonmar

Differential Revision: D3257819

fbshipit-source-id: 6279103c0879
2016-05-31 06:11:23 -07:00
Andrew Farmer
9737719b7f Subtract nested allocations from lightweight profiling
Summary:
Does three things:

1. Fix order of arguments to `updEntry` to update the existing entry, instead of replacing it with 2*new.
2. Subtract nested allocations from parent.
3. Call setAllocationCounter after recording profiling data, so profiling overhead doesn't count towards the parent or the allocation limit.

Reviewed By: simonmar

Differential Revision: D3235413

fbshipit-source-id: a9f287399516fc90600b15a1524592f9c3b0674b
2016-05-31 04:05:05 -07:00
Simon Marlow
9e083231ce Unpack flags
Summary:
One more microoptimisation.  This one didn't give any measurable
benefit, but since we'll be checking the profile flag more often in
prod it seems like a good idea.

Reviewed By: JonCoens, zilberstein

Differential Revision: D3269656

fbshipit-source-id: bc3e0d6717497c463c18a92564a9d8c3a4c9646b
2016-05-31 04:05:05 -07:00
Simon Marlow
5fc8b4540a unpack IORefs in the Env
Summary: Should have done this ages ago.

Reviewed By: zilberstein

Differential Revision: D3269606

fbshipit-source-id: 3710558987e194ce91c50ac713ecc99dcef5e412
2016-05-31 04:05:05 -07:00
Simon Marlow
ab012df36e data -> newtype
Summary: I'm not sure why these were data.

Reviewed By: kuk0

Differential Revision: D3252631

fbshipit-source-id: 4a0158041ceed8cf8d3c15e80c21d95d1dcc5625
2016-05-31 04:05:05 -07:00
Bartosz Nitka
7f45a6b128 Implement fail for GenHaxl u
Summary:
`fail` is used for partial pattern match failures and currently
it throws a pure exception. Haxl has it's own concept of catchable
exceptions. Lifting the exception into Haxl makes them catchable
which sometimes might be desirable. It also gives us a better way to
distinguish them from InternalErrors.

Reviewed By: zilberstein

Differential Revision: D3230341

fbshipit-source-id: 4f0d048e13348335b86dd0427877d3d3b63e2d87
2016-05-31 04:05:05 -07:00
Simon Marlow
f44e8798d7 POC: Add native Haxl monad profiling
Summary:
Given that Haskell profiling is really heavyweight and can't be used
in prod, I wanted to experiment with a lightweight profiling scheme in
the Haxl monad itself.

This should let us get per-policy profile data on a sampling basis
from production, but it won't be as detailed as full profiling - no
stack traces - but it does take into account memoization.

The idea is fairly simple:
* There's a notion of a current "label" (a Text).  Policies and
  memoized features are labels.
* Allocations are tracked and assigned to the current label.
* We also track dependencies, when one label depends on another is
  recorded.  This is so that we could account for memoization in
  post-processing of the data.
* We could also assign data fetches to the current label (not done
  yet).

When turned off, this only costs a simple test at each label site
(policy entry and memo point).

If we want to use this, there's a lot to do:
* Allocs include memoized dependencies right now so they aren't
  correct), this is fixable, but it's a bit tricky to do it without
  breaking allocation limits, so I stopped to make a POC diff.
* Measure the overhead, both turned on and off.
* Include data-fetching
* Implement sampling in prod
* Figure out where to put with the data, and post-process it, etc.

Reviewed By: xich

Differential Revision: D3201803

fbshipit-source-id: 623c2efd602fc071ac6d80501474c09c3cda5c18
2016-05-31 04:05:05 -07:00
Katie Ots
78de4f77d6 Enabling custom show functions for datasource requests and responses
Summary:
Make it possible to supply the show function used for a datasource request
and/or response in the cache. Previously, this was hardcoded to be `show`.

Reviewed By: simonmar

Differential Revision: D3226122

fbshipit-source-id: 90888db59a3f2f4b4c44e107cbc2391c958e470c
2016-05-31 04:05:05 -07:00
Simon Marlow
57deba8faf Move pure exceptions to NonHaxlException
Summary:
Previously all pure exceptions were mapped to CriticalError, but
that's not ideal because we want CriticalError to mean "something
internal went wrong in Haxl".

Putting pure errors in NonHaxlException will make them more
distinguishable in sigma_feature_errors.

Reviewed By: niteria

Differential Revision: D3201569

fbshipit-source-id: dcdcdf7ff00a39d7e4a3310474fbe04f5cb38f6a
2016-05-31 04:05:05 -07:00
Simon Marlow
56a738f65c Add new exception category: LogicBug
Summary:
There are some exceptions that are really bugs, such as passing
invalid parameters to functions, or violating invariants.  We want to
fix these bugs, but they are currently represented by exceptions in
LogicError, and are often caught.

Let's make a separate exception category, LogicBug, and start
migrating exceptions over.  This will expose bugs that need fixing, so
it will be a difficult migration.

This is step #1, introduce the new category.  Any new exceptions we
create that indicate client-coe bugs should be a child of LogicBug.

Reviewed By: xich

Differential Revision: D3201590

fbshipit-source-id: 4fd9ed688e8d327b8fe9543dd6998d7efbc46c27
2016-05-31 04:05:05 -07:00
Katie Ots
b261f567c4 Parameterise dumpCacheAsHaskell with function name/type
Summary:
Parameterise dumpCacheAsHaskell with function name/type, and offer a more
appropriate type for si_sigma users so the output of `:cache dump` in haxlsh
will have a more fitting function signature.

Reviewed By: niteria

Differential Revision: D2982820

fbshipit-source-id: 0f5dc0049de4
2016-05-31 04:05:05 -07:00
Zejun Wu
d9b084b86c Do not require Show for memoFingerprintKey
Summary:
We don't need memoized value to be showable. By removing this constrait,
we no longer require deriving `Show` for many user-defined data types
and we support values like `Haxl a` which doesn't have a proper `Show`
instance.

Differential Revision: D2793617

fbshipit-source-id: f4958ca83eda97d4a27f4e1544a1078039ce6875
2016-05-31 01:20:39 -07:00
Simon Marlow
4360502913 Attach a call stack to HaxlException
Summary:
When profiling is on (GHC's -prof flag) we can grab a call stack with
GHC.Stack.currentCallStack.  This diff attaches the call stack to
every HaxlException when profiling is on.

(this may have some runtime overhead, but it's only when profiling is
on, and if it turns out to be a problem for accurate profiling then we
can fix it by deferring rendering of the stack until we need to see
it).

Reviewed By: zilberstein

Differential Revision: D2754599

fbshipit-source-id: d71f71c3f897aec7db92a3446cfc653158a1a6f4
2016-05-31 01:20:39 -07:00
Jake Lengyel
c4ca10b6ea Add (de)serialization functions to most datasources
Summary: Add serialization and deserialization functions to non-risk datasources that don't require special treatment (like CacheClient, etc), for use with D2628780

Reviewed By: simonmar

Differential Revision: D2645436

fbshipit-source-id: 2777dbfbb11528bc079e2e82e88ebbdc880a8914
2016-05-31 01:20:38 -07:00
Jake Lengyel
9d2d4bc746 Serialize Haxl datasource request cache
Summary: Implement the core mechanisms for (de)serializing the Haxl datasource cache, including a few datasources.

Reviewed By: simonmar

Differential Revision: D2628780

fbshipit-source-id: d0ab08e1b8f01bb2fe298eaef9ce22c2321d6965
2016-05-31 01:20:38 -07:00
Gergely Szilvasy
3e78756034 Remove TARGETS files 2016-05-27 08:22:40 -07:00
Simon Marlow
f09fc218c3 Merge branch 'master' of https://github.com/facebook/Haxl
* 'master' of https://github.com/facebook/Haxl:
  Try to compile with GHC 8.0.1
  Use hvr travis setup
  Update aeson and time
2016-05-23 10:36:40 +01:00
Simon Marlow
3c299cc1fd bump to 0.3.1.0 2016-05-23 10:36:23 +01:00
Zejun Wu
f577a3b645 Merge pull request #42 from phadej/aeson-0.11
Update aeson and improve travis setup
2016-03-17 14:10:11 -07:00
Oleg Grenrus
e5583aa23d Try to compile with GHC 8.0.1 2016-02-09 19:52:25 +02:00
Oleg Grenrus
4e7e04140d Use hvr travis setup 2016-02-09 16:19:27 +02:00
Oleg Grenrus
63a57a2c43 Update aeson and time 2016-02-09 15:51:34 +02:00