Summary:
Memoized operations were not represented in monadbench, this diff fixes that. Three tests are included:
1. Unmemoized computation, repeated N times.
2. Memoized computation, repeated N times.
3. Memoized computation, repeated N times **under different memo keys**
Reviewed By: simonmar
Differential Revision: D3444238
fbshipit-source-id: b2df534232acd5c02f9f6aea030c55d5cc846eb0
Summary: Integrate counts of memoized computation access into the profiling framework. Every call to `cachedComputation` logs one hit, including the first one.
Reviewed By: simonmar
Differential Revision: D3430491
fbshipit-source-id: a799c0e603c7bc94813da9801d7f4931a011131d
Summary:
Add a :profile command to haxlsh to view lightweight profiling data.
Has optional flags to sort output on a given column, or filter rows by name
with a regex.
Useful for iterating while trying to squash allocation issues.
Differential Revision: D3429989
fbshipit-source-id: 3631afbac6f7a8580b1c46fae8039bacaa996ab3
Summary: Some tests were failing, but we ignored the test failures by not checking the return value from the test runner. This patch fixes both the test runner and the tests.
Reviewed By: watashi
Differential Revision: D3379609
fbshipit-source-id: 0a1278879faa5beb0f9779ddfaa622cdbf05a73f
Summary:
This diff does two things:
1. Claws back performance lost to lightweight profiling, and then some.
Haxl monad with lightweight profiling is now faster than it was before
lightweight profiling was added.
par1 and tree are ~20% faster.
seqr is ~10% faster.
par2 and seql are unchanged.
2. Eliminate redundant constraints on some exported functions.
Wherever types on exported functions changed, they became less
constrained with no loss of functionality. Notably, the *WithShow
functions no longer require pointless Show constraints.
Now the gory details:
Monadbench on master (before lightweight profiling):
par1
10000 reqs: 0.01s
100000 reqs: 0.11s
1000000 reqs: 1.10s
par2
10000 reqs: 0.02s
100000 reqs: 0.41s
500000 reqs: 2.02s
seql
10000 reqs: 0.04s
100000 reqs: 0.50s
500000 reqs: 2.65s
seqr
200000 reqs: 0.02s
2000000 reqs: 0.19s
20000000 reqs: 1.92s
tree
17 reqs: 0.48s
18 reqs: 0.99s
19 reqs: 2.04s
After D3316018, par1 and tree got faster (surprise win), but par2 got worse, and seql got much worse:
par1
10000 reqs: 0.01s
100000 reqs: 0.08s
1000000 reqs: 0.91s
par2
10000 reqs: 0.03s
100000 reqs: 0.42s
500000 reqs: 2.29s
seql
10000 reqs: 0.04s
100000 reqs: 0.61s
500000 reqs: 3.89s
seqr
200000 reqs: 0.02s
2000000 reqs: 0.19s
20000000 reqs: 1.83s
tree
17 reqs: 0.39s
18 reqs: 0.77s
19 reqs: 1.58s
Looked at the core (-ddump-prep) for Monad module.
Main observation is that GHC is really bad at optimizing the 'Request r a' constraint because it is a tuple.
To see why:
f :: Request r a => ...
f = ... g ... h ...
g :: Show (r a) => ...
h :: Request r a => ...
GHC will end up with something like:
f $dRequest =
let $dShow = case $dRequest of ... in
let $dEq = case $dRequest of ... in
... etc for Typeable, Hashable, and the other Show ...
let g' = g $dShow ... in
let req_tup = ($dShow, $dEq, ... etc ...) in
h req_tup ...
That is, it unboxes each of the underlying dictionaries lazily, even though it only needs the single Show dictionary.
It then reboxes them all in order to call 'h', meaning none of the unboxed ones are dead code.
I couldn't figure out how to get it to do the sane thing (unbox the one it needs and pass the original dictionary onwards).
We should investigate improving the optimizer.
To avoid the problem, I tightened up the constraints in several places to be only what is necessary (instead of all of Request).
Notably:
Removed Request constraint from ShowReq, as it was completely unnecessary.
All the *WithShow variants do not take Show constraints at all. Doing so seemed to violate their purpose.
The crucial *WithInsert functions only take the bare constraints they need, avoiding the reboxing.
Since *WithInsert are used by *WithShow, I had to explicitly pass a show function in places.
See Note [showFn] for an explanation.
This gave us back quite a bit on seql, and a bit on seqr:
par1
10000 reqs: 0.01s
100000 reqs: 0.08s
1000000 reqs: 0.90s
par2
10000 reqs: 0.02s
100000 reqs: 0.36s
500000 reqs: 2.18s
seql
10000 reqs: 0.04s
100000 reqs: 0.55s
500000 reqs: 3.00s
seqr
200000 reqs: 0.02s
2000000 reqs: 0.18s
20000000 reqs: 1.73s
tree
17 reqs: 0.39s
18 reqs: 0.79s
19 reqs: 1.54s
Finally, addProfileFetch was getting inlined into dataFetchWithInsert.
This caused some let-bound stuff to float out and get allocated before the flag test.
Adding a NOINLINE prevented this, getting about 10% speedup on par2 and seql.
Doing the constraint work above enabled this, because otherwise the call to
addProfileFetches was creating the reboxing issue where it didn't exist before.
par1
10000 reqs: 0.01s
100000 reqs: 0.08s
1000000 reqs: 0.89s
par2
10000 reqs: 0.02s
100000 reqs: 0.35s
500000 reqs: 1.98s
seql
10000 reqs: 0.04s
100000 reqs: 0.53s
500000 reqs: 2.72s
seqr
200000 reqs: 0.02s
2000000 reqs: 0.17s
20000000 reqs: 1.67s
tree
17 reqs: 0.39s
18 reqs: 0.82s
19 reqs: 1.65s
Reviewed By: simonmar
Differential Revision: D3378141
fbshipit-source-id: 4b9dbe0c347f924805a7ed4c526c4e7c9aeef077
Summary:
This collects the highest round in which a label adds a fetch, as well as
number of fetches per label per datasource. It reports these, along with
aggregated values with scuba sample of profiling data.
Aggregation for number of rounds is the maximum round of label or any of
label's children. Aggregation for number of fetches is sum.
Reviewed By: simonmar
Differential Revision: D3316018
fbshipit-source-id: 152690c7b8811d22f566437675c943f755029528
Summary: We use a FB-specific test runner in fbcode. As a result currently tests/Main.hs is different on github to allow 'cabal test' to pass. This diff resolves the difference by creating a common list of tests and two separate entry points for running the tests: tests/Main.hs for internal use, and tests/TestMain.hs for github. tests/Main.hs will (eventually) be excluded from the public sources.
Reviewed By: simonmar
Differential Revision: D3371609
fbshipit-source-id: 46a7382df814687230db43136acd496d0c5ebca9
Summary: Fix compilation errors since get/setAllocationCounter is not supported on ghc 7.8.4.
Reviewed By: xich
Differential Revision: D3371487
fbshipit-source-id: 33c41c12503b54eb7e4d82f1d987e089792b6a0f
Summary:
Implementation of runHaxl gated behind -DEVENTLOG was slightly
bitrotted; this diff should fix that, and associated warnings and
errors.
Reviewed By: simonmar
Differential Revision: D3365756
fbshipit-source-id: 6632eff8fee29d431c1bc3cff55b74dc3ab0ae61
Summary: We use Data.Binary so the binary package is required for Haxl to build.
Reviewed By: niteria, kuk0
Differential Revision: D3365843
fbshipit-source-id: 4e1198a1d27dac91e6525fc163aaeb10f7a65972
Summary:
This will be very helpful when investigating some bad requests or
infrastructure issues, especially when the exception is caught
by withDefault later so we cannot see them in sigma_feature_errors.
The cost is negligible, so
whynot
Reviewed By: simonmar
Differential Revision: D2548503
fbshipit-source-id: 8167ef536d201923f80793aca298cc6c5dff92d1
Summary: ignore this diff. just trying to kick a sandcastle build
Reviewed By: niteria
Differential Revision: D3327658
fbshipit-source-id: a774f53e54db5e7a29a3a6152dcf7fb2d62db4b0
Summary:
Some people want to use Haxl for large batch jobs or long-running
computations. For these use-cases, caching everything is not
practical, but we still want the batching behaviour that Haxl provides.
The new Flag field, caching, controls whether caching is enabled or
not. If not, we discard the cache after each round.
Reviewed By: niteria
Differential Revision: D3310266
fbshipit-source-id: 3fb707f77075dd42410bd50fc7465b18b216804e
Summary:
The value was too sensitive, it was different in the dbg way, and even
when running individual tests vs. all tests.
Reviewed By: codemiller
Differential Revision: D3310273
fbshipit-source-id: b15f81350f389888189409e7195486dc218f2573
Summary: Associate names with MemoFingerprintKeys. This adds two unboxed string fields to MemoFingerprintKey. (One for module name, one for feature name.) Each is a pointer to a statically allocated CString, which we can then turn into a Text for withLabel.
Reviewed By: simonmar
Differential Revision: D3257819
fbshipit-source-id: 6279103c0879
Summary:
Does three things:
1. Fix order of arguments to `updEntry` to update the existing entry, instead of replacing it with 2*new.
2. Subtract nested allocations from parent.
3. Call setAllocationCounter after recording profiling data, so profiling overhead doesn't count towards the parent or the allocation limit.
Reviewed By: simonmar
Differential Revision: D3235413
fbshipit-source-id: a9f287399516fc90600b15a1524592f9c3b0674b
Summary:
One more microoptimisation. This one didn't give any measurable
benefit, but since we'll be checking the profile flag more often in
prod it seems like a good idea.
Reviewed By: JonCoens, zilberstein
Differential Revision: D3269656
fbshipit-source-id: bc3e0d6717497c463c18a92564a9d8c3a4c9646b
Summary: Should have done this ages ago.
Reviewed By: zilberstein
Differential Revision: D3269606
fbshipit-source-id: 3710558987e194ce91c50ac713ecc99dcef5e412
Summary:
`fail` is used for partial pattern match failures and currently
it throws a pure exception. Haxl has it's own concept of catchable
exceptions. Lifting the exception into Haxl makes them catchable
which sometimes might be desirable. It also gives us a better way to
distinguish them from InternalErrors.
Reviewed By: zilberstein
Differential Revision: D3230341
fbshipit-source-id: 4f0d048e13348335b86dd0427877d3d3b63e2d87
Summary:
Given that Haskell profiling is really heavyweight and can't be used
in prod, I wanted to experiment with a lightweight profiling scheme in
the Haxl monad itself.
This should let us get per-policy profile data on a sampling basis
from production, but it won't be as detailed as full profiling - no
stack traces - but it does take into account memoization.
The idea is fairly simple:
* There's a notion of a current "label" (a Text). Policies and
memoized features are labels.
* Allocations are tracked and assigned to the current label.
* We also track dependencies, when one label depends on another is
recorded. This is so that we could account for memoization in
post-processing of the data.
* We could also assign data fetches to the current label (not done
yet).
When turned off, this only costs a simple test at each label site
(policy entry and memo point).
If we want to use this, there's a lot to do:
* Allocs include memoized dependencies right now so they aren't
correct), this is fixable, but it's a bit tricky to do it without
breaking allocation limits, so I stopped to make a POC diff.
* Measure the overhead, both turned on and off.
* Include data-fetching
* Implement sampling in prod
* Figure out where to put with the data, and post-process it, etc.
Reviewed By: xich
Differential Revision: D3201803
fbshipit-source-id: 623c2efd602fc071ac6d80501474c09c3cda5c18
Summary:
Make it possible to supply the show function used for a datasource request
and/or response in the cache. Previously, this was hardcoded to be `show`.
Reviewed By: simonmar
Differential Revision: D3226122
fbshipit-source-id: 90888db59a3f2f4b4c44e107cbc2391c958e470c
Summary:
Previously all pure exceptions were mapped to CriticalError, but
that's not ideal because we want CriticalError to mean "something
internal went wrong in Haxl".
Putting pure errors in NonHaxlException will make them more
distinguishable in sigma_feature_errors.
Reviewed By: niteria
Differential Revision: D3201569
fbshipit-source-id: dcdcdf7ff00a39d7e4a3310474fbe04f5cb38f6a
Summary:
There are some exceptions that are really bugs, such as passing
invalid parameters to functions, or violating invariants. We want to
fix these bugs, but they are currently represented by exceptions in
LogicError, and are often caught.
Let's make a separate exception category, LogicBug, and start
migrating exceptions over. This will expose bugs that need fixing, so
it will be a difficult migration.
This is step #1, introduce the new category. Any new exceptions we
create that indicate client-coe bugs should be a child of LogicBug.
Reviewed By: xich
Differential Revision: D3201590
fbshipit-source-id: 4fd9ed688e8d327b8fe9543dd6998d7efbc46c27
Summary:
Parameterise dumpCacheAsHaskell with function name/type, and offer a more
appropriate type for si_sigma users so the output of `:cache dump` in haxlsh
will have a more fitting function signature.
Reviewed By: niteria
Differential Revision: D2982820
fbshipit-source-id: 0f5dc0049de4
Summary:
We don't need memoized value to be showable. By removing this constrait,
we no longer require deriving `Show` for many user-defined data types
and we support values like `Haxl a` which doesn't have a proper `Show`
instance.
Differential Revision: D2793617
fbshipit-source-id: f4958ca83eda97d4a27f4e1544a1078039ce6875
Summary:
When profiling is on (GHC's -prof flag) we can grab a call stack with
GHC.Stack.currentCallStack. This diff attaches the call stack to
every HaxlException when profiling is on.
(this may have some runtime overhead, but it's only when profiling is
on, and if it turns out to be a problem for accurate profiling then we
can fix it by deferring rendering of the stack until we need to see
it).
Reviewed By: zilberstein
Differential Revision: D2754599
fbshipit-source-id: d71f71c3f897aec7db92a3446cfc653158a1a6f4
Summary: Add serialization and deserialization functions to non-risk datasources that don't require special treatment (like CacheClient, etc), for use with D2628780
Reviewed By: simonmar
Differential Revision: D2645436
fbshipit-source-id: 2777dbfbb11528bc079e2e82e88ebbdc880a8914
Summary: Implement the core mechanisms for (de)serializing the Haxl datasource cache, including a few datasources.
Reviewed By: simonmar
Differential Revision: D2628780
fbshipit-source-id: d0ab08e1b8f01bb2fe298eaef9ce22c2321d6965
Summary:
It isn't used by anything else in core, it's really a local thing that
we're using in our own data sources, so moving it out to avoid
confusing Haxl open-source users.
Test Plan: contbuild
Reviewed By: watashi@fb.com
Subscribers: anfarmer, kjm, jlengyel, watashi, smarlow, akr, bnitka, jcoens
FB internal diff: D2516897
Signature: t1:2516897:1444236927:0e6e204a0f4393e18931dea2e7ddba83e6123ba9
Summary:
Use the smart view technique from [1] to avoid quadratic behavior of
left-nested binds in the continuation of Blocked results. This makes the seql
benchmark linear (and quite fast).
(See before-and-after benchmark results in P20020178)
[1] http://dl.acm.org/citation.cfm?id=2784743
Test Plan:
Ran the Haxl core tests with `fbmake runtests`. Also compared the
microbenchmarks in MonadBench. (see P20020178)
End-to-end test results for top 25 contexts:
https://www.facebook.com/pxlcld/mPXZhttps://www.facebook.com/pxlcld/mPZ2https://www.facebook.com/pxlcld/mPZ0https://www.facebook.com/pxlcld/mPZ1
Based on those, not sure what to do. It does seem like a very slight improvement, but not strictly improving. Thoughts?
Reviewed By: smarlow@fb.com
Subscribers: ldbrandy, kjm, jlengyel, memo, watashi, smarlow, akr, bnitka, jcoens
FB internal diff: D2438616
Tasks: 8432911
Signature: t1:2438616:1442568800:bf2d1a18819c0ceeb6ea097379ec4bbc6364c0c2