A few notes:
- The removal of the `preEmps` stuff from `findSchedulePrefix`:
This has been in dejafu from near the beginning, on the assumption
that it was somehow an optimisation which would lead to a more
efficient exploration of the schedules. I can't really remember
the original justification. Well, it's not, it's actually a huge
pessimisation.
- The removal of the `pruneCommits` stuff from `sctBounded`:
Another algorithmic "optimisation" that is actually a large
pessimisation! Past-barrucadu should be ashamed. It's conceivable
that `pruneCommits` is a win in programs which have a large
amount of relaxed memory *and* synchronised stuff going on, but I
can't really think of such a use-case.
- The un-tupling of `dependency` and `updateDepState`:
This makes me sad. I find the tuples to be good extra
documentation in the types, as the things which were paired
logically go together and shouldn't be separated. However, the
tuples were constructed at the call-site, and deconstructed in the
body, so the allocation was just a waste.
- Catenating lists:
If the order doesn't matter, then it's better to put the shorter
list first, this is because the spine of the first argument to
`(++)` gets reallocated. Even better, rewrite the code to not
concat: can it operate on the list of lists directly?
- Data structures as control structures:
It's often said that in Haskell we use laziness to let us use data
structures as control structures. This is true, but don't use it
as an excuse to write really naive code and just trust the data
structures to magically make everything better for us!
- Not forcing full evaluation in the DPOR scheduler:
This was definitely a win at one time, avoiding a huge
multi-gigabyte space leak. Laziness is a fickle beast, however,
and this is no longer a win.
As the NFData instances are no longer used anywhere, they are
removed and the deepseq dependency is dropped.
- Inline inner loops:
(this may be an effect of compiling with profiling messing up the
inliner)
If a small function is used as the inner "loop" of a recursive
definition, try sticking an `{-# INLINE ... #-}` pragma on
it. Allocation seems to go up a little but time tends to go down
more.
Net effect of this commit on the execution of dejafu-tests, compiled
with profiling, on my desktop:
Before:
> total time = 298.20 secs (298199 ticks @ 1000 us, 1 processor)
> total alloc = 111,135,182,008 bytes (excludes profiling overheads)
After:
> total time = 140.70 secs (140702 ticks @ 1000 us, 1 processor)
> total alloc = 46,653,034,384 bytes (excludes profiling overheads)
This makes `stepThread` messier, and doesn't actually prevent nesting
currently - although it does prevent usage when there are multiple
threads, which may be enough.