Before, we were slicing the original raw text and storing individual
variables with values corresponding to each field. This is avoidable
overhead.
With this patch, we store the offsets of the fields at construction
time and perform the slice when a property is accessed.
This appears to show a very marginal performance win on its own and
the gains are so small as to not be worth reporting. However, this
patch marks the end of our parsing refactor, so it is worth reporting
the gains from the entire series:
author(mpm)
0.896565
0.795987 89%
desc(bug)
0.887169
0.803438 90%
date(2015)
0.878797
0.773961 88%
extra(rebase_source)
0.865446
0.761603 88%
author(mpm) or author(greg)
1.801832
1.576025 87%
author(mpm) or desc(bug)
1.812438
1.593335 88%
date(2015) or branch(default)
0.968276
0.875270 90%
author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.183104 87%
Pretty consistent speed-up across the board for any revset accessing
changelog revision data. Not bad!
It's also worth noting that PyPy appears to experience a similar to
marginally greater speed-up as well!
According to statprof, revsets accessing changelog revision data
are now clearly dominated by zlib decompression (16-17% of execution
time). Surprisingly, it appears the most expensive part of revision
parsing are the various text.index() calls to search for newlines!
These appear to cumulatively add up to 5+% of execution time. I
reckon implementing the parsing in C would make things marginally
faster.
If accessing larger strings (such as the commit message),
encoding.tolocal() is the most expensive procedure outside of
decompression.
Before, we first searched for the double newline as the first step in
the parse then moved to the front of the string and worked our way
to the back again. This made sense when we were splitting the raw
text on the double newline. But in our new newline scanning based
approach, this feels awkward.
This patch updates the parsing logic to parse the text linearly and
deal with the description field last.
Because we're avoiding an extra string scan, revsets appear to
demonstrate a very slight performance win. But the percentage
change is marginal, so the numbers aren't worth reporting.
This is probably the most complicated patch in the parsing
refactor.
Because the date and extras are encoded in the same field, we
stuff the entire field into a dedicated variable and add a
property for accessing the sub-components of each. There is
some duplicated code here. But the code is relatively simple,
so it shouldn't be a big deal.
We see revset performance wins across the board:
author(mpm)
0.896565
0.876713
0.822961
desc(bug)
0.887169
0.895514
0.847054
date(2015)
0.878797
0.820987
0.811613
extra(rebase_source)
0.865446
0.823811
0.797756
author(mpm) or author(greg)
1.801832
1.784160
1.668172
author(mpm) or desc(bug)
1.812438
1.822756
1.677608
date(2015) or branch(default)
0.968276
0.910981
0.896032
author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.516788
3.265024
We see a speed-up on revsets accessing date and extras because the new
parsing code only parses what you access. Even though they are stored
the same text field, we avoid parsing dates when accessing extras and
vice-versa.
But strangely revsets accessing both date and extras appeared to speed
up as well! I'm not sure if this is due to refactoring the parsing
code or due to an optimization in revsets. You can't argue with the
results!
Same strategy as before.
Revsets not accessing the user demonstrate a slight performance win:
desc(bug)
0.887169
0.910400
0.895514
date(2015)
0.878797
0.870697
0.820987
extra(rebase_source)
0.865446
0.841644
0.823811
date(2015) or branch(default)
0.968276
0.945792
0.910981
Like the description, we store the raw bytes and convert from
hex on access.
This patch also marks the beginning of our new parsing method,
which is based on newline offsets and doesn't rely on
str.split().
Many revsets showed a performance improvement:
author(mpm)
0.896565
0.869085
0.868598
desc(bug)
0.887169
0.928164
0.910400
extra(rebase_source)
0.865446
0.871500
0.841644
author(mpm) or author(greg)
1.801832
1.791589
1.731503
author(mpm) or desc(bug)
1.812438
1.851003
1.798764
date(2015) or branch(default)
0.968276
0.974027
0.945792
Before, the description field was converted to a localstr at parse
time. With this patch, we store the raw description and convert to
a localstr when it is first accessed.
We see a revset speedup for revsets that don't access the description:
author(mpm)
0.896565
0.914234
0.869085
date(2015)
0.878797
0.891980
0.862525
extra(rebase_source)
0.865446
0.912514
0.871500
author(mpm) or author(greg)
1.801832
1.860402
1.791589
date(2015) or branch(default)
0.968276
0.994673
0.974027
author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.721032
3.643593
As you can see, most of these revsets are already faster than from
before this refactoring: we have already offset the performance
loss from the introduction of the new class representing parsed
changelog entries!
Currently, changelog entries are parsed into their respective
components at read time. Many operations are only interested
in a subset of fields of a changelog entry. The parsing and
storing of all the fields adds avoidable overhead.
This patch introduces the "changelogrevision" class. It takes
changelog raw text and exposes the parsed results as attributes.
The code for parsing changelog entries has been moved into its
construction function. changelog.read() has been modified to use
the new class internally while maintaining its existing API.
Future patches will make revision parsing lazy.
We implement the construction function of the new class with
__new__ instead of __init__ so we can use a named tuple to
represent the empty revision. This saves overhead and complexity
of coercing later versions of this class to represent an empty
instance.
While we are here, we add a method on changelog to obtain an
instance of the new type.
The overhead of constructing the new class regresses performance
of revsets accessing this data:
author(mpm)
0.896565
0.929984
desc(bug)
0.887169
0.935642 105%
date(2015)
0.878797
0.908094
extra(rebase_source)
0.865446
0.922624 106%
author(mpm) or author(greg)
1.801832
1.902112 105%
author(mpm) or desc(bug)
1.812438
1.860977
date(2015) or branch(default)
0.968276
1.005824
author(mpm) or desc(bug) or date(2015) or extra(rebase_source)
3.656193
3.743381
Once lazy parsing is implemented, these revsets will all be faster
than before. There is no performance change on revsets that do not
access this data. There /could/ be a performance regression on
operations that perform several changelog reads. However, I can't
think of anything outside of revsets and `hg log` (basically the
same as a revset) that would be impacted.
Currently, changelog reading decodes read values. This is wasteful
because a lot of times consumers aren't interested in some of these
values.
This patch changes description decoding to occur in changectx as
needed.
revsets reading changelog entries appear to speed up slightly:
revset #7: author(lmoscovicz)
plain
0) 0.906329
1) 0.872653
revset #8: author(mpm)
plain
0) 0.903478
1) 0.878037
revset #9: author(lmoscovicz) or author(mpm)
plain
0) 1.817855
1) 1.778680
revset #10: author(mpm) or author(lmoscovicz)
plain
0) 1.837052
1) 1.764568
This patch adds a new method "readfiles" to get the files modified by a
changeset. It extracts some logic from "read" to only return the files modified
by a changeset as efficiently as possible. This is used in the next patch to
speed up hg log <file|folder>
smartset sorting is lazy (so faster in some case) and better (informs that the
set is sorted allowing some optimisation). So we rely on it directly.
Some test output are updated because we now have more information (ordering).
The main goal of this patch series is to reduce the use of PyXxx() function
that is likely to require ugly error handling and inc/decref. Plus, this is
faster than using PySet_Contains().
revset #0: 0::tip
0) 0.004168
1) 0.003678 88%
This patch ignores out-of-range roots as they are in the pure implementation.
Because reachable sets are calculated from heads, and out-of-range heads raise
IndexError, we can just take out-of-range roots as unreachable. Otherwise,
the test of "hg log -Gr '. + wdir()'" would fail.
"heads" argument is changed to a list. Should we have to rename the C function
as its signature is changed?
There are no remaining codepaths in reachableroots where it will
return None, so just trust it completely and simplify this method.
Result by revset
================
Revision:
0) Revision 462169b7f799: style: adjust whitespaces in webutil.py
1) Revision d1d91b8090c6: changelog: trust C implementation of reachableroots more
revset #0: 0::tip
plain
0) 0.067684
1) 0.006622 9%
revset #1: 0::@
plain
0) 0.068249
1) 0.009394 13%
IOW this is a 10x speedup in my repo for hg itself for 0::tip and
similar revsets now that the C code is correctly wired up.
This patch is part of a series of patches to speed up the computation of
revset.reachableroots by introducing a C implementation. The main motivation is
to speed up smartlog on big repositories. At the end of the series, on our big
repositories the computation of reachableroots is 10-50x faster and smartlog on
is 2x-5x faster.
This patch allows us to call the new C implementation of reachableroots from
python by creating an entry point in the changelog class.
Since transaction are used for more than just changesets, it is possible
to have a transaction without new changesets at all. In this case no
''00changelog.i.a' are written. In all cases the 'changelog.readpending'
method is called if the repository has any pending data. The 'revlog' logic
provides empty content if the file is missing, so the whole operation
resulted in an empty changelog.
We now skip reading the pending file if it is missing.
Currently __contains__ is called only by "rev()" revset, but "x in cl" is a
function that is likely to be used in hot loop. revlog.__contains__ is simple
enough to duplicate to changelog, so just inline it.
Because revlog implements __iter__, "rev in revlog" works but does silly O(n)
lookup unexpectedly. So it seems good to add fast version of __contains__.
This allows "rev in repo.changelog" in the next patch.
The callback will likely need to perform some operation related to the
transaction (eg: registering file update). So we better pass the current
transaction as the callback argument. Otherwise callback that needs it has to
rely on horrible weak reference trick.
This allow already allow us to slay a wild weak reference usage.
The callback will likely need to perform some operation related to the
transaction (eg: backing files up). So we better pass the current transaction as
the callback argument. Otherwise callback that needs it has to rely on horrible
weak reference trick.
The first foreseen user of this is changelog._writepending. We would like it to
register the temporary file it create for cleanup purpose.
Instead of calling 'cl.finalize()' by hand (possibly at a bogus time) we
register it in the transaction during 'delayupdate' and rely on 'tr.close()' to
call it at the right time.
The 'delayupdate' method now takes a transaction object and registers its
'_writepending' method for execution in 'transaction.writepending()'. The hook can then
use 'transaction.writepending()' directly.
At some point this will allow the addition of other file creation
during writepending.
The current way we use the 'delayupdate' mechanism is wrong. We call
'delayupdate' right after the transaction retrieval, then we call 'finalize'
right before calling 'tr.close()'. The 'finalize' call will -always- result in a
flush to disk, making the data available to all readers. But the 'tr.close()' may
be a no-op if the transaction is nested. This would result in data:
1) exposed to reader too early,
2) rolled back by other part of the transaction after such exposure
So we need to end up in a situation where we call 'finalize' a single time when
the transaction actually closes. For this purpose we need to be able to call
'delayupdate' and '_writepending' multiple times and 'finalize' once. This was
not possible with the previous state of the code.
This changeset refactors the code to makes this possible. We buffer data in memory
as much as possible and fall-back to writing to a ".a" file after the first call
to '_writepending'.
5d1adb6683fa introduced use of the new filtering headrevs C implementation. It
caught TypeError to detect when to fall back to the implementation that was
compatible with old extensions. That method was however not reliable.
Instead, use the new headrevsfiltered function when passing a filter. It will
reliably fail with AttributeError when an old extension that predates
headrevsfiltered is used.
Previously, if there were filtered revs the repository could not use the C fast
path for computing the head revs in the changelog. This slowed down many
operations in large repositories.
This adds the ability to filter revs to the C fast path. This speeds up histedit
on repositories with filtered revs by 30% (13s to 9s). This could be improved
further by sorting the filtered revs and walking the sorted list while we walk
the changelog, but even this initial version that just calls __contains__ is
still massively faster.
The new C api is compatible for both new and old python clients, and the new
python client can call both new and old C apis.
The ``localrepo.writepending`` method is using the ``changelog._delaybuff``
attribute to know if it has anything to do. However the ``changelog._delaybuff``
is never initialised at ``__init__`` time. This can lead to crash when using
bundle2 for part that never touch the changelog.
We simply initialize it to its base value. This is scheduled for stable as it
both trivial and blocking for experimenting with bundle2.
Just invoking "os.fstat()" with "file.fileno()" doesn't require non
ANSI file API, because filename is not used for invocation of
"os.fstat()".
But "util.fstat()" should invoke "os.stat()" with "fp.name", if file
object doesn't have "fileno()" method for portability, and "fp.name"
may cause invocation of non ANSI file API.
So, this patch makes the constructor of appender class invoke
"util.fstat()" via vfs, to encapsulate filename handling.
pprint ain't pretty in Python 2.4:
Changed in version 2.5: Dictionaries are sorted by key before the display is
computed; before 2.5, a dictionary was sorted only if its display required more
than one line, although that wasn’t documented.
Fixes issue introduced in 06396987f8e8.
The only way to access the branch of a changeset is currently to
create a changectx object and access its `branch()` method. Creating
a new Python object is costly and has a huge impact on code doing
heavy access to `branch()` (like branchmap).
This change introduces a new method on changelog that allows direct
access to the branch of a revision. See the next changeset for impact.
When commiting to a repo with lots of history (>400000 changesets)
the filteredrevs check (added with 373606589de5) in changelog.py
takes a bit of time even if the filteredrevs set is empty. Skipping
the check in that case shaves 0.36 seconds off a 2.14 second commit.
A 17% gain.
This changeset allows changelog object to be "filtered". You can assign a set of
revision numbers to the `changelog.filteredrevs` attributes. The changelog will
then pretends these revision does not exists in this repo.
A few methods need to be altered to achieve this behavior:
- tip
- __iter_
- irevs
- hasnode
- headrevs
For consistency and to help debugging, the following methods are altered too.
Tests tend to show it's not necessary to alter them but have them raise proper
exception helps to detect bad acces to filtered revisions.
- rev
- node
- linkrev
- parentrevs
- flags
The following methods would also need alteration for consistency purpose but
this is non-trivial and not done yet.
- nodemap
- strip
The C version of headrevs is not run if there is any revision to filter. It'll
need a proper rewrite later to restore performance.