Following on from Jun Wu's patch last October[1], we have found that using mmap
for the revlog index in repos with large revlogs gives a noticable performance
improvment (~110ms on each hg invocation), particularly for commands that don't
touch the index very much.
This changeset adds this as an option, activated by a new experimental config
option so that it can be enabled on a per-repo basis. The configuration option
specifies an index size threshold at which Mercurial will switch to using mmap
to access the index.
If the configuration option is not specified, the default remains to load the
full file, which seems to be the best option for smaller repos.
Some initial performance numbers for average of 5 invocations of `hg log -l 5`
for different cache states:
| Repo: | HG | FB |
|---|---|---|
| Index size: | 2.3MB | much bigger |
| read (warm): | 237ms | 432ms |
| mmap (warm): | 227ms | 321ms |
| | (-3%) | (-26%) |
| read (cold): | 397ms | 696ms |
| mmap (cold): | 410ms | 888ms |
| | (+3%) | (+28%) |
[1] https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-October/088737.html
Test Plan:
`hg log --config experimental.mmapindex=true`
Differential Revision: https://phab.mercurial-scm.org/D477
The recent e85296920485 patch removed the linkmapper argument from addgroup, as
part of trying to make addgroup more agnostic from the changegroup format. It
turns out that the changegroup can't resolve linkrevs while iterating over the
deltas, because applying the deltas might affect the linkrev resolution. For
example, when applying a series of changelog entries, the linkmapper just
returns len(cl). If we're iterating over the deltas without applying them to the
changelog, this results in incorrect linkrevs. This was caught by the hgsql
extension, which reads the revisions before applying them.
The fix is to return linknodes as part of the delta iterator, and let the
consumer choose what to do.
Differential Revision: https://phab.mercurial-scm.org/D730
It can be useful for extensions to be able to produce the shortest
unambiguous hash (including the in-tree "show" extension). That logic
is currently inside the shortest() template function. Let's move it
out of the templater. I've put it on revlog since it's closely related
to revlog._partialmatch. We may also want a convenience method on
context, but I'll leave that for a later patch.
Differential Revision: https://phab.mercurial-scm.org/D724
Previously revlog.addgroup would accept a changegroup and a linkmapper and use
it to iterate of the deltas. As part of untangling the revlog-changegroup
interdependency, let's move the changegroup delta iteration logic to it's own
function and pass the simple iterator to the revlog instead.
This will make it easier to introduce non-revlogs stores in the future, without
reinventing any changegroup specific logic.
Differential Revision: https://phab.mercurial-scm.org/D688
Previously the addgroup loop would set chain to be the result of
self._addrevision(node,...). Since _addrevision now always returns the passed in
node, we can drop that behavior and just always set chain = node in the loop.
This will be useful in a future patch where we refactor the cg.deltachunk logic
to another function and therefore chain disappears entirely from this function.
Differential Revision: https://phab.mercurial-scm.org/D699
My repo got corrupted yesterday by something that ended up writing the
null revision to the revlog (nullid hash, not nullrev index, of
course). We use many extensions internally (narrowhg, remotefilelog,
evolve, internal extensions) and treemanifests are on. The null
revision was written to the changelog, the root manifest log, and one
subdirectory manifest log. I have no idea exactly why the null
revision was written, but it seems cheap enough to check that we
should fail instead of corrupting the repo.
Differential Revision: https://phab.mercurial-scm.org/D522
The general delta heuristic to select a delta do not scale with the number of
branch. The delta base is frequently too far away to be able to reuse a chain
according to the "distance" criteria. This leads to insertion of larger delta (or
even full text) that themselves push the bases for the next delta further away
leading to more large deltas and full texts. This full text and frequent
recomputation throw Mercurial performance in disarray.
For example of a slightly large repository
280 000 files (2 150 000 versions)
430 000 changesets (10 000 topological heads)
Number below compares repository with and without the distance criteria:
manifest size:
with: 21.4 GB
without: 0.3 GB
store size:
with: 28.7 GB
without 7.4 GB
bundle last 15 00 revisions:
with: 800 seconds
971 MB
without: 50 seconds
73 MB
unbundle time (of the last 15K revisions):
with: 1150 seconds (~19 minutes)
without: 35 seconds
Similar issues has been observed in other repositories.
Adding a new option or "feature" on stable is uncommon. However, given that this
issues is making Mercurial practically unusable, I'm exceptionally targeting
this patch for stable.
What is actually needed is a full rework of the delta building and reading
logic. However, that will be a longer process and churn not suitable for stable.
In the meantime, we introduces a quick and dirty mitigation of this in the
'experimental' config space. The new option introduces a way to set the maximum
amount of memory usable to store a diff in memory. This extend the ability for
Mercurial to create chains without removing all safe guard regarding memory
access. The option should be phased out when core has a more proper solution
available.
Setting the limit to '0' remove all limits, setting it to '-1' use the default
limit (textsize x 4).
I've seen revlog._deltachain() appear in a number of performance
profiles. I suspect there are 2 reasons for this:
1. Delta chain resolution performs many index lookups, thus triggering
population of index tuples. Creating possibly tens of thousands of
PyObject will have overhead.
2. Delta chain resolution is a tight loop.
By moving delta chain resolution to C, we can defer instantiation
of full index entry tuples and make the loop faster courtesy of
not running in Python.
We can measure the impact to delta chain resolution via
`hg perflogrevision` using the mozilla-central repo with a recent
manifest having delta chain length of 33726:
$ hg perfrevlogrevision -m 364895
! full
! wall 0.367585 comb 0.370000 user 0.340000 sys 0.030000 (best of 27)
! wall 0.357581 comb 0.360000 user 0.350000 sys 0.010000 (best of 28)
! deltachain
! wall 0.010644 comb 0.010000 user 0.010000 sys 0.000000 (best of 270)
! wall 0.000292 comb 0.000000 user 0.000000 sys 0.000000 (best of 8729)
$ hg perfrevlogrevision --cache -m 364895
! deltachain
! wall 0.003904 comb 0.000000 user 0.000000 sys 0.000000 (best of 712)
! wall 0.000284 comb 0.000000 user 0.000000 sys 0.000000 (best of 9926)
The first test measures savings from both not instantiating index
entries and moving to C. The second test (which doesn't clear the
index caches) essentially isolates the benefits of moving from Python
to C. It still shows a 13.7x speedup (versus 36.4x). And there are
multiple milliseconds of savings within the critical path for resolving
revision data. I think that justifies the existence of C code.
A more striking example of the benefits of this change can be
demonstrated by timing `hg debugdeltachain -m` for the mozilla-central
repo:
$ time hg debugdeltachain -m > /dev/null
before: 1057.4s
after: 503.3s
PyPy2.7 5.8.0: 220.0s
It's worth noting that the C code isn't as optimal as it could be.
We're still instantiating a new PyObject for every revision. A future
optimization would be to reuse the PyObject on the cached index tuple.
We could potentially also get wins by using a memory array of raw
integers. There is also room for a delta chain cache on revlog
instances. Of course, the best optimization is to implement revlog
reading outside of Python so Python doesn't need to be concerned
about the relatively expensive index entries and operations on them.
It seems like the reason for "content" is that the variable contains
the nodes that the changegroup "contains", see c2901cddb53f (revlog:
make addgroup returns a list of node contained in the added source,
2012-01-13), but "nodes" seems much clearer.
The comment seems to refer to code that was deleted in a46638c640d8
(revlog.addgroup(): always use _addrevision() to add new revlog
entries, 2010-10-08).
There are a number of improvements we want to make to revlogs
that will require a new version - version 2. It is unclear what the
full set of improvements will be or when we'll be done with them.
What I do know is that the process will likely take longer than a
single release, will require input from various stakeholders to
evaluate changes, and will have many contentious debates and
bikeshedding.
It is unrealistic to develop revlog version 2 up front: there
are just too many uncertainties that we won't know until things
are implemented and experiments are run. Some changes will also
be invasive and prone to bit rot, so sitting on dozens of patches
is not practical.
This commit introduces skeleton support for version 2 revlogs in
a way that is flexible and not bound by backwards compatibility
concerns.
An experimental repo requirement for denoting revlog v2 has been
added. The requirement string has a sub-version component to it.
This will allow us to declare multiple requirements in the course
of developing revlog v2. Whenever we change the in-development
revlog v2 format, we can tweak the string, creating a new
requirement and locking out old clients. This will allow us to
make as many backwards incompatible changes and experiments to
revlog v2 as we want. In other words, we can land code and make
meaningful progress towards revlog v2 while still maintaining
extreme format flexibility up until the point we freeze the
format and remove the experimental labels.
To enable the new repo requirement, you must supply an experimental
and undocumented config option. But not just any boolean flag
will do: you need to explicitly use a value that no sane person
should ever type. This is an additional guard against enabling
revlog v2 on an installation it shouldn't be enabled on. The
specific scenario I'm trying to prevent is say a user with a
4.4 client with a frozen format enabling the option but then
downgrading to 4.3 and accidentally creating repos with an
outdated and unsupported repo format. Requiring a "challenge"
string should prevent this.
Because the format is not yet finalized and I don't want to take
any chances, revlog v2's version is currently 0xDEAD. I figure
squatting on a value we're likely never to use as an actual revlog
version to mean "internal testing only" is acceptable. And
"dead" is easily recognized as something meaningful.
There is a bunch of cleanup that is needed before work on revlog
v2 begins in earnest. I plan on doing that work once this patch
is accepted and we're comfortable with the idea of starting down
this path.
The idea is simple. If the given node id prefix is 'ff...f', add +1 to the
number of matches (e.g. ambiguous if partial + maybewdir > 1).
This patch also fixes id() revset and shortest() template since _partialmatch()
can raise WdirUnsupported exception.
This will allow us to map repo["ff..."] to workingctx. _partialmatch() will
be updated later. I tried "return wdirrev" in place of raising the exception,
but earlier exception seemed better.
When we try to run, 'hg debugrevspec 'branch(wdir())'', it throws an index error
and blows up. Lets raise the WdirUnsupported if wdir() is passed so that we can
catch that later.
revlog.parentrevs() is called while evaluating ^ operator in revsets. When wdir
is passed, it raises IndexError. This patch raises WdirUnsupported if wdir is
passed in the function. The error will be caugth in future patches.
RevlogNG is not such a good name when it is no longer the
newest revlog version. Since we'll soon have revlog version 2,
let's remove some references to it.
First, the logic around the if..elif..elif was subtly wrong
and sub-optimal because all branches would be tested as long as
the revlog was valid. This patch changes things so it behaves like
a switch statement over the revlog version.
While I was here, I also tweaked error strings to make them
consistent and to read better.
Feature flag constants don't need "NG" in the name because they will
presumably apply to non-"NG" version revlogs.
All feature flag constants should also share a similar naming
convention to identify them as such.
And, "RevlogNG" isn't a great internal name since it isn't obvious it
maps to version 1 revlogs. Plus, "NG" (next generation) is only a good
name as long as it is the latest version. Since we're talking about
version 2, now is as good a time as any to move on from that naming.
This completes our rename of internal revlog methods to
distinguish between low-level raw revlog data "segments" and
higher-level, per-revision "chunks."
perf.py has been updated to consult both names so it will work
against older Mercurial versions.
Currently, "chunk" is overloaded in revlog terminology to mean
multiple things. One of them refers to a segment of raw data from
the revlog. This commit renames various methods only used within
revlog.py to have "segment" in their name instead of "chunk."
While I was here, I also made the names more descriptive. e.g.
"_loadchunk()" becomes "_readsegment()" because it actually does
I/O.
Previously, revlog.size equals to revlog.rawsize. However, the flag
processor framework could make a difference - "size" could mean the length
of len(revision(raw=False)), while "rawsize" means len(revision(raw=True)).
This patch makes it so.
This corrects "hg status" output when flag processor is involved. The call
stack looks like:
basectx.status -> workingctx._buildstatus -> workingctx._dirstatestatus
-> workingctx._checklookup -> filectx.cmp -> filelog.cmp -> filelog.size
-> revlog.size
Previously, revlog.revision(raw=False) may try to apply the delta chain
on _cache hit. That happens if flags are non-empty. This patch makes rawtext
reused so delta chain application is avoided.
"_cache" and "rev" are moved a bit to avoid unnecessary assignments.
When writing the revlog-ng index, the third field is len(rawtext). See
revlog._addrevision:
textlen = len(rawtext)
....
e = (offset_type(offset, flags), l, textlen,
base, link, p1r, p2r, node)
self.index.insert(-1, e)
Therefore, revlog.index[rev][2] returned by revlog.rawsize should be
len(rawtext), where "rawtext" is revlog.revision(raw=True).
Unfortunately it's hard to add a test for this code path because "if l >= 0"
catches most cases.
If cache hit and flags are empty, no flag processor runs and "text" equals
to "rawtext". So we check flags, and return rawtext.
This resolves performance issue introduced by a previous patch.
All 3 users of _addrevision use raw:
- addrevision: passing rawtext to _addrevision
- addgroup: passing rawtext and raw=True to _addrevision
- clone: passing rawtext to _addrevision
There is no real user using _addrevision(raw=False). On the other hand,
_addrevision is low-level code dealing with raw revlog deltas and rawtexts.
It should not transform rawtext to non-raw text.
This patch removes the "raw" parameter from "_addrevision", and does some
rename and doc change to make it clear that "_addrevision" expects rawtext.
Archeology shows 886a08012bbe added "raw" flag to "_addrevision", follow-ups
fe1e206cb389 and 1cfa6239c923 seem to make the flag unnecessary.
test-revlog-raw.py no longer complains.
See the added comment. revdiff is meant to output the raw delta that will be
written to revlog. It should use raw.
test-revlog-raw.py now shows "addgroupcopy test passed", but there is more
to fix.
Using external content provided by flagprocessor when building revlog delta
is wrong, because deltas are applied to raw contents in revlog.
This patch fixes the above issue by adding "raw=True".
test-revlog-raw.py now shows "local test passed", but there is more to fix.
As documented at revlog.__init__, revlog._cache stores raw text.
The current read and write usage of "_cache" in revlog.revision lacks of
raw=True check.
This patch fixes that by adding check about raw, and storing rawtext
explicitly in _cache.
Note: it may slow down cache hit code path when raw=False and flags=0. That
performance issue will be fixed in a later patch.
test-revlog-raw now points us to a new problem.