(This is not yet enabled; it will be turned on in a followup patch.)
The path encoding performed by fncache is complex and (perhaps
surprisingly) slow enough to negatively affect the overall performance
of Mercurial.
For a short path (< 120 bytes), the Python code can be reduced to a fairly
tractable state machine that either determines that nothing needs to be
done in a single pass, or performs the encoding in a second pass.
For longer paths, we avoid the more complicated hashed encoding scheme
for now, and fall back to Python.
Raw performance: I measured in a repo containing 150,000 files in its tip
manifest, with a median path name length of 57 bytes, and 95th percentile
of 96 bytes.
In this repo, the Python code takes 3.1 seconds to encode all path
names, while the hybrid C-and-Python code (called from Python) takes
0.21 seconds, for a speedup of about 14.
Across several other large repositories, I've measured the speedup from
the C code at between 26x and 40x.
For path names above 120 bytes where we must fall back to Python for
hashed encoding, the speedup is about 1.7x. Thus absolute performance
will depend strongly on the characteristics of a particular repository.
For a netbeans clone on Windows 7 x64:
Before:
$ hg perffncacheencode
! wall 3.516000 comb 3.525623 user 3.525623 sys 0.000000 (best of 3)
After:
$ hg perffncacheencode
! wall 3.443000 comb 3.447622 user 3.447622 sys 0.000000 (best of 3)
Not yet used (will be enabled in a later patch).
This patch is a stripped down version of patches originally created by
Bryan O'Sullivan <bryano@fb.com>
For a netbeans clone on Windows 7 x64:
Before:
$ hg perffncacheload
! wall 0.124000 comb 0.124801 user 0.124801 sys 0.000000 (best of 76)
After:
$ hg perffncacheload
! wall 0.096000 comb 0.093601 user 0.078001 sys 0.015600 (best of 97)
For a netbeans clone on Windows 7 x64:
Before:
$ hg perffncachewrite
! wall 0.210000 comb 0.218401 user 0.202801 sys 0.015600 (best of 47)
After:
$ hg perffncachewrite
! wall 0.104000 comb 0.109201 user 0.078000 sys 0.031200 (best of 95)
I don't think we will ever have anything in the store that resides inside a
directory that ends in .i or .d under store/ that we wouldn't want to have
direncoded. The files not under data/ surely don't need direncoding, but it
doesn't harm to let these few run through it. It hurts more to check whether the
thousands of other files start with 'data/'. They do anyway.
See also 67e6074ba430 (fixed with 0c522fe42894), which moved the direncoding
from filelog into store
hgweb has an incorrect padding calculation, causing the text to move further
away from the graph the more branches there are (issue3626). This patch fixes
all existing templates (gitweb, monoblue, paper and spartan).
Tests updated by Patrick Mezard <patrick@mezard.eu>
by refactoring
for i, n in enumerate(res):
if n:
<main code block>
to
for i, n in enumerate(res):
if not n:
continue
<main code block>
(no functional change)
Merely creating and using a generator has a measurable impact,
particularly since the common case for stream_out is generators that
yield just once. Avoiding generators improves stream_out performance
by about 7%.
Auditing at this stage is both pointless (paths are already trusted by
the local repo) and expensive. Skipping the audits improves stream_out
performance by about 15%.
New commit from the amend process were created without any phase contraint. If
the amended changeset had a different phase from it's parent, the phases data
were lost.
The changeset ensure the new commit are created in the same phase than the
original changeset.
Subversion 1.7 changes its XML output to include an explicit encoding tag:
<?xml version="1.0" encoding="UTF-8"?>
This triggers xml.dom.minidom to always return unicode strings, causing
other parts of the code to explode.
We unconditionally encode path names before handing them back, which
works with both str (actually a no-op) and unicode values.
JavaScript .replace always magically processed $$ $& $' $` in replacement
strings and thus displayed subject lines incorrectly in the graph view.
Instead of regexps and .replace we now just create the strings the right way in
the first place.
When we rewrite a bookmarked changeset, we want to update the
bookmark on its successors. But the successors are not descendants
of its precursor (by definition). This changeset alters the bookmarks
logic to update bookmark location if the newer location is a successor
of the old one[1].
note: valid destinations are in fact any kind of successors of any kind
of descendants (recursively.)
This changeset requires the enabling of the obsolete feature in
some bookmark tests.
We usually update bookmarks only if the new location is descendant of the old
bookmarks location. We extract this logic into a function. This is the first
step to allow more complex logic using obsolescence in this validation of the
bookmark movement.
A relevant obsolete marker may have been added -after- we previously
exchanged the changeset. We have to search for remote heads that
disappear by the sole fact of pushing obsolescence.
This case will also happen when remote got the new version from a
repository that does not propagate obsolescence markers.
Checkheads was more permissive than expected. When the remote heads
are public we don't need to search for successors. None will make a
public head disappear.