This completes our rename of internal revlog methods to
distinguish between low-level raw revlog data "segments" and
higher-level, per-revision "chunks."
perf.py has been updated to consult both names so it will work
against older Mercurial versions.
To prepare for renaming revlog._chunkraw, we stuff a reference to this
metho in a local variable. This does 2 things. First, it moves the
attribute lookup outside of a loop, which more accurately measures
the time of the code being invoked. Second, it allows us to alias
to different methods depending on their presence (perf.py needs to
support running against old Mercurial versions).
Removing an attribute lookup from a tigh loop appears to shift the
numbers slightly with mozilla-central:
$ hg perfrevlogchunks -c
! read
! wall 0.354789 comb 0.340000 user 0.330000 sys 0.010000 (best of 28)
! wall 0.335932 comb 0.330000 user 0.290000 sys 0.040000 (best of 30)
! read w/ reused fd
! wall 0.342326 comb 0.340000 user 0.320000 sys 0.020000 (best of 29)
! wall 0.332857 comb 0.340000 user 0.290000 sys 0.050000 (best of 30)
! read batch
! wall 0.023623 comb 0.020000 user 0.000000 sys 0.020000 (best of 124)
! wall 0.023666 comb 0.020000 user 0.000000 sys 0.020000 (best of 125)
! read batch w/ reused fd
! wall 0.023828 comb 0.020000 user 0.000000 sys 0.020000 (best of 124)
! wall 0.023556 comb 0.020000 user 0.000000 sys 0.020000 (best of 126)
Currently, "chunk" is overloaded in revlog terminology to mean
multiple things. One of them refers to a segment of raw data from
the revlog. This commit renames various methods only used within
revlog.py to have "segment" in their name instead of "chunk."
While I was here, I also made the names more descriptive. e.g.
"_loadchunk()" becomes "_readsegment()" because it actually does
I/O.
In the future, chg may prefill repo's dirstate filecache so it's valuable
and should be kept. Previously we drop both filecache and property cache for
dirstate during fsmonitor reposetup, this patch changes it to only drop
property cache but keep the filecache.
Previously, the "startrev" argument would be ignored due to
"startrev = 0" in the benchmark function. This meant that
`hg perfrevlog` always started at revision 0.
Rename the local variable to "beginrev" so the variable does the
right thing.
The current function ('writebundle') is focussing on getting an existing
changegroup to disk. It is no easy ways to includes more part in the generated
bundle2. So we introduce a slightly higher level function that is fed the
'outgoing' object (that defines the bundled spec) and the bundlespec parameters
(to control the changegroup generation and inclusion of other parts).
This is creating the third logic dedicated to create a consistent bundle2 (the
other 2 are the push code and the getbundle code). We should probably reconcile
them at some points but they all takes different types of input. So we need to
introduce an intermediate "object" that each different input could be converted
to. Such unified "bundle2 specification" could be fed to some unified code.
We start by having the `hg bundle` related code on its own to helps defines its
specific needs first. Once the common and specific parts of each logic will be
known we can start unification.
In theory, it should be enough to pay attention only to the modification time
when detecting if a snapshotted working directory file changed. In practice,
BeyondCompare preserves all file attributes when syncing files at the directory
level. (If you open the file and sync individual hunks, then mtime does change,
and everything was being copied back as desired.) I'm not sure how many other
synchronization tools would trigger this issue, but it's annoyingly inconsistent
(if a single file is diffed, it isn't snapshotted, so the same BeyondCompare
file sync operation _is_ visible, because wdir() is updated in place.
I filed a bug with them, and they stated it is on their wish list, but won't be
fixed in the near term. This isn't a complete fix (there is still the case of
the size not changing), but this seems like a trivial enough change to fix most
of the problem. I suppose we could fool around with making files in the other
snapshot readonly, and copy back if we see the readonly bit copied. That seems
pretty hacky though, and only works if the external tool copies all attributes.
Now that output can be conditionalized, the few `chmod +x` specific outputs can
be conditionalized, and the rest of the tests run as normal. Disable one test
that is failing on Windows for now.
I'm going to restructure cext/pure modules and get rid of our hgimporter
hack. C extension modules will be moved to cext/ directory so old and new
compiled modules can coexist in development tree. This is necessary to
run 'hg bisect' without recompiling.
New extension modules will be loaded by an importer function:
base85 = policy.importmod('base85') # select pure.base85 or cext.base85
This will also allow us to split cffi from pure modules, which is currently
difficult because pure modules can't be imported by name.
See the previous commit for why. Marked as API change since osutil.listdir()
seems widely used in third-party extensions.
The win32mbcs extension is updated to wrap both util. and windows. aliases.
I'm going to replace hgimporter with a simpler import function, so we can
access to pure/cext modules by name:
# util.py
base85 = policy.importmod('base85') # select pure.base85 or cext.base85
# cffi/base85.py
from ..pure.base85 import * # may re-export pure.base85 functions
This means we'll have to use policy.importmod() function in place of the
standard import statement, but we wouldn't want to write it every place where
C extension modules are used. So this patch makes util host base85 functions.
The old tree stripping logic iterated over every tree revlog in the repo looking
for commits that had revs to be stripped. That's very inefficient in large
repos. Instead, let's look at what files are touched by the strip and only
inspect those revlogs.
I don't have actual perf numbers, since internally we don't use a true
treemanifest, but simply iterating over hundreds of thousands of revlogs takes
many, many seconds, so this should help tremendously when stripping only a few
commits.
The keys are passed here and there as unicodes and our transformer make things
bytes. Due to that, mq was not poped and this results in error on Py3.
Here we abuse r'' to make that str on Python 3.
When diffing binary contents, with certain configs, we can show
"Binary file <name> has changed" without actual content.
That allows a fast path where we could avoid providing actual binary
contents. Note: in that case we still need to test if two contents are the
same, that's done by using "filectx.cmp", which could have its own fast
path.
This seems to be more correct given the table drawn in the previous patch.
Namely, "losedatafn" and "opts.git" are removed, "not opts.text" is added.
- losedatafn: diff output (binary) should not be affected by "losedatafn"
- opts.git: binary testing is helpful for detecting a fast path in the
next path. the fast path can also be used if opts.git is False
- opts.text: if it's set, we should treat the content as non-binary
The end goal is to avoid calling fctx.data() when unnecessary. For example,
if diff.nobinary=1 and files are binary, the expected behavior is to print
"Binary file has changed". That could avoid reading fctx.data() sometimes.
This is mainly to enable an external LFS extension to skip expensive binary
file loading sometimes (read: most of the time with diff.nobinary=1 and
diff.text=0), without any behavior changes to mercurial (i.e. whether a file
is LFS or not does not change any behavior, LFS could be 100% transparent to
users).
We already have the logic for restricting directory walks in
match.visitdir() that we use for treemanifests. We should take
advantage of it when walking the working copy as well.
This speeds up "hg st -I rootfilesin:." on the Firefox repo from
0.587s to 0.305s on warm disk (and much more on cold disk). More time
is spent reading the dirstate than walking the working copy after.
I tried to find scenarios where calling match.visitdir() would be a
noticeable overhead, but I couldn't find any. I encourage the reader
to try for themselves, since this is performance-critical code.
Because _rootsanddirs() returns a list of directories to visit
recursively and a list of directories to visit non-recursively. For
patterns such as 'rootfilesin:foo/bar', we clearly need to visit the
directory foo/bar, but we also need to visit its parents. The method
therefore uses util.dirs() to find the parent directories of
'foo/bar'. That method does not include the root directory, but since
we obviously need to visit the root directory, we always added '.' to
the set of directories to visit non-recursively.
The visitdir() method had special handling to consider set(['.']) to
mean that no includes had been specified and would thus visit all
directories. However, when the pattern is 'rootfilesin:.', set(['.'])
is actually the real set of directories to visit and the special
handling of that set meant that all directories got visited instead of
just the root directory.
The fix is simple: add '.' to the set of parent directories in
_rootsanddirs() and stop treating set(['.']) specially. This makes
hg files -r . -I rootfilesin:.
in a treemanifest version of the Firefox repo go from 1.5s to 0.26s on
warm disk (and a *much* bigger improvement on cold disk).
Note that the -I is necessary for no good reason. We just haven't
optimized visitdir() for regular (non-include, non-exclude) patterns
yet.
The update statement does not depend on anything in the loop, so just
move it before the loop and do it once. There are no cases where
update would happen 0 times before (and 1 now); the function returns
early in all such cases.
The useful pudb debugger can be used with Mercurial, but its import of urwid
fails when demandimport is enabled. Add urwid.command_map to the ignore list so
pudb can be used with hg without disabling all of demandimport.
outgoing has been using an unfiltered repo since 07f64d64baf7 (discovery:
outgoing pass unfiltered repo to findcommonincoming (issue3776),
2013-01-28). If I'm reading code and history correctly, it should be
safe to run _outgoing() on a filtered repo since daf83ddd4afd
(discovery: run discovery on filtered repository, 2015-01-07). By
running _outgoing() on a filtered repo, we can also remove the
workaround there for ignoring filtered revisions.