The manifest.manifest class has a _treeinmem member than one can
manually set to True to test that the treemanifest class works as a
drop-in replacement for manifestdict (which is mostly a requirement
for treemanifest repos to work). However, it doesn't quite work at the
moment. These tests fail:
test-largefiles-misc.t
test-rebase-newancestor.t
test-subrepo.t
test-subrepo-deep-nested-change.t
test-subrepo-recursion.t
All but test-rebase-newancestor.t fail because they trigger calls to
subdirmatcher.visitdir(), which tries to access a _excluderoots field
that does not exist on the subdirmatcher. Let's fix that by overriding
visitdir() in a similar way to how matchfn is overridden, i.e. by
prepending the directory before calling the superclass method.
Before a4236180df5e (match: remove unnecessary optimization where
visitdir() returns 'all', 2015-05-06), match.visitdir() used to return
the special value 'all' to indicate that it was known that all
subdirectories would also be included in the match. The purpose for
that value was to avoid calling the matcher on all the paths. It
turned out that calling the matcher was not a problem, so the special
return value was removed and the code was simplified. However, if we
use the same special value for not just avoiding calling the matcher
on each file, but to avoid iterating over each file, it's a much
bigger win. On commands like
hg st --rev .^ --rev . dom/
we run the matcher (dom/) on the two manifests, then diff the narrowed
manifest. If the size of the match is much larger than the size of the
diff, this is wasteful. In the above case, we would end up iterating
over the 15k-or-so files in dom/ for each of the manifests, only to
later discover that they are mostly the same. This means that runningt
the command above is usually slower than getting the status for the
entire repo, because that code avoids calling treemanifest.match() and
only calls treemanifest.diff(), which loads only what's needed for the
diff.
Let's fix this by reintroducing the 'all' value in match.visitdir()
and making treemanifest.match() return a lazy copy of the manifest
from dom/ and down (in the above case). This speeds up the above
command on the Firefox repo from 0.357s to 0.137s (best of 5). The
wider the match, the bigger the speedup.
This has a small, but measurable, effect on performance if a pattern
file is very large. In an artificial test with 200,000 lines of
pattern data, using re2 reduced read time by 200 milliseconds.
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.
For great justice.
The problem was that the former name and the new name are both normalized to the
case in dirstate, so matcher._files would be ['ABC.txt', 'ABC.txt'].
localrepo.commit() calls localrepo.status(), passing along the matcher. Inside
dirstate.status(), _walkexplicit() simply grabs matcher.files() and processes
those items. Since the old name isn't present, it is silently dropped. There's
a fundamental tension here, because the status command should also accept files
that don't match the filesystem, so we can't drop the normalization in status.
The problem originated in d70aa474bd84.
Unfortunately with this change, the case of the old file must still be specified
exactly, or the old file is again silently excluded. I went back to
d70aa474bd84^, and that had the same behavior, so we are no worse off. I'm open
to ideas from a matcher or dirstate expert on how to fix that half.
Since 862fc5f60fef, .hgignore is ignored on Windows because a pat may have
a drive letter, but pathutil.join is posixpath.join.
"z:\foo\bar/z:\foo\bar\.hgignore"
Instead, this patch uses os.path.join() and util.localpath() to process both
parts as file-system paths.
Maybe we can remove os.path.join() at dirstate._ignore because 'include:' is
resolved relative to repo root? It was introduced by 29ce78e5f35c.
When reading pattern files, we just call open(path), which is relative to the
current directory. Let's fix this by resolving the paths before attempting to
read the file.
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".
This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.
This patch was produced by running `2to3 -f except -w -n .`.
The files command supports naming directories to limit the output to children of
that directory, and it also supports -S to force recursion into a subrepo. But
previously, using -S and naming a subrepo caused nothing to be output. The
reason was narrowmatcher() strips the current subrepo path off of each file,
which would leave an empty list if only the subrepo was named.
When matching on workingctx, dirstate.matches() would see the submatcher is not
always(), so it returned the list of files in dmap for each file in the matcher-
namely, an empty list. If a directory in a subrepo was named, the output was as
expected, so this was inconsistent.
The 'not anypats()' check is enforced by an existing test around line 140:
$ hg remove -I 're:.*.txt' sub1
Without the check, this removed all of the files in the subrepo.
This class is only needed on case insensitive filesystems, and only
for wdir context matches. It allows the user to not match the case of
the items in the filesystem- especially for naming directories, which
dirstate doesn't handle[1]. Making dirstate handle mismatched
directory cases is too expensive[2].
Since dirstate doesn't apply to committed csets, this is only created by
overriding basectx.match() in workingctx, and only on icasefs. The default
arguments have been dropped, because the ctx must be passed to the matcher in
order to function.
For operations that can apply to both wdir and some other context, this ends up
normalizing the filename to the case as it exists in the filesystem, and using
that case for the lookup in the other context. See the diff example in the
test.
Previously, given a directory with an inexact case:
- add worked as expected
- diff, forget and status would silently ignore the request
- files would exit with 1
- commit, revert and remove would fail (even when the commands leading up to
them worked):
$ hg ci -m "AbCDef" capsdir1/capsdir
abort: CapsDir1/CapsDir: no match under directory!
$ hg revert -r '.^' capsdir1/capsdir
capsdir1\capsdir: no such file in rev 64dae27060b7
$ hg remove capsdir1/capsdir
not removing capsdir1\capsdir: no tracked files
[1]
Globs are normalized, so that the -I and -X don't need to be specified with a
case match. Without that, the second last remove (with -X) removes the files,
leaving nothing for the last remove. However, specifying the files as
'glob:**.Txt' does not work. Perhaps this requires 're.IGNORECASE'?
There are only a handful of places that create matchers directly, instead of
being routed through the context.match() method. Some may benefit from changing
over to using ctx.match() as a factory function:
revset.checkstatus()
revset.contains()
revset.filelog()
revset._matchfiles()
localrepository._loadfilter()
ignore.ignore()
fileset.subrepo()
filemerge._picktool()
overrides.addlargefiles()
lfcommands.lfconvert()
kwtemplate.__init__()
eolfile.__init__()
eolfile.checkrev()
acl.buildmatch()
Currently, a toplevel subrepo can be named with an inexact case. However, the
path auditor gets in the way of naming _anything_ in the subrepo if the top
level case doesn't match. That is trickier to handle, because there's the user
provided case, the case in the filesystem, and the case stored in .hgsub. This
can be fixed next cycle.
--- a/tests/test-subrepo-deep-nested-change.t
+++ b/tests/test-subrepo-deep-nested-change.t
@@ -170,8 +170,15 @@
R sub1/sub2/test.txt
$ hg update -Cq
$ touch sub1/sub2/folder/bar
+#if icasefs
+ $ hg addremove Sub1/sub2
+ abort: path 'Sub1\sub2' is inside nested repo 'Sub1'
+ [255]
+ $ hg -q addremove sub1/sub2
+#else
$ hg addremove sub1/sub2
adding sub1/sub2/folder/bar (glob)
+#endif
$ hg status -S
A sub1/sub2/folder/bar
? foo/bar/abc
The narrowmatcher class may need to be tweaked when that is fixed.
[1] http://www.selenic.com/pipermail/mercurial-devel/2015-April/068183.html
[2] http://www.selenic.com/pipermail/mercurial-devel/2015-April/068191.html
The matches function was previously traversing all submanifests to look for
matching files, even though it was possible to know if a submanifest won't
contain any matches.
This change adds a visitdir function on the match object to decide quickly if
a directory should be visited when traversing. The function also decides if
_all_ subdirectories should be traversed.
Adding this logic as methods on the match object also makes the logic
modifiable by extensions, such as largefiles.
An example of a command this speeds up is running
hg status --rev .^ python/
on the Mozilla repo with the treemanifest experiment enabled.
It goes from 2.03s to 1.85s.
More improvements to speed from this change will happen when treemanifests are
lazily loaded. Because a flat manifest is still loaded and then converted
into treemanifests, speed improvements are limited.
This change has no negative effect on speed. For a worst-case example, this
command is not negatively impacted:
hg status --rev .^ 'relglob:*.js'
on the Mozilla repo. It goes from 2.83s to 2.82s.
Several commands take the fast path when match.always() is
true. However, when the user passes "." on the command line, that
results in a matcher for which match.always() == False. Let's make it
so such matchers return True, and have an empty list of .files(). This
makes e.g. "hg log ." as fast as "hg log" and "hg revert ." as fast as
"hg revert --all" (when run from repo root).
This makes _includeroots more like _fileroots and gives visitdir() a
nice symmetry in the two.
I'm hoping to later combine the two (_fileroots and _includeroots),
and having them treated similarly should make that step easier to
follow.
It seems unlikely that the optimization to avoid calling util.finddirs
twice will be noticeable, so let's drop it. This makes the two
conditions for includes and regular patterns more similar.
The repo root is nothing special when it comes to what directories to
visit: patterns like '-X relglob:*.py' should not exclude the top
directory, while '-X path:.' should (pointless as such a pattern may
be). The explicit removal of '.' from the set of excluded roots was
probably there to avoid removing the the root directory when any
patterns had been given, but since a619025a07b5 (treemanifest: visit
directory 'foo' when given e.g. '-X foo/ba?', 2015-05-27), we only
exclude directories that should be completely excluded, so we no
longer need to special-case the root directory.
The full path is propagated to the original match object since this is often
used directly for printing a file name to the user. This is cleaner than
requiring each caller to join the prefix with the file name prior to calling it,
and will lead to not having to pass the prefix around separately. It is also
consistent with the bad() and abs() methods in terms of the required input. The
uipath() method now inherits this path building property.
There is no visible change in path style for rel() because it ultimately calls
util.pathto(), which returns an os.sep based path. (The previous os.path.join()
was violating the documented usage of util.pathto(), that its third parameter be
'/' separated.) The doctest needed to be normalized to '/' separators to avoid
test differences on Windows, now that a full path is returned for a short
filename.
The test changes are to drop globs that are no longer necessary when printing an
absolute file in a subrepo, as returned from match.uipath(). Previously when
os.path.join() was used to add the prefix, the absolute path to a file in a
subrepo was printed with a mix of '/' and '\'. The absolute path for a file not
in a subrepo, as returned from match.uipath(), is still purely '/' based.
This is a utility to make it easier for subrepos to convert a file name to the
full path rooted at the top repository. It can replace the various path joining
lambdas, and doesn't require the prefix to be passed into the method that wishes
to build such a path.
The name is derived from the following pattern in annotate() and other places:
name = ((pats and rel) or abs)
The pathname separator is not os.sep in part to avoid confusion with variables
named 'abs' or similar that _are_ '/' separated, and in part because some
methods like cmdutils.forget() and maybe cmdutils.add() need to build a '/'
separated path to the root repo. This can replace the manual path building
there.
Various bit of code replace the bad method on matchers, and then delegate to the
original bad method after doing some custom processing. At least some of these
forget to restore the original method when the need has passed, and then when
the matcher is passed to the next subrepo (even a sibling), another layer is
added such that the chain looks like:
bad2 -> bad1 -> original
At best, this is a waste of processing, but sometimes spurious messages can be
emitted (e.g. 9f1524f400d9).
The trick with this copy of the matcher is to make sure it is *not* passed to
any subrepo- the original must be passed instead.
For globs like 'foo/ba?', match._roots() will return 'foo'. Since
visitdir(), excludes directories in the excluded roots, it would skip
the entire foo directory. This is incorrect, since 'foo/ba?' doesn't
mean that everything in foo/ should be exluded. Note that visitdir()
is called only from the treemanifest class, so this only affects tree
manifests. Fix by adding roots to the set of excluded roots only if
there are no excluded patterns.
Since 'glob' is the default pattern type for globs, we also need to
update some -X patterns in the tests to be of 'path' type to take
advantage of the visitdir tricks. For consistency, also update the -I
patterns.
It seems a little unfortunate that 'foo' in 'hg files -X foo' is
considered a pattern because of the implied 'glob' type, but improving
that is left for another day.
Windows was previously getting this test failure:
--- e:/Projects/hg/tests/test-hgignore.t
+++ e:/Projects/hg/tests/test-hgignore.t.err
@@ -230,6 +230,7 @@
$ hg status
? dir1/file2
+ ? dir1/subdir/subfile3
? dir1/subdir/subfile4
? dir2/file1
@@ -241,4 +242,4 @@
$ echo "glob:file*2" > dir1/.hgignoretwo
$ hg status | grep file2
- [1]
+ ? dir1/file2
The problem was 'source' would be in the form "F:\test-hgignore.t\.hgignore", so
when pathutil.dirname() split on '/', 'sourceroot' was empty. Therefore, 'path'
ended up being relative instead of absolute.
This adds a new rule syntax that allows the user to include a pattern file, but
only have those patterns match against files underneath the subdirectory of the
pattern file.
This is useful when you have nested projects in a repository and the inner
projects wants to set up ignore rules that won't affect other projects in the
repository. It is also useful in high commit rate repositories for removing the
root .hgignore as a point of contention.
There was a bug in my recent change to visitdir (cb39606d374e) due to
the stored generator being iterated over twice. Making the generator into a
list at the start fixes this.
A future patch will be allowing nested matchers. To support that, let's refactor
_buildmatch to build a list of matchers then return True if any match.
We were already doing that for filesets + regex patterns, but this way will be
more generic.
A future patch will make _buildmatch able to expand relative include patterns.
Doing so will require knowing the root of the repo, so let's go ahead and pass
it in.
tl;dr: This is another step towards a (previously unstated) goal of
eliminating match.files() in conditions.
There are four types of matchers:
* always: Matches everything, checked with always(), files() is empty
* exact: Matches exact set of files, checked with isexact(), files()
contains the files to match
* patterns: Matches more complex patterns, checked with anypats(),
files() contains roots of the matched patterns
* prefix: Matches simple 'path:' patterns as prefixes ('foo' matches
both 'foo' and 'foo/bar'), no single method to check, files()
contains the prefixes to match
For completeness, it would be nice to have a method for checking for
the "prefix" type of matcher as well, so let's add that, making it
return True simply when none of the others do.
The larger goal here is to eliminate uses of match.files() in
conditions (i.e. bool(match.files())). The reason for this is that
there are scenarios when you would like to create a "prefix" matcher
that happens to match no files. One example is for 'hg files -I foo
bar'. The narrowmatcher also restricts the set of files given and it
would not surprise me if have bugs caused by that already. Note that
'if m.files() and not m.anypats()' and similar is sometimes used to
catch the "exact" and "prefix" cases above.
match.visitdir() used to only look at the match's primary pattern roots to
decide if a treemanifest traverser should descend into a particular directory.
This change logically makes visitdir also consider the match's include and
exclude pattern roots (if applicable) to make this decision.
This is especially important for situations like using narrowhg with multiple
treemanifest revlogs.
Now that the matcher supports 'include:' rules, let's change the dirstate.ignore
creation to just create a matcher with a bunch of includes. This allows us to
completely delete ignore.py.
I moved some of the syntax documentation over to readpatternfile in match.py so
we don't lose it.
This allows the matcher to understand 'include:path/to/file' style rules. The
files support the standard hgignore syntax and any rules read from the file are
included in the matcher without regard to the files location in the repository
(i.e. if the included file is in somedir/otherdir, all of it's rules will still
apply to the entire repository).
Occasionally the matcher will want to print warning messages instead of throwing
exceptions (like if it encounters a bad syntax parameter when parsing files).
Let's add an optional warn argument that can provide this. The next patch will
actually use this argument.
Future patches will be adding the ability to recursively include pattern files
in a match rule expression. Part of that behavior will require tracking which
file each pattern came from so we can report errors correctly.
Let's add a 'source' arg to the kindpats list to track this. Initially it will
only be populated by listfile rules.
Match's visitdir() was prematurely optimized to return 'all' in some cases, so
that the caller would not have to call it for directories within the current
directory. This change makes the visitdir system less flexible for future
changes, such as making visitdir consider the match's include and exclude
patterns.
As a demonstration of this optimization not actually improving performance,
I ran 'hg files -r . media' on the Mozilla repository, stored as treemanifest
revlogs.
With best of ten tries, the command took 1.07s both with and without the
optimization, even though the optimization reduced the calls from visitdir()
from 987 to 51.
This will work for any command that creates its matcher via scmutil.match(), but
only the files command is tested here (both workingctx and basectx based tests).
The previous behavior was to completely ignore the files in the subrepo, even
though -S was given.
My first attempt was to teach context.walk() to optionally recurse, but once
that was in place and the complete file list was built up, the predicate test
would fail with 'path in nested repo' when a file in a subrepo was accessed
through the parent context.
There are two slightly surprising behaviors with this functionality. First, any
path provided inside the fileset isn't narrowed when it is passed to the
subrepo. I dont see any clean way to do that in the matcher. Fortunately, the
'subrepo()' fileset is the only one to take a path.
The second surprise is that status predicates are resolved against the subrepo,
not the parent like 'hg status -S' is. I don't see any way to fix that either,
given the path auditor error mentioned above.
match.ispartial() will return the opposite of match.always() in core, but this
function will be extensible by extensions to produce another result even
if match.always() will be untouched.
This will be useful for narrowhg, where ispartial() will return False even if
the match won't always match. This would happen in the case where the only
time the match function is False is when the path is outside of the narrow
spec.