This was added by cb39606d374e, but core matchers support visitdir() of
arbitrary locations since a4236180df5e, and verifier._verifymanifest()
doesn't seem to strictly obey the restriction.
I have no idea how important this API contract is for third-party extensions.
That's why this patch is RFC.
When we changed basematcher.visitdir() in 0ca205268beb (match: make
base matcher return True for visitdir, 2017-07-14), we forgot to add
an override in nevermatcher. This led to tests failing in narrowhg.
As Durham pointed out, it's high time to add unit tests for the
matcher, so this patch also adds a first unit test.
Differential Revision: https://phab.mercurial-scm.org/D151
This was only used by the sparse extension's dirstate._ignore
override, which no longer exists.
Differential Revision: https://phab.mercurial-scm.org/D60
If a matcher doesn't implement visitdir, we should be returning True so that
tree traversals are not prematurely pruned. The old value of False would prevent
tree traversals when using any matcher that didn't implement visitdir.
Differential Revision: https://phab.mercurial-scm.org/D83
unionmatcher is currently used where only a limited subset of its
functions will be called. Specifically, visitdir() is never
called. The next patch will pass it to dirstate.walk() where it will
matter that visitdir() is correctly implemented, so let's fix
that. Also add the explicitdir etc that will also be assumed by
dirstate.walk() to exist on a matcher.
Differential Revision: https://phab.mercurial-scm.org/D58
The forceincludematcher is simply a unionmatcher of a includematcher
(matching paths recursively) with the given matcher. Since the
forceincludematcher is only used by sparse, move it there.
I don't have a good sparse repo setup to test performance impact on.
Differential Revision: https://phab.mercurial-scm.org/D57
The matchers that were recently moved into core from the sparse
extension override __call__, while the previously existing matchers
override matchfn. Let's switch to the latter for consistency.
When I added prefix() in 559ee9ecae07 (match: introduce boolean
prefix() method, 2014-10-28), we already had always(), isexact(), and
anypats(), so it made sense to write it in terms of them (a prefix
matcher is one that isn't any of the other types). It's only now that
I realize that it's much more natural to define prefix() explicitly
(it's one that uses path: patterns, roughly speaking) and let
anypats() be defined in terms of the others. Remember that these
methods are all used for determining which fast paths are
possible. anypats() simply means that no fast paths are possible (it
could be called complex() instead). Further evidence is that
rootfilesin:some/dir does not have any patterns, but it's still
considered to be an anypats() matcher. That's because anypats() really
just means that it's not a prefix() matcher (and not always() and not
isexact()).
This patch thus changes prefix() to return False by default and
anypats() to return True only if the other three are False. Having
anypats() be True by default also seems like a good thing, because it
means forgetting to override it will lead only to performance bugs,
not correctness bugs.
Since the base class's implementation changes, we're also forced to
update the subclasses. That change exposed and fixed a bug in the
differencematcher: for example when both its two input matchers were
prefix matchers, we would say that the result was also a prefix
matcher, which is incorrect, because e.g "path:dir - path:dir/foo" no
longer matches everything under "dir" (which is what prefix() means).
The m.isexact() and m.prefix() methods are used by callers to
determine whether m.files() can be used for fast paths. It seems safe
to let callers to any fast paths it can that rely on the empty
m.files().
The regexes for path: and relpath: patterns are the same (since the
paths have already been normalized at the point we create the
regexes).
I don't think the "if pat == '.'" will have any effect relpath:
because relpath: patterns will have the root directory already
normalized to '' by pathutil.canonpath() (unlike path:, for which the
root gets normalized to '.' by util.normpath()).
The regexes are passed to re.match(), which matches against the
beginning of the input, so the '^' doesn't do anything.
Note that unrooted patterns, such as globs and regexes from .hgignore
are instead achieved by adding '.*' to the expression given by the
user. (That's unless the user's expression started with '^', in which
case the '.*' is not added, perhaps to keep the regex cleaner?)
The sparse extension contains some matcher types that are
generic and can exist in core.
As part of the move, the classes now inherit from basematcher.
always(), files(), and isexact() have been dropped because
they match the default implementations in basematcher.
The "patterns"/"include" in "patternspat"/"includepat" is redundant,
so drop it. Also a "_" prefix since it's "private".
Inline the "pm"/"im" variables.
match.match already interprets "!bool(patterns)" as matching
everything (but includes and excludes still apply). We might as well
allow None, which lets us simplify some callers a bit.
I originally wrote this patch while trying to change
match.match(patterns=[]) to mean to match no patterns. This patch is
one step towards that goal. I'm not sure it'll be worth the effort to
go all the way there, but I think this patch still makes sense on its
own.
Most of it does the same as its superclass, so it can simply be
removed. It also seems to make more sense for it to use relative
paths, as we do for everything except alwaysmatcher, although
nevermatcher.uipath() will probably never get called anyway, so it
won't matter.
c01965ab5195 introduced a deterministic `__repr__` for ignores. However, it
didn't account for when ignore was `util.never`. This broke fsmonitor's ignore
change detection -- with an empty hgignore, it would kick in all the time.
Introduce `nevermatcher` and switch to it. This neatly parallels
`alwaysmatcher`.
This moves the optimization for patterns that match everything to the
caller, so we can remove it from patternmatcher.
Note that we need to teach alwaysmatcher to use relative paths now in
cases like "hg files .." from inside mercurial/, because while it
still matches everything, paths should be printed relative to the
working directory.
By passing in the result of the normalize() call, we prepare for
moving the special handling of patterns that always match out of the
patternmatcher.
It also lets us remove many of the arguments from the matcher, because
they were passed only the the normalize function (we could have
removed the arguments by binding them to the function instead of
moving the normalize() call out).
Since the caller now deals with empty pattern lists, we can drop that
support in the patternmatcher. It now gets the more logical behavior
of matching nothing when no patterns are given (although there is no
in-core caller that will pass no patterns).
In patternmatcher, we used to say that all directories should be
visited if no explicit files were listed, because the case of empty
_files usually implied that no patterns were given (which in turns
meant that everything should match). However, this made e.g. "hg files
-r . rootfilesin:." slower than necessary, because that also ended
up with an empty list in _files. Now that patternmatcher does not
handle includes, the only remaining case where its _files/_fileset
fields will be empty is when it's matching everything. We can
therefore treat the always-case specially and stop treating the empty
_files case specially. This makes the case mentioned above faster on
treemanifest repos.
Having a special matcher that always matches seems to make more sense
than making one of the other matchers handle the case. For now, we
just use this new matcher when no patterns were provided.
Should at least be useful for debugging. Would matter for correctness
too if fsmonitor or Facebook's sparse extension worked with subrepos
(which I don't know if they do).
The includematcher will always get at least one include pattern and
will never get any non-include patterns, so we can remove most of the
code in it. This patch does mostly straight-forward deletions of
code. We will clean up further later.
At this point the includematcher is an exact copy of the main matcher
class. We will specialize and simplify both classes in the following
patches. This initial unmodified copy is just to make the differences
clearer. We also rename the main matcher to "patternmatcher" for
consistency.
I may eventually merge this new includematcher back into the main
matcher, but I think doing it this way makes the intermediate steps
clearer regardless.
Exact matching is now handled by the exactmatcher class.
We can safely remove _files from the __repr__() implementation,
because even though the field is set, the patternspat field is enough
for the representation to be unambiguous (which was not the case when
the matcher could handle exact matches).
Even though most matchers will always want to use the relative path in
uipath(), when we add support for intersecting matcher, we will want
to control which form to use for any kind of matcher without knowing
the type (see next patch), so we need the implementation on the base
class.
Also rename the attribute from "pathrestricted" to "relativeuipath"
since there actually are cases where we match everything but still use
relative paths (like when the user runs "hg files .." from inside
mercurial/).
As I've said on earlier patches, I'm hoping to use more composition of
simpler matchers instead of the single complex matcher we currently
have. This extracts a first new matcher that composes two other
matchers. It matches if the first matcher matches but the second does
not. As such, we can use it for excludes, which this patch also
does. We'll remove the now-unncessary code for excludes in the next
patch.
I'm hoping to rewrite the matcher so excludes are handled by
composition of one matcher with another matcher where the second
matcher has only includes. For that to work, we need to make
visitdir() to return 'all' for directory 'foo' for a '-I foo' matcher.
This makes the subdirmatcher not depend on the main matcher, giving us
more freedom to modify that (specifically, it will lose it _always
field in a while).
We will soon start splitting up the current matcher class into more
specialized classes, so we'll want a base class for all the things
that don't vary much between different matchers.
Exact matchers are only created internally (as opposed to from user
input) based on a set of files that the caller collected before, so
they should always match the list exactly (i.e. case-sensitively).
fsmonitor and debugignore currently access matcher fields that I would
consider implementation details, namely patternspat, includepat, and
excludepat. Let' instead implement __repr__() and have the few users
use that instead.
Marked (API) because the fields can now be None.
match() will soon gain more logic and we don't want to duplicate that
in icasefsmatch(), so merge the two functions instead and use a flag
to get case-insensitive behavior.