Commit Graph

203 Commits

Author SHA1 Message Date
Jun Wu
b43606ee98 ignore: add a way to disable hgignore
Summary:
This allows us to turn on and off hgignore support directly without changing
files in the working copy (which could be hard to revert cleanly).

Reviewed By: mjpieters

Differential Revision: D7544529

fbshipit-source-id: 14cc41e2ae361070f91bf3b8aa28dd5808e7fe99
2018-04-13 21:51:49 -07:00
Jun Wu
26b5601cf3 dirstate: respect gitignore
Summary:
Use the new gitignore matcher powered by Rust.

The hgignore matcher has some laziness, but is not tree-aware - with N
"hgignore" files loaded, it needs O(N) time to match.  The gitignore matcher
is tree-aware and backed by native code with decent time complexity.

We have been maintaining a translation script that collects all gitignores,
generate hgignore files with very long regexp for them. That script has
issues with sparse recently. This diff allows us to remove those generated
hgignore files from the repo.

Note: fsmonitor state does not contain ignored files. And ignore
invalidation is generally broken in fsmonitor (it only checks top-level
.hgignore). That means, once a file is ignored, it cannot be "unignored" by
just removing the matched pattern from ".gitignore". The file has to be
"touched" or so.

Reviewed By: markbt

Differential Revision: D7319608

fbshipit-source-id: 1763544aedb44676413efb6d14ffd3917ed3b1cd
2018-04-13 21:51:40 -07:00
Yuya Nishihara
259c2d3a80 match: remove doc about undefined behavior of visitdir()
This was added by cb39606d374e, but core matchers support visitdir() of
arbitrary locations since a4236180df5e, and verifier._verifymanifest()
doesn't seem to strictly obey the restriction.

I have no idea how important this API contract is for third-party extensions.
That's why this patch is RFC.
2017-11-30 22:32:13 +09:00
Augie Fackler
8001b8c677 match: remove superfluous pass statements 2017-09-30 07:44:45 -04:00
Yuya Nishihara
dcc07e5503 doctest: use print_function and convert bytes to unicode where needed 2017-09-03 14:56:31 +09:00
Yuya Nishihara
a71f259bd2 doctest: bulk-replace string literals with b'' for Python 3
Our code transformer can't rewrite string literals in docstrings, and I
don't want to make the transformer more complex.
2017-09-03 14:32:11 +09:00
Kostia Balytskyi
bc861d5ae7 match: expose some data and functionality to other modules
This patch makes sure that other modules can check whether patterns
are CWD-relative.
2017-08-02 15:48:57 -07:00
Martin von Zweigbergk
cf9a57caf9 match: override visitdir() in nevermatcher to return False
When we changed basematcher.visitdir() in 0ca205268beb (match: make
base matcher return True for visitdir, 2017-07-14), we forgot to add
an override in nevermatcher. This led to tests failing in narrowhg.

As Durham pointed out, it's high time to add unit tests for the
matcher, so this patch also adds a first unit test.

Differential Revision: https://phab.mercurial-scm.org/D151
2017-07-19 14:50:50 -07:00
Martin von Zweigbergk
bf770dc63d match: remove unused negatematcher
This was only used by the sparse extension's dirstate._ignore
override, which no longer exists.

Differential Revision: https://phab.mercurial-scm.org/D60
2017-07-11 10:46:55 -07:00
Durham Goode
d4313959ab match: make base matcher return True for visitdir
If a matcher doesn't implement visitdir, we should be returning True so that
tree traversals are not prematurely pruned. The old value of False would prevent
tree traversals when using any matcher that didn't implement visitdir.

Differential Revision: https://phab.mercurial-scm.org/D83
2017-07-14 10:57:36 -07:00
Martin von Zweigbergk
2ff08ff937 match: make unionmatcher a proper matcher
unionmatcher is currently used where only a limited subset of its
functions will be called. Specifically, visitdir() is never
called. The next patch will pass it to dirstate.walk() where it will
matter that visitdir() is correctly implemented, so let's fix
that. Also add the explicitdir etc that will also be assumed by
dirstate.walk() to exist on a matcher.

Differential Revision: https://phab.mercurial-scm.org/D58
2017-07-11 10:46:10 -07:00
Martin von Zweigbergk
3c2e121c63 match: write forceincludematcher using unionmatcher
The forceincludematcher is simply a unionmatcher of a includematcher
(matching paths recursively) with the given matcher. Since the
forceincludematcher is only used by sparse, move it there.

I don't have a good sparse repo setup to test performance impact on.

Differential Revision: https://phab.mercurial-scm.org/D57
2017-07-07 14:39:59 -07:00
Martin von Zweigbergk
209d8095b3 match: inverse _anypats(), making it _prefix() 2017-07-11 09:42:32 -07:00
Martin von Zweigbergk
0b6160ac83 match: override matchfn instead of __call__ for consistency
The matchers that were recently moved into core from the sparse
extension override __call__, while the previously existing matchers
override matchfn. Let's switch to the latter for consistency.
2017-07-07 08:55:12 -07:00
Martin von Zweigbergk
57baaf4707 match: express anypats(), not prefix(), in terms of the others
When I added prefix() in 559ee9ecae07 (match: introduce boolean
prefix() method, 2014-10-28), we already had always(), isexact(), and
anypats(), so it made sense to write it in terms of them (a prefix
matcher is one that isn't any of the other types). It's only now that
I realize that it's much more natural to define prefix() explicitly
(it's one that uses path: patterns, roughly speaking) and let
anypats() be defined in terms of the others. Remember that these
methods are all used for determining which fast paths are
possible. anypats() simply means that no fast paths are possible (it
could be called complex() instead). Further evidence is that
rootfilesin:some/dir does not have any patterns, but it's still
considered to be an anypats() matcher. That's because anypats() really
just means that it's not a prefix() matcher (and not always() and not
isexact()).

This patch thus changes prefix() to return False by default and
anypats() to return True only if the other three are False. Having
anypats() be True by default also seems like a good thing, because it
means forgetting to override it will lead only to performance bugs,
not correctness bugs.

Since the base class's implementation changes, we're also forced to
update the subclasses. That change exposed and fixed a bug in the
differencematcher: for example when both its two input matchers were
prefix matchers, we would say that the result was also a prefix
matcher, which is incorrect, because e.g "path:dir - path:dir/foo" no
longer matches everything under "dir" (which is what prefix() means).
2017-07-09 17:02:09 -07:00
Martin von Zweigbergk
82311df2ab match: make nevermatcher an exact matcher and a prefix matcher
The m.isexact() and m.prefix() methods are used by callers to
determine whether m.files() can be used for fast paths. It seems safe
to let callers to any fast paths it can that rely on the empty
m.files().
2017-07-09 15:19:27 -07:00
Martin von Zweigbergk
3a5a9ff76e match: combine regex code for path: and relpath:
The regexes for path: and relpath: patterns are the same (since the
paths have already been normalized at the point we create the
regexes).

I don't think the "if pat == '.'" will have any effect relpath:
because relpath: patterns will have the root directory already
normalized to '' by pathutil.canonpath() (unlike path:, for which the
root gets normalized to '.' by util.normpath()).
2017-07-09 23:01:11 -07:00
Martin von Zweigbergk
8c3e639b61 match: remove unnecessary '^' from regexes
The regexes are passed to re.match(), which matches against the
beginning of the input, so the '^' doesn't do anything.

Note that unrooted patterns, such as globs and regexes from .hgignore
are instead achieved by adding '.*' to the expression given by the
user. (That's unless the user's expression started with '^', in which
case the '.*' is not added, perhaps to keep the regex cleaner?)
2017-07-09 22:53:02 -07:00
Gregory Szorc
16c192411d match: move matchers from sparse into core
The sparse extension contains some matcher types that are
generic and can exist in core.

As part of the move, the classes now inherit from basematcher.
always(), files(), and isexact() have been dropped because
they match the default implementations in basematcher.
2017-07-06 17:39:24 -07:00
Martin von Zweigbergk
2c8e174b97 match: minor cleanups to patternmatcher and includematcher
The "patterns"/"include" in "patternspat"/"includepat" is redundant,
so drop it. Also a "_" prefix since it's "private".

Inline the "pm"/"im" variables.
2017-06-08 22:49:21 -07:00
Martin von Zweigbergk
a9c40085c0 match: allow pats to be None
match.match already interprets "!bool(patterns)" as matching
everything (but includes and excludes still apply). We might as well
allow None, which lets us simplify some callers a bit.

I originally wrote this patch while trying to change
match.match(patterns=[]) to mean to match no patterns. This patch is
one step towards that goal. I'm not sure it'll be worth the effort to
go all the way there, but I think this patch still makes sense on its
own.
2017-06-08 22:18:17 -07:00
Martin von Zweigbergk
2c881818ba match: simplify nevermatcher
Most of it does the same as its superclass, so it can simply be
removed. It also seems to make more sense for it to use relative
paths, as we do for everything except alwaysmatcher, although
nevermatcher.uipath() will probably never get called anyway, so it
won't matter.
2017-06-01 08:31:21 -07:00
Siddharth Agarwal
9202c22f7c match: introduce nevermatcher for when no ignore files are present
c01965ab5195 introduced a deterministic `__repr__` for ignores. However, it
didn't account for when ignore was `util.never`. This broke fsmonitor's ignore
change detection -- with an empty hgignore, it would kick in all the time.

Introduce `nevermatcher` and switch to it. This neatly parallels
`alwaysmatcher`.
2017-06-01 00:40:52 -07:00
Martin von Zweigbergk
a00e1e8b0a match: remove special-casing of always-matching patterns in patternmatcher
This moves the optimization for patterns that match everything to the
caller, so we can remove it from patternmatcher.

Note that we need to teach alwaysmatcher to use relative paths now in
cases like "hg files .." from inside mercurial/, because while it
still matches everything, paths should be printed relative to the
working directory.
2017-05-19 13:16:15 -07:00
Martin von Zweigbergk
40160c59ed match: move normalize() call out of matcher constructors
By passing in the result of the normalize() call, we prepare for
moving the special handling of patterns that always match out of the
patternmatcher.

It also lets us remove many of the arguments from the matcher, because
they were passed only the the normalize function (we could have
removed the arguments by binding them to the function instead of
moving the normalize() call out).
2017-05-19 12:47:45 -07:00
Martin von Zweigbergk
cc35e8cc47 match: drop support for empty pattern list in patternmatcher
Since the caller now deals with empty pattern lists, we can drop that
support in the patternmatcher. It now gets the more logical behavior
of matching nothing when no patterns are given (although there is no
in-core caller that will pass no patterns).
2017-05-19 11:58:16 -07:00
Martin von Zweigbergk
b0d04b4dc4 match: optimize visitdir() for when no explicit files are listed
In patternmatcher, we used to say that all directories should be
visited if no explicit files were listed, because the case of empty
_files usually implied that no patterns were given (which in turns
meant that everything should match). However, this made e.g. "hg files
-r .  rootfilesin:."  slower than necessary, because that also ended
up with an empty list in _files. Now that patternmatcher does not
handle includes, the only remaining case where its _files/_fileset
fields will be empty is when it's matching everything. We can
therefore treat the always-case specially and stop treating the empty
_files case specially. This makes the case mentioned above faster on
treemanifest repos.
2017-05-20 23:49:14 -07:00
Martin von Zweigbergk
57f17ff9f2 match: handle everything-matching using new alwaysmatcher
Having a special matcher that always matches seems to make more sense
than making one of the other matchers handle the case. For now, we
just use this new matcher when no patterns were provided.
2017-05-19 11:50:01 -07:00
Martin von Zweigbergk
e63330c2d2 match: add __repr__ for subdirmatcher
Should at least be useful for debugging. Would matter for correctness
too if fsmonitor or Facebook's sparse extension worked with subrepos
(which I don't know if they do).
2017-05-26 13:08:30 -07:00
Yuya Nishihara
477ffb0437 match: define exactmatcher.matchfn statically
This should eliminate the reference cycle, self.matchfn -> self.exact -> self.
2017-05-28 23:54:31 +09:00
Yuya Nishihara
b7251d7b93 match: remove override of prefix() from differencematcher
It's exactly the same as basematcher.prefix().
2017-05-28 23:51:30 +09:00
Martin von Zweigbergk
882acf90e8 match: remove support for includes from patternmatcher
Includes (and excludes) are now delegated to the includematcher.
2017-05-19 11:44:05 -07:00
Martin von Zweigbergk
19566696b9 match: simplify includematcher a bit
The "include" we have in symbols is redundant and the double negative
in visitdir() can be removed.
2017-05-22 23:31:15 -07:00
Martin von Zweigbergk
66dd6b9e1c match: remove support for non-include patterns from includematcher
The includematcher will always get at least one include pattern and
will never get any non-include patterns, so we can remove most of the
code in it. This patch does mostly straight-forward deletions of
code. We will clean up further later.
2017-05-19 13:36:34 -07:00
Martin von Zweigbergk
1ba59afc49 match: split up main matcher into patternmatcher and includematcher
At this point the includematcher is an exact copy of the main matcher
class. We will specialize and simplify both classes in the following
patches. This initial unmodified copy is just to make the differences
clearer. We also rename the main matcher to "patternmatcher" for
consistency.

I may eventually merge this new includematcher back into the main
matcher, but I think doing it this way makes the intermediate steps
clearer regardless.
2017-05-19 22:36:14 -07:00
Martin von Zweigbergk
ce94f073cb match: remove support for exact matching from main matcher class
Exact matching is now handled by the exactmatcher class.

We can safely remove _files from the __repr__() implementation,
because even though the field is set, the patternspat field is enough
for the representation to be unambiguous (which was not the case when
the matcher could handle exact matches).
2017-05-18 23:39:39 -07:00
Martin von Zweigbergk
6cc2daf5d6 match: handle exact matching using new exactmatcher 2017-05-17 09:26:15 -07:00
Martin von Zweigbergk
7767620115 match: handle includes using new intersectionmatcher 2017-05-12 23:12:05 -07:00
Martin von Zweigbergk
8a54c0d671 match: move entire uipath() implementation to basematcher
Even though most matchers will always want to use the relative path in
uipath(), when we add support for intersecting matcher, we will want
to control which form to use for any kind of matcher without knowing
the type (see next patch), so we need the implementation on the base
class.

Also rename the attribute from "pathrestricted" to "relativeuipath"
since there actually are cases where we match everything but still use
relative paths (like when the user runs "hg files .." from inside
mercurial/).
2017-05-25 14:32:56 -07:00
Martin von Zweigbergk
6f7738b741 match: remove support for excludes from matcher class
The support is now provided by differencematcher() and still available
via the match() function.
2017-05-16 22:15:42 -07:00
Martin von Zweigbergk
8d0d310985 match: handle excludes using new differencematcher
As I've said on earlier patches, I'm hoping to use more composition of
simpler matchers instead of the single complex matcher we currently
have. This extracts a first new matcher that composes two other
matchers. It matches if the first matcher matches but the second does
not. As such, we can use it for excludes, which this patch also
does. We'll remove the now-unncessary code for excludes in the next
patch.
2017-05-16 16:36:48 -07:00
Martin von Zweigbergk
2724c60601 match: override matchfn() the usual way in subdirmatcher 2017-05-25 09:52:56 -07:00
Martin von Zweigbergk
cb783946fc match: make matchfn a method on the class
This makes it easier to override in subclasses, so they don't have to
assign the attribute with a lambda.
2017-05-25 09:52:49 -07:00
Martin von Zweigbergk
de3c23309e match: fix visitdir for roots of includes
I'm hoping to rewrite the matcher so excludes are handled by
composition of one matcher with another matcher where the second
matcher has only includes. For that to work, we need to make
visitdir() to return 'all' for directory 'foo' for a '-I foo' matcher.
2017-05-16 14:31:21 -07:00
Martin von Zweigbergk
605d9dfcea match: make subdirmatcher extend basematcher
This makes the subdirmatcher not depend on the main matcher, giving us
more freedom to modify that (specifically, it will lose it _always
field in a while).
2017-05-17 23:02:42 -07:00
Martin von Zweigbergk
5e75aba9b0 match: make basematcher._files a @propertycache
This will make it easier to override in subclasses (otherwise the
function @propertycache object will be replaced by the
super-constructor call)..
2017-05-19 10:17:08 -07:00
Martin von Zweigbergk
c9664eeaa0 match: extract base class for matchers
We will soon start splitting up the current matcher class into more
specialized classes, so we'll want a base class for all the things
that don't vary much between different matchers.
2017-05-17 23:45:13 -07:00
Martin von Zweigbergk
2410b10b5f match: use ProgrammingError where appropriate 2017-05-23 08:49:01 -07:00
Martin von Zweigbergk
243fda7165 match: catch attempts to create case-insenstive exact matchers
Exact matchers are only created internally (as opposed to from user
input) based on a set of files that the caller collected before, so
they should always match the list exactly (i.e. case-sensitively).
2017-05-22 08:49:34 -07:00
Martin von Zweigbergk
ee3be3c6ea match: implement __repr__() and update users (API)
fsmonitor and debugignore currently access matcher fields that I would
consider implementation details, namely patternspat, includepat, and
excludepat. Let' instead implement __repr__() and have the few users
use that instead.

Marked (API) because the fields can now be None.
2017-05-22 11:08:18 -07:00