Commit Graph

146 Commits

Author SHA1 Message Date
Matt Harbison
70f9cecb27 match: normpath the ignore source when expanding the 'subinclude' kind
Windows was previously getting this test failure:

  --- e:/Projects/hg/tests/test-hgignore.t
  +++ e:/Projects/hg/tests/test-hgignore.t.err
  @@ -230,6 +230,7 @@

     $ hg status
     ? dir1/file2
  +  ? dir1/subdir/subfile3
     ? dir1/subdir/subfile4
     ? dir2/file1

  @@ -241,4 +242,4 @@
     $ echo "glob:file*2" > dir1/.hgignoretwo

     $ hg status | grep file2
  -  [1]
  +  ? dir1/file2

The problem was 'source' would be in the form "F:\test-hgignore.t\.hgignore", so
when pathutil.dirname() split on '/', 'sourceroot' was empty.  Therefore, 'path'
ended up being relative instead of absolute.
2015-05-27 13:28:16 -04:00
Durham Goode
9ae1033b0d match: enable 'subinclude:' syntax
This adds a new rule syntax that allows the user to include a pattern file, but
only have those patterns match against files underneath the subdirectory of the
pattern file.

This is useful when you have nested projects in a repository and the inner
projects wants to set up ignore rules that won't affect other projects in the
repository. It is also useful in high commit rate repositories for removing the
root .hgignore as a point of contention.
2015-05-16 16:25:05 -07:00
Drew Gottlieb
94d1131d10 match: fix bug in match.visitdir()
There was a bug in my recent change to visitdir (cb39606d374e) due to
the stored generator being iterated over twice. Making the generator into a
list at the start fixes this.
2015-05-22 14:39:34 -07:00
Durham Goode
8328676177 match: allow unioning arbitrary match functions
A future patch will be allowing nested matchers. To support that, let's refactor
_buildmatch to build a list of matchers then return True if any match.

We were already doing that for filesets + regex patterns, but this way will be
more generic.
2015-05-16 16:16:18 -07:00
Durham Goode
dbbbdec2b6 match: add root to _buildmatch
A future patch will make _buildmatch able to expand relative include patterns.
Doing so will require knowing the root of the repo, so let's go ahead and pass
it in.
2015-05-16 16:12:00 -07:00
Martin von Zweigbergk
37105d1059 match: introduce boolean prefix() method
tl;dr: This is another step towards a (previously unstated) goal of
eliminating match.files() in conditions.

There are four types of matchers:

 * always: Matches everything, checked with always(), files() is empty

 * exact: Matches exact set of files, checked with isexact(), files()
   contains the files to match

 * patterns: Matches more complex patterns, checked with anypats(),
   files() contains roots of the matched patterns

 * prefix: Matches simple 'path:' patterns as prefixes ('foo' matches
   both 'foo' and 'foo/bar'), no single method to check, files()
   contains the prefixes to match

For completeness, it would be nice to have a method for checking for
the "prefix" type of matcher as well, so let's add that, making it
return True simply when none of the others do.

The larger goal here is to eliminate uses of match.files() in
conditions (i.e. bool(match.files())). The reason for this is that
there are scenarios when you would like to create a "prefix" matcher
that happens to match no files. One example is for 'hg files -I foo
bar'. The narrowmatcher also restricts the set of files given and it
would not surprise me if have bugs caused by that already. Note that
'if m.files() and not m.anypats()' and similar is sometimes used to
catch the "exact" and "prefix" cases above.
2014-10-28 22:47:22 -07:00
Drew Gottlieb
eb5e31d8eb match: have visitdir() consider includes and excludes
match.visitdir() used to only look at the match's primary pattern roots to
decide if a treemanifest traverser should descend into a particular directory.
This change logically makes visitdir also consider the match's include and
exclude pattern roots (if applicable) to make this decision.

This is especially important for situations like using narrowhg with multiple
treemanifest revlogs.
2015-05-18 14:29:20 -07:00
Durham Goode
8e42385986 ignore: use 'include:' rules instead of custom syntax
Now that the matcher supports 'include:' rules, let's change the dirstate.ignore
creation to just create a matcher with a bunch of includes. This allows us to
completely delete ignore.py.

I moved some of the syntax documentation over to readpatternfile in match.py so
we don't lose it.
2015-05-16 16:06:22 -07:00
Durham Goode
e99405d80e match: add 'include:' syntax
This allows the matcher to understand 'include:path/to/file' style rules.  The
files support the standard hgignore syntax and any rules read from the file are
included in the matcher without regard to the files location in the repository
(i.e. if the included file is in somedir/otherdir, all of it's rules will still
apply to the entire repository).
2015-05-16 15:56:52 -07:00
Durham Goode
d6f0921e70 match: add optional warn argument
Occasionally the matcher will want to print warning messages instead of throwing
exceptions (like if it encounters a bad syntax parameter when parsing files).
Let's add an optional warn argument that can provide this. The next patch will
actually use this argument.
2015-05-18 16:27:56 -07:00
Durham Goode
7ea80e36a3 match: add source to kindpats list
Future patches will be adding the ability to recursively include pattern files
in a match rule expression. Part of that behavior will require tracking which
file each pattern came from so we can report errors correctly.

Let's add a 'source' arg to the kindpats list to track this. Initially it will
only be populated by listfile rules.
2015-05-16 15:51:03 -07:00
Matt Mackall
7e1cf5444c merge with stable 2015-05-19 07:17:57 -05:00
Drew Gottlieb
04e229c0e2 match: rename _fmap to _fileroots for clarity
fmap isn't a very descriptive name for the set of the match's files.
2015-05-08 12:30:51 -07:00
Drew Gottlieb
ca0e804650 match: remove unnecessary optimization where visitdir() returns 'all'
Match's visitdir() was prematurely optimized to return 'all' in some cases, so
that the caller would not have to call it for directories within the current
directory. This change makes the visitdir system less flexible for future
changes, such as making visitdir consider the match's include and exclude
patterns.

As a demonstration of this optimization not actually improving performance,
I ran 'hg files -r . media' on the Mozilla repository, stored as treemanifest
revlogs.

With best of ten tries, the command took 1.07s both with and without the
optimization, even though the optimization reduced the calls from visitdir()
from 987 to 51.
2015-05-06 15:59:35 -07:00
Durham Goode
b06100dbed ignore: move readpatternfile to match.py
In preparation for adding 'include:' rule support to match.py, let's move the
pattern file reader function to match.py
2015-05-16 15:46:54 -07:00
Matt Harbison
9311abf92a match: resolve filesets in subrepos for commands given the '-S' argument
This will work for any command that creates its matcher via scmutil.match(), but
only the files command is tested here (both workingctx and basectx based tests).
The previous behavior was to completely ignore the files in the subrepo, even
though -S was given.

My first attempt was to teach context.walk() to optionally recurse, but once
that was in place and the complete file list was built up, the predicate test
would fail with 'path in nested repo' when a file in a subrepo was accessed
through the parent context.

There are two slightly surprising behaviors with this functionality.  First, any
path provided inside the fileset isn't narrowed when it is passed to the
subrepo.  I dont see any clean way to do that in the matcher.  Fortunately, the
'subrepo()' fileset is the only one to take a path.

The second surprise is that status predicates are resolved against the subrepo,
not the parent like 'hg status -S' is.  I don't see any way to fix that either,
given the path auditor error mentioned above.
2015-05-16 00:36:35 -04:00
Drew Gottlieb
cb6b880af7 match: add match.ispartial()
match.ispartial() will return the opposite of match.always() in core, but this
function will be extensible by extensions to produce another result even
if match.always() will be untouched.

This will be useful for narrowhg, where ispartial() will return False even if
the match won't always match. This would happen in the case where the only
time the match function is False is when the path is outside of the narrow
spec.
2015-05-15 15:43:26 -07:00
Matt Harbison
25fa81d27a match: introduce uipath() to properly style a file path
Several methods print files relative to the repo root, unless files are named on
the command line, in which case they are printed relative to cwd.  Since the
check relies on the 'pats' parameter, which needs to be replaced by a matcher
when adding subrepo support, this logic gets folded into the matcher to tidy up
the callers.

Prior to 7d5fcea60c78, this style decision was based off of whether or not the
'pats' list was empty.  That change altered the check to test match.anypats()
instead, in order to make paths printed consistent when -I/-X is specified.
That however, changed the style when a file is given to the command.  So now we
test the pattern list to get the old behavior for files, as well as test -I/-X
to get the consistency for patterns.
2014-12-04 23:04:55 -05:00
Martin von Zweigbergk
60c55cc10d match: remove unnecessary setting of self._always
The 'always' class calls its parent constructor with an empty list of
patterns, which will result in a matcher that always matches. The
parent constructor will set self._always to True in such cases, so
there is no need to set it again.
2014-09-30 15:58:08 -07:00
Martin von Zweigbergk
998d55c67c match: simplify brittle predicate construction
In match.__init__(), we create the matchfn predicate by and-ing
together the individual predicates for includes, excludes (negated)
and patterns. Instead of the current set of nested if/else blocks, we
can simplify by adding the predicates to a list and defining the
overall predicate in a generic way based on the components. We can
still optimize it for the 0-length and 1-length cases. This way, there
is no combinatorial explosion to deal with if new component predicates
are added, and there is less risk of getting the overall predicate
wrong.
2014-09-19 13:49:58 -07:00
Siddharth Agarwal
712df54e55 match: use util.re.escape instead of re.escape
For a pathological .hgignore with over 2500 glob lines and over 200000 calls to
re.escape, and with re2 available, this speeds up parsing the .hgignore from
0.75 seconds to 0.20 seconds. This causes e.g. 'hg status' with hgwatchman
enabled to go from 1.02 seconds to 0.47 seconds.
2014-07-15 15:34:50 -07:00
Siddharth Agarwal
5e38818cf4 match: use util.re.compile instead of util.compilere 2014-07-15 14:49:45 -07:00
Siddharth Agarwal
1aa951751f match: make glob '**/' match the empty string
Previously, a glob pattern of the form 'foo/**/bar' would match 'foo/a/bar' but
not 'foo/bar'. That was because the '**' in 'foo/**/bar' would be translated to
'.*', making the final regex pattern 'foo/.*/bar'. That pattern doesn't match
the string 'foo/bar'.

This is a bug because the '**/' glob matches the empty string in standard Unix
shells like bash and zsh.

Fix that by making the ending '/' optional if an empty string can be matched.
2014-06-25 14:50:48 -07:00
Yuya Nishihara
2e030eb020 match: fix NameError 'pat' on overflow of regex pattern length
'pat' was renamed to 'regex' in 25907f42ff54.
2014-04-29 11:02:40 +09:00
Mads Kiilerich
3bb35d5e35 match: remove last traces of unused .missing callback 2013-10-03 18:01:21 +02:00
Mads Kiilerich
3811c637be match: _globre doctests 2014-04-13 22:00:08 +02:00
Mads Kiilerich
df98b5168a match: improve documentation - docstrings and more descriptive variable naming
No real changes.

pattern: 'kind:pat' as specified on the command line
patterns, pats: list of patterns
kind: 'path', 'glob' or 're' or ...
pat: string in the corresponding 'kind' format
kindpats: list of (kind, pat) tuples
2013-10-03 18:01:21 +02:00
Mads Kiilerich
f171464010 match: make it more clear what _roots do and that it ends up in match()._files 2013-10-03 18:01:21 +02:00
Augie Fackler
edc98c0164 match: use ctx.getfileset() instead of fileset.getfileset()
Resolves an import cycle involving match and merge.
2014-02-04 14:54:42 -05:00
Augie Fackler
213fff305a pathutil: tease out a new library to break an import cycle from canonpath use 2013-11-06 18:19:04 -05:00
Mads Kiilerich
1e900bb145 check-code: check for spaces around = for named parameters 2013-10-03 14:50:47 +02:00
Siddharth Agarwal
be89af772f match: add comments to explain explicitdir and traversedir 2013-05-03 15:36:18 -07:00
Siddharth Agarwal
8e7066e5f2 match: make explicitdir and traversedir None by default
With this, extensions can easily tell when traversedir and/or explicitdir don't
need to be called.
2013-05-03 14:41:58 -07:00
Siddharth Agarwal
ded9770198 match: drop dir callback
dir has been subsumed by explicitdir and traversedir.
2013-04-28 21:29:32 -07:00
Siddharth Agarwal
3c72b73e1f match: introduce explicitdir and traversedir
match.dir is currently called in two different places:
(1) noting when a directory specified explicitly is visited.
(2) noting when a directory is visited during a recursive walk.

purge cares about both, but commit only cares about the first.

Upcoming patches will split the two cases into two different callbacks. Why
bother? Consider a hypothetical extension that can provide more efficient walk
results, via e.g. watching the filesystem. That extension will need to
fall back to a full recursive walk if a callback is set for (2), but not if a
callback is only set for (1).
2013-04-28 21:24:09 -07:00
Mads Kiilerich
2fc7d0133f match: fix root calculation for combining regexps with simple paths
The fall-back root for walking is the repo root, not no root.

The "roots" do however also end up in m.files() which is used in various ways,
for instance to indicate whether matches are exact. The change could thus have
other impacts.
2013-04-30 01:04:35 +02:00
Bryan O'Sullivan
905d64bf45 match: more accurately report when we're always going to match
This improves the performance of log --patch and --stat by about
20% for moderately large manifests (e.g. mozilla-central) for the
common case of no -I/-X patterns.
2013-02-21 12:55:39 -08:00
Mads Kiilerich
2372d51b68 fix wording and not-completely-trivial spelling errors and bad docstrings 2012-08-15 22:39:18 +02:00
Bryan O'Sullivan
3f45806d34 matcher: use re2 bindings if available
There are two sets of Python re2 bindings available on the internet;
this code works with both.

Using re2 can greatly improve "hg status" performance when a .hgignore
file becomes even modestly complex.

Example: "hg status" on a clean tree with 134K files, where "hg
debugignore" reports a regexp 4256 bytes in size.

  no .hgignore: 1.76 sec
  Python re:    2.79
  re2:          1.82

The overhead of regexp matching drops from 1.03 seconds with stock
re to 0.06 with re2.

(For comparison, a git repo with the same contents and .gitignore
file runs "git status -s" in 1.71 seconds, i.e. only slightly faster
than hg with re2.)
2012-06-01 15:26:20 -07:00
Matt Mackall
bfe92722a0 merge with stable 2012-05-30 14:21:58 -05:00
FUJIWARA Katsunori
abfb6c35d4 match: make 'match.files()' return list object always
'exact' match objects are sometimes created with a non-list 'pattern'
argument:

  - using 'set' in queue.refresh():hgext/mq.py
        match = scmutil.matchfiles(repo, set(c[0] + c[1] + c[2] + inclsubs))

  - using 'dict' in revert():mercurial/cmdutil.py (names = {})
        m = scmutil.matchfiles(repo, names)

'exact' match objects return specified 'pattern' to callers of
'match.files()' as it is, so it is a non-list object.

but almost all implementations expect 'match.files()' to return a list
object, so this may causes problems: e.g. exception for "+" with
another list object.

this patch ensures that '_files' of 'exact' match objects is a list
object.

for non 'exact' match objects, parsing specified 'pattern' already
ensures that it it a list one.
2012-05-23 00:25:29 +09:00
Brodie Rao
92158e04de cleanup: "raise SomeException()" -> "raise SomeException" 2012-05-12 16:00:58 +02:00
Jesse Glick
17675847fa localrepo: optimize internode status calls using match.always
Introduce match.always() to check if a match object always says yes, i.e.
None was passed in. If so, mfmatches should not bother iterating every file in
the repository.
2012-05-04 15:54:55 -04:00
Patrick Mezard
8d52be3d10 match: consider filesets as "anypats"
Matt suggested this on IRC, I do not think the choice is obvious, but this one
makes things simpler because while filesets are turned into a list of files
into the match objects, it would more be difficult to tell invalid files passed
in pats from those expanded from filesets.
2012-02-26 17:10:55 +01:00
Martin Geisler
543f0688d9 match: remove unused assignment
The field is assigned again below with the constructor argument.
2011-08-09 11:05:13 +02:00
Peter Arrenbrecht
aa36fb062f match: fix bug caused by refactoring in fb457d08da0b 2011-06-23 14:40:57 +02:00
Matt Mackall
1dc7f878ce match: introduce basic fileset support 2011-06-18 16:53:44 -05:00
Matt Mackall
b662345d91 match: allow passing a context object to match core 2011-06-18 16:52:51 -05:00
Patrick Mezard
37e2e503d6 match: make 'listfile:' split on LF and CRLF
We want util.readfile() to operate in binary mode, so EOLs have to be handled
correctly depending on the platform. It seems both easier and more convenient
to treat LF and CRLF the same way on all platforms.
2011-05-07 21:12:30 +02:00
Dan Villiom Podlaski Christiansen
511c941422 prevent transient leaks of file handle by using new helper functions
These leaks may occur in environments that don't employ a reference
counting GC, i.e. PyPy.

This implies:
 - changing opener(...).read() calls to opener.read(...)
 - changing opener(...).write() calls to opener.write(...)
 - changing open(...).read(...) to util.readfile(...)
 - changing open(...).write(...) to util.writefile(...)
2011-05-02 10:11:18 +02:00
Adrian Buehlmann
f3e8eae526 move canonpath from util to scmutil 2011-04-20 21:41:41 +02:00
Steve Borho
909958e650 match: fix subtle error in _buildmatch
The trailing comma was causing a ValueError.  See
https://bitbucket.org/tortoisehg/thg/issue/132
2011-02-18 10:28:20 -06:00
jfh
76026ecba1 add debugignore which yields the combined ignore patten of the .hgignore files
For GUI clients its sometimes important to know which files will be ignored and
which files will be important. This allows the GUI client to skipping redoing a
'hg status' when the files are ignored but have changed. (For instance, a
typical case is that the "build" directory inside some project is ignored but
files in it frequently change.)
2011-01-15 16:02:03 +01:00
Steve Borho
5d527f9378 match: support reading pattern lists from files 2010-12-23 15:12:24 -06:00
Martin Geisler
ca9cbd61a1 narrowmatcher: propagate bad method
The full path is propagated to the original match object since this is
often used directly for printing an error to the user.
2010-09-13 13:09:09 +02:00
Martin Geisler
3dafdd42e0 narrowmatcher: fix broken rel method 2010-09-13 13:08:18 +02:00
Martin Geisler
e0e3bf4835 match: add narrowmatcher class
This class can be used to adapt an existing match object to a new
match object that only cares about paths within a certain
subdirectory.
2010-09-03 12:58:51 +02:00
Martin Geisler
34c8204207 match: accept auditor argument
This is used when normalizing filenames and patterns.
2010-09-03 12:58:51 +02:00
Martin Geisler
b9c719086c match: mark error messages for translation 2010-08-30 17:11:51 +02:00
Matt Mackall
8d99be19f0 many, many trivial check-code fixups 2010-01-25 00:05:27 -06:00
Matt Mackall
595d66f424 Update license to GPLv2+ 2010-01-19 22:20:08 -06:00
Alejandro Santos
ebe339890f split local and stdlib module imports (eases migration issues) 2009-07-05 11:06:09 +02:00
timeless
fb33de67af Generally replace "file name" with "filename" in help and comments. 2009-06-09 09:25:17 -04:00
Matt Mackall
e802acf0e9 match: remove match.never
Only one user, can be translated to match.exact()
2009-05-31 17:54:18 -05:00
Matt Mackall
37eaadf540 match: ignore return of match.bad
All users returned false, return can now be dropped
2009-05-31 17:54:18 -05:00
Matt Mackall
2940d96bce match: document bad callback semantics 2009-05-31 17:54:18 -05:00
Matt Mackall
bd7c2104ff match: fix _patsplit breakage with drive letters 2009-05-24 16:37:34 -05:00
Matt Mackall
f1416e098e match: fold match into _match base class 2009-05-24 02:56:22 -05:00
Matt Mackall
50be99f7cb match: add exact flag to match() to unify all match forms 2009-05-24 02:56:20 -05:00
Matt Mackall
e8c6616c42 match: redefine always and never in terms of match and exact 2009-05-24 02:56:14 -05:00
Matt Mackall
dad8161beb match: fold _globprefix into _roots 2009-05-24 02:56:14 -05:00
Matt Mackall
870bafcadb match: optimize escaping in _globre
- localize re.escape
- fastpath escaping of non-special characters
2009-05-24 02:56:14 -05:00
Matt Mackall
fb6d5f4ec6 match: remove head and tail args from _globre 2009-05-24 02:56:14 -05:00
Matt Mackall
6cec04c6af match: fold _matcher into match.__init__ 2009-05-24 02:56:14 -05:00
Matt Mackall
0b844e4b36 match: rename _matchfn to _buildmatch 2009-05-24 02:56:14 -05:00
Matt Mackall
64e6241687 match: optimize _patsplit 2009-05-24 02:56:14 -05:00
Matt Mackall
8ba6e48c01 match: tweak some names 2009-05-24 02:56:14 -05:00
Matt Mackall
cb5cd35394 match: simplify _matcher
- get rid of special case
- simplify anypats logic
- fold inckinds and exckinds
2009-05-24 02:56:14 -05:00
Matt Mackall
d9e0a2ed6d match: split up _normalizepats 2009-05-24 02:56:14 -05:00
Matt Mackall
3d1ba9cdd7 match: optimize _globprefix 2009-05-24 02:56:14 -05:00
Matt Mackall
e01c72ad8b match: unnest functions in _matcher 2009-05-24 02:56:14 -05:00
Matt Mackall
556e496bdb match: kill unused defaults on _globre 2009-05-24 02:56:14 -05:00
Matt Mackall
85df20c19d match: kill test in matchfn 2009-05-24 02:56:14 -05:00
Matt Mackall
d65d73641a match: refactor matchfn generation 2009-05-24 02:56:14 -05:00
Matt Mackall
f1f37a33cf match: move util match functions over 2009-05-24 02:56:14 -05:00
Matt Mackall
b287cf72fe match: refactor patkind
add patkind(pat) to match
change external users
change util.patkind to _patsplit
2009-05-24 02:56:14 -05:00
Matt Mackall
89c18ad8ce match: add some default args 2009-05-24 02:56:14 -05:00
Matt Mackall
532c58d931 match: change all users of util.matcher to match.match 2009-05-24 02:56:14 -05:00
Simon Heimberg
1b0d5c18bb match: use self.exact instead of lambda
self.exact uses a set and does not need an extra copy of the files
2009-05-15 09:43:30 +02:00
Martin Geisler
cd12c66fa6 match: add copyright and license header 2009-04-26 01:57:00 +02:00
Martin Geisler
e2222d3c43 replace set-like dictionaries with real sets
Many of the dictionaries created by dict.fromkeys were emulating sets.
These can now be replaced with real sets.
2009-04-22 00:57:28 +02:00
Matt Mackall
ec9023fc53 dirstate.walk: speed up calling match function 2008-07-22 13:03:31 -05:00
Matt Mackall
793085a29c match: cleanup match classes a bit 2008-05-12 11:37:08 -05:00
Matt Mackall
52779fb5fe match: add always, never, and exact methods 2008-05-12 11:37:08 -05:00
Matt Mackall
1897262320 walk: begin refactoring badmatch handling 2008-05-12 11:37:07 -05:00
Matt Mackall
20f7afebf8 walk: introduce match objects 2008-05-12 11:37:07 -05:00