Commit Graph

138 Commits

Author SHA1 Message Date
Pierre-Yves David
be968edfac match: explicitly tests for None
Changeset ba7f2a1cc2d2 removed the mutable default value, but did not explicitly
tested for None. Such implicit testing can introduce semantic and performance
issue. We move to an explicit testing for None as recommended by PEP8:

https://www.python.org/dev/peps/pep-0008/#programming-recommendations
2017-03-15 15:08:45 -07:00
Pulkit Goyal
5a7c5d918e match: slice over bytes to get the byteschr instead of ascii value 2017-03-16 08:03:51 +05:30
Pulkit Goyal
0bad6ee6aa match: make regular expression bytes to prevent TypeError 2017-03-16 07:52:47 +05:30
Rishabh Madan
6a6d5ec05c py3: open file in rb mode 2017-03-15 14:51:18 +05:30
Gregory Szorc
98c99b99fa match: don't use mutable default argument value
There shouldn't be a big perf hit creating a new object because
this function is complicated and does things that dwarf the cost
of creating a new PyObject.
2017-03-12 21:53:03 -07:00
Rodrigo Damazio Bovendorp
0bd84ceb64 match: making visitdir() deal with non-recursive entries
Primarily as an optimization to avoid recursing into directories that will
never have a match inside, this classifies each matcher pattern's root as
recursive or non-recursive (erring on the side of keeping it recursive,
which may lead to wasteful directory or manifest walks that yield no matches).

I measured the performance of "rootfilesin" in two repos:
- The Firefox repo with tree manifests, with
  "hg files -r . -I rootfilesin:browser".
  The browser directory contains about 3K files across 249 subdirectories.
- A specific Google-internal directory which contains 75K files across 19K
  subdirectories, with "hg files -r . -I rootfilesin:REDACTED".

I tested with both cold and warm disk caches. Cold cache was produced by
running "sync; echo 3 > /proc/sys/vm/drop_caches". Warm cache was produced
by re-running the same command a few times.

These were the results:

               Cold cache           Warm cache
             Before   After      Before   After
firefox      0m5.1s   0m2.18s   0m0.22s  0m0.14s
google3 dir  2m3.9s   0m1.57s   0m8.12s  0m0.16s

Certain extensions, notably narrowhg, can depend on this for correctness
(not trying to recurse into directories for which it has no information).
2017-02-13 17:03:14 -08:00
Rodrigo Damazio Bovendorp
8c5ff96dc1 match: adding support for matching files inside a directory
This adds a new "rootfilesin" matcher type which matches files inside a
directory, but not any subdirectories (so it matches non-recursively).
This has the "root" prefix per foozy's plan for other matchers (rootglob,
rootpath, cwdre, etc.).
2017-02-13 15:39:29 -08:00
Jun Wu
d9d6d37c5a match: migrate to util.iterfile 2016-11-14 23:16:05 +00:00
liscju
c7ec9d159e i18n: translate abort messages
I found a few places where message given to abort is
not translated, I don't find any reason to not translate
them.
2016-06-14 11:53:55 +02:00
Martin von Zweigbergk
cc3e946787 match: override 'visitdir' in subdirmatcher
The manifest.manifest class has a _treeinmem member than one can
manually set to True to test that the treemanifest class works as a
drop-in replacement for manifestdict (which is mostly a requirement
for treemanifest repos to work). However, it doesn't quite work at the
moment. These tests fail:

test-largefiles-misc.t
test-rebase-newancestor.t
test-subrepo.t
test-subrepo-deep-nested-change.t
test-subrepo-recursion.t

All but test-rebase-newancestor.t fail because they trigger calls to
subdirmatcher.visitdir(), which tries to access a _excluderoots field
that does not exist on the subdirmatcher. Let's fix that by overriding
visitdir() in a similar way to how matchfn is overridden, i.e. by
prepending the directory before calling the superclass method.
2016-02-05 21:25:44 -08:00
Martin von Zweigbergk
c84bb33a89 match: rename "narrowmatcher" to "subdirmatcher" (API)
I keep mistaking "narrowmatcher" for narrowhg's
narrowmatcher. "subdirmatcher" seems more to the point anyway.
2016-02-05 21:09:32 -08:00
Laurent Charignon
b43822e77f match: add option to return line and lineno from readpattern
This will be used to display the line and linenumber of ignorefile that matched
an ignored file (issue4856).
2015-12-26 19:40:38 -08:00
Martin von Zweigbergk
50de24bc06 treemanifest: don't iterate entire matching submanifests on match()
Before a4236180df5e (match: remove unnecessary optimization where
visitdir() returns 'all', 2015-05-06), match.visitdir() used to return
the special value 'all' to indicate that it was known that all
subdirectories would also be included in the match. The purpose for
that value was to avoid calling the matcher on all the paths. It
turned out that calling the matcher was not a problem, so the special
return value was removed and the code was simplified. However, if we
use the same special value for not just avoiding calling the matcher
on each file, but to avoid iterating over each file, it's a much
bigger win. On commands like

  hg st --rev .^ --rev . dom/

we run the matcher (dom/) on the two manifests, then diff the narrowed
manifest. If the size of the match is much larger than the size of the
diff, this is wasteful. In the above case, we would end up iterating
over the 15k-or-so files in dom/ for each of the manifests, only to
later discover that they are mostly the same. This means that runningt
the command above is usually slower than getting the status for the
entire repo, because that code avoids calling treemanifest.match() and
only calls treemanifest.diff(), which loads only what's needed for the
diff.

Let's fix this by reintroducing the 'all' value in match.visitdir()
and making treemanifest.match() return a lazy copy of the manifest
from dom/ and down (in the above case). This speeds up the above
command on the Firefox repo from 0.357s to 0.137s (best of 5). The
wider the match, the bigger the speedup.
2015-12-12 09:57:05 -08:00
Bryan O'Sullivan
319ff5c06a match: use re2 in readpatternfile if possible
This has a small, but measurable, effect on performance if a pattern
file is very large.  In an artificial test with 200,000 lines of
pattern data, using re2 reduced read time by 200 milliseconds.
2015-12-10 21:33:55 -08:00
Mads Kiilerich
09567db49a spelling: trivial spell checking 2015-10-17 00:58:46 +02:00
Pierre-Yves David
30913031d4 error: get Abort from 'error' instead of 'util'
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.

For great justice.
2015-10-08 12:55:45 -07:00
Matt Mackall
dc3c45835d merge with stable 2015-08-12 17:01:50 -05:00
Matt Harbison
3c940eb21b match: fix a caseonly rename + explicit path commit on icasefs (issue4768)
The problem was that the former name and the new name are both normalized to the
case in dirstate, so matcher._files would be ['ABC.txt', 'ABC.txt'].
localrepo.commit() calls localrepo.status(), passing along the matcher.  Inside
dirstate.status(), _walkexplicit() simply grabs matcher.files() and processes
those items.  Since the old name isn't present, it is silently dropped.  There's
a fundamental tension here, because the status command should also accept files
that don't match the filesystem, so we can't drop the normalization in status.
The problem originated in d70aa474bd84.

Unfortunately with this change, the case of the old file must still be specified
exactly, or the old file is again silently excluded.  I went back to
d70aa474bd84^, and that had the same behavior, so we are no worse off.  I'm open
to ideas from a matcher or dirstate expert on how to fix that half.
2015-08-06 21:00:16 -04:00
Yuya Nishihara
445c9fa91e ignore: fix path concatenation of .hgignore on Windows
Since 862fc5f60fef, .hgignore is ignored on Windows because a pat may have
a drive letter, but pathutil.join is posixpath.join.

  "z:\foo\bar/z:\foo\bar\.hgignore"

Instead, this patch uses os.path.join() and util.localpath() to process both
parts as file-system paths.

Maybe we can remove os.path.join() at dirstate._ignore because 'include:' is
resolved relative to repo root? It was introduced by 29ce78e5f35c.
2015-07-27 22:14:40 +09:00
Durham Goode
677efb706e ignore: fix include: rules depending on current directory (issue4759)
When reading pattern files, we just call open(path), which is relative to the
current directory.  Let's fix this by resolving the paths before attempting to
read the file.
2015-07-24 16:44:52 -07:00
Gregory Szorc
02e789ba1d match: use absolute_import 2015-08-08 19:39:45 -07:00
Matt Mackall
0508027d17 merge with stable 2015-06-24 13:41:27 -05:00
Gregory Szorc
5380dea2a7 global: mass rewrite to use modern exception syntax
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".

This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.

This patch was produced by running `2to3 -f except -w -n .`.
2015-06-23 22:20:08 -07:00
Matt Harbison
f536f82114 match: let 'path:.' and 'path:' match everything (issue4687)
Previously, both queries exited with code 1, printing nothing.  The pattern in
the latter query is normalized to '.', so it is really the same case.
2015-06-20 19:59:26 -04:00
Matt Harbison
3adb63e5c0 match: explicitly naming a subrepo implies always() for the submatcher
The files command supports naming directories to limit the output to children of
that directory, and it also supports -S to force recursion into a subrepo.  But
previously, using -S and naming a subrepo caused nothing to be output.  The
reason was narrowmatcher() strips the current subrepo path off of each file,
which would leave an empty list if only the subrepo was named.

When matching on workingctx, dirstate.matches() would see the submatcher is not
always(), so it returned the list of files in dmap for each file in the matcher-
namely, an empty list.  If a directory in a subrepo was named, the output was as
expected, so this was inconsistent.

The 'not anypats()' check is enforced by an existing test around line 140:

    $ hg remove -I 're:.*.txt' sub1

Without the check, this removed all of the files in the subrepo.
2015-05-17 22:09:37 -04:00
Matt Harbison
041a91f971 match: add a subclass for dirstate normalizing of the matched patterns
This class is only needed on case insensitive filesystems, and only
for wdir context matches. It allows the user to not match the case of
the items in the filesystem- especially for naming directories, which
dirstate doesn't handle[1]. Making dirstate handle mismatched
directory cases is too expensive[2].

Since dirstate doesn't apply to committed csets, this is only created by
overriding basectx.match() in workingctx, and only on icasefs.  The default
arguments have been dropped, because the ctx must be passed to the matcher in
order to function.

For operations that can apply to both wdir and some other context, this ends up
normalizing the filename to the case as it exists in the filesystem, and using
that case for the lookup in the other context.  See the diff example in the
test.

Previously, given a directory with an inexact case:

  - add worked as expected

  - diff, forget and status would silently ignore the request

  - files would exit with 1

  - commit, revert and remove would fail (even when the commands leading up to
    them worked):

        $ hg ci -m "AbCDef" capsdir1/capsdir
        abort: CapsDir1/CapsDir: no match under directory!

        $ hg revert -r '.^' capsdir1/capsdir
        capsdir1\capsdir: no such file in rev 64dae27060b7

        $ hg remove capsdir1/capsdir
        not removing capsdir1\capsdir: no tracked files
        [1]

Globs are normalized, so that the -I and -X don't need to be specified with a
case match.  Without that, the second last remove (with -X) removes the files,
leaving nothing for the last remove.  However, specifying the files as
'glob:**.Txt' does not work.  Perhaps this requires 're.IGNORECASE'?

There are only a handful of places that create matchers directly, instead of
being routed through the context.match() method.  Some may benefit from changing
over to using ctx.match() as a factory function:

  revset.checkstatus()
  revset.contains()
  revset.filelog()
  revset._matchfiles()
  localrepository._loadfilter()
  ignore.ignore()
  fileset.subrepo()
  filemerge._picktool()
  overrides.addlargefiles()
  lfcommands.lfconvert()
  kwtemplate.__init__()
  eolfile.__init__()
  eolfile.checkrev()
  acl.buildmatch()

Currently, a toplevel subrepo can be named with an inexact case.  However, the
path auditor gets in the way of naming _anything_ in the subrepo if the top
level case doesn't match.  That is trickier to handle, because there's the user
provided case, the case in the filesystem, and the case stored in .hgsub.  This
can be fixed next cycle.

  --- a/tests/test-subrepo-deep-nested-change.t
  +++ b/tests/test-subrepo-deep-nested-change.t
  @@ -170,8 +170,15 @@
     R sub1/sub2/test.txt
     $ hg update -Cq
     $ touch sub1/sub2/folder/bar
  +#if icasefs
  +  $ hg addremove Sub1/sub2
  +  abort: path 'Sub1\sub2' is inside nested repo 'Sub1'
  +  [255]
  +  $ hg -q addremove sub1/sub2
  +#else
     $ hg addremove sub1/sub2
     adding sub1/sub2/folder/bar (glob)
  +#endif
     $ hg status -S
     A sub1/sub2/folder/bar
     ? foo/bar/abc

The narrowmatcher class may need to be tweaked when that is fixed.


[1] http://www.selenic.com/pipermail/mercurial-devel/2015-April/068183.html
[2] http://www.selenic.com/pipermail/mercurial-devel/2015-April/068191.html
2015-04-12 01:39:21 -04:00
Matt Harbison
8a1b99d29c match: move _normalize() into the match class
This will be overridden in an upcoming patch to also deal with dirstate
normalization on case insensitive filesystems.
2015-04-12 00:29:17 -04:00
Drew Gottlieb
d01f641c75 treemanifest: further optimize treemanifest.matches()
The matches function was previously traversing all submanifests to look for
matching files, even though it was possible to know if a submanifest won't
contain any matches.

This change adds a visitdir function on the match object to decide quickly if
a directory should be visited when traversing. The function also decides if
_all_ subdirectories should be traversed.

Adding this logic as methods on the match object also makes the logic
modifiable by extensions, such as largefiles.

An example of a command this speeds up is running
  hg status --rev .^ python/
on the Mozilla repo with the treemanifest experiment enabled.
It goes from 2.03s to 1.85s.

More improvements to speed from this change will happen when treemanifests are
lazily loaded. Because a flat manifest is still loaded and then converted
into treemanifests, speed improvements are limited.

This change has no negative effect on speed. For a worst-case example, this
command is not negatively impacted:
  hg status --rev .^ 'relglob:*.js'
on the Mozilla repo. It goes from 2.83s to 2.82s.
2015-04-06 10:51:53 -07:00
Yuya Nishihara
1a9a64a4eb match: remove unused assignment of ctx
ctx is consumed in __init__ to build match patterns and never used after that.
2015-01-17 12:39:44 +09:00
Martin von Zweigbergk
8874cd66a5 match: add isexact() method to hide internals
Comparing a function reference seems bad.
2014-10-29 08:43:39 -07:00
Martin von Zweigbergk
ca1a27676b matcher: make e.g. 'relpath:.' lead to fast paths
Several commands take the fast path when match.always() is
true. However, when the user passes "." on the command line, that
results in a matcher for which match.always() == False. Let's make it
so such matchers return True, and have an empty list of .files(). This
makes e.g. "hg log ." as fast as "hg log" and "hg revert ." as fast as
"hg revert --all" (when run from repo root).
2014-11-19 15:56:58 -08:00
Martin von Zweigbergk
2905a64046 match: don't remove '.' from _includeroots
This makes _includeroots more like _fileroots and gives visitdir() a
nice symmetry in the two.

I'm hoping to later combine the two (_fileroots and _includeroots),
and having them treated similarly should make that step easier to
follow.
2015-05-27 13:22:48 -07:00
Martin von Zweigbergk
18cff35498 match: join two nested if-blocks
Instead of

  if a:
      if b:
          return False

let's write it

  if a and b:
      return False
2015-05-31 13:17:41 -07:00
Martin von Zweigbergk
52fb4fd46c match: drop optimization (?) of 'parentdirs' calculation
It seems unlikely that the optimization to avoid calling util.finddirs
twice will be noticeable, so let's drop it. This makes the two
conditions for includes and regular patterns more similar.
2015-05-27 11:47:55 -07:00
Martin von Zweigbergk
9f6865690f match: break boolean expressions into one operand per line
This makes it much easier to spot both the operators ('and'/'or') and
the operands.
2015-05-27 09:34:00 -07:00
Martin von Zweigbergk
4d5376fa06 match: drop unnecessary removal of '.' from excluded roots
The repo root is nothing special when it comes to what directories to
visit: patterns like '-X relglob:*.py' should not exclude the top
directory, while '-X path:.' should (pointless as such a pattern may
be). The explicit removal of '.' from the set of excluded roots was
probably there to avoid removing the the root directory when any
patterns had been given, but since a619025a07b5 (treemanifest: visit
directory 'foo' when given e.g. '-X foo/ba?', 2015-05-27), we only
exclude directories that should be completely excluded, so we no
longer need to special-case the root directory.
2015-05-27 13:23:35 -07:00
Matt Harbison
21ca967140 narrowmatcher: propagate the rel() method
The full path is propagated to the original match object since this is often
used directly for printing a file name to the user.  This is cleaner than
requiring each caller to join the prefix with the file name prior to calling it,
and will lead to not having to pass the prefix around separately.  It is also
consistent with the bad() and abs() methods in terms of the required input.  The
uipath() method now inherits this path building property.

There is no visible change in path style for rel() because it ultimately calls
util.pathto(), which returns an os.sep based path.  (The previous os.path.join()
was violating the documented usage of util.pathto(), that its third parameter be
'/' separated.)  The doctest needed to be normalized to '/' separators to avoid
test differences on Windows, now that a full path is returned for a short
filename.

The test changes are to drop globs that are no longer necessary when printing an
absolute file in a subrepo, as returned from match.uipath().  Previously when
os.path.join() was used to add the prefix, the absolute path to a file in a
subrepo was printed with a mix of '/' and '\'.  The absolute path for a file not
in a subrepo, as returned from match.uipath(), is still purely '/' based.
2014-11-27 10:16:56 -05:00
Matt Harbison
e9dd83402c match: add the abs() method
This is a utility to make it easier for subrepos to convert a file name to the
full path rooted at the top repository.  It can replace the various path joining
lambdas, and doesn't require the prefix to be passed into the method that wishes
to build such a path.

The name is derived from the following pattern in annotate() and other places:

        name = ((pats and rel) or abs)

The pathname separator is not os.sep in part to avoid confusion with variables
named 'abs' or similar that _are_ '/' separated, and in part because some
methods like cmdutils.forget() and maybe cmdutils.add() need to build a '/'
separated path to the root repo.  This can replace the manual path building
there.
2014-11-28 20:15:46 -05:00
Martin von Zweigbergk
dccbc7f7f3 match: make 'always' and 'exact' functions, not classes
There is no reason for classes 'always' and 'exact' not to be just
functions that return instances the plain 'match' class.
2014-11-01 22:56:49 -07:00
Matt Harbison
d2ece01a89 match: add an optional constructor parameter for a bad() override
This will be used to eliminate monkey patching of new matcher instances that
weren't removed in ce72b8f6bade::8c67445f215f.
2015-06-05 18:56:33 -04:00
Matt Harbison
ea06eb1cd9 match: introduce badmatch() to eliminate long callback chains with subrepos
Various bit of code replace the bad method on matchers, and then delegate to the
original bad method after doing some custom processing.  At least some of these
forget to restore the original method when the need has passed, and then when
the matcher is passed to the next subrepo (even a sibling), another layer is
added such that the chain looks like:

    bad2 -> bad1 -> original

At best, this is a waste of processing, but sometimes spurious messages can be
emitted (e.g. 9f1524f400d9).

The trick with this copy of the matcher is to make sure it is *not* passed to
any subrepo- the original must be passed instead.
2015-06-04 21:19:22 -04:00
Martin von Zweigbergk
0bfa24333c treemanifest: visit directory 'foo' when given e.g. '-X foo/ba?'
For globs like 'foo/ba?', match._roots() will return 'foo'. Since
visitdir(), excludes directories in the excluded roots, it would skip
the entire foo directory. This is incorrect, since 'foo/ba?' doesn't
mean that everything in foo/ should be exluded. Note that visitdir()
is called only from the treemanifest class, so this only affects tree
manifests. Fix by adding roots to the set of excluded roots only if
there are no excluded patterns.

Since 'glob' is the default pattern type for globs, we also need to
update some -X patterns in the tests to be of 'path' type to take
advantage of the visitdir tricks. For consistency, also update the -I
patterns.

It seems a little unfortunate that 'foo' in 'hg files -X foo' is
considered a pattern because of the implied 'glob' type, but improving
that is left for another day.
2015-05-27 10:44:04 -07:00
Matt Harbison
70f9cecb27 match: normpath the ignore source when expanding the 'subinclude' kind
Windows was previously getting this test failure:

  --- e:/Projects/hg/tests/test-hgignore.t
  +++ e:/Projects/hg/tests/test-hgignore.t.err
  @@ -230,6 +230,7 @@

     $ hg status
     ? dir1/file2
  +  ? dir1/subdir/subfile3
     ? dir1/subdir/subfile4
     ? dir2/file1

  @@ -241,4 +242,4 @@
     $ echo "glob:file*2" > dir1/.hgignoretwo

     $ hg status | grep file2
  -  [1]
  +  ? dir1/file2

The problem was 'source' would be in the form "F:\test-hgignore.t\.hgignore", so
when pathutil.dirname() split on '/', 'sourceroot' was empty.  Therefore, 'path'
ended up being relative instead of absolute.
2015-05-27 13:28:16 -04:00
Durham Goode
9ae1033b0d match: enable 'subinclude:' syntax
This adds a new rule syntax that allows the user to include a pattern file, but
only have those patterns match against files underneath the subdirectory of the
pattern file.

This is useful when you have nested projects in a repository and the inner
projects wants to set up ignore rules that won't affect other projects in the
repository. It is also useful in high commit rate repositories for removing the
root .hgignore as a point of contention.
2015-05-16 16:25:05 -07:00
Drew Gottlieb
94d1131d10 match: fix bug in match.visitdir()
There was a bug in my recent change to visitdir (cb39606d374e) due to
the stored generator being iterated over twice. Making the generator into a
list at the start fixes this.
2015-05-22 14:39:34 -07:00
Durham Goode
8328676177 match: allow unioning arbitrary match functions
A future patch will be allowing nested matchers. To support that, let's refactor
_buildmatch to build a list of matchers then return True if any match.

We were already doing that for filesets + regex patterns, but this way will be
more generic.
2015-05-16 16:16:18 -07:00
Durham Goode
dbbbdec2b6 match: add root to _buildmatch
A future patch will make _buildmatch able to expand relative include patterns.
Doing so will require knowing the root of the repo, so let's go ahead and pass
it in.
2015-05-16 16:12:00 -07:00
Martin von Zweigbergk
37105d1059 match: introduce boolean prefix() method
tl;dr: This is another step towards a (previously unstated) goal of
eliminating match.files() in conditions.

There are four types of matchers:

 * always: Matches everything, checked with always(), files() is empty

 * exact: Matches exact set of files, checked with isexact(), files()
   contains the files to match

 * patterns: Matches more complex patterns, checked with anypats(),
   files() contains roots of the matched patterns

 * prefix: Matches simple 'path:' patterns as prefixes ('foo' matches
   both 'foo' and 'foo/bar'), no single method to check, files()
   contains the prefixes to match

For completeness, it would be nice to have a method for checking for
the "prefix" type of matcher as well, so let's add that, making it
return True simply when none of the others do.

The larger goal here is to eliminate uses of match.files() in
conditions (i.e. bool(match.files())). The reason for this is that
there are scenarios when you would like to create a "prefix" matcher
that happens to match no files. One example is for 'hg files -I foo
bar'. The narrowmatcher also restricts the set of files given and it
would not surprise me if have bugs caused by that already. Note that
'if m.files() and not m.anypats()' and similar is sometimes used to
catch the "exact" and "prefix" cases above.
2014-10-28 22:47:22 -07:00
Drew Gottlieb
eb5e31d8eb match: have visitdir() consider includes and excludes
match.visitdir() used to only look at the match's primary pattern roots to
decide if a treemanifest traverser should descend into a particular directory.
This change logically makes visitdir also consider the match's include and
exclude pattern roots (if applicable) to make this decision.

This is especially important for situations like using narrowhg with multiple
treemanifest revlogs.
2015-05-18 14:29:20 -07:00
Durham Goode
8e42385986 ignore: use 'include:' rules instead of custom syntax
Now that the matcher supports 'include:' rules, let's change the dirstate.ignore
creation to just create a matcher with a bunch of includes. This allows us to
completely delete ignore.py.

I moved some of the syntax documentation over to readpatternfile in match.py so
we don't lose it.
2015-05-16 16:06:22 -07:00