Commit Graph

146 Commits

Author SHA1 Message Date
Martin von Zweigbergk
64acd1e09e match: use match.prefix() in subdirmatcher
It seems like the subdirmatcher should be checking if the matcher it's
based on is matching prefixes. It was effectively doing that already
because "prefix() == not always() and not anypats() and not
isexact()", subdirmatcher was checking the first two parts of that
condition and I don't think it will ever be given an "exact" matcher
with it's directory name (because exact matchers are for matching
files, not directories). Still, let's switch to using prefix() for
clarity (and because I'm trying to remove code that reaches for
matchers internals).
2017-05-17 22:33:15 -07:00
Martin von Zweigbergk
096fa56147 match: avoid accessing match._pathrestricted from subdirmatcher
Accessing only the public API wherever possible helps us refactor
matchers later.
2017-05-12 16:31:21 -07:00
Martin von Zweigbergk
ddbf56e07e match: override visitdir() the usual way in subdirmatcher
Just override the function instead of replacing it on each instance.
2017-05-18 10:17:57 -07:00
Martin von Zweigbergk
767cd2bb63 match: make _fileroots a @propertycache and rename it to _fileset
The files in the set are not necesserily roots of anything. Making it
a @propertycache will help towards extracting a base class for
matchers.
2017-05-18 09:04:37 -07:00
Martin von Zweigbergk
3bc2187d25 match: remove ispartial()
The function was added in c2498bb6d298 (match: add match.ispartial(),
2015-05-15) for use by narrowhg, but narrowhg never ended up needing
it.
2017-05-17 09:43:50 -07:00
Martin von Zweigbergk
c3406ac3db cleanup: use set literals
We no longer support Python 2.6, so we can now use set literals.
2017-02-10 16:56:29 -08:00
Martin von Zweigbergk
41765f4b1b match: optimize visitdir() for patterns matching only root directory
Because _rootsanddirs() returns a list of directories to visit
recursively and a list of directories to visit non-recursively. For
patterns such as 'rootfilesin:foo/bar', we clearly need to visit the
directory foo/bar, but we also need to visit its parents. The method
therefore uses util.dirs() to find the parent directories of
'foo/bar'. That method does not include the root directory, but since
we obviously need to visit the root directory, we always added '.' to
the set of directories to visit non-recursively.

The visitdir() method had special handling to consider set(['.']) to
mean that no includes had been specified and would thus visit all
directories. However, when the pattern is 'rootfilesin:.', set(['.'])
is actually the real set of directories to visit and the special
handling of that set meant that all directories got visited instead of
just the root directory.

The fix is simple: add '.' to the set of parent directories in
_rootsanddirs() and stop treating set(['.']) specially. This makes

  hg files -r .  -I rootfilesin:.

in a treemanifest version of the Firefox repo go from 1.5s to 0.26s on
warm disk (and a *much* bigger improvement on cold disk).

Note that the -I is necessary for no good reason. We just haven't
optimized visitdir() for regular (non-include, non-exclude) patterns
yet.
2017-05-05 08:49:07 -07:00
Durham Goode
5566fd666f match: make subinclude construction lazy
The matcher subinclude functionality allows us to have .hgignore files that
include subdirectory hgignore files. Today it parses the entire repo at once,
even if we only need to test a file in one subdirectory. This patch makes the
subinclude tree creation lazy, which speeds up matcher creation significantly in
large repos with very large trees of ignore patterns.
2017-05-03 10:30:57 -07:00
Pierre-Yves David
be968edfac match: explicitly tests for None
Changeset ba7f2a1cc2d2 removed the mutable default value, but did not explicitly
tested for None. Such implicit testing can introduce semantic and performance
issue. We move to an explicit testing for None as recommended by PEP8:

https://www.python.org/dev/peps/pep-0008/#programming-recommendations
2017-03-15 15:08:45 -07:00
Pulkit Goyal
5a7c5d918e match: slice over bytes to get the byteschr instead of ascii value 2017-03-16 08:03:51 +05:30
Pulkit Goyal
0bad6ee6aa match: make regular expression bytes to prevent TypeError 2017-03-16 07:52:47 +05:30
Rishabh Madan
6a6d5ec05c py3: open file in rb mode 2017-03-15 14:51:18 +05:30
Gregory Szorc
98c99b99fa match: don't use mutable default argument value
There shouldn't be a big perf hit creating a new object because
this function is complicated and does things that dwarf the cost
of creating a new PyObject.
2017-03-12 21:53:03 -07:00
Rodrigo Damazio Bovendorp
0bd84ceb64 match: making visitdir() deal with non-recursive entries
Primarily as an optimization to avoid recursing into directories that will
never have a match inside, this classifies each matcher pattern's root as
recursive or non-recursive (erring on the side of keeping it recursive,
which may lead to wasteful directory or manifest walks that yield no matches).

I measured the performance of "rootfilesin" in two repos:
- The Firefox repo with tree manifests, with
  "hg files -r . -I rootfilesin:browser".
  The browser directory contains about 3K files across 249 subdirectories.
- A specific Google-internal directory which contains 75K files across 19K
  subdirectories, with "hg files -r . -I rootfilesin:REDACTED".

I tested with both cold and warm disk caches. Cold cache was produced by
running "sync; echo 3 > /proc/sys/vm/drop_caches". Warm cache was produced
by re-running the same command a few times.

These were the results:

               Cold cache           Warm cache
             Before   After      Before   After
firefox      0m5.1s   0m2.18s   0m0.22s  0m0.14s
google3 dir  2m3.9s   0m1.57s   0m8.12s  0m0.16s

Certain extensions, notably narrowhg, can depend on this for correctness
(not trying to recurse into directories for which it has no information).
2017-02-13 17:03:14 -08:00
Rodrigo Damazio Bovendorp
8c5ff96dc1 match: adding support for matching files inside a directory
This adds a new "rootfilesin" matcher type which matches files inside a
directory, but not any subdirectories (so it matches non-recursively).
This has the "root" prefix per foozy's plan for other matchers (rootglob,
rootpath, cwdre, etc.).
2017-02-13 15:39:29 -08:00
Jun Wu
d9d6d37c5a match: migrate to util.iterfile 2016-11-14 23:16:05 +00:00
liscju
c7ec9d159e i18n: translate abort messages
I found a few places where message given to abort is
not translated, I don't find any reason to not translate
them.
2016-06-14 11:53:55 +02:00
Martin von Zweigbergk
cc3e946787 match: override 'visitdir' in subdirmatcher
The manifest.manifest class has a _treeinmem member than one can
manually set to True to test that the treemanifest class works as a
drop-in replacement for manifestdict (which is mostly a requirement
for treemanifest repos to work). However, it doesn't quite work at the
moment. These tests fail:

test-largefiles-misc.t
test-rebase-newancestor.t
test-subrepo.t
test-subrepo-deep-nested-change.t
test-subrepo-recursion.t

All but test-rebase-newancestor.t fail because they trigger calls to
subdirmatcher.visitdir(), which tries to access a _excluderoots field
that does not exist on the subdirmatcher. Let's fix that by overriding
visitdir() in a similar way to how matchfn is overridden, i.e. by
prepending the directory before calling the superclass method.
2016-02-05 21:25:44 -08:00
Martin von Zweigbergk
c84bb33a89 match: rename "narrowmatcher" to "subdirmatcher" (API)
I keep mistaking "narrowmatcher" for narrowhg's
narrowmatcher. "subdirmatcher" seems more to the point anyway.
2016-02-05 21:09:32 -08:00
Laurent Charignon
b43822e77f match: add option to return line and lineno from readpattern
This will be used to display the line and linenumber of ignorefile that matched
an ignored file (issue4856).
2015-12-26 19:40:38 -08:00
Martin von Zweigbergk
50de24bc06 treemanifest: don't iterate entire matching submanifests on match()
Before a4236180df5e (match: remove unnecessary optimization where
visitdir() returns 'all', 2015-05-06), match.visitdir() used to return
the special value 'all' to indicate that it was known that all
subdirectories would also be included in the match. The purpose for
that value was to avoid calling the matcher on all the paths. It
turned out that calling the matcher was not a problem, so the special
return value was removed and the code was simplified. However, if we
use the same special value for not just avoiding calling the matcher
on each file, but to avoid iterating over each file, it's a much
bigger win. On commands like

  hg st --rev .^ --rev . dom/

we run the matcher (dom/) on the two manifests, then diff the narrowed
manifest. If the size of the match is much larger than the size of the
diff, this is wasteful. In the above case, we would end up iterating
over the 15k-or-so files in dom/ for each of the manifests, only to
later discover that they are mostly the same. This means that runningt
the command above is usually slower than getting the status for the
entire repo, because that code avoids calling treemanifest.match() and
only calls treemanifest.diff(), which loads only what's needed for the
diff.

Let's fix this by reintroducing the 'all' value in match.visitdir()
and making treemanifest.match() return a lazy copy of the manifest
from dom/ and down (in the above case). This speeds up the above
command on the Firefox repo from 0.357s to 0.137s (best of 5). The
wider the match, the bigger the speedup.
2015-12-12 09:57:05 -08:00
Bryan O'Sullivan
319ff5c06a match: use re2 in readpatternfile if possible
This has a small, but measurable, effect on performance if a pattern
file is very large.  In an artificial test with 200,000 lines of
pattern data, using re2 reduced read time by 200 milliseconds.
2015-12-10 21:33:55 -08:00
Mads Kiilerich
09567db49a spelling: trivial spell checking 2015-10-17 00:58:46 +02:00
Pierre-Yves David
30913031d4 error: get Abort from 'error' instead of 'util'
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.

For great justice.
2015-10-08 12:55:45 -07:00
Matt Mackall
dc3c45835d merge with stable 2015-08-12 17:01:50 -05:00
Matt Harbison
3c940eb21b match: fix a caseonly rename + explicit path commit on icasefs (issue4768)
The problem was that the former name and the new name are both normalized to the
case in dirstate, so matcher._files would be ['ABC.txt', 'ABC.txt'].
localrepo.commit() calls localrepo.status(), passing along the matcher.  Inside
dirstate.status(), _walkexplicit() simply grabs matcher.files() and processes
those items.  Since the old name isn't present, it is silently dropped.  There's
a fundamental tension here, because the status command should also accept files
that don't match the filesystem, so we can't drop the normalization in status.
The problem originated in d70aa474bd84.

Unfortunately with this change, the case of the old file must still be specified
exactly, or the old file is again silently excluded.  I went back to
d70aa474bd84^, and that had the same behavior, so we are no worse off.  I'm open
to ideas from a matcher or dirstate expert on how to fix that half.
2015-08-06 21:00:16 -04:00
Yuya Nishihara
445c9fa91e ignore: fix path concatenation of .hgignore on Windows
Since 862fc5f60fef, .hgignore is ignored on Windows because a pat may have
a drive letter, but pathutil.join is posixpath.join.

  "z:\foo\bar/z:\foo\bar\.hgignore"

Instead, this patch uses os.path.join() and util.localpath() to process both
parts as file-system paths.

Maybe we can remove os.path.join() at dirstate._ignore because 'include:' is
resolved relative to repo root? It was introduced by 29ce78e5f35c.
2015-07-27 22:14:40 +09:00
Durham Goode
677efb706e ignore: fix include: rules depending on current directory (issue4759)
When reading pattern files, we just call open(path), which is relative to the
current directory.  Let's fix this by resolving the paths before attempting to
read the file.
2015-07-24 16:44:52 -07:00
Gregory Szorc
02e789ba1d match: use absolute_import 2015-08-08 19:39:45 -07:00
Matt Mackall
0508027d17 merge with stable 2015-06-24 13:41:27 -05:00
Gregory Szorc
5380dea2a7 global: mass rewrite to use modern exception syntax
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".

This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.

This patch was produced by running `2to3 -f except -w -n .`.
2015-06-23 22:20:08 -07:00
Matt Harbison
f536f82114 match: let 'path:.' and 'path:' match everything (issue4687)
Previously, both queries exited with code 1, printing nothing.  The pattern in
the latter query is normalized to '.', so it is really the same case.
2015-06-20 19:59:26 -04:00
Matt Harbison
3adb63e5c0 match: explicitly naming a subrepo implies always() for the submatcher
The files command supports naming directories to limit the output to children of
that directory, and it also supports -S to force recursion into a subrepo.  But
previously, using -S and naming a subrepo caused nothing to be output.  The
reason was narrowmatcher() strips the current subrepo path off of each file,
which would leave an empty list if only the subrepo was named.

When matching on workingctx, dirstate.matches() would see the submatcher is not
always(), so it returned the list of files in dmap for each file in the matcher-
namely, an empty list.  If a directory in a subrepo was named, the output was as
expected, so this was inconsistent.

The 'not anypats()' check is enforced by an existing test around line 140:

    $ hg remove -I 're:.*.txt' sub1

Without the check, this removed all of the files in the subrepo.
2015-05-17 22:09:37 -04:00
Matt Harbison
041a91f971 match: add a subclass for dirstate normalizing of the matched patterns
This class is only needed on case insensitive filesystems, and only
for wdir context matches. It allows the user to not match the case of
the items in the filesystem- especially for naming directories, which
dirstate doesn't handle[1]. Making dirstate handle mismatched
directory cases is too expensive[2].

Since dirstate doesn't apply to committed csets, this is only created by
overriding basectx.match() in workingctx, and only on icasefs.  The default
arguments have been dropped, because the ctx must be passed to the matcher in
order to function.

For operations that can apply to both wdir and some other context, this ends up
normalizing the filename to the case as it exists in the filesystem, and using
that case for the lookup in the other context.  See the diff example in the
test.

Previously, given a directory with an inexact case:

  - add worked as expected

  - diff, forget and status would silently ignore the request

  - files would exit with 1

  - commit, revert and remove would fail (even when the commands leading up to
    them worked):

        $ hg ci -m "AbCDef" capsdir1/capsdir
        abort: CapsDir1/CapsDir: no match under directory!

        $ hg revert -r '.^' capsdir1/capsdir
        capsdir1\capsdir: no such file in rev 64dae27060b7

        $ hg remove capsdir1/capsdir
        not removing capsdir1\capsdir: no tracked files
        [1]

Globs are normalized, so that the -I and -X don't need to be specified with a
case match.  Without that, the second last remove (with -X) removes the files,
leaving nothing for the last remove.  However, specifying the files as
'glob:**.Txt' does not work.  Perhaps this requires 're.IGNORECASE'?

There are only a handful of places that create matchers directly, instead of
being routed through the context.match() method.  Some may benefit from changing
over to using ctx.match() as a factory function:

  revset.checkstatus()
  revset.contains()
  revset.filelog()
  revset._matchfiles()
  localrepository._loadfilter()
  ignore.ignore()
  fileset.subrepo()
  filemerge._picktool()
  overrides.addlargefiles()
  lfcommands.lfconvert()
  kwtemplate.__init__()
  eolfile.__init__()
  eolfile.checkrev()
  acl.buildmatch()

Currently, a toplevel subrepo can be named with an inexact case.  However, the
path auditor gets in the way of naming _anything_ in the subrepo if the top
level case doesn't match.  That is trickier to handle, because there's the user
provided case, the case in the filesystem, and the case stored in .hgsub.  This
can be fixed next cycle.

  --- a/tests/test-subrepo-deep-nested-change.t
  +++ b/tests/test-subrepo-deep-nested-change.t
  @@ -170,8 +170,15 @@
     R sub1/sub2/test.txt
     $ hg update -Cq
     $ touch sub1/sub2/folder/bar
  +#if icasefs
  +  $ hg addremove Sub1/sub2
  +  abort: path 'Sub1\sub2' is inside nested repo 'Sub1'
  +  [255]
  +  $ hg -q addremove sub1/sub2
  +#else
     $ hg addremove sub1/sub2
     adding sub1/sub2/folder/bar (glob)
  +#endif
     $ hg status -S
     A sub1/sub2/folder/bar
     ? foo/bar/abc

The narrowmatcher class may need to be tweaked when that is fixed.


[1] http://www.selenic.com/pipermail/mercurial-devel/2015-April/068183.html
[2] http://www.selenic.com/pipermail/mercurial-devel/2015-April/068191.html
2015-04-12 01:39:21 -04:00
Matt Harbison
8a1b99d29c match: move _normalize() into the match class
This will be overridden in an upcoming patch to also deal with dirstate
normalization on case insensitive filesystems.
2015-04-12 00:29:17 -04:00
Drew Gottlieb
d01f641c75 treemanifest: further optimize treemanifest.matches()
The matches function was previously traversing all submanifests to look for
matching files, even though it was possible to know if a submanifest won't
contain any matches.

This change adds a visitdir function on the match object to decide quickly if
a directory should be visited when traversing. The function also decides if
_all_ subdirectories should be traversed.

Adding this logic as methods on the match object also makes the logic
modifiable by extensions, such as largefiles.

An example of a command this speeds up is running
  hg status --rev .^ python/
on the Mozilla repo with the treemanifest experiment enabled.
It goes from 2.03s to 1.85s.

More improvements to speed from this change will happen when treemanifests are
lazily loaded. Because a flat manifest is still loaded and then converted
into treemanifests, speed improvements are limited.

This change has no negative effect on speed. For a worst-case example, this
command is not negatively impacted:
  hg status --rev .^ 'relglob:*.js'
on the Mozilla repo. It goes from 2.83s to 2.82s.
2015-04-06 10:51:53 -07:00
Yuya Nishihara
1a9a64a4eb match: remove unused assignment of ctx
ctx is consumed in __init__ to build match patterns and never used after that.
2015-01-17 12:39:44 +09:00
Martin von Zweigbergk
8874cd66a5 match: add isexact() method to hide internals
Comparing a function reference seems bad.
2014-10-29 08:43:39 -07:00
Martin von Zweigbergk
ca1a27676b matcher: make e.g. 'relpath:.' lead to fast paths
Several commands take the fast path when match.always() is
true. However, when the user passes "." on the command line, that
results in a matcher for which match.always() == False. Let's make it
so such matchers return True, and have an empty list of .files(). This
makes e.g. "hg log ." as fast as "hg log" and "hg revert ." as fast as
"hg revert --all" (when run from repo root).
2014-11-19 15:56:58 -08:00
Martin von Zweigbergk
2905a64046 match: don't remove '.' from _includeroots
This makes _includeroots more like _fileroots and gives visitdir() a
nice symmetry in the two.

I'm hoping to later combine the two (_fileroots and _includeroots),
and having them treated similarly should make that step easier to
follow.
2015-05-27 13:22:48 -07:00
Martin von Zweigbergk
18cff35498 match: join two nested if-blocks
Instead of

  if a:
      if b:
          return False

let's write it

  if a and b:
      return False
2015-05-31 13:17:41 -07:00
Martin von Zweigbergk
52fb4fd46c match: drop optimization (?) of 'parentdirs' calculation
It seems unlikely that the optimization to avoid calling util.finddirs
twice will be noticeable, so let's drop it. This makes the two
conditions for includes and regular patterns more similar.
2015-05-27 11:47:55 -07:00
Martin von Zweigbergk
9f6865690f match: break boolean expressions into one operand per line
This makes it much easier to spot both the operators ('and'/'or') and
the operands.
2015-05-27 09:34:00 -07:00
Martin von Zweigbergk
4d5376fa06 match: drop unnecessary removal of '.' from excluded roots
The repo root is nothing special when it comes to what directories to
visit: patterns like '-X relglob:*.py' should not exclude the top
directory, while '-X path:.' should (pointless as such a pattern may
be). The explicit removal of '.' from the set of excluded roots was
probably there to avoid removing the the root directory when any
patterns had been given, but since a619025a07b5 (treemanifest: visit
directory 'foo' when given e.g. '-X foo/ba?', 2015-05-27), we only
exclude directories that should be completely excluded, so we no
longer need to special-case the root directory.
2015-05-27 13:23:35 -07:00
Matt Harbison
21ca967140 narrowmatcher: propagate the rel() method
The full path is propagated to the original match object since this is often
used directly for printing a file name to the user.  This is cleaner than
requiring each caller to join the prefix with the file name prior to calling it,
and will lead to not having to pass the prefix around separately.  It is also
consistent with the bad() and abs() methods in terms of the required input.  The
uipath() method now inherits this path building property.

There is no visible change in path style for rel() because it ultimately calls
util.pathto(), which returns an os.sep based path.  (The previous os.path.join()
was violating the documented usage of util.pathto(), that its third parameter be
'/' separated.)  The doctest needed to be normalized to '/' separators to avoid
test differences on Windows, now that a full path is returned for a short
filename.

The test changes are to drop globs that are no longer necessary when printing an
absolute file in a subrepo, as returned from match.uipath().  Previously when
os.path.join() was used to add the prefix, the absolute path to a file in a
subrepo was printed with a mix of '/' and '\'.  The absolute path for a file not
in a subrepo, as returned from match.uipath(), is still purely '/' based.
2014-11-27 10:16:56 -05:00
Matt Harbison
e9dd83402c match: add the abs() method
This is a utility to make it easier for subrepos to convert a file name to the
full path rooted at the top repository.  It can replace the various path joining
lambdas, and doesn't require the prefix to be passed into the method that wishes
to build such a path.

The name is derived from the following pattern in annotate() and other places:

        name = ((pats and rel) or abs)

The pathname separator is not os.sep in part to avoid confusion with variables
named 'abs' or similar that _are_ '/' separated, and in part because some
methods like cmdutils.forget() and maybe cmdutils.add() need to build a '/'
separated path to the root repo.  This can replace the manual path building
there.
2014-11-28 20:15:46 -05:00
Martin von Zweigbergk
dccbc7f7f3 match: make 'always' and 'exact' functions, not classes
There is no reason for classes 'always' and 'exact' not to be just
functions that return instances the plain 'match' class.
2014-11-01 22:56:49 -07:00
Matt Harbison
d2ece01a89 match: add an optional constructor parameter for a bad() override
This will be used to eliminate monkey patching of new matcher instances that
weren't removed in ce72b8f6bade::8c67445f215f.
2015-06-05 18:56:33 -04:00
Matt Harbison
ea06eb1cd9 match: introduce badmatch() to eliminate long callback chains with subrepos
Various bit of code replace the bad method on matchers, and then delegate to the
original bad method after doing some custom processing.  At least some of these
forget to restore the original method when the need has passed, and then when
the matcher is passed to the next subrepo (even a sibling), another layer is
added such that the chain looks like:

    bad2 -> bad1 -> original

At best, this is a waste of processing, but sometimes spurious messages can be
emitted (e.g. 9f1524f400d9).

The trick with this copy of the matcher is to make sure it is *not* passed to
any subrepo- the original must be passed instead.
2015-06-04 21:19:22 -04:00
Martin von Zweigbergk
0bfa24333c treemanifest: visit directory 'foo' when given e.g. '-X foo/ba?'
For globs like 'foo/ba?', match._roots() will return 'foo'. Since
visitdir(), excludes directories in the excluded roots, it would skip
the entire foo directory. This is incorrect, since 'foo/ba?' doesn't
mean that everything in foo/ should be exluded. Note that visitdir()
is called only from the treemanifest class, so this only affects tree
manifests. Fix by adding roots to the set of excluded roots only if
there are no excluded patterns.

Since 'glob' is the default pattern type for globs, we also need to
update some -X patterns in the tests to be of 'path' type to take
advantage of the visitdir tricks. For consistency, also update the -I
patterns.

It seems a little unfortunate that 'foo' in 'hg files -X foo' is
considered a pattern because of the implied 'glob' type, but improving
that is left for another day.
2015-05-27 10:44:04 -07:00