Several methods print files relative to the repo root, unless files are named on
the command line, in which case they are printed relative to cwd. Since the
check relies on the 'pats' parameter, which needs to be replaced by a matcher
when adding subrepo support, this logic gets folded into the matcher to tidy up
the callers.
Prior to 7d5fcea60c78, this style decision was based off of whether or not the
'pats' list was empty. That change altered the check to test match.anypats()
instead, in order to make paths printed consistent when -I/-X is specified.
That however, changed the style when a file is given to the command. So now we
test the pattern list to get the old behavior for files, as well as test -I/-X
to get the consistency for patterns.
The 'always' class calls its parent constructor with an empty list of
patterns, which will result in a matcher that always matches. The
parent constructor will set self._always to True in such cases, so
there is no need to set it again.
In match.__init__(), we create the matchfn predicate by and-ing
together the individual predicates for includes, excludes (negated)
and patterns. Instead of the current set of nested if/else blocks, we
can simplify by adding the predicates to a list and defining the
overall predicate in a generic way based on the components. We can
still optimize it for the 0-length and 1-length cases. This way, there
is no combinatorial explosion to deal with if new component predicates
are added, and there is less risk of getting the overall predicate
wrong.
For a pathological .hgignore with over 2500 glob lines and over 200000 calls to
re.escape, and with re2 available, this speeds up parsing the .hgignore from
0.75 seconds to 0.20 seconds. This causes e.g. 'hg status' with hgwatchman
enabled to go from 1.02 seconds to 0.47 seconds.
Previously, a glob pattern of the form 'foo/**/bar' would match 'foo/a/bar' but
not 'foo/bar'. That was because the '**' in 'foo/**/bar' would be translated to
'.*', making the final regex pattern 'foo/.*/bar'. That pattern doesn't match
the string 'foo/bar'.
This is a bug because the '**/' glob matches the empty string in standard Unix
shells like bash and zsh.
Fix that by making the ending '/' optional if an empty string can be matched.
No real changes.
pattern: 'kind:pat' as specified on the command line
patterns, pats: list of patterns
kind: 'path', 'glob' or 're' or ...
pat: string in the corresponding 'kind' format
kindpats: list of (kind, pat) tuples
match.dir is currently called in two different places:
(1) noting when a directory specified explicitly is visited.
(2) noting when a directory is visited during a recursive walk.
purge cares about both, but commit only cares about the first.
Upcoming patches will split the two cases into two different callbacks. Why
bother? Consider a hypothetical extension that can provide more efficient walk
results, via e.g. watching the filesystem. That extension will need to
fall back to a full recursive walk if a callback is set for (2), but not if a
callback is only set for (1).
The fall-back root for walking is the repo root, not no root.
The "roots" do however also end up in m.files() which is used in various ways,
for instance to indicate whether matches are exact. The change could thus have
other impacts.
This improves the performance of log --patch and --stat by about
20% for moderately large manifests (e.g. mozilla-central) for the
common case of no -I/-X patterns.
There are two sets of Python re2 bindings available on the internet;
this code works with both.
Using re2 can greatly improve "hg status" performance when a .hgignore
file becomes even modestly complex.
Example: "hg status" on a clean tree with 134K files, where "hg
debugignore" reports a regexp 4256 bytes in size.
no .hgignore: 1.76 sec
Python re: 2.79
re2: 1.82
The overhead of regexp matching drops from 1.03 seconds with stock
re to 0.06 with re2.
(For comparison, a git repo with the same contents and .gitignore
file runs "git status -s" in 1.71 seconds, i.e. only slightly faster
than hg with re2.)
'exact' match objects are sometimes created with a non-list 'pattern'
argument:
- using 'set' in queue.refresh():hgext/mq.py
match = scmutil.matchfiles(repo, set(c[0] + c[1] + c[2] + inclsubs))
- using 'dict' in revert():mercurial/cmdutil.py (names = {})
m = scmutil.matchfiles(repo, names)
'exact' match objects return specified 'pattern' to callers of
'match.files()' as it is, so it is a non-list object.
but almost all implementations expect 'match.files()' to return a list
object, so this may causes problems: e.g. exception for "+" with
another list object.
this patch ensures that '_files' of 'exact' match objects is a list
object.
for non 'exact' match objects, parsing specified 'pattern' already
ensures that it it a list one.
Introduce match.always() to check if a match object always says yes, i.e.
None was passed in. If so, mfmatches should not bother iterating every file in
the repository.
Matt suggested this on IRC, I do not think the choice is obvious, but this one
makes things simpler because while filesets are turned into a list of files
into the match objects, it would more be difficult to tell invalid files passed
in pats from those expanded from filesets.
We want util.readfile() to operate in binary mode, so EOLs have to be handled
correctly depending on the platform. It seems both easier and more convenient
to treat LF and CRLF the same way on all platforms.
These leaks may occur in environments that don't employ a reference
counting GC, i.e. PyPy.
This implies:
- changing opener(...).read() calls to opener.read(...)
- changing opener(...).write() calls to opener.write(...)
- changing open(...).read(...) to util.readfile(...)
- changing open(...).write(...) to util.writefile(...)
For GUI clients its sometimes important to know which files will be ignored and
which files will be important. This allows the GUI client to skipping redoing a
'hg status' when the files are ignored but have changed. (For instance, a
typical case is that the "build" directory inside some project is ignored but
files in it frequently change.)