This improves the performance of log --patch and --stat by about
20% for moderately large manifests (e.g. mozilla-central) for the
common case of no -I/-X patterns.
There are two sets of Python re2 bindings available on the internet;
this code works with both.
Using re2 can greatly improve "hg status" performance when a .hgignore
file becomes even modestly complex.
Example: "hg status" on a clean tree with 134K files, where "hg
debugignore" reports a regexp 4256 bytes in size.
no .hgignore: 1.76 sec
Python re: 2.79
re2: 1.82
The overhead of regexp matching drops from 1.03 seconds with stock
re to 0.06 with re2.
(For comparison, a git repo with the same contents and .gitignore
file runs "git status -s" in 1.71 seconds, i.e. only slightly faster
than hg with re2.)
'exact' match objects are sometimes created with a non-list 'pattern'
argument:
- using 'set' in queue.refresh():hgext/mq.py
match = scmutil.matchfiles(repo, set(c[0] + c[1] + c[2] + inclsubs))
- using 'dict' in revert():mercurial/cmdutil.py (names = {})
m = scmutil.matchfiles(repo, names)
'exact' match objects return specified 'pattern' to callers of
'match.files()' as it is, so it is a non-list object.
but almost all implementations expect 'match.files()' to return a list
object, so this may causes problems: e.g. exception for "+" with
another list object.
this patch ensures that '_files' of 'exact' match objects is a list
object.
for non 'exact' match objects, parsing specified 'pattern' already
ensures that it it a list one.
Introduce match.always() to check if a match object always says yes, i.e.
None was passed in. If so, mfmatches should not bother iterating every file in
the repository.
Matt suggested this on IRC, I do not think the choice is obvious, but this one
makes things simpler because while filesets are turned into a list of files
into the match objects, it would more be difficult to tell invalid files passed
in pats from those expanded from filesets.
We want util.readfile() to operate in binary mode, so EOLs have to be handled
correctly depending on the platform. It seems both easier and more convenient
to treat LF and CRLF the same way on all platforms.
These leaks may occur in environments that don't employ a reference
counting GC, i.e. PyPy.
This implies:
- changing opener(...).read() calls to opener.read(...)
- changing opener(...).write() calls to opener.write(...)
- changing open(...).read(...) to util.readfile(...)
- changing open(...).write(...) to util.writefile(...)
For GUI clients its sometimes important to know which files will be ignored and
which files will be important. This allows the GUI client to skipping redoing a
'hg status' when the files are ignored but have changed. (For instance, a
typical case is that the "build" directory inside some project is ignored but
files in it frequently change.)