The wiki page is intended to describe several solution to the requirement issue.
Some of those solutions does not involve upgrading mercurial. That is very
useful for people that can't easily upgrade they Mercurial in some place.
Upcoming patches will allow the file cache to watch over multiple files, and
call the decorated function again if any of the files change.
The particular use case for this is the bookmark store, which needs to be
invalidated if either .hg/bookmarks or .hg/bookmarks.current changes. (This
doesn't currently happen, which is a bug. This bug will also be fixed in
upcoming patches.)
Before this patch, almost all of check code in
"casecollisionauditor.__call__()" is executed, even if specified
filename is already checked, because "f in self._newfiles" is examined
lastly.
In addition to it, adding "fl" to "self._loweredfiles" and "f" to
"self._newfiles" are also redundant in such case.
This patch checks "f in self._newfiles" first, and returns immediately
to avoid execution of check code for efficiency.
This patch also changes paths added to "rejected" list from full path
(referred by "p") to relative one (referred by "f"), when type of
target path is neither file nor symlink.
This change should be reasonable, because the path added to "rejected"
list is relative one, when "OSError" is raised at "lstat()".
Just invoking "os.fstat()" with "file.fileno()" doesn't require non
ANSI file API, because filename is not used for invocation of
"os.fstat()".
But "util.fstat()" should invoke "os.stat()" with "fp.name", if file
object doesn't have "fileno()" method for portability, and "fp.name"
may cause invocation of non ANSI file API.
So, this patch makes the constructor of appender class invoke
"util.fstat()" via vfs, to encapsulate filename handling.
Full walks are only necessary when the caller needs the list of clean files.
addremove by definition doesn't need them.
With this patch and an extension that produces a subset of results for
dirstate.walk when full is False, on a repository with over 200,000 files, hg
addremove drops from 2.8 seconds to 0.5.
Several places use scmutil.addremove as a means to declare that certain files
have been operated on. This is ugly because:
- addremove takes patterns relative to the cwd, not paths relative to the root,
which means extra contortions for callers.
- addremove doesn't make clear what happens to files whose status hasn't
changed.
This new method accepts filenames relative to the repo root, and has a much
clearer contract. It also allows future modifications that do more with files
whose status hasn't changed.
An upcoming patch will refactor some code out into a method called
_findrenames. Having a line saying "copies = _findrenames..." is confusing.
Besides, 'renames' is a more precise name for this local anyway.
Before this patch, vfs constructor applies both "util.expandpath()"
and "os.path.realpath()" on "base" path, if "expand" is True.
This patch splits it into "realpath" and "expandpath", to apply each
functions separately: this splitting can allow to use vfs also where
one of each is not needed.
This is over twice as fast as the Python dirs code. Upcoming changes
will nearly double its speed again.
perfdirs results for a working dir with 170,000 files:
Python 638 msec
C 244
This encapsulates the "multiset of directories" structures that are
currently open-coded (and duplicated) in both the dirstate and
context modules.
This will be used, and optionally replaced by a C implementation,
in upcoming changes.
Now that we no longer sort all the walk results, using iteritems becomes
possible.
This is a relatively minor speedup: on a large repository with 170,000 files,
perfaddremove goes from 2.13 seconds to 2.10.
The only place where the order matters is in printing out added or removed
files. We already sort that set.
On a large repository with 170,000 files, this speeds up perfaddremove from
2.34 seconds to 2.13.
This will let us stop sorting all the results in an upcoming patch.
This also permits future refactorings where the same code consumes
dirstate.walk results but doesn't print anything out.
An argument could be made that printing out results as we go along is more
responsive UI-wise. However, at this point iterating through walk results is
actually faster than sorting them, so once we stop sorting all the results the
argument ceases to be valid.
dirstate.walk only does lstats and never returns stat objects for directories.
On a large repository with 170,000 files, this speeds perfaddremove up from
2.40 seconds to 2.34.
This parallels what's done for the util module, which imports either
mercurial.posix or mercurial.windows as 'platform' and then slurps the
appropriate functions into its own namespace.
With the new parallel update code, it is possible for multiple
workers to try to create a hierarchy of directories at the same
time. This is hard to trigger in general, but most likely during
initial checkout.
To deal with these races, we introduce a new ensuredirs function
whose contract is to ensure that a directory hierarchy exists - it
will ignore a failure that implies that the desired directory already
exists.
Now that dirstate.walk returns None for paths under symlink directories,
addremove doesn't need to validate each path it sees to look for files
under symlinks.
On a large repository this brings addremove from 6.3 seconds down to 3.65 (42%)
since addremove no longer has to stat every directory of every file to determine
if the file is inside a symlink directory. I put it through our benchmark and
see no perf hit to any other commands.
The pathauditor currently throws exceptions when it encounters an invalid
path. This change adds a method to allow people to treat it as a boolean.
This is currently used by scmutil.addremove and in a subsequent patch it
will be used by dirstate.walk
If there is no outgoiing changesets but we have filtered revision in outgoing.excluded
We run into a filtering related crash. The excluded revision should not be there
in the first place but discovery need cleanup in default, not stable.
Bookmarks/branches/tags shouldn't be allowed to be integers because that
overlaps with revision numbers. Right now if a user created one they can't
use it anyway because the revision numbers take precedence.
The check only happens when creating a new bookmark/etc from a command so it
shouldn't affect existing bookmarks/branches/tags or importing branches from
git.
This fix was prompted by us having a user create a bookmark named "404" then
accidentally checkout a very old version of our repository.
The dirstate walk results contain the stat information for each path, so we
don't need to query it again. On a large repo this makes addremove go from
8.35 seconds to 7.1 (15%).
Previously the addremove code queried the dirstate 4 times per path. Now it
only does so once. On a large repo this brings addremove from 9.5 seconds
to 8.35 seconds (12%).
For the special case, ":null" we remove the implied revision 0 since that
wouldn't make any sense here. A test case is added to make sure only nullrev is
shown.
Preserve the invariant that if P is a filecached property on X then
P in X.__dict__ => P in X._filecache.
Previously, it was possible for a filecached property to become out of sync
with the filesystem if it was set before getting it first, since the initial
filecacheentry was created in __get__.
Old behaviour:
repo.prop = x
repo.invalidate() # prop has no entry in _filecache, it's not removed
# from __dict__
repo.prop # returns x like before without checking with the
# filesystem
New:
repo.prop = x # an empty entry is created in _filecache
repo.invalidate() # prop is removed from __dict__
repo.prop # recreates prop
For changelog level filtering to take effect it need to be used for any
iteration.
This changeset removes usage of `range` and `xrange` that survived the first
pass.
This factors out the checks from tags and bookmarks, and newly applies
the same prohibitions to branches. checknewlabel takes a new parameter,
kind, indicating the kind of label being checked.
Test coverage is added for all three types of labels.
For now the new function only checks to make sure the new label name
isn't a reserved name ('tip', '.', or 'null'). Eventually more of the
checks will be unified between the different types of labels.
The `tag` command is trivially updated to use it. Updating branches and
bookmarks to use it is slightly more invasive and thus reserved for
later patches.
This patch invokes "osutil.listdir()" via vfs object.
The function added newly to "abstractvfs" is named not as "listdir()"
but as "readdir()", because:
- "os.listdir()" seems to be more familiar as "listdir()" than
"osutil.listdir()"
- "osutil.listdir()" returns also type of each files like
"readdir()" POSIX API: even though "d_type" field of "dirent"
structure is defined mainly only on BSD/Linux
This patch invokes "osutil.listdir()" via "rawvfs" object to avoid
filename encoding, because the path passed to "osutil.listdir()"
shouldn't be encoded.
This patch also omits importing "osutil" module, because it is no
longer used.
This just replaces "os.stat()" invocation: refactoring around
"self.createmode" and "vfs.createmode" initialization is omitted.
This patch also newly adds "stat()" function to "abstractvfs".
This patch defines "join()" in each classes derived from "abstractvfs"
except "vfs", which already defines it.
This allows all vfs instances to be used for indirect file API
invocation.
Definition functions for vfs support in dictionary order increases
readability/maintainability, because there are functions which invoke
file API:
- with same name: "os.listdir" and "osutil.listdir", for example
- with ambiguous names: "os.mkdir" and "util.makedirs", for example
For backwards compatibility, aliases for the old names are added,
except for "abstractopener", "statichttpopener" and "_fncacheopener",
because these are not used in Mercurial core implementation after this
patch.
"_fncacheopener" was only referred in "fncachestore" constructor, so
this patch also renames from "_fncacheopener" to "_fncachevfs" there.
Before this change, push would incorrectly fast-path the bundle
generation when extinct changesets are involved, because they are not
added to outgoing.excluded. The reason to do so are related to
outgoing.excluded being assumed to contain only secret changesets by
scmutil.nochangesfound(), when displaying warnings like:
changes found (ignored 9 secret changesets)
Still, outgoing.excluded seems like a good API to report the extinct
changesets instead of dedicated code and nothing in the docstring
indicates it to be bound to secret changesets. This patch adds extinct
changesets to outgoing.excluded and fixes scmutil.nochangesfound() to
filter the excluded node list.
Original version and test by Pierre-Yves.David@ens-lyon.org
As a part of migration to vfs, this patch invokes some file API
indirectly via vfs, while ensuring repository directory in the
constructor of "localrepository" class.
New file API are added to "scmutil.abstractopener" class, because they
are also used via other derived classes than "scmutil.opener".
But "join()" is not yet defined other than "scmutil.opener" class,
because it should not be used via other opener classes yet.
As a part of migration to vfs, this patch moves path expansion API
invocations in the constructor of "localrepository" to the constructor
of "opener", because the root path to the repository is very important
to handle paths using non-ASCII characters correctly.
This patch also rearrange initialization order of "wvfs" field,
because it is required to initialize "self.root".
The existing help only walked through an example.
Now we first explain the basic rules and then show an example.
The 'collections' example and description only cause confusion and is removed.
Bikeshedded by Patrick Mezard <patrick@mezard.eu>
Fixes (on Windows in cmd.exe):
$ hg -R v:\x\a status V:\x\a\bar
abort: V:\x\a\bar not under root
where v:\x\a is a valid repository with a checked-out file "bar"
(Note the difference in casing: "v:\" versus "V:\")
Changeset ebc9c501f13c introduced a 'root' path component to look for
hgrc files, which is used both as an absolute path and a path relative
to the <install-root>.
The latter one was broken since 'root' was set to an absolute location
and the subsequent os.path.join discarded the <install-root> path prefix.
This improves the performance of "hg log -l1" from 0.21 seconds to
0.07 on a Linux kernel tree.
Ideally we could use xrange instead of range on the most common
path, and thus avoid a ton of allocation, but xrange doesn't support
slice-based indexing.
This patch contains support for Plan 9 from Bell Labs. A README is
provided in contrib/plan9 which describes the port in greater detail.
A new extension is also provided named factotum which permits the
factotum(4) authentication agent to provide credentials for HTTP
repositories. This extension is also applicable to other POSIX
platforms which make use of Plan 9 from User Space (aka plan9ports).
New users of filecache use different names for the function used to compute
the runtime path of the cached file.
Users should subclass filecache and provide their own version of this
function to call the appropriate join function on 'obj' (an instance
of the class that its member function was decorated).
On platforms not supporting shell expansion, scmutil.match() performs glob
expansion on 'pats' arguments. But _matchfiles() revset calls match.match()
directly, bypassing this behaviour. To avoid duplicating scmutil.match(), a
secondary scmutil.matchandpats() is introduced returning both the match object
and the expanded inputs. Note the expanded pats are also needed in the fast
path, and will be used by --follow code path.
When assigning a new object to filecached properties, the cached object that
was kept in the _filecache map was still holding the old object.
By implementing __set__, we track these changes too and update the cached
copy as well.
pathauditor is invoked not only for localpath form using "os.sep" as
separator, but also for normalized form using "/": for example, hg
internal path like "store/data" under ".hg", or ones normalized by
match object
this causes insufficient repository nesting check, because current
pathauditor implementation divides specified path into components by
"os.sep", and this causes to treat multiple path components joined by
"/" as single one on Windows environment.
this patch applies "util.localpath()" on specified path to force it to
be divided into components correctly.
in fact, root for pathauditor also uses multiple path separator on
Windows. but this does not affect audit itself, so "util.localpath()"
is not applied on it.
in current pathauditor implementation, un-normcase()-ed path is
stored into and compared with audit result cache.
this is not efficiency on case insensitive filesystem.
Some character sets, cp932 (known as Shift-JIS for Japanese) for
example, use 0x41('A') - 0x5A('Z') and 0x61('a') - 0x7A('z') as second
or later character.
In such character set, case collision checking recognizes different
files as CASEFOLDED same file, if filenames are treated as byte
sequence.
win32mbcs extension is not appropriate to handle this problem, because
this problem can occur on other than Windows platform only if
problematic character set is used.
Callers of util.checkcase() use known ASCII filenames as last
component of path, and string.lower() is not applied to directory part
of path. So, util.checkcase() is kept intact, even though it applies
string.lower() to filenames.
The most appropriate context is not always clearly defined. The obvious cases:
For working directory commands, we use None
For commands (eg annotate) with single revs, we use that revision
The less obvious cases:
For commands (eg status, diff) with a pair of revs, we use the second revision
For commands that take a range (like log), we use None
The idea is being able to associate a file with a property, and watch
that file stat info for modifications when we decide it's important for it to
be up-to-date. Once it changes, we recreate the object.
On filesystems that can't uniquely identify a file, we always recreate.
As a consequence, localrepo.invalidate() will become much less expensive in the
case where nothing changed on-disk.
The Mercurial 1.9 release is moving a lot of stuff around anyway and we are
already moving path_auditor from util.py to scmutil.py for that release.
So this seems like a good opportunity to do such a rename. It also strengthens
the current project policy to avoid underbars in names.
The two new methods are useful for quickly opening a file for reading
or writing. Unlike 'opener(...).read()', they ensure they the file is
immediately closed without relying on CPython reference counting.
On a case-sensitive file system, files can be added with names that differ
only in case (a "case collision"). This would cause an error on case-insensitive
filesystems. A warning or error is now given for such collisions, depending on
the value of ui.portablefilenames ('warn', 'abort', or 'ignore'):
$ touch file File
$ hg add --config ui.portablefilenames=abort File
abort: possible case-folding collision for File
$ hg add File
warning: possible case-folding collision for File
On POSIX platforms, the 'add', 'addremove', 'copy' and 'rename' commands now
warn if a file has a name that can't be checked out on Windows.
Example:
$ hg add con.xml
warning: filename contains 'con', which is reserved on Windows: 'con.xml'
$ hg status
A con.xml
The file is added despite the warning.
The warning is ON by default. It can be suppressed by setting the config option
'portablefilenames' in section 'ui' to 'ignore' or 'false':
$ hg --config ui.portablefilenames=ignore add con.xml
$ hg sta
A con.xml
If ui.portablefilenames is set to 'abort', then the command is aborted:
$ hg --config ui.portablefilenames=abort add con.xml
abort: filename contains 'con', which is reserved on Windows: 'con.xml'
On Windows, the ui.portablefilenames config setting is irrelevant and the
command is always aborted if a problematic filename is found.