If the matching command lives in an in-tree extension (which is all we
scan for), and the user has disabled that extension with
"extensions.<name>=!", we were not finding it, because the path in
_disabledextensions was the empty string. If the user had set
"extensions.<name>=!<valid path>" it would work, so it seems like just
a mistake that it didn't work.
This includes one test showing how disabling a command with e.g.
"extensions.rebase=!" results in the command not being
suggested. We'll fix that next.
This is a pretty straightforward move of the code.
I converted the "force" argument to a keyword argument.
Like other recent changes, this code is tightly coupled with
working directory update code in merge.py. I suspect the code
will become more tightly coupled over time, possibly even moved
to merge.py. For now, let's get the code in core.
merge.calculateupdates() now filters the update actions through sparse
by default.
The filtering no-ops if sparse isn't enabled or no sparse config
is defined.
The function has been refactored to behave more like a filter
instead of a wrapper of merge.calculateupdates().
We should arguably take sparse into account earlier in
merge.calculateupdates(). This patch preserves the old behavior
of applying sparse at the end of update calculation, which is the
simplest and safest approach.
This was our last method on the custom repo type, meaning we could
remove that custom type and inline the 2 lines of code into
reposetup().
As part of the move, instead of wrapping merge.update() from
the sparse extension, we inline the function call. The ported
function now no-ops if sparse isn't enabled, making it safe to
always call.
The call site in update() may not be the most appropriate. But
it matches the previous behavior, which is the safest thing
to do. It can be improved later.
As part of the move, the function arguments changed so revs are
passed as a list instead of *args. This allows us to use keyword
arguments properly.
Since the plan is to integrate sparse into core and have it
enabled by default, we need to prepare for a sparse matcher
to always be obtained and operated on. As part of the move,
we inserted code that returns an always matcher if sparse
isn't enabled. Some callers in the sparse extension take this
into account and conditionally perform matching depending on
whether the special always matcher is seen. I /think/ this
may have sped up some operations where the extension is
installed but no sparse config is activated.
One thing I'm ensure of in this code is whether os.path.dirname()
is semantically correct. os.posixpath.dirname() (which is
exported as pathutil.dirname) might be a better choise because
all patterns should be using posix directory separators (/)
instead of Windows (\). There's an inline comment that implies
Windows was tested. So hopefully it won't be a problem. We
can improve this in a follow-up. I've added a TODO to track it.
The sparse extension contains some matcher types that are
generic and can exist in core.
As part of the move, the classes now inherit from basematcher.
always(), files(), and isexact() have been dropped because
they match the default implementations in basematcher.
Before, 0 was being used as the default signature value and we cast
the int to a string. We also handled I/O exceptions manually.
The new code uses cfs.tryread() so we always feed data into the
hasher. The empty string does hash and and should be suitable
for input into a cache key.
The changes made the code simple enough that the separate checksum
function could be inlined.
sparse.py in FB's hg-experimental repo switched to using __repr__ for
non-sparse matchers soon after hg core started overriding __repr__ in
the matchers in match.py (because the core matchers also stopped
having "includepat" and other attributes that sparse used to depend
on). Let's finish that migration by implementing __repr__ in the
sparse matchers as well. That also lets us remove the special handling
of them in _hashmatcher().
The delaypush() function had a reference to "repo" that was clearly
supposed to be "pushop.repo". Instead of just fixing that, let's
extract "pushop.repo.ui" to a variable, since that's the only
piece of the repo that's needed in the function.
I have not looked into why I saw a different result in the test to
start with, but that's for another patch anyway.
This simplifies the method slightly. It does create a full list of
paths while doing so, but it's not a lot of data anyway (besides, I
would think references to strings are no larger than (references to?)
True).
mergestate.unresolved() is a generator, so it seems better for it to
rely on iteritems() than items(), although it also seems unlikely for
it to make a noticeable difference.
I don't know why applying an empty changegroup should be an error. It
seems harmless. I suspect the check was there to find code that
creates empty changegroups just because that would be wasteful. Let's
use develwarn() for that instead, so we catch any such cases that run
with our test runner, but we still allow others to generate empty
changegroups if they want to.
We have run into this check at Google once or twice and had to work
around it, but I'm changing this not so much because of that, but
because it seems like it shouldn't be an error.
I also changed the message slightly to be more modern ("changelog
group" -> "changegroup") and more generic ("received" -> "applied").
Applying an empty changegroup has been an error since the
beginning. The only exception was strip, which would allow to apply an
empty changegroup from the temporary bundle. However, the emptyok=True
option was only set for bundle1 bundles. In other words, temporary
bundle2 bundles would fail if they were empty.
Bundle2 has now been used enough that it seems safe to say that we
simply don't create bundle2 bundles with empty changegroups. That also
suggests that we never create bundle1 bundles with empty changegroups
(i.e. empty bundle1 bundles, since bundle1 is just a changegroup),
because, AFAICT, the code leading up to the application of the bundle
is the same for bundle1 and bundle2.
Therefore, let's stop passing emptyok=True, so we more clearly get the
same behavior for bundle1 and bundle2.
The "patterns"/"include" in "patternspat"/"includepat" is redundant,
so drop it. Also a "_" prefix since it's "private".
Inline the "pm"/"im" variables.
The code was refactored during the move to be more procedural
instead of using string formatting. This has the benefit of not
writing empty sections, which changed tests.
The sparse extension maintains caches for the sparse files
to a signature and a signature to a matcher. This allows the
sparse matchers to be resolved quickly, which is apparently
something that can occur in loops.
This patch ports the sparse caches to the localrepo class
pretty much as-is. There is potentially room to improve the
caching mechanism. But that can be done as a follow-up.
The default invalidatecaches() now clears the relevant sparse
cache. invalidatesignaturecache() has been moved to sparse.py.
This method is reasonably well-contained and simple to move.
As part of the move, some light formatting was performed.
A "working copy" reference in an error message was changed to
"working directory."
The biggest change was to _refreshoncommit() in sparse.py. It
was previously checking for the existence of an attribute on
the repo instance. Since the moved function now returns empty
data if sparse isn't enabled, we unconditionally call the
new function. However, we do have to protect another method
call in that function. This will all be unhacked eventually.
Currently, the sparse extension sniffs repo instances for
attributes defined by the sparse extension to determine if
sparse is enabled. As we move code away from repo instances,
these checks will be a bit more brittle.
We introduce a module-level variable to track whether sparse is
enabled as a temporary workaround.
One more step towards weaning off methods on repo instances and
moving code to core. While this function is only used once and
is simple, it needs to exist on its own so Facebook can monkeypatch
it to enable simplecache integration.
This patch marks the beginning of moving code from the sparse
extension into core. The goal is to move as much of the
functionality as possible into core, where it will be an
experimental feature. The extension will likely continue to
exist to enable the feature and provide UI elements.
As part of the move, the repo method was converted to a module
function. It doesn't need to exist on repos.
An error message was also updated to reflect that an error isn't
necessarily from the .hg/sparse file. The API should be updated
later to pass in a filename so the error can be more descriptive.
Copyright of the added file was copied from the sparse extension.
vfs.exists() followed by a file read is an anti-pattern because it
incurs an extra stat() to test for file presence. vfs.tryread()
returns empty string on missing file and avoids the stat().
This was relying on garbage collection to close the opened
file, which is a bug. Both callers simply called into self.vfs
to resolve the path. So refactor to use the vfs layer.
While we're here, rename the method to reflect it is internal
and to break anyone relying on the old behavior.
Sparse checkout is still highly experimental and not protected
by BC guarantees yet. We also haven't had a discussion on the UX.
To discourage use, we rename the sparse command to debugsparse.
This is a 3rd party extension authored by Facebook. References in
core are not appropriate.
It will be possible to restore this code/optimization via
monkeypatching. So Facebook won't lose any functionality.
The removed code is important for performance. So add a comment
tracking it.
Facebook has developed an extension to enable "sparse" checkouts -
a working directory with a subset of files. This feature is a critical
component in enabling repositories to scale to infinite number of
files while retaining reasonable performance. It's worth noting
that sparse checkout is only one possible solution to this problem:
another is virtual filesystems that realize files on first access.
But given that virtual filesystems may not be accessible to all
users, sparse checkout is necessary as a fallback.
Per mailing list discussion at
https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/095868.html
we want to add sparse checkout to the Mercurial distribution via
roughly the following mechanism:
1. Vendor extension as-is with minimal modifications (this patch)
2. Refactor extension so it is more clearly experimental and inline
with Mercurial practices
3. Move code from extension into core where possible
4. Drop experimental labeling and/or move feature into core
after sign-off from narrow clone feature owners
This commit essentially copies the sparse extension and tests
from revision 71e0a2aeca92a4078fe1b8c76e32c88ff1929737 of the
https://bitbucket.org/facebook/hg-experimental repository.
A list of modifications made as part of vendoring is as follows:
* "EXPERIMENTAL" added to module docstring
* Imports were changed to match Mercurial style conventions
* "testedwith" value was updated to core Mercurial special value and
comment boilerplate was inserted
* A "clone_sparse" function was renamed to "clonesparse" to appease
the style checker
* Paths to the sparse extension in tests reflect built-in location
* test-sparse-extensions.t was renamed to test-sparse-fsmonitor.t
and references to "simplecache" were removed. The test always skips
because it isn't trivial to run it given the way we currently run
fsmonitor tests
* A double empty line was removed from test-sparse-profiles.t
There are aspects of the added code that are obviously not ideal.
The goal is to make a minimal number of modifications as part of
the vendoring to make it easier to track changes from the original
implementation. Refactoring will occur in subsequent patches.
I'm still cleaning this up, but it's easier to do in bite-size chunks
like this than all at once. The negative lookahead avoids one false
positive category from some output related to finding Subversion
bindings.
Differential Revision: https://phab.mercurial-scm.org/D13
In blockdescendants(), we had an assertion when line range of a merge
changeset was not consistent depending on which parent was considered for
computation. For instance, this might occur when file content (in lookup
range) is significantly different between parent branches of the merge as
demonstrated in added tests (where we almost completely rewrite the "baz" file
while also introducing similarities with its content in the other branch we
later merge to).
Now, in such case, we combine line ranges from all parents by storing the
envelope of both line ranges. This is conservative (the line range is
extended, possibly unnecessarily) but at least this should avoid missing
descendants with changes in a range that would fall in that of one parent but
not in another one (the case of "baz: narrow change (2->2+)" changeset in
tests).
Switch the lone call in merge.py to use it.
As with past refactors, the goal is to make wctx hot-swappable with an
in-memory context in the future. This change should be a no-op today.
Now, files (to be truncated) are opened with checkambig=True, only if
localrepository caches it.
Therefore, straightforward "copy if EPERM" is always reasonable to
avoid file stat ambiguity at closing.
This patch makes checkambigatclosing close wrapper copy the target
file, and advance mtime on it after renaming, if EPERM. This can avoid
file stat ambiguity, even if the target file is owned by another (see
issue5418 and issue5584 for detail).
This patch factors main logic out instead of changing
checkambigatclosing._checkambig() directly, in order to reuse it.
Now, transaction can determine whether avoidance of file stat
ambiguity is needed for each files, by blacklist "checkambigfiles".
For similarity to truncation in _playback(), this patch apply
checkambig=True only on limited files in code paths below.
- restoring files by util.copyfile(), in _playback()
(checkambigfiles itself is examined at first, because it as a
keyword argument might be None)
- writing files at finalization of transaction, in _generatefiles()
This patch reduces cost of checking stat at writing out and restoring
files, which aren't filecache-ed.
Advancing mtime by os.utime() fails for EPERM, if the target file is
owned by another. 0d920bcb0fd1 and related changes made some code
paths give advancing mtime up in such case, to fix issue5418.
This causes file stat ambiguity (again), if it is owned by another.
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
To avoid file stat ambiguity in such case, especially for
.hg/dirstate, c75c7b3e3284 made vfs.rename() copy the target file, and
advance mtime of renamed one again, if EPERM (see issue5584 for detail).
But straightforward "copy if EPERM" isn't reasonable for truncation of
append-only files at rollbacking, because rollbacking might cost much
for truncation of many filelogs, even though filelogs aren't
filecache-ed.
Therefore, this patch introduces blacklist "checkambigfiles", and
avoids file stat ambiguity only for files specified in this blacklist.
This patch consists of two parts below, which should be applied at
once in order to avoid regression.
- specify 'checkambig=True' at vfs.open(mode='a') in _playback()
according to checkambigfiles
- invoke _playback() with checkambigfiles
- add transaction.__init__() checkambigfiles argument, for _abort()
- make localrepo instantiate transaction with _cachedfiles
- add rollback() checkambigfiles argument, for "hg rollback/recover"
- make localrepo invoke rollback() with _cachedfiles
After this patch, straightforward "copy if EPERM" will be reasonable
at closing the file opened with checkambig=True, because this policy
is applied only on files, which are listed in blacklist
"checkambigfiles".