Summary:
Some consumers still rely on enabling 'sparse=' so let's add a module
that just redirects to the real fbsparse.py.
Also updates configerator to use the newer name.
Reviewed By: markbt, quark-zju
Differential Revision: D6755971
fbshipit-source-id: 3a67f029045dacf927742a616a714fe632b97fea
Summary:
We have our fbsparse, which we are using. We can rename it later when configs
are in the same repo as the extension files.
Test Plan: - run tests, see only the failures related to the other commits in the stack
Reviewers: #sourcecontrol
Differential Revision: https://phabricator.intern.facebook.com/D6683245
This commit makes it so sparse treats passed paths as CWD-relative,
not repo-root-realive. This is a more intuitive behavior in my (and some
other FB people's) opinion.
This is breaking change however. My hope here is that since sparse is
experimental, it's ok to introduce BCs.
The reason (glob)s are needed in the test is this: in these two cases we
do not supply path together with slashes, but `os.path.join` adds them, which
means that under Windows they can be backslashes. To demonstrate this behavior,
one could remove the (glob)s and run `./run-tests.py test-sparse.t` from
MinGW's terminal on Windows.
Previously, [include] was implicit and pattern lines before a
[section] were added to includes.
Because the format may change in the future and explicit behavior,
well, more explicit, this commit changes the config parser to
reject pattern lines that don't occur in a [section].
Differential Revision: https://phab.mercurial-scm.org/D96
Instead of treating files that are outside the sparse config as
ignored, this makes it so we list only those that are within the
sparse config by passing the sparse matcher to dirstate.walk().
Once we add support for narrow (sparseness applied to history, not
just working copy), we will need to do a similar restriction of the
walk over manifests, so this will be more consistent then. It also
simplifies the code a bit.
Note that a side-effect of this change is that files outside the
sparse config used to be listed as ignored, but they will now not be
listed at all. This can be seen in the test case where "hg purge" no
longer has any effect because it doesn't see that the files outside
the space config exist. To fix that, I think we should add an option
to dirstate.walk() to walk outside the sparse config. We might expose
that to the user as --no-sparse flag to e.g. "hg status" and "hg
purge", but that's work for another day.
Differential Revision: https://phab.mercurial-scm.org/D59
The sparse extension performs a lot of monkeypatching of dirstate
to make it sparse aware. Essentially, various operations need to
take the active sparse config into account. They do this by obtaining
a matcher representing the sparse config and filtering paths through
it.
The monkeypatching is done by stuffing a reference to a repo on
dirstate and calling sparse.matcher() (which takes a repo instance)
during each function call. The reason this function takes a repo
instance is because resolving the sparse config may require resolving
file contents from filelogs, and that requires a repo. (If the
current sparse config references "profile" files, the contents of
those files from the dirstate's parent revisions is resolved.)
I seem to recall people having strong opinions that the dirstate
object not have a reference to a repo. So copying what the sparse
extension does probably won't fly in core. Plus, the dirstate
modifications shouldn't require a full repo: they only need a matcher.
So there's no good reason to stuff a reference to the repo in
dirstate.
This commit exposes a sparse matcher to dirstate via a property that
when looked up will call a function that eventually calls
sparse.matcher(). The repo instance is bound in a closure, so it
isn't exposed to dirstate.
This approach is functionally similar to what the sparse extension does
today, except it hides the repo instance from dirstate. The approach
is not optimal because we have to call a proxy function and
sparse.matcher() on every property lookup. There is room to cache
the matcher instance in dirstate. After all, the matcher only changes
if the dirstate's parents change or if the sparse config changes. It
feels like we should be able to detect both events and update the
matcher when this occurs. But for now we preserve the existing
semantics so we can move the dirstate sparseness bits into core. Once
in core, refactoring becomes a bit easier since it will be clearer how
all these components interact.
The sparse extension has been updated to use the new property.
Because all references to the repo on dirstate have been removed,
the code for setting it has been removed.
"self" here is the dirstate instance. I'm pretty confident that self
and repo.dirstate will be the exact same object. So remove a dependency
on repo by just looking at self.
This is a pretty straightforward port. Some code cleanup was
performed. But no major changes to the logic were made.
I'm not a huge fan of this function because it does multiple
things. I'd like to get things into core first to facilitate
refactoring later.
Please also note the added inline comment about the oddities
of writeconfig() and the try..except to undo it. This is because
of the hackiness in which the sparse matcher is obtained by
various consumers, notably dirstate. We'll need a massive
refactor to address this. That refactor is effectively blocked
on having the sparse dirstate hacks live in core.
activeprofiles() is a special case of a more generic function.
Furthermore, that generic function is essentially already
implemented inline in the sparse extension.
So, refactor activeprofiles() to a generic activeconfig(). Change
the only consumer of activeprofiles() to use it. And have the
inline implementation in the sparse extension use it.
Instead of wrapping committablectx.markcommitted(), we inline
the call into workingctx.markcommitted().
Per smf's review, workingctx is the proper location for this
code, as committablectx is the shared base class for it and
memctx. Since this code touches the working directory, it belongs
in workingctx.
This is a pretty straightforward move of the code.
I converted the "force" argument to a keyword argument.
Like other recent changes, this code is tightly coupled with
working directory update code in merge.py. I suspect the code
will become more tightly coupled over time, possibly even moved
to merge.py. For now, let's get the code in core.
merge.calculateupdates() now filters the update actions through sparse
by default.
The filtering no-ops if sparse isn't enabled or no sparse config
is defined.
The function has been refactored to behave more like a filter
instead of a wrapper of merge.calculateupdates().
We should arguably take sparse into account earlier in
merge.calculateupdates(). This patch preserves the old behavior
of applying sparse at the end of update calculation, which is the
simplest and safest approach.
This was our last method on the custom repo type, meaning we could
remove that custom type and inline the 2 lines of code into
reposetup().
As part of the move, instead of wrapping merge.update() from
the sparse extension, we inline the function call. The ported
function now no-ops if sparse isn't enabled, making it safe to
always call.
The call site in update() may not be the most appropriate. But
it matches the previous behavior, which is the safest thing
to do. It can be improved later.
As part of the move, the function arguments changed so revs are
passed as a list instead of *args. This allows us to use keyword
arguments properly.
Since the plan is to integrate sparse into core and have it
enabled by default, we need to prepare for a sparse matcher
to always be obtained and operated on. As part of the move,
we inserted code that returns an always matcher if sparse
isn't enabled. Some callers in the sparse extension take this
into account and conditionally perform matching depending on
whether the special always matcher is seen. I /think/ this
may have sped up some operations where the extension is
installed but no sparse config is activated.
One thing I'm ensure of in this code is whether os.path.dirname()
is semantically correct. os.posixpath.dirname() (which is
exported as pathutil.dirname) might be a better choise because
all patterns should be using posix directory separators (/)
instead of Windows (\). There's an inline comment that implies
Windows was tested. So hopefully it won't be a problem. We
can improve this in a follow-up. I've added a TODO to track it.
The sparse extension contains some matcher types that are
generic and can exist in core.
As part of the move, the classes now inherit from basematcher.
always(), files(), and isexact() have been dropped because
they match the default implementations in basematcher.
sparse.py in FB's hg-experimental repo switched to using __repr__ for
non-sparse matchers soon after hg core started overriding __repr__ in
the matchers in match.py (because the core matchers also stopped
having "includepat" and other attributes that sparse used to depend
on). Let's finish that migration by implementing __repr__ in the
sparse matchers as well. That also lets us remove the special handling
of them in _hashmatcher().
The code was refactored during the move to be more procedural
instead of using string formatting. This has the benefit of not
writing empty sections, which changed tests.
The sparse extension maintains caches for the sparse files
to a signature and a signature to a matcher. This allows the
sparse matchers to be resolved quickly, which is apparently
something that can occur in loops.
This patch ports the sparse caches to the localrepo class
pretty much as-is. There is potentially room to improve the
caching mechanism. But that can be done as a follow-up.
The default invalidatecaches() now clears the relevant sparse
cache. invalidatesignaturecache() has been moved to sparse.py.
This method is reasonably well-contained and simple to move.
As part of the move, some light formatting was performed.
A "working copy" reference in an error message was changed to
"working directory."
The biggest change was to _refreshoncommit() in sparse.py. It
was previously checking for the existence of an attribute on
the repo instance. Since the moved function now returns empty
data if sparse isn't enabled, we unconditionally call the
new function. However, we do have to protect another method
call in that function. This will all be unhacked eventually.
Currently, the sparse extension sniffs repo instances for
attributes defined by the sparse extension to determine if
sparse is enabled. As we move code away from repo instances,
these checks will be a bit more brittle.
We introduce a module-level variable to track whether sparse is
enabled as a temporary workaround.
One more step towards weaning off methods on repo instances and
moving code to core. While this function is only used once and
is simple, it needs to exist on its own so Facebook can monkeypatch
it to enable simplecache integration.
This patch marks the beginning of moving code from the sparse
extension into core. The goal is to move as much of the
functionality as possible into core, where it will be an
experimental feature. The extension will likely continue to
exist to enable the feature and provide UI elements.
As part of the move, the repo method was converted to a module
function. It doesn't need to exist on repos.
An error message was also updated to reflect that an error isn't
necessarily from the .hg/sparse file. The API should be updated
later to pass in a filename so the error can be more descriptive.
Copyright of the added file was copied from the sparse extension.
vfs.exists() followed by a file read is an anti-pattern because it
incurs an extra stat() to test for file presence. vfs.tryread()
returns empty string on missing file and avoids the stat().
This was relying on garbage collection to close the opened
file, which is a bug. Both callers simply called into self.vfs
to resolve the path. So refactor to use the vfs layer.
While we're here, rename the method to reflect it is internal
and to break anyone relying on the old behavior.
Sparse checkout is still highly experimental and not protected
by BC guarantees yet. We also haven't had a discussion on the UX.
To discourage use, we rename the sparse command to debugsparse.
This is a 3rd party extension authored by Facebook. References in
core are not appropriate.
It will be possible to restore this code/optimization via
monkeypatching. So Facebook won't lose any functionality.
The removed code is important for performance. So add a comment
tracking it.
Facebook has developed an extension to enable "sparse" checkouts -
a working directory with a subset of files. This feature is a critical
component in enabling repositories to scale to infinite number of
files while retaining reasonable performance. It's worth noting
that sparse checkout is only one possible solution to this problem:
another is virtual filesystems that realize files on first access.
But given that virtual filesystems may not be accessible to all
users, sparse checkout is necessary as a fallback.
Per mailing list discussion at
https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/095868.html
we want to add sparse checkout to the Mercurial distribution via
roughly the following mechanism:
1. Vendor extension as-is with minimal modifications (this patch)
2. Refactor extension so it is more clearly experimental and inline
with Mercurial practices
3. Move code from extension into core where possible
4. Drop experimental labeling and/or move feature into core
after sign-off from narrow clone feature owners
This commit essentially copies the sparse extension and tests
from revision 71e0a2aeca92a4078fe1b8c76e32c88ff1929737 of the
https://bitbucket.org/facebook/hg-experimental repository.
A list of modifications made as part of vendoring is as follows:
* "EXPERIMENTAL" added to module docstring
* Imports were changed to match Mercurial style conventions
* "testedwith" value was updated to core Mercurial special value and
comment boilerplate was inserted
* A "clone_sparse" function was renamed to "clonesparse" to appease
the style checker
* Paths to the sparse extension in tests reflect built-in location
* test-sparse-extensions.t was renamed to test-sparse-fsmonitor.t
and references to "simplecache" were removed. The test always skips
because it isn't trivial to run it given the way we currently run
fsmonitor tests
* A double empty line was removed from test-sparse-profiles.t
There are aspects of the added code that are obviously not ideal.
The goal is to make a minimal number of modifications as part of
the vendoring to make it easier to track changes from the original
implementation. Refactoring will occur in subsequent patches.