Commit Graph

36 Commits

Author SHA1 Message Date
Kostia Balytskyi
ebe8e04f0d sparse: treat paths as cwd-relative
This commit makes it so sparse treats passed paths as CWD-relative,
not repo-root-realive. This is a more intuitive behavior in my (and some
other FB people's) opinion.

This is breaking change however. My hope here is that since sparse is
experimental, it's ok to introduce BCs.

The reason (glob)s are needed in the test is this: in these two cases we
do not supply path together with slashes, but `os.path.join` adds them, which
means that under Windows they can be backslashes. To demonstrate this behavior,
one could remove the (glob)s and run `./run-tests.py test-sparse.t` from
MinGW's terminal on Windows.
2017-08-04 05:38:22 -07:00
Gregory Szorc
fde2177334 sparse: require [section] in sparse config files (BC)
Previously, [include] was implicit and pattern lines before a
[section] were added to includes.

Because the format may change in the future and explicit behavior,
well, more explicit, this commit changes the config parser to
reject pattern lines that don't occur in a [section].

Differential Revision: https://phab.mercurial-scm.org/D96
2017-07-15 13:21:23 -07:00
Martin von Zweigbergk
f3c48a5fe0 sparse: override dirstate.walk() instead of dirstate._ignore
Instead of treating files that are outside the sparse config as
ignored, this makes it so we list only those that are within the
sparse config by passing the sparse matcher to dirstate.walk().

Once we add support for narrow (sparseness applied to history, not
just working copy), we will need to do a similar restriction of the
walk over manifests, so this will be more consistent then. It also
simplifies the code a bit.

Note that a side-effect of this change is that files outside the
sparse config used to be listed as ignored, but they will now not be
listed at all. This can be seen in the test case where "hg purge" no
longer has any effect because it doesn't see that the files outside
the space config exist. To fix that, I think we should add an option
to dirstate.walk() to walk outside the sparse config. We might expose
that to the user as --no-sparse flag to e.g. "hg status" and "hg
purge", but that's work for another day.

Differential Revision: https://phab.mercurial-scm.org/D59
2017-07-11 10:46:35 -07:00
Gregory Szorc
0ee6ecfbec sparse: move config updating function into core
As part of the move, the ui argument was dropped.

Additional fixups will be made in a follow-up commit.
2017-07-10 21:39:49 -07:00
Gregory Szorc
a7c49e2ec2 dirstate: expose a sparse matcher on dirstate (API)
The sparse extension performs a lot of monkeypatching of dirstate
to make it sparse aware. Essentially, various operations need to
take the active sparse config into account. They do this by obtaining
a matcher representing the sparse config and filtering paths through
it.

The monkeypatching is done by stuffing a reference to a repo on
dirstate and calling sparse.matcher() (which takes a repo instance)
during each function call. The reason this function takes a repo
instance is because resolving the sparse config may require resolving
file contents from filelogs, and that requires a repo. (If the
current sparse config references "profile" files, the contents of
those files from the dirstate's parent revisions is resolved.)

I seem to recall people having strong opinions that the dirstate
object not have a reference to a repo. So copying what the sparse
extension does probably won't fly in core. Plus, the dirstate
modifications shouldn't require a full repo: they only need a matcher.
So there's no good reason to stuff a reference to the repo in
dirstate.

This commit exposes a sparse matcher to dirstate via a property that
when looked up will call a function that eventually calls
sparse.matcher(). The repo instance is bound in a closure, so it
isn't exposed to dirstate.

This approach is functionally similar to what the sparse extension does
today, except it hides the repo instance from dirstate. The approach
is not optimal because we have to call a proxy function and
sparse.matcher() on every property lookup. There is room to cache
the matcher instance in dirstate. After all, the matcher only changes
if the dirstate's parents change or if the sparse config changes. It
feels like we should be able to detect both events and update the
matcher when this occurs. But for now we preserve the existing
semantics so we can move the dirstate sparseness bits into core. Once
in core, refactoring becomes a bit easier since it will be clearer how
all these components interact.

The sparse extension has been updated to use the new property.
Because all references to the repo on dirstate have been removed,
the code for setting it has been removed.
2017-07-08 16:18:04 -07:00
Gregory Szorc
48bb68b295 sparse: use self instead of repo.dirstate
"self" here is the dirstate instance. I'm pretty confident that self
and repo.dirstate will be the exact same object. So remove a dependency
on repo by just looking at self.
2017-07-08 15:42:11 -07:00
Gregory Szorc
6155516e71 sparse: move code for importing rules from files into core
This is a pretty straightforward port. Some code cleanup was
performed. But no major changes to the logic were made.

I'm not a huge fan of this function because it does multiple
things. I'd like to get things into core first to facilitate
refactoring later.

Please also note the added inline comment about the oddities
of writeconfig() and the try..except to undo it. This is because
of the hackiness in which the sparse matcher is obtained by
various consumers, notably dirstate. We'll need a massive
refactor to address this. That refactor is effectively blocked
on having the sparse dirstate hacks live in core.
2017-07-08 14:15:07 -07:00
Gregory Szorc
21d9237e1c sparse: refactor activeprofiles into a generic function (API)
activeprofiles() is a special case of a more generic function.
Furthermore, that generic function is essentially already
implemented inline in the sparse extension.

So, refactor activeprofiles() to a generic activeconfig(). Change
the only consumer of activeprofiles() to use it. And have the
inline implementation in the sparse extension use it.
2017-07-08 14:01:32 -07:00
Gregory Szorc
d86e3657d2 sparse: move printing of sparse config changes function into core
As part of the port, all arguments now have default values of 0.
Strings are now also given the i18n treatment.
2017-07-08 13:34:19 -07:00
Gregory Szorc
519ece1048 sparse: move code for clearing rules to core
This is a pretty straightforward port.
2017-07-08 13:19:38 -07:00
Gregory Szorc
2689134340 sparse: move post commit actions into core
Instead of wrapping committablectx.markcommitted(), we inline
the call into workingctx.markcommitted().

Per smf's review, workingctx is the proper location for this
code, as committablectx is the shared base class for it and
memctx. Since this code touches the working directory, it belongs
in workingctx.
2017-07-07 11:51:10 -07:00
Gregory Szorc
793c8fb431 sparse: move working directory refreshing into core
This is a pretty straightforward move of the code.

I converted the "force" argument to a keyword argument.

Like other recent changes, this code is tightly coupled with
working directory update code in merge.py. I suspect the code
will become more tightly coupled over time, possibly even moved
to merge.py. For now, let's get the code in core.
2017-07-06 14:53:08 -07:00
Gregory Szorc
7fec603f86 sparse: refactor update actions filtering and call from core
merge.calculateupdates() now filters the update actions through sparse
by default.

The filtering no-ops if sparse isn't enabled or no sparse config
is defined.

The function has been refactored to behave more like a filter
instead of a wrapper of merge.calculateupdates().

We should arguably take sparse into account earlier in
merge.calculateupdates(). This patch preserves the old behavior
of applying sparse at the end of update calculation, which is the
simplest and safest approach.
2017-07-06 16:29:31 -07:00
Gregory Szorc
7fff0417c9 sparse: move update action filtering into core
This is a relatively straight port of the function. It is pretty large.
So refactoring will be postponed to a subsequent commit.
2017-07-06 16:17:35 -07:00
Gregory Szorc
6dce563cd3 sparse: move pruning of temporary includes into core
This was our last method on the custom repo type, meaning we could
remove that custom type and inline the 2 lines of code into
reposetup().

As part of the move, instead of wrapping merge.update() from
the sparse extension, we inline the function call. The ported
function now no-ops if sparse isn't enabled, making it safe to
always call.

The call site in update() may not be the most appropriate. But
it matches the previous behavior, which is the safest thing
to do. It can be improved later.
2017-07-06 14:33:18 -07:00
Gregory Szorc
26fd8a7af7 sparse: move function for resolving sparse matcher into core
As part of the move, the function arguments changed so revs are
passed as a list instead of *args. This allows us to use keyword
arguments properly.

Since the plan is to integrate sparse into core and have it
enabled by default, we need to prepare for a sparse matcher
to always be obtained and operated on. As part of the move,
we inserted code that returns an always matcher if sparse
isn't enabled. Some callers in the sparse extension take this
into account and conditionally perform matching depending on
whether the special always matcher is seen. I /think/ this
may have sped up some operations where the extension is
installed but no sparse config is activated.

One thing I'm ensure of in this code is whether os.path.dirname()
is semantically correct. os.posixpath.dirname() (which is
exported as pathutil.dirname) might be a better choise because
all patterns should be using posix directory separators (/)
instead of Windows (\). There's an inline comment that implies
Windows was tested. So hopefully it won't be a problem. We
can improve this in a follow-up. I've added a TODO to track it.
2017-07-06 17:41:45 -07:00
Gregory Szorc
16c192411d match: move matchers from sparse into core
The sparse extension contains some matcher types that are
generic and can exist in core.

As part of the move, the classes now inherit from basematcher.
always(), files(), and isexact() have been dropped because
they match the default implementations in basematcher.
2017-07-06 17:39:24 -07:00
Gregory Szorc
0338e1f32a sparse: move config signature logic into core
This is a pretty straightforward port. It will be cleaned up in
a subsequent commit.
2017-07-06 16:11:56 -07:00
Gregory Szorc
35151669b9 sparse: remove custom hash matcher
With the recent change to always use repr(), this function was
functionally identical to the version in fsmonitor it was
replacing. So remove it.
2017-07-06 17:31:33 -07:00
Martin von Zweigbergk
585fc127ad sparse: override __repr__ in matchers
sparse.py in FB's hg-experimental repo switched to using __repr__ for
non-sparse matchers soon after hg core started overriding __repr__ in
the matchers in match.py (because the core matchers also stopped
having "includepat" and other attributes that sparse used to depend
on). Let's finish that migration by implementing __repr__ in the
sparse matchers as well. That also lets us remove the special handling
of them in _hashmatcher().
2017-07-06 16:37:36 -07:00
Gregory Szorc
fa7c02cef4 sparse: move some temporary includes functions into core
Functions for reading and writing the tempsparse file have been
moved. prunetemporaryincludes() will be moved separately
because it is non-trivial.
2017-07-06 14:48:16 -07:00
Gregory Szorc
23bd6434bf sparse: move config file writing into core
The code was refactored during the move to be more procedural
instead of using string formatting. This has the benefit of not
writing empty sections, which changed tests.
2017-07-06 12:24:55 -07:00
Gregory Szorc
0cd417305b localrepo: add sparse caches
The sparse extension maintains caches for the sparse files
to a signature and a signature to a matcher. This allows the
sparse matchers to be resolved quickly, which is apparently
something that can occur in loops.

This patch ports the sparse caches to the localrepo class
pretty much as-is. There is potentially room to improve the
caching mechanism. But that can be done as a follow-up.

The default invalidatecaches() now clears the relevant sparse
cache. invalidatesignaturecache() has been moved to sparse.py.
2017-07-06 12:20:53 -07:00
Gregory Szorc
82797a75d4 sparse: move active profiles function into core
Also includes some light formatting changes.
2017-07-06 12:26:04 -07:00
Gregory Szorc
b77eafa212 sparse: move resolving of sparse patterns for rev into core
This method is reasonably well-contained and simple to move.

As part of the move, some light formatting was performed.

A "working copy" reference in an error message was changed to
"working directory."

The biggest change was to _refreshoncommit() in sparse.py. It
was previously checking for the existence of an attribute on
the repo instance. Since the moved function now returns empty
data if sparse isn't enabled, we unconditionally call the
new function. However, we do have to protect another method
call in that function. This will all be unhacked eventually.
2017-07-06 12:15:14 -07:00
Gregory Szorc
19d9143b89 sparse: variable to track if sparse is enabled
Currently, the sparse extension sniffs repo instances for
attributes defined by the sparse extension to determine if
sparse is enabled. As we move code away from repo instances,
these checks will be a bit more brittle.

We introduce a module-level variable to track whether sparse is
enabled as a temporary workaround.
2017-07-06 12:06:37 -07:00
Gregory Szorc
c16ee0ee8c sparse: move profile reading into core
One more step towards weaning off methods on repo instances and
moving code to core. While this function is only used once and
is simple, it needs to exist on its own so Facebook can monkeypatch
it to enable simplecache integration.
2017-07-06 12:14:12 -07:00
Gregory Szorc
2316ea9a38 sparse: move config parsing into core
This patch marks the beginning of moving code from the sparse
extension into core. The goal is to move as much of the
functionality as possible into core, where it will be an
experimental feature. The extension will likely continue to
exist to enable the feature and provide UI elements.

As part of the move, the repo method was converted to a module
function. It doesn't need to exist on repos.

An error message was also updated to reflect that an error isn't
necessarily from the .hg/sparse file. The API should be updated
later to pass in a filename so the error can be more descriptive.

Copyright of the added file was copied from the sparse extension.
2017-07-06 12:14:03 -07:00
Gregory Szorc
1db9bed779 sparse: use vfs.tryread()
vfs.exists() followed by a file read is an anti-pattern because it
incurs an extra stat() to test for file presence. vfs.tryread()
returns empty string on missing file and avoids the stat().
2017-07-06 10:58:45 -07:00
Gregory Szorc
49b16cba64 sparse: refactor sparsechecksum()
This was relying on garbage collection to close the opened
file, which is a bug. Both callers simply called into self.vfs
to resolve the path. So refactor to use the vfs layer.

While we're here, rename the method to reflect it is internal
and to break anyone relying on the old behavior.
2017-07-01 11:56:39 -07:00
Gregory Szorc
8febc1c48e sparse: document config file format
This was previously undocumented. Seems useful to have.
2017-07-06 10:57:26 -07:00
Gregory Szorc
259226f5d1 sparse: rename command to debugsparse
Sparse checkout is still highly experimental and not protected
by BC guarantees yet. We also haven't had a discussion on the UX.

To discourage use, we rename the sparse command to debugsparse.
2017-07-01 10:29:27 -07:00
Gregory Szorc
6a8520b1b6 sparse: remove reference to simplecache
This is a 3rd party extension authored by Facebook. References in
core are not appropriate.

It will be possible to restore this code/optimization via
monkeypatching. So Facebook won't lose any functionality.

The removed code is important for performance. So add a comment
tracking it.
2017-07-06 10:54:23 -07:00
Gregory Szorc
4ff8c49366 sparse: remove reference to hgwatchman
This is a legacy extension. Now that the extension is in core,
we only need to support what's in core, which is fsmonitor.
2017-07-01 10:24:31 -07:00
Gregory Szorc
6301d186f7 sparse: expand module docstring
Clarify lack of BC guarantees. And say a bit more about the extension.
2017-07-01 10:36:03 -07:00
Gregory Szorc
82b7f72254 sparse: vendor Facebook-developed extension
Facebook has developed an extension to enable "sparse" checkouts -
a working directory with a subset of files. This feature is a critical
component in enabling repositories to scale to infinite number of
files while retaining reasonable performance. It's worth noting
that sparse checkout is only one possible solution to this problem:
another is virtual filesystems that realize files on first access.
But given that virtual filesystems may not be accessible to all
users, sparse checkout is necessary as a fallback.

Per mailing list discussion at
https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/095868.html
we want to add sparse checkout to the Mercurial distribution via
roughly the following mechanism:

1. Vendor extension as-is with minimal modifications (this patch)
2. Refactor extension so it is more clearly experimental and inline
   with Mercurial practices
3. Move code from extension into core where possible
4. Drop experimental labeling and/or move feature into core
   after sign-off from narrow clone feature owners

This commit essentially copies the sparse extension and tests
from revision 71e0a2aeca92a4078fe1b8c76e32c88ff1929737 of the
https://bitbucket.org/facebook/hg-experimental repository.

A list of modifications made as part of vendoring is as follows:

* "EXPERIMENTAL" added to module docstring
* Imports were changed to match Mercurial style conventions
* "testedwith" value was updated to core Mercurial special value and
  comment boilerplate was inserted
* A "clone_sparse" function was renamed to "clonesparse" to appease
  the style checker
* Paths to the sparse extension in tests reflect built-in location
* test-sparse-extensions.t was renamed to test-sparse-fsmonitor.t
  and references to "simplecache" were removed. The test always skips
  because it isn't trivial to run it given the way we currently run
  fsmonitor tests
* A double empty line was removed from test-sparse-profiles.t

There are aspects of the added code that are obviously not ideal.
The goal is to make a minimal number of modifications as part of
the vendoring to make it easier to track changes from the original
implementation. Refactoring will occur in subsequent patches.
2017-07-01 10:43:29 -07:00