We don't care about filtering out symlinks that have already been
committed with full content, only those that have been accidentally
resolved in the working directory.
In basectx._buildstatus(), we read the manifests for the two revisions
being compared. For "caching reasons" unknown to me, it is better to
read the earlier manifest first, which basectx._prestatus() takes care
of. However, if the 'self' context is a committablectx and the 'other'
context is the parent of the working directory (as in the very common
case of plain "hg status"), there is no need to read any manifests at
all -- all that's needed is the dirstate status. To avoid reading the
manifests, _prestatus() is overridden in committablectx and avoids
calling its super method, and _buildstatus() calls its super method
only if the 'other' context is not the parent of the working
directory.
It seems easier to follow what's happening if we move the pre-fetching
to _buildstatus() just before the place where the manifests are
fetched. We just need to add an extra check that the revision is not
None to handle the case that was previously handled by subclass
overriding. That also makes it safe for committablectx._prestatus() to
call its parent, although the latter now becomes empty, so we won't
bother.
The workingctx method simply calls the super method. The only effect
it has is that it uses a different default argument for the 'other'
argument. The only in-tree caller is patch.diff, which always passes
an argument to the method, so it should be safe to remove the
overriding. Having the default argument depend on the type seems
rather dangerous anyway.
In order not to avoid listing files as both added and deleted, for
example, we check for every file in the manifest if it is in the
_list_ of deleted files. This can get quite slow when there are many
deleted files. Change it to a set to make the containment check
faster. On a somewhat contrived example of the Mozilla repo with the
entire testing/ directory deleted (~14k files), this makes
'hg status --rev .^' go from 26s to 2s.
The comment in workingctx.status() says that "calling 'super' subtly
reveresed the contexts", but that is simply not true, so we should not
be swapping added and removed fields.
Hidden changesets are by far the most common error case and is the only one[1]
that can reach the user. We move to a friendlier message with a hint about how
to access the data anyway. We should probably point to a help topic instead but
we do not have such a topic yet.
Example of the new output
abort: hidden revision '4'!
(use --hidden to access hidden revisions)
[1] Actually, filtering from "served" can also reach the user during certain
exchange operations.
This will help user to debug. A more precise message will be issued
for the most common case ("visible" filter) in the next changesets.
example output:
- abort: filtered revision '4'!
+ abort: filtered revision '4' (not in 'visible' subset)!
We capture FilteredxxxError and issue a FilteredRepoLookupError instead with a
sightly different messsge. The message will likely get more improvement in the
future.
error: filtered revision '4'
We are going to introduce more precise exception classes for filtered nodes. So
we will have to upgrade them to the `RepoLookupError` level here. We wrap the
whole thing into a try/except to ease this future catching. Some of the current
exception catching will be moved in this one. But the current changeset focuses
on code movement only.
Two possible behaviors are defined for handling censored data: abort, and
ignore. When we ignore censored data we return an empty file to callers
requesting the file data.
The status tuple returned from dirstate.status() has an additional
field compared to the other status tuples: lookup/unsure. This field
is just an optimization and not something most callers care about
(they want the resolved value of 'modified' or 'clean'). To prepare
for a single future status type, let's separate out the 'lookup' field
from the rest by having dirstate.status() return a pair: (lookup,
status).
This wraps all the locations of dirstate.setparent with the appropriate
begin/endparentchange calls. This will prevent exceptions during those calls
from causing incoherent dirstates (issue4353).
Rev 2eef89bfd70d switched the contract for filectxfn from "raise IOError if
file is missing" to "return None if file is missing". Out of tree extensions
need to be updated for that, but for extensions interested in compatibility
with both Mercurial <= 3.1 and default, it is next to impossible to introspect
core Mercurial to figure out what to do.
This patch adds a field to memctx for extensions to use.
The internal API used IOError to indicate that a file should be marked as
removed.
There is some correlation between IOError (especially with ENOENT) and files
that should be removed, but using IOErrors to represent file removal internally
required some hacks.
Instead, use the value None to indicate that the file not is present.
Before, spurious IO errors could cause commits that silently removed files.
They will now be reported like all other IO errors so the root cause can be
fixed.
In all the remaining cases the comprehension variable is used for the same
thing as a previous loop variable.
This will mute some pyflakes "list comprehension redefines" warnings.
The value '*' currently designates that bid merge should be used. The best
way to test bid merge is to set preferancestor=* in the configuration file ...
but then it would abort with unknown revision '*' when other code paths ended
up in changectx.ancestor .
Instead, just skip and ignore the value '*' when looking for a preferred
ancestor.
dirstate.normal is the method that marks files as unchanged/normal.
Rev 03dc7365e275 started caching dirstate.normal in order to improve
performance. However, there was an error in the patch: taking the wlock, under
some conditions depending on platform, can cause a new dirstate object to be
created. Caching dirstate.normal before calling wlock would then cause the
fixup calls below to be on the old dirstate object, effectively disappearing
into the ether.
On Unix and Unix-like OSes, the condition under which we create a new dirstate
object is 'the dirstate file has been modified since the last time we opened
it'. This happens pretty rarely, so the object is usually the same -- there's
little impact.
On Windows, the condition is 'always'. This means files in the lookup state are
never marked normal, so the bug has a serious performance impact since all the
files in the lookup state are re-read every time hg status is run.
When reversing a status, trading "added" and "removed" make sense.
Reversing "deleted" and "unknown" does not. We stop doing it.
The reversing is documented in place for the poor soul not even able to remember
the index of all status elements by heart.
By the magic of code movement, we ended up dropping unknown and ignored
information when comparing the working directory with a non-parent revision.
Let's stop doing it and add a test.
Changeset 83ad0e76acc0 introduced a test to validate that file were not reported
twice when both unknown and removed. This behavior change was introduced by
64d05ea3a10f alongside a bug that dropped ignored and unknown completely
(issue4321). As we are going to fix the bug, we need a proper implementation of
the behavior tested in 83ad0e76acc0.
Setting substate to None was an oversight in 5b3c9729fe09 and this patch
corrects it by setting substate to an empty dictionary which matches what
subrepo code expects.
Similar to the previous patch for workingfilectx, this patch will allow
abstracting localrepo.remove / write method to refactor working directory code
but instead operate on files in memory.
This fixes a discrepency for basectx and classes that inherit from it. Now
callers can pass these arguments to any context without an exception being
raised.
For non-working contexts, walk and matches do the same thing. For working
contexts, walk stats all the files and looks for unknown files, while matches
just filters the dirstate by match.
On a repository with over 250,000 files and 700,000 commits, this improves
cases like
hg status --rev <rev> -- <file> # rev is not .
from 2.1 seconds to 1.4 seconds.
There is further scope for improvement here: for a single file or a small set
of files, it is probably more efficient to use filelog linkrevs when possible.
However there will always be cases where that will fail (multiple commits
pointing to the same file revision, removed files...), so this is independently
useful.
When the matcher is exact, there's no reason to iterate over the entire
manifest. It's much more efficient to iterate over the list of files instead.
For a repository with approximately 300,000 files, this speeds up
hg log -l10 --patch --follow for a frequently modified file from 16.5 seconds
to 10.5 seconds.
This was mistakenly moved from workingctx to committablectx in
edbbc56a5e4f. Since the method is querying the dirstate, the only logical place
is for it to reside is in workingctx.
In 4c8873aad79a, memctx was changed to inherit from committablectx, this in
turn added the 'substate' property to memctx. It turns out that the
newcommitphase method tested for this property:
def newcommitphase(ui, ctx):
commitphase = phases.newcommitphase(ui)
substate = getattr(ctx, "substate", None)
if not substate:
return commitphase
Currently, memctx isn't ready to handle substates, nor removed files, so we
explicitly must set substate=None to get the old behavior back. In the future,
we can decide how memctx should play with substate. For now, this fixes
third-party extensions and some internal code dealing with subrepos.