We are going to introduce a linkrev-correction phases when computing parents.
It will be more convenient to have the nullid parent filtered out earlier. I
had to make a minimal adjustment to the rename handling logic to keep it
functional. That logic have been documented in the process since it took me
some time to check all the cases out.
workingctx.ancestors() was not returning the dirstate parents as part of the
result set. The only place this function is used is for copy detection when
committing a file, and that code already checks the parents manually, so this
change has no affect at the moment.
I found it while playing around with changing how copy detection works.
Before this patch, "memctx._manifest" updates all entries in the
(parent) manifest. But this is inefficiency, because almost all files
may be clean in that context.
On the other hand, just updating entries for changed "files" specified
at construction causes unexpected abortion, when there is at least one
newly removed file (see issue4470 for detail).
To calculate manifest more efficiently, this patch replaces
"pman.iteritems()" for the loop by "self._status.modified" to avoid
updating entries for clean or removed files
Examination of removal is also omitted, because removed files aren't
treated in this loop (= "self[f]" returns not None always).
Before this patch, "memctx._manifest" tries to get (and use normally)
filectx also for newly-removed files, even though "memctx.filectx()"
returns None for such files.
To calculate manifest correctly even with newly-removed files, this
patch does:
- replace "man.iteritems()" for the loop by "self._status.modified"
to avoid accessing itself to newly removed files
this also reduces loop cost for large manifest.
- remove files in "self._status.removed" from the manifest
In this patch, amending is confirmed twice to examine both (1) newly
removed files and (2) ones already removed in amended revision.
Before this patch, "memctx._manifest" calculates the manifest
according to the 1st parent. This causes the disappearance
of newly added files from the manifest.
For example, if newly added files aren't listed up in manifest of
memctx, they aren't listed up in "added" field of "status" returned by
"ctx.status()", and "{diff()}" (= "patch.diff") in "committemplate"
shows nothing for them.
To calculate manifest including newly added files correctly, this
patch puts newly added files (= ones in "self._status.added") into the
manifest.
Some details of changes for "test-commit-amend.t" in this patch:
- "touch foo" is replaced by "echo foo > foo", because newly added
empty file can't be shown in "diff()" output without "diff.git"
configuration
- amending is confirmed twice to examine both (1) newly added files
and (2) ones already added in amended revision
Before this patch, "memctx._status" is initialized by "(files, [], [],
[], [], [], [])" and this causes "memctx.modified" to include not
only modified files but also added and removed ones incorrectly.
This patch adds "_status" method to calculate exact status being
committed according to "files" specified at construction time.
Exact "_status" is useful to share/reuse logic of committablectx.
This patch is also preparation for issues fixed by subsequent patches.
Some details of changes for tests in this patch:
- some filename lines are omitted in "test-convert-svn-encoding.t",
because they are correctly listed up as "removed" files
those lines are written out in "localrepository.commitctx" for
"modified" and "added" files by "ui.note".
- "| fixbundle" filterring in "test-histedit-fold.t" is omitted to
check lines including "added" correctly
"fixbundle" discards all lines including "added".
filectxfn returns None for removed files, so we have to check for None
before computing the new file content hash for the manifest.
Includes a test that proves this works, by demonstrating that we can
show the diff of an amended commit in the committemplate.
Instead use a magic value, so that we can identify modified or added
nodes correctly when using manifest.diff().
Thanks to Martin von Zweigbergk for catching that we have to update
_buildstatus as well. That part eluded my debugging for some time.
The result of 'hg rm' + 'hg rename' disagreed with the one from
'hg rename --force'. We align them on 'hg move --force' because it agrees with
what 'hg status' says after the commit.
Stopping reporting a modified file as added puts an end to the hg revert confusion in this
situation (issue4458).
However, reporting the file as modified also prevents revert from restoring the copy
source. We fix this in a later changeset.
Git diff also stop reporting the add in the middle of the chain as add. Not
sure how important (and even wrong) it is.
Because the same dictionary was used to (1) get node from parent and (2) store
annotated version, we could end up with buggy values. For example with a chain
of renames:
$ hg mv b c
$ hg mv a b
The value from 'b' would be updated as "<old-a>a", then the value of c would be
updated as "<old-b>a'. With the current dictionary sharing this ends up with:
'<new-c>' == '<old-a>aa'
This value is double-wrong as we should use '<old-b>' and a single 'a'.
We now use a read-only value for lookup. The 'test-rename.t' test is impacted
because such a chained added file is suddenly detected as such.
Note that the exception-catching from the previous branchtip check is moved up
to catch exceptions from the try block surrounding the namespace lookup.
It turns out that maintaining a reference of any sort (even weak!) to the repo
when constructed doesn't work because we may at some point pass in a repoview
filtered by something other than what the initial repo was.
Extensions might want to create new filternames and change what revisions
are considered hidden or shown. This is the case for inhibit that enables
direct access to hidden hashes with the visible-directaccess-nowarn filtername.
By using startswith instead of a direct comparison with 'visible' we
allow extensions to do that and not work directly on the 'visible' filtername
used by core.
Move the code in context._manifestmatches() into a new
manifest.matches(). It's a natural place for the code to live and it
allows other callers to easily use it. It should also make it easier
to optimize the new method in alternative implementations of the
manifest (same reasoning as with manifest.diff()).
We can just modify the status tuple we got from dirstate.status()
instead of deconstructing it and constructing a new instance, thereby
simplifying the code a little.
Letting _dirstatestatus() return an scmutil.status instance also means
that _buildstatus() will always return such an instance, so we can
remove the conversion from the call sites.
It makes no sense to request reverse status (i.e. changes from the
working copy to its parent) and then look at the deleted, unknown or
ignored fields. If you do, you would get the result from the forward
status (changes from parent to the working copy). Instead of giving a
nonsensical answer to a nonsensical question, it seems a little saner
to return empty lists. It might be best if we could prevent the caller
accessing these lists, but it's doubtful it's worth the trouble.
We don't care about filtering out symlinks that have already been
committed with full content, only those that have been accidentally
resolved in the working directory.
In basectx._buildstatus(), we read the manifests for the two revisions
being compared. For "caching reasons" unknown to me, it is better to
read the earlier manifest first, which basectx._prestatus() takes care
of. However, if the 'self' context is a committablectx and the 'other'
context is the parent of the working directory (as in the very common
case of plain "hg status"), there is no need to read any manifests at
all -- all that's needed is the dirstate status. To avoid reading the
manifests, _prestatus() is overridden in committablectx and avoids
calling its super method, and _buildstatus() calls its super method
only if the 'other' context is not the parent of the working
directory.
It seems easier to follow what's happening if we move the pre-fetching
to _buildstatus() just before the place where the manifests are
fetched. We just need to add an extra check that the revision is not
None to handle the case that was previously handled by subclass
overriding. That also makes it safe for committablectx._prestatus() to
call its parent, although the latter now becomes empty, so we won't
bother.
The workingctx method simply calls the super method. The only effect
it has is that it uses a different default argument for the 'other'
argument. The only in-tree caller is patch.diff, which always passes
an argument to the method, so it should be safe to remove the
overriding. Having the default argument depend on the type seems
rather dangerous anyway.
In order not to avoid listing files as both added and deleted, for
example, we check for every file in the manifest if it is in the
_list_ of deleted files. This can get quite slow when there are many
deleted files. Change it to a set to make the containment check
faster. On a somewhat contrived example of the Mozilla repo with the
entire testing/ directory deleted (~14k files), this makes
'hg status --rev .^' go from 26s to 2s.
The comment in workingctx.status() says that "calling 'super' subtly
reveresed the contexts", but that is simply not true, so we should not
be swapping added and removed fields.
Hidden changesets are by far the most common error case and is the only one[1]
that can reach the user. We move to a friendlier message with a hint about how
to access the data anyway. We should probably point to a help topic instead but
we do not have such a topic yet.
Example of the new output
abort: hidden revision '4'!
(use --hidden to access hidden revisions)
[1] Actually, filtering from "served" can also reach the user during certain
exchange operations.
This will help user to debug. A more precise message will be issued
for the most common case ("visible" filter) in the next changesets.
example output:
- abort: filtered revision '4'!
+ abort: filtered revision '4' (not in 'visible' subset)!
We capture FilteredxxxError and issue a FilteredRepoLookupError instead with a
sightly different messsge. The message will likely get more improvement in the
future.
error: filtered revision '4'
We are going to introduce more precise exception classes for filtered nodes. So
we will have to upgrade them to the `RepoLookupError` level here. We wrap the
whole thing into a try/except to ease this future catching. Some of the current
exception catching will be moved in this one. But the current changeset focuses
on code movement only.
Two possible behaviors are defined for handling censored data: abort, and
ignore. When we ignore censored data we return an empty file to callers
requesting the file data.
The status tuple returned from dirstate.status() has an additional
field compared to the other status tuples: lookup/unsure. This field
is just an optimization and not something most callers care about
(they want the resolved value of 'modified' or 'clean'). To prepare
for a single future status type, let's separate out the 'lookup' field
from the rest by having dirstate.status() return a pair: (lookup,
status).
This wraps all the locations of dirstate.setparent with the appropriate
begin/endparentchange calls. This will prevent exceptions during those calls
from causing incoherent dirstates (issue4353).
Rev 2eef89bfd70d switched the contract for filectxfn from "raise IOError if
file is missing" to "return None if file is missing". Out of tree extensions
need to be updated for that, but for extensions interested in compatibility
with both Mercurial <= 3.1 and default, it is next to impossible to introspect
core Mercurial to figure out what to do.
This patch adds a field to memctx for extensions to use.
The internal API used IOError to indicate that a file should be marked as
removed.
There is some correlation between IOError (especially with ENOENT) and files
that should be removed, but using IOErrors to represent file removal internally
required some hacks.
Instead, use the value None to indicate that the file not is present.
Before, spurious IO errors could cause commits that silently removed files.
They will now be reported like all other IO errors so the root cause can be
fixed.
In all the remaining cases the comprehension variable is used for the same
thing as a previous loop variable.
This will mute some pyflakes "list comprehension redefines" warnings.
The value '*' currently designates that bid merge should be used. The best
way to test bid merge is to set preferancestor=* in the configuration file ...
but then it would abort with unknown revision '*' when other code paths ended
up in changectx.ancestor .
Instead, just skip and ignore the value '*' when looking for a preferred
ancestor.
dirstate.normal is the method that marks files as unchanged/normal.
Rev 03dc7365e275 started caching dirstate.normal in order to improve
performance. However, there was an error in the patch: taking the wlock, under
some conditions depending on platform, can cause a new dirstate object to be
created. Caching dirstate.normal before calling wlock would then cause the
fixup calls below to be on the old dirstate object, effectively disappearing
into the ether.
On Unix and Unix-like OSes, the condition under which we create a new dirstate
object is 'the dirstate file has been modified since the last time we opened
it'. This happens pretty rarely, so the object is usually the same -- there's
little impact.
On Windows, the condition is 'always'. This means files in the lookup state are
never marked normal, so the bug has a serious performance impact since all the
files in the lookup state are re-read every time hg status is run.
When reversing a status, trading "added" and "removed" make sense.
Reversing "deleted" and "unknown" does not. We stop doing it.
The reversing is documented in place for the poor soul not even able to remember
the index of all status elements by heart.
By the magic of code movement, we ended up dropping unknown and ignored
information when comparing the working directory with a non-parent revision.
Let's stop doing it and add a test.
Changeset 83ad0e76acc0 introduced a test to validate that file were not reported
twice when both unknown and removed. This behavior change was introduced by
64d05ea3a10f alongside a bug that dropped ignored and unknown completely
(issue4321). As we are going to fix the bug, we need a proper implementation of
the behavior tested in 83ad0e76acc0.
Setting substate to None was an oversight in 5b3c9729fe09 and this patch
corrects it by setting substate to an empty dictionary which matches what
subrepo code expects.
Similar to the previous patch for workingfilectx, this patch will allow
abstracting localrepo.remove / write method to refactor working directory code
but instead operate on files in memory.
This fixes a discrepency for basectx and classes that inherit from it. Now
callers can pass these arguments to any context without an exception being
raised.
For non-working contexts, walk and matches do the same thing. For working
contexts, walk stats all the files and looks for unknown files, while matches
just filters the dirstate by match.
On a repository with over 250,000 files and 700,000 commits, this improves
cases like
hg status --rev <rev> -- <file> # rev is not .
from 2.1 seconds to 1.4 seconds.
There is further scope for improvement here: for a single file or a small set
of files, it is probably more efficient to use filelog linkrevs when possible.
However there will always be cases where that will fail (multiple commits
pointing to the same file revision, removed files...), so this is independently
useful.
When the matcher is exact, there's no reason to iterate over the entire
manifest. It's much more efficient to iterate over the list of files instead.
For a repository with approximately 300,000 files, this speeds up
hg log -l10 --patch --follow for a frequently modified file from 16.5 seconds
to 10.5 seconds.
This was mistakenly moved from workingctx to committablectx in
edbbc56a5e4f. Since the method is querying the dirstate, the only logical place
is for it to reside is in workingctx.
In 4c8873aad79a, memctx was changed to inherit from committablectx, this in
turn added the 'substate' property to memctx. It turns out that the
newcommitphase method tested for this property:
def newcommitphase(ui, ctx):
commitphase = phases.newcommitphase(ui)
substate = getattr(ctx, "substate", None)
if not substate:
return commitphase
Currently, memctx isn't ready to handle substates, nor removed files, so we
explicitly must set substate=None to get the old behavior back. In the future,
we can decide how memctx should play with substate. For now, this fixes
third-party extensions and some internal code dealing with subrepos.
basectx.status may reorder the list after workingctx._poststatus is called,
so workingctx must copy it. Otherwise, wctx.deleted() would return "unknown"
files, for example.
This method is needed to have memfilectx behave like the other file
contexts. We can't just inherit this method because each file context has
different behavior: filectx reads from the filelog, and workingfilectx reads
from the disk. Therefore, we define memfilectx to return the size of the data
in memory.
This patch changes the calling signature of memfilectx's __init__ to fall in
line with the other file contexts.
Calling code and tests have been updated accordingly.
commitablectx has a much more robust implementation of flags() so we will use
that instead of just blindly calling the flags function for the given path.
This is a slight change in definition from memctx returning only modified() but
its parent's definition is more consistent with other contexts' behavior so we
can call this change a slight bugfix and step in the right direction.
This patch marks the start of having memctx inherit from committablectx,
thereby making it a full-fledged context that will eventually grow the ability
to perform diffing and also merging.
In the refactoring of removing localrepo.status, e283486177bd, we accidentally
changed the return type from a tuple to a list. Philosophically, this is
incorrect so we explicitly return a tuple again.
The old code is unneeded now that basectx has the ability to calculate the
status between two context objects.
Now, we use the objected oriented pattern called 'super'.
A future patch will remove the old workingctx.status which caches the status of
the working directory, therefore we now cache this status in the poststatus
hook of committablectx.
Previously, workingctx had custom variables for the unknown, ignored, and clean
list of files of status. These then got moved to committablectx and, after the
refactoring of localrepo.status, are no longer needed. We, therefore, simplify
the whole mess.
As a bonus, we are able to remove the need for having 'assert'.
This is just a copy from localrepo.status and is a small step to removing that
method entirely. The prestatus hook is only called for changectx's, thereby
ensuring that the same behavior is guaranteed.
This patch encapsulate the logic for changing the match.bad function when
comparing against the working directory's parent. Future patches will remove
more of the 'if ... else' blocks in localrepo.status that test for this working
directory parent case.
This patch paves the way to allow a workingctx to override the match object
with a custom 'bad' method for cases where status is sent a directory pattern.
This patch maintains the fast path for workingctx which is to not build a
manifest if the working directory is being compared to its parent since, in
this case, we can just copy the parent manifest.
With this patch, we are one step closer to removing 'if ... else' logic in
localrepo.status for testing if the context is the working directory or
not. Future patches will replace those blocks of code with a call to the
context's _poststatus hook so that each context object will do the right thing
depending on the situation.
With this patch, we are one step closer to removing 'if ... else' logic in
localrepo.status for testing if the context is the working directory or
not. Future patches will replace those blocks of code with a call to the
context's _prestatus hook so that each context object will do the right thing
depending on the situation.
This method is a duplicate of localrepo.mfmatches and sets the stage for
factoring localrepo.status into a context method that will be customizable
based on inheritance and object type.
This patch is a step forward in getting rid of needing to check 'parentworking'
throughout the status method. Eventually, we will use the power of inheritance
to do the correct thing when comparing the working directory with its parent.
This method is mostly a copy from localrepo.status. The custom status method of
workingctx will eventually be absorbed by the refactoring of localrepo.status
to context.status but unfortunately we can't do it in one step.
This patch introduces "editor" argument to "memctx.__init__()", and
moves editor invocation from "makememctx()" to "memctx.__init__()", to
centralize editor invocation into "memctx" object creation.
This relocation is needed, because "makememctx()" requires the "store"
object providing "getfile()" to create "memfilectx" object, and this
prevents some code paths from using "makememctx()" instead of
"memctx.__init__()".
This patch also invokes "localrepository.savecommitmessage()", when
"editor" is specified explicitly, to centralize saving commit message
into "memctx" object creation: passing "cmdutil.commiteditor" as
"editor" can achieve both suppressing editor invocation and saving
into ".hg/last-message.txt" for non empty commit messages.
The current situation is a bit of a layering violation as
merge-specific knowledge is pushed down to lower layers and leaks
merge assumptions into other code paths.
Here, we simply silence the warning with a hack. Both the warning and
the hack will probably go away in the near future when bid merge is
made the default.
Multiple revisions can be specified in merge.preferancestor, separated by
whitespace. First match wins.
This makes it possible to overrule the default of picking the common ancestor
with the lowest hash value among the "best" (introduced in f19507e1bcf2).
This can for instance help with some merges where the 'wrong' ancestor is used.
There will thus be some overlap between this and the problems that can be
solved with a future 'consensus merge'.
Mercurial will show a note like
note: using 40663881a6dd as ancestor of 3b08d01b0ab5 and adfe50279922
alternatively, use --config merge.preferancestor=0f6b37dbe527
when the option is available, listing all the alternative ancestors.
Show a message like
note: using 0f6b37dbe527 as ancestor of adfe50279922 and cf89f02107e5
So far this is just a warning - there is nothing the user can do to select
another ancestor.
When running 'hg cat -r . <file>' it was doing an expensive ctx.walk(m) which
applied the regex to every file in the manifest.
This changes changectx.walk to iterate over just the files in the regex, if no
other patterns are specified. This cuts hg cat time by 50% in our repo and
probably benefits a few other commands as well.
The 'copies' method has no test coverage and calls copies.pathcopies with an
incorrect number of parameters and is thus (fortunately) not used. Kill it.
This patch gets stat object of target path by "vfs.lstat()", and
examines stat object to know the type of it. This follows the way in
"workingctx.add()".
This should be cheaper than original implementation invoking
"lexists()", "isfile()" and "islink()".
This patch also changes paths added to "rejected" list from full path
(referred by "p") to relative one (referred by "f"), when type of
target path is neither file nor symlink.
This change should be reasonable, because the path added to "rejected"
list is relative one, when "OSError" is raised at "lstat()".
Currently, we have basectx that serves as a common ancestor of all contexts. We
will now add a new class commitablectx that will inherit from basectx and will
serve as a common place for code that will be shared between mutable contexts,
e.g. workingctx and memctx.