We are going to introduce some wider caching mechanisms during linkrev
adjustment. As there is no specific reason to not be a method and some
reasons to be a method, let's make it a method.
The logic of walking a manifest to yield files matching a match object is
currently being done by context, not the manifest itself. This moves the walk()
function to both manifestdict and treemanifest. This separate implementation
will also permit differing, optimized implementations for each manifest.
When the manifest revision is stored as a delta against a non-parent revision,
'_adjustlinkrev' could miss some file update because it was using the delta
only. We now use the 'fastread' method that uses the delta only when it makes
sense.
A test showcasing on the of possible issue have been added.
Now that the caching into _status is done in
workingctx._dirstatestatus(), which workingcommitctx._dirstatestatus()
does not call, there is no caching to prevent in _buildstatus(), so
stop overriding it.
Since it's only the dirstate status we cache, it makes more sense to
cache it in the _dirstatestatus() method. Note that this change means
the dirstate status will also be cached when status is requested
between the working copy and some other revision, while we currently
only cache the result if exactly the status between the working copy
and its parent is requested.
b04f57726c73 changed basefilectx.annotate() to directly instantiate new
filectx's instead of going through self.filectx(), this breaks extensions that
replace the filectx class, and would also break future uses that would need
memfilectx's.
This further simplifies the status code.
This simplification comes at a slight performance cost for `hg
export`. Before, on mozilla-central:
perfmanifest tip
! wall 0.265977 comb 0.260000 user 0.240000 sys 0.020000 (best of 38)
perftags
! result: 162
! wall 0.007172 comb 0.010000 user 0.000000 sys 0.010000 (best of 403)
perfstatus
! wall 0.422302 comb 0.420000 user 0.260000 sys 0.160000 (best of 24)
hgperf export tip
! wall 0.148706 comb 0.150000 user 0.150000 sys 0.000000 (best of 65)
after, same repo:
perfmanifest tip
! wall 0.267143 comb 0.270000 user 0.250000 sys 0.020000 (best of 37)
perftags
! result: 162
! wall 0.006943 comb 0.010000 user 0.000000 sys 0.010000 (best of 397)
perfstatus
! wall 0.411198 comb 0.410000 user 0.260000 sys 0.150000 (best of 24)
hgperf export tip
! wall 0.173229 comb 0.170000 user 0.170000 sys 0.000000 (best of 55)
The next set of patches introduces a new manifest type implemented
almost entirely in C, and more than makes up for the performance hit
incurred in this change.
We can do a little tiny bit better by enhancing manifest.diff to
optionally include files that are in both sides. This will be done in
a followup patch.
This is necessary to annotate workingctx revision. basefilectx.linkrev() can't
be used because committablefilectx has no filelog.
committablefilectx looks for parents() from self._changectx. That means fctx
is linked to self._changectx, so linkrev() can simply be aliased to rev().
When both '.' (the working copy root) and an explicit file (or files)
are in match.files(), we only walk the explicitly listed files. This
is because we remove the '.' from the set too early. Move later and
add a test for it. Before this change, the last test would print only
"3".
This is similar to 327902ff25df in motivation. All contexts now have this
method, so the rest of the 'ctx._repo' uses can be converted without worrying
about what type of context it is.
A couple places in context currently use "x in self._dirs" to check for the
existence of the directory, but this requires that all directories be loaded
into a dict. Calling hasdir() instead puts the work on the the manifest to
check for the existence of a directory in the most efficient manner.
This is a convenience method that calls to its manifest's hasdir(). There are
parts of context that check to see if a directory exists, and this method will
let implementations of manifest provide an optimal way to find a particular
directory.
Although Python supports `X = Y if COND else Z`, this was only
introduced in Python 2.5. Since we have to support Python 2.4, it was
a very common thing to write instead `X = COND and Y or Z`, which is a
bit obscure at a glance. It requires some intricate knowledge of
Python to understand how to parse these one-liners.
We change instead all of these one-liners to 4-liners. This was
executed with the following perlism:
find -name "*.py" -exec perl -pi -e 's,(\s*)([\.\w]+) = \(?(\S+)\s+and\s+(\S*)\)?\s+or\s+(\S*)$,$1if $3:\n$1 $2 = $4\n$1else:\n$1 $2 = $5,' {} \;
I tweaked the following cases from the automatic Perl output:
prev = (parents and parents[0]) or nullid
port = (use_ssl and 443 or 80)
cwd = (pats and repo.getcwd()) or ''
rename = fctx and webutil.renamelink(fctx) or []
ctx = fctx and fctx or ctx
self.base = (mapfile and os.path.dirname(mapfile)) or ''
I also added some newlines wherever they seemd appropriate for readability
There are probably a few ersatz ternary operators still in the code
somewhere, lurking away from the power of a simple regex.
The workingctx class was using dirstate.dirs() as it's implementation. The
sparse extension maintains a pruned down version of the dirstate, so this
resulted in the workingctx reporting an incorrect listing of directories
during merge calculations (it was detecting directory renames when it
shouldn't have).
The fix is to use the default implementation, which uses workingctx._manifest,
which unions the manifest with the dirstate to produce the correct overall
picture. This also produces more accurate output since it will no longer
return directories that have been entirely deleted in the dirstate.
Tests will be added to the sparse extension to detect regressions for this.
There is no reason to read obsolescence markers when doing a plain 'hg
status' without --rev. Use the unfiltered repo when initializing
context._rev to speed things up. This speeds up 'hg status' from
1.342s to 0.080s on my repo with ~110k markers.
Tracked files that are deleted should always be reported as such, no
matter what their state was in earlier revisions. This is encoded in
in two conditions in the loop in basectx._buildstatus() for modified
and added files, but the check is missing for clean files. We should
check for clean files too, but instead of adding the check in a third
place, move it earlier and skip most of the loop body for deleted
files.
When calculating status involving the working copy and a revision
other than the parent of the working copy, the files that are not in
the working context manifest ('mf2' in the basectx._buildstatus())
will be reported as removed (note that deleted files _are_ in the
working context manifest). However, if the file is reported as deleted
in the dirstate, it will get that status too (as shown by failing
tests).
Fix by removing deleted files from the 'removed' list after the main
loop in _buildstatus().
Before this patch, the result of "status()" on "workingcommitctx" may
incorrectly contain files other than ones to be committed, because
"workingctx._dirstatestatus()" returns the result of
"dirstate.status()" directly.
For correct matching, this patch overrides "_dirstatestatus" in
"workingcommitctx" and makes it return matched files only in
"self._status".
This patch uses empty list for "deleted", "unknown" and "ignored" of
status, because status between "changectx"s also makes them empty.
Before this patch, "status()" on "workingcommitctx" with "always
match" object causes breaking "self._status" in
"workingctx._buildstatus()", because "workingctx._buildstatus()"
caches the result of "dirstate.status()" into "self._status" for
efficiency, even though it should be fixed at construction for
committing.
For example, template function "diff()" without any patterns in
"committemplate" implies "status()" on "workingcommitctx" with "always
match" object, via "basectx.diff()" and "patch.diff()".
Then, broken "self._status" causes committing unexpected files.
To avoid breaking already fixed "self._status" at "ctx.status()", this
patch overrides "_buildstatus" in "workingcommitctx".
This patch doesn't write out the result of template function "diff()"
in "committemplate" in "test-commit.t", because matching against files
to be committed still has an issue fixed in subsequent patch.
Before this patch, "workingctx" is also used for the context to be
committed. But "workingctx" works incorrectly in some cases.
For example, even when only some of changed files in the working
directory are committed, "status()" on "workingctx" object for
committing recognizes files not to be committed as changed, too.
As the preparation for fixing these issues, this patch chooses adding
new class "workingcommitctx" for exact context to be committed,
because switching by the flag (like "self._fixedstatus" or so) in some
code paths of "workingctx" is less readable and maintenancable.
Before this patch, "workingctx.status" caches the result of
"dirstate.status" directly into "self._status".
But "dirstate.status" is invoked with False "list*" arguments in
normal "self._status" accessing route, and this makes
"unknown"/"ignored"/"clean" of status empty.
This may cause unexpected result of code paths internally accessing to
them (accessors for external usage are already removed by previous patch).
This patch makes "unknown"/"ignored"/"clean" of cached status empty
for equivalence. Making them empty is executed only when at least one
of "unknown", "ignored" or "clean" has files, for efficiency.
The annotate logic now use the new 'introrev' method to bootstrap its traversal.
This catches issues from linkrev-shadowing of the changeset introducing the
version of a file in source changeset.
More tests have been added to display pathological cases.
The previous changeset properly fixed the ancestors computation, but we need to
ensure that the initial filectx is also using the right changeset.
When asking for log or annotation from a certain point, the first step is to
define the changeset that introduced the current file version. We cannot just
pick the "starting point" changesets as it may just "use" the file revision,
unchanged.
Currently, we were using 'linkrev' for this purpose, but this exposes us to
unexpected branch-jumping when the revision introducing the starting point
version is itself linkrev-shadowed. So we need to take the topology into
account again. Therefore, we introduce an 'introrev' function, returning the
changeset which introduced the file change in the current changeset.
This function will be used to fix linkrev-related issues when bootstrapping 'hg
log --follow' and 'hg annotate'.
It reuses the '_adjustlinkrev' function, extending it to allow introspection of
the initial changeset too. In the previous usage of the '_adjustlinkrev' the
starting rev was always using a children file revisions, so it could be safely
ignored in the search. In this case, the starting point is using the revision
of the file we are looking, and may be the changeset we are looking for.
Because of the way filenodes are computed, you can have multiple changesets
"introducing" the same file revision. For example, in the changeset graph
below, changeset 2 and 3 both change a file -to- and -from- the same content.
o 3: content = new
|
| o 2: content = new
|/
o 1: content = old
In such cases, the file revision is create once, when 2 is added, and just reused
for 3. So the file change in '3' (from "old" to "new)" has no linkrev pointing
to it). We'll call this situation "linkrev-shadowing". As the linkrev is used for
optimization purposes when walking a file history, the linkrev-shadowing
results in an unexpected jump to another branch during such a walk.. This leads to
multiple bugs with log, annotate and rename detection.
One element to fix such bugs is to ensure that walking the file history sticks on
the same topology as the changeset's history. For this purpose, we extend the
logic in 'basefilectx.parents' so that it always defines the proper changeset
to associate the parent file revision with. This "proper" changeset has to be an
ancestor of the changeset associated with the child file revision.
This logic is performed in the '_adjustlinkrev' function. This function is
given the starting changeset and all the information regarding the parent file
revision. If the linkrev for the file revision is an ancestor of the starting
changeset, the linkrev is valid and will be used. If it is not, we detected a
topological jump caused by linkrev shadowing, we are going to walk the
ancestors of the starting changeset until we find one setting the file to the
revision we are trying to create.
The performance impact appears acceptable:
- We are walking the changelog once for each filelog traversal (as there should
be no overlap between searches),
- changelog traversal itself is fairly cheap, compared to what is likely going
to be perform on the result on the filelog traversal,
- We only touch the manifest for ancestors touching the file, And such
changesets are likely to be the one introducing the file. (except in
pathological cases involving merge),
- We use manifest diff instead of full manifest unpacking to check manifest
content, so it does not involve applying multiple diffs in most case.
- linkrev shadowing is not the common case.
Tests for fixed issues in log, annotate and rename detection have been
added.
But this changeset does not solve all problems. It fixes -ancestry-
computation, but if the linkrev-shadowed changesets is the starting one, we'll
still get things wrong. We'll have to fix the bootstrapping of such operations
in a later changeset. Also, the usage of `hg log FILE` without --follow still
has issues with linkrev pointing to hidden changesets, because it relies on the
`filelog` revset which implement its own traversal logic that is still to be
fixed.
Thanks goes to:
- Matt Mackall: for nudging me in the right direction
- Julien Cristau and Rémi Cardona: for keep telling me linkrev bug were an
evolution show stopper for 3 years.
- Durham Goode: for finding a new linkrev issue every few weeks
- Mads Kiilerich: for that last rename bug who raise this topic over my
anoyance limit.
There are two caching routes for (propertycache-ed) "_status" below in
committablectx:
- invoking "status()":
"dirstate.status()" is invoked, and the result of it is cached
into "_status". In this case, any of "listignored", "listclean"
and "listunknown" may be True.
- accessing "_status" directly before "status()":
Own "status()" is invoked, but all of "listignored", "listclean"
and "listunknown" arguments are False, in this case.
"ignored"/"clean"/"unknown" accessor methods of "committablectx" use
corresponded fields of "_status", but these fields aren't reliable,
because these fields are empty when:
- "_status" method is executed before accessors, or
- "status()" is executed with "list*=False" before accessors
In addition to it, these accessors aren't used in the recent Mercurial
implementation. At least, removing them doesn't cause any test
failures.
Before this patch, "workingctx.status" always replaces "self._status"
by the recent result, even though:
- status isn't calculated against the parent of the working directory, or
- specified "match" isn't "always" one
(status is only visible partially)
If "workingctx" object is shared between some procedures indirectly
referring "ctx._status", this incorrect caching may cause unexpected
result: for example, "ctx._status" is used via "manifest()", "files()"
and so on.
To cache "self._status" correctly at "workingctx.status", this patch
overwrites "self._status" in "workingctx._buildstatus" only when:
- status is calculated against the parent of the working directory, and
- specified "match" is "always" one
This patch can be applied (and effective) only on default branch,
because procedure around "basectx.status" is much different between
stable and default: for example, overwriting "self._status" itself is
executed not in "workingctx._buildstatus" but in
"workingctx._poststatus", on stable branch.
we are going to need this filelog for the linkrev adjustment, so we better
normalise the list and have the filelog in all case.
This is done in a previous changeset to help readability.
We are going to introduce a linkrev-correction phases when computing parents.
It will be more convenient to have the nullid parent filtered out earlier. I
had to make a minimal adjustment to the rename handling logic to keep it
functional. That logic have been documented in the process since it took me
some time to check all the cases out.
workingctx.ancestors() was not returning the dirstate parents as part of the
result set. The only place this function is used is for copy detection when
committing a file, and that code already checks the parents manually, so this
change has no affect at the moment.
I found it while playing around with changing how copy detection works.
Before this patch, "memctx._manifest" updates all entries in the
(parent) manifest. But this is inefficiency, because almost all files
may be clean in that context.
On the other hand, just updating entries for changed "files" specified
at construction causes unexpected abortion, when there is at least one
newly removed file (see issue4470 for detail).
To calculate manifest more efficiently, this patch replaces
"pman.iteritems()" for the loop by "self._status.modified" to avoid
updating entries for clean or removed files
Examination of removal is also omitted, because removed files aren't
treated in this loop (= "self[f]" returns not None always).
Before this patch, "memctx._manifest" tries to get (and use normally)
filectx also for newly-removed files, even though "memctx.filectx()"
returns None for such files.
To calculate manifest correctly even with newly-removed files, this
patch does:
- replace "man.iteritems()" for the loop by "self._status.modified"
to avoid accessing itself to newly removed files
this also reduces loop cost for large manifest.
- remove files in "self._status.removed" from the manifest
In this patch, amending is confirmed twice to examine both (1) newly
removed files and (2) ones already removed in amended revision.
Before this patch, "memctx._manifest" calculates the manifest
according to the 1st parent. This causes the disappearance
of newly added files from the manifest.
For example, if newly added files aren't listed up in manifest of
memctx, they aren't listed up in "added" field of "status" returned by
"ctx.status()", and "{diff()}" (= "patch.diff") in "committemplate"
shows nothing for them.
To calculate manifest including newly added files correctly, this
patch puts newly added files (= ones in "self._status.added") into the
manifest.
Some details of changes for "test-commit-amend.t" in this patch:
- "touch foo" is replaced by "echo foo > foo", because newly added
empty file can't be shown in "diff()" output without "diff.git"
configuration
- amending is confirmed twice to examine both (1) newly added files
and (2) ones already added in amended revision
Before this patch, "memctx._status" is initialized by "(files, [], [],
[], [], [], [])" and this causes "memctx.modified" to include not
only modified files but also added and removed ones incorrectly.
This patch adds "_status" method to calculate exact status being
committed according to "files" specified at construction time.
Exact "_status" is useful to share/reuse logic of committablectx.
This patch is also preparation for issues fixed by subsequent patches.
Some details of changes for tests in this patch:
- some filename lines are omitted in "test-convert-svn-encoding.t",
because they are correctly listed up as "removed" files
those lines are written out in "localrepository.commitctx" for
"modified" and "added" files by "ui.note".
- "| fixbundle" filterring in "test-histedit-fold.t" is omitted to
check lines including "added" correctly
"fixbundle" discards all lines including "added".
filectxfn returns None for removed files, so we have to check for None
before computing the new file content hash for the manifest.
Includes a test that proves this works, by demonstrating that we can
show the diff of an amended commit in the committemplate.
Instead use a magic value, so that we can identify modified or added
nodes correctly when using manifest.diff().
Thanks to Martin von Zweigbergk for catching that we have to update
_buildstatus as well. That part eluded my debugging for some time.
The result of 'hg rm' + 'hg rename' disagreed with the one from
'hg rename --force'. We align them on 'hg move --force' because it agrees with
what 'hg status' says after the commit.
Stopping reporting a modified file as added puts an end to the hg revert confusion in this
situation (issue4458).
However, reporting the file as modified also prevents revert from restoring the copy
source. We fix this in a later changeset.
Git diff also stop reporting the add in the middle of the chain as add. Not
sure how important (and even wrong) it is.
Because the same dictionary was used to (1) get node from parent and (2) store
annotated version, we could end up with buggy values. For example with a chain
of renames:
$ hg mv b c
$ hg mv a b
The value from 'b' would be updated as "<old-a>a", then the value of c would be
updated as "<old-b>a'. With the current dictionary sharing this ends up with:
'<new-c>' == '<old-a>aa'
This value is double-wrong as we should use '<old-b>' and a single 'a'.
We now use a read-only value for lookup. The 'test-rename.t' test is impacted
because such a chained added file is suddenly detected as such.
Note that the exception-catching from the previous branchtip check is moved up
to catch exceptions from the try block surrounding the namespace lookup.
It turns out that maintaining a reference of any sort (even weak!) to the repo
when constructed doesn't work because we may at some point pass in a repoview
filtered by something other than what the initial repo was.
Extensions might want to create new filternames and change what revisions
are considered hidden or shown. This is the case for inhibit that enables
direct access to hidden hashes with the visible-directaccess-nowarn filtername.
By using startswith instead of a direct comparison with 'visible' we
allow extensions to do that and not work directly on the 'visible' filtername
used by core.
Move the code in context._manifestmatches() into a new
manifest.matches(). It's a natural place for the code to live and it
allows other callers to easily use it. It should also make it easier
to optimize the new method in alternative implementations of the
manifest (same reasoning as with manifest.diff()).
We can just modify the status tuple we got from dirstate.status()
instead of deconstructing it and constructing a new instance, thereby
simplifying the code a little.
Letting _dirstatestatus() return an scmutil.status instance also means
that _buildstatus() will always return such an instance, so we can
remove the conversion from the call sites.
It makes no sense to request reverse status (i.e. changes from the
working copy to its parent) and then look at the deleted, unknown or
ignored fields. If you do, you would get the result from the forward
status (changes from parent to the working copy). Instead of giving a
nonsensical answer to a nonsensical question, it seems a little saner
to return empty lists. It might be best if we could prevent the caller
accessing these lists, but it's doubtful it's worth the trouble.
We don't care about filtering out symlinks that have already been
committed with full content, only those that have been accidentally
resolved in the working directory.
In basectx._buildstatus(), we read the manifests for the two revisions
being compared. For "caching reasons" unknown to me, it is better to
read the earlier manifest first, which basectx._prestatus() takes care
of. However, if the 'self' context is a committablectx and the 'other'
context is the parent of the working directory (as in the very common
case of plain "hg status"), there is no need to read any manifests at
all -- all that's needed is the dirstate status. To avoid reading the
manifests, _prestatus() is overridden in committablectx and avoids
calling its super method, and _buildstatus() calls its super method
only if the 'other' context is not the parent of the working
directory.
It seems easier to follow what's happening if we move the pre-fetching
to _buildstatus() just before the place where the manifests are
fetched. We just need to add an extra check that the revision is not
None to handle the case that was previously handled by subclass
overriding. That also makes it safe for committablectx._prestatus() to
call its parent, although the latter now becomes empty, so we won't
bother.
The workingctx method simply calls the super method. The only effect
it has is that it uses a different default argument for the 'other'
argument. The only in-tree caller is patch.diff, which always passes
an argument to the method, so it should be safe to remove the
overriding. Having the default argument depend on the type seems
rather dangerous anyway.
In order not to avoid listing files as both added and deleted, for
example, we check for every file in the manifest if it is in the
_list_ of deleted files. This can get quite slow when there are many
deleted files. Change it to a set to make the containment check
faster. On a somewhat contrived example of the Mozilla repo with the
entire testing/ directory deleted (~14k files), this makes
'hg status --rev .^' go from 26s to 2s.
The comment in workingctx.status() says that "calling 'super' subtly
reveresed the contexts", but that is simply not true, so we should not
be swapping added and removed fields.
Hidden changesets are by far the most common error case and is the only one[1]
that can reach the user. We move to a friendlier message with a hint about how
to access the data anyway. We should probably point to a help topic instead but
we do not have such a topic yet.
Example of the new output
abort: hidden revision '4'!
(use --hidden to access hidden revisions)
[1] Actually, filtering from "served" can also reach the user during certain
exchange operations.
This will help user to debug. A more precise message will be issued
for the most common case ("visible" filter) in the next changesets.
example output:
- abort: filtered revision '4'!
+ abort: filtered revision '4' (not in 'visible' subset)!
We capture FilteredxxxError and issue a FilteredRepoLookupError instead with a
sightly different messsge. The message will likely get more improvement in the
future.
error: filtered revision '4'
We are going to introduce more precise exception classes for filtered nodes. So
we will have to upgrade them to the `RepoLookupError` level here. We wrap the
whole thing into a try/except to ease this future catching. Some of the current
exception catching will be moved in this one. But the current changeset focuses
on code movement only.
Two possible behaviors are defined for handling censored data: abort, and
ignore. When we ignore censored data we return an empty file to callers
requesting the file data.
The status tuple returned from dirstate.status() has an additional
field compared to the other status tuples: lookup/unsure. This field
is just an optimization and not something most callers care about
(they want the resolved value of 'modified' or 'clean'). To prepare
for a single future status type, let's separate out the 'lookup' field
from the rest by having dirstate.status() return a pair: (lookup,
status).
This wraps all the locations of dirstate.setparent with the appropriate
begin/endparentchange calls. This will prevent exceptions during those calls
from causing incoherent dirstates (issue4353).
Rev 2eef89bfd70d switched the contract for filectxfn from "raise IOError if
file is missing" to "return None if file is missing". Out of tree extensions
need to be updated for that, but for extensions interested in compatibility
with both Mercurial <= 3.1 and default, it is next to impossible to introspect
core Mercurial to figure out what to do.
This patch adds a field to memctx for extensions to use.
The internal API used IOError to indicate that a file should be marked as
removed.
There is some correlation between IOError (especially with ENOENT) and files
that should be removed, but using IOErrors to represent file removal internally
required some hacks.
Instead, use the value None to indicate that the file not is present.
Before, spurious IO errors could cause commits that silently removed files.
They will now be reported like all other IO errors so the root cause can be
fixed.
In all the remaining cases the comprehension variable is used for the same
thing as a previous loop variable.
This will mute some pyflakes "list comprehension redefines" warnings.
The value '*' currently designates that bid merge should be used. The best
way to test bid merge is to set preferancestor=* in the configuration file ...
but then it would abort with unknown revision '*' when other code paths ended
up in changectx.ancestor .
Instead, just skip and ignore the value '*' when looking for a preferred
ancestor.
dirstate.normal is the method that marks files as unchanged/normal.
Rev 03dc7365e275 started caching dirstate.normal in order to improve
performance. However, there was an error in the patch: taking the wlock, under
some conditions depending on platform, can cause a new dirstate object to be
created. Caching dirstate.normal before calling wlock would then cause the
fixup calls below to be on the old dirstate object, effectively disappearing
into the ether.
On Unix and Unix-like OSes, the condition under which we create a new dirstate
object is 'the dirstate file has been modified since the last time we opened
it'. This happens pretty rarely, so the object is usually the same -- there's
little impact.
On Windows, the condition is 'always'. This means files in the lookup state are
never marked normal, so the bug has a serious performance impact since all the
files in the lookup state are re-read every time hg status is run.
When reversing a status, trading "added" and "removed" make sense.
Reversing "deleted" and "unknown" does not. We stop doing it.
The reversing is documented in place for the poor soul not even able to remember
the index of all status elements by heart.
By the magic of code movement, we ended up dropping unknown and ignored
information when comparing the working directory with a non-parent revision.
Let's stop doing it and add a test.
Changeset 83ad0e76acc0 introduced a test to validate that file were not reported
twice when both unknown and removed. This behavior change was introduced by
64d05ea3a10f alongside a bug that dropped ignored and unknown completely
(issue4321). As we are going to fix the bug, we need a proper implementation of
the behavior tested in 83ad0e76acc0.
Setting substate to None was an oversight in 5b3c9729fe09 and this patch
corrects it by setting substate to an empty dictionary which matches what
subrepo code expects.
Similar to the previous patch for workingfilectx, this patch will allow
abstracting localrepo.remove / write method to refactor working directory code
but instead operate on files in memory.