Rev 2eef89bfd70d switched the contract for filectxfn from "raise IOError if
file is missing" to "return None if file is missing". Out of tree extensions
need to be updated for that, but for extensions interested in compatibility
with both Mercurial <= 3.1 and default, it is next to impossible to introspect
core Mercurial to figure out what to do.
This patch adds a field to memctx for extensions to use.
The internal API used IOError to indicate that a file should be marked as
removed.
There is some correlation between IOError (especially with ENOENT) and files
that should be removed, but using IOErrors to represent file removal internally
required some hacks.
Instead, use the value None to indicate that the file not is present.
Before, spurious IO errors could cause commits that silently removed files.
They will now be reported like all other IO errors so the root cause can be
fixed.
In all the remaining cases the comprehension variable is used for the same
thing as a previous loop variable.
This will mute some pyflakes "list comprehension redefines" warnings.
The value '*' currently designates that bid merge should be used. The best
way to test bid merge is to set preferancestor=* in the configuration file ...
but then it would abort with unknown revision '*' when other code paths ended
up in changectx.ancestor .
Instead, just skip and ignore the value '*' when looking for a preferred
ancestor.
dirstate.normal is the method that marks files as unchanged/normal.
Rev 03dc7365e275 started caching dirstate.normal in order to improve
performance. However, there was an error in the patch: taking the wlock, under
some conditions depending on platform, can cause a new dirstate object to be
created. Caching dirstate.normal before calling wlock would then cause the
fixup calls below to be on the old dirstate object, effectively disappearing
into the ether.
On Unix and Unix-like OSes, the condition under which we create a new dirstate
object is 'the dirstate file has been modified since the last time we opened
it'. This happens pretty rarely, so the object is usually the same -- there's
little impact.
On Windows, the condition is 'always'. This means files in the lookup state are
never marked normal, so the bug has a serious performance impact since all the
files in the lookup state are re-read every time hg status is run.
When reversing a status, trading "added" and "removed" make sense.
Reversing "deleted" and "unknown" does not. We stop doing it.
The reversing is documented in place for the poor soul not even able to remember
the index of all status elements by heart.
By the magic of code movement, we ended up dropping unknown and ignored
information when comparing the working directory with a non-parent revision.
Let's stop doing it and add a test.
Changeset 83ad0e76acc0 introduced a test to validate that file were not reported
twice when both unknown and removed. This behavior change was introduced by
64d05ea3a10f alongside a bug that dropped ignored and unknown completely
(issue4321). As we are going to fix the bug, we need a proper implementation of
the behavior tested in 83ad0e76acc0.
Setting substate to None was an oversight in 5b3c9729fe09 and this patch
corrects it by setting substate to an empty dictionary which matches what
subrepo code expects.
Similar to the previous patch for workingfilectx, this patch will allow
abstracting localrepo.remove / write method to refactor working directory code
but instead operate on files in memory.
This fixes a discrepency for basectx and classes that inherit from it. Now
callers can pass these arguments to any context without an exception being
raised.
For non-working contexts, walk and matches do the same thing. For working
contexts, walk stats all the files and looks for unknown files, while matches
just filters the dirstate by match.
On a repository with over 250,000 files and 700,000 commits, this improves
cases like
hg status --rev <rev> -- <file> # rev is not .
from 2.1 seconds to 1.4 seconds.
There is further scope for improvement here: for a single file or a small set
of files, it is probably more efficient to use filelog linkrevs when possible.
However there will always be cases where that will fail (multiple commits
pointing to the same file revision, removed files...), so this is independently
useful.
When the matcher is exact, there's no reason to iterate over the entire
manifest. It's much more efficient to iterate over the list of files instead.
For a repository with approximately 300,000 files, this speeds up
hg log -l10 --patch --follow for a frequently modified file from 16.5 seconds
to 10.5 seconds.
This was mistakenly moved from workingctx to committablectx in
edbbc56a5e4f. Since the method is querying the dirstate, the only logical place
is for it to reside is in workingctx.
In 4c8873aad79a, memctx was changed to inherit from committablectx, this in
turn added the 'substate' property to memctx. It turns out that the
newcommitphase method tested for this property:
def newcommitphase(ui, ctx):
commitphase = phases.newcommitphase(ui)
substate = getattr(ctx, "substate", None)
if not substate:
return commitphase
Currently, memctx isn't ready to handle substates, nor removed files, so we
explicitly must set substate=None to get the old behavior back. In the future,
we can decide how memctx should play with substate. For now, this fixes
third-party extensions and some internal code dealing with subrepos.
basectx.status may reorder the list after workingctx._poststatus is called,
so workingctx must copy it. Otherwise, wctx.deleted() would return "unknown"
files, for example.
This method is needed to have memfilectx behave like the other file
contexts. We can't just inherit this method because each file context has
different behavior: filectx reads from the filelog, and workingfilectx reads
from the disk. Therefore, we define memfilectx to return the size of the data
in memory.
This patch changes the calling signature of memfilectx's __init__ to fall in
line with the other file contexts.
Calling code and tests have been updated accordingly.
commitablectx has a much more robust implementation of flags() so we will use
that instead of just blindly calling the flags function for the given path.
This is a slight change in definition from memctx returning only modified() but
its parent's definition is more consistent with other contexts' behavior so we
can call this change a slight bugfix and step in the right direction.
This patch marks the start of having memctx inherit from committablectx,
thereby making it a full-fledged context that will eventually grow the ability
to perform diffing and also merging.
In the refactoring of removing localrepo.status, e283486177bd, we accidentally
changed the return type from a tuple to a list. Philosophically, this is
incorrect so we explicitly return a tuple again.
The old code is unneeded now that basectx has the ability to calculate the
status between two context objects.
Now, we use the objected oriented pattern called 'super'.
A future patch will remove the old workingctx.status which caches the status of
the working directory, therefore we now cache this status in the poststatus
hook of committablectx.
Previously, workingctx had custom variables for the unknown, ignored, and clean
list of files of status. These then got moved to committablectx and, after the
refactoring of localrepo.status, are no longer needed. We, therefore, simplify
the whole mess.
As a bonus, we are able to remove the need for having 'assert'.
This is just a copy from localrepo.status and is a small step to removing that
method entirely. The prestatus hook is only called for changectx's, thereby
ensuring that the same behavior is guaranteed.
This patch encapsulate the logic for changing the match.bad function when
comparing against the working directory's parent. Future patches will remove
more of the 'if ... else' blocks in localrepo.status that test for this working
directory parent case.
This patch paves the way to allow a workingctx to override the match object
with a custom 'bad' method for cases where status is sent a directory pattern.
This patch maintains the fast path for workingctx which is to not build a
manifest if the working directory is being compared to its parent since, in
this case, we can just copy the parent manifest.
With this patch, we are one step closer to removing 'if ... else' logic in
localrepo.status for testing if the context is the working directory or
not. Future patches will replace those blocks of code with a call to the
context's _poststatus hook so that each context object will do the right thing
depending on the situation.
With this patch, we are one step closer to removing 'if ... else' logic in
localrepo.status for testing if the context is the working directory or
not. Future patches will replace those blocks of code with a call to the
context's _prestatus hook so that each context object will do the right thing
depending on the situation.
This method is a duplicate of localrepo.mfmatches and sets the stage for
factoring localrepo.status into a context method that will be customizable
based on inheritance and object type.
This patch is a step forward in getting rid of needing to check 'parentworking'
throughout the status method. Eventually, we will use the power of inheritance
to do the correct thing when comparing the working directory with its parent.
This method is mostly a copy from localrepo.status. The custom status method of
workingctx will eventually be absorbed by the refactoring of localrepo.status
to context.status but unfortunately we can't do it in one step.
This patch introduces "editor" argument to "memctx.__init__()", and
moves editor invocation from "makememctx()" to "memctx.__init__()", to
centralize editor invocation into "memctx" object creation.
This relocation is needed, because "makememctx()" requires the "store"
object providing "getfile()" to create "memfilectx" object, and this
prevents some code paths from using "makememctx()" instead of
"memctx.__init__()".
This patch also invokes "localrepository.savecommitmessage()", when
"editor" is specified explicitly, to centralize saving commit message
into "memctx" object creation: passing "cmdutil.commiteditor" as
"editor" can achieve both suppressing editor invocation and saving
into ".hg/last-message.txt" for non empty commit messages.
The current situation is a bit of a layering violation as
merge-specific knowledge is pushed down to lower layers and leaks
merge assumptions into other code paths.
Here, we simply silence the warning with a hack. Both the warning and
the hack will probably go away in the near future when bid merge is
made the default.
Multiple revisions can be specified in merge.preferancestor, separated by
whitespace. First match wins.
This makes it possible to overrule the default of picking the common ancestor
with the lowest hash value among the "best" (introduced in f19507e1bcf2).
This can for instance help with some merges where the 'wrong' ancestor is used.
There will thus be some overlap between this and the problems that can be
solved with a future 'consensus merge'.
Mercurial will show a note like
note: using 40663881a6dd as ancestor of 3b08d01b0ab5 and adfe50279922
alternatively, use --config merge.preferancestor=0f6b37dbe527
when the option is available, listing all the alternative ancestors.
Show a message like
note: using 0f6b37dbe527 as ancestor of adfe50279922 and cf89f02107e5
So far this is just a warning - there is nothing the user can do to select
another ancestor.
When running 'hg cat -r . <file>' it was doing an expensive ctx.walk(m) which
applied the regex to every file in the manifest.
This changes changectx.walk to iterate over just the files in the regex, if no
other patterns are specified. This cuts hg cat time by 50% in our repo and
probably benefits a few other commands as well.
The 'copies' method has no test coverage and calls copies.pathcopies with an
incorrect number of parameters and is thus (fortunately) not used. Kill it.
This patch gets stat object of target path by "vfs.lstat()", and
examines stat object to know the type of it. This follows the way in
"workingctx.add()".
This should be cheaper than original implementation invoking
"lexists()", "isfile()" and "islink()".
This patch also changes paths added to "rejected" list from full path
(referred by "p") to relative one (referred by "f"), when type of
target path is neither file nor symlink.
This change should be reasonable, because the path added to "rejected"
list is relative one, when "OSError" is raised at "lstat()".
Currently, we have basectx that serves as a common ancestor of all contexts. We
will now add a new class commitablectx that will inherit from basectx and will
serve as a common place for code that will be shared between mutable contexts,
e.g. workingctx and memctx.
We change the hardcoded 'filectx' to instead use type(self).__name__ so that objects that
inherit from basefilectx in the future will be able to use the same representation.
Similar to the refactoring of context, we split common logic from filectx to a
parent class called basefilectx that will be inherited by filectx,
workingfilectx, and memfilectx. This will allow a clear disinction of all the
file contexts:
- filectx: read-only access to a filerevision that is already present in the repo,
- workingfilectx: a filecontext that represents files from the working directory,
- memfilectx: a filecontext that represents files in-memory
The refactoring of all the context objects allows us to simply pass a basectx
to the __new__ constructor and have it return the same object without
allocating new memory.
We change the hardcoded 'changectx' to instead use type(self).__name__ so that
objects that inherit from basectx in the future will be able to use the same
representation.
At the moment, there is no simple way to check if an object is a context
because there is no common parent class. If there were, we could use
'isinstance' everywhere. Simply having memctx inherit from workingctx or
changectx would allow the use of 'isinstance' but that could lead to some
confusing situations of reading the code since we have three distinct concepts
of a context:
- changectx represents a changeset *already* in the repo, and is therefore immutable
- workingctx represents changes on disk in the working directory
- memctx represents changes solely in memory which may or may not be on disk
Therefore, I propose refactoring context.py to have all three contexts inherit
from a parent class 'basectx'.
The annotate algorithm used a custom parents() function to try to reuse
filectx and filelogs. I simplified it a bit to rely more heavily on the
self.parents() which makes it work well with alternative filectx
implementations. I tested performance on a file with 5000+ revisions
but no renames, and on a file with 500 revisions repeating a series of
4 edits+renames and saw zero performance hit. In fact, it was reliably a
couple milliseconds faster now.
Added the perfannotate command to contrib/perf.py for future use.
Previously we used 'if filelog:' to check if the filelog existed. If the
instance did exist, this pattern then calls len() on the filelog to see
if it is empty. I'm developing a filelog replacement that doesn't have
len() implemented, so it's better to do an explicit 'is not None' check
here instead.
Also change _changeid() to return the _changeid attribute if it has it.
Previously it would try to obtain it from the _changectx(), and if that
did not exist it would construct the _changectx() using the linkrev. In
the extension I'm working on, filectx's don't have easy access to linkrevs
so avoiding this when possible is better.
Before this patch, refcount (managed in "needed") of the annotation
result is kept as 1, even if corresponding annotation result is
discarded from "hist", because it isn't decreased and discarded.
In the history tree including merging revision, the most recent common
ancestor of merged revisions is scanned twice. Refcount of such
ancestor never becomes 0, because refcount is started from 1 at the
second scanning.
This prevents annotation results of merging revision in "hist" from
being discarded, and decreases memory efficiency.
This patch discards refcount of the annotation result, when the
corresponding annotation is discarded from "hist".
Before this patch, refcount (managed in "needed") of parents of each
revisions in "visit" is increased, only when parent is not annotated
yet (examined by "p not in hist").
But this causes less refcount of the revision like "A" in the tree
below ("A" is assumed as the second parent of "C"):
A --- B --- C
\ /
\-----/
Steps of annotation for "C" in this case are shown below:
1. for "C"
1.1 increase refcount of "B"
1.2 increase refcount of "A" (=> 1)
1.3 defer annotation for "C"
2. for "A"
2.1 annotate for "A" (=> put result into "hist[A]")
2.2 clear "pcache[A]" ("pcache[A] = []")
3. for "B"
3.1 not increase refcount of "A", because "A not in hist" is False
3.2 annotate for "B"
3.3 decrease refcount of "A" (=> 0)
3.4 delete "hist[A]", even though "A" is still needed by "C"
3.5 clear "pcache[B]"
4. for "C", again
4.1 not increase refcount of "B", because "B not in hist" is False
4.2 increase refcount of "A" (=> 1)
4.3 defer annotation for "C"
5. for "A", again
5.1 annotate for "A" (=> put result into "hist[A]", again)
5.2 clear "pcache[A]"
6. for "C", once again
6.1 not increase refcount of "B", because "B not in hist" is False
6.2 not increase refcount of "A", because "A not in hist" is False
6.3 annotate for "C"
6.4 decrease refcount of "A", and delete "hist[A]"
6.5 decrease refcount of "B", and delete "hist[B]"
6.6 clear "pcache[C]"
At step (5.1), annotation for "A" mis-recognizes that all lines are
created at "A", because "pcache[A]" already cleared at step (2.2)
prevents from scanning ancestors of "A".
So, annotation for "C" or its descendants loses information about "A"
or its ancestors.
The root cause of this problem is that refcount of "A" is decreased at
step (3.3), even though it isn't increased at step (3.1).
To increase refcount correctly, this patch increases refcount of each
parents of each revisions:
- regardless of "p not in hist" or not, and
- only once for each revisions in "visit" (by "not pcached")
In fact, this problem should occur only on legacy repositories in
which a filelog includes the merging between the revision and its
ancestor (as the second parent), because:
- tree is scanned in depth-first
without such merging, revisions in "visit" refer different
revisions as parent each other
- recent Mercurial doesn't allow such merging
changelog and manifest can include such merging someway, but
filelogs can't, because "localrepository._filecommit()" converts
such merging request to linear history.
This patch tests merging cases below: these cases are from filelog of
"mercurial/commands.py" in the repository of Mercurial itself.
- both parents are same
10 --- 11 --- 12
\_/
filelogrev: changesetid:
10 526aca6bcb38
11 05098100ff44
12 2d4f4cfa81d6
- the second parent is also ancestor of the first one
37 --- 38 --- 39 --- 40
\________/
filelogrev: changesetid:
37 033dc4170fe6
38 5ff1a23ce38c
39 661a47367859
40 a2ba99fd026f
Before this patch, annotation is re-calculated even if it is already
calculated. This may cause unexpected annotation, because already
cleared "pcache" ("pcache[f] = []") prevents from scanning ancestors.
This patch reuses already calculated annotation if it is available.
In fact, "reusable" situation should be seen only on legacy
repositories in which a filelog include the merging between the
revision and its ancestor, because:
- tree is scanned in depth-first
without such merging, annotation result should be released soon
- recent Mercurial doesn't allow such merging
changelog and manifest can include such merging someway, but
filelogs can't, because "localrepository._filecommit()" converts
such merging request to linear history.
Instead of walking all the way to the root of the DAG, we generate
a set of candidate GCA revs, then figure out which ones will win
the race to the root (usually without needing to traverse all the
way to the root).
In the common case of nodes that are close to each other in both
revision number and topology, this is usually a big win: it makes
"hg --time debugancestors" up to 9 times faster than the more general
ancestor function when measured on heads of the linux-2.6 hg repo.
Victory is not assured, however. The older function can still win
by a large margin if one node is much closer to the root than the
other, or by a much smaller amount if one is an ancestor of the
other.
For now, we've also got a small paranoid harness function that calls
both ancestor functions on every input and ensures that they give
equivalent answers.
Even without the checker function, the old ancestor function needs
to stay alive for the time being, as its generality is used by
context.filectx.merge.
If exception is error.LookupError and running in i18n environment,
below condition is always true.
Because msg is translated and dosen't contain 'manifest'.
if util.safehasattr(err, 'name') and 'manifest' not in msg:
This patch creates a new exception class and uses it instead of
string match.
We can not use `len(repo,changelog)`, it may be a filtered revision. We now use
`repo,changelog.tip()` to fetch this information.
The `tip` command is also fixed and tested
Thanks goes to Idan Kamara for the initial report.
This pulls some of the logic for the cleanup that needs to happen
after a commit has been made otu of localrepo.commit and into
workingctx. This is part of a larger refactoring effort that will
eventually allow us to perform some types of merges in-memory.
Now that changelog filtering is in place, it's become evident that
naming the filters according to the set of revs _not_ included in the
filtered changelog is confusing. This is especially evident in the
collaborative branch cache scheme.
This changes the names of the filters to reflect the revs that _are_
included:
hidden -> visible
unserved -> served
mutable -> immutable
impactable -> base
repoview.filteredrevs is renamed to filterrevs, so that callers read a
bit more sensibly, e.g.:
filterrevs('visible') # filter revs according to what's visible
Add sorted() in places found by testing with PYTHONHASHSEED=random and code
inspection.
An alternative to sprinkling sorted() all over would be to change substate to a
custom dict with sorted iterators...
On `filectx`, linkrev may point to any revision in the repository. When the
repository is filtered this may lead to `filectx` trying to build `changectx`
for filtered revision. In such case we fallback to creating `changectx` on the
unfiltered version of the reposition. This fallback should not be an issue
because `changectx` from `filectx` are not used in complex operation that
care about filtering. It is complicated to work around the issue in a
clearer way as code raising such `filectx` rarely have access to the
repository directly.
Linkrevs create a lot of issue with filtering. It is stored in revlog entry at
creation time and never changed. Nothing prevent the changeset revision pointed
to become filtered. Several bogus behavior emerge from such situation. Those
bugs are complex to solve and not part of the current effort to install
filtering. This changeset is simple hack that prevent plain crash in favor on
minor misbehavior without visible effect.
This "hack" is longly documented in to code itself to help people that would
look at it in the future.
Currently the code path of `changectx(filteredrepo, rev)` call
`filteredrepo.changelog.node(rev)`. When `rev` is filtered this raise an
unhandled `IndexError`. This case now raise a `RepoLookupError` as other
error case do.
The same we have `unstable` and `bumped`. Convenient method to access troubles
information in general may land later.
This get actual use and testing in the next changesets.
During changectx __init__ the dirstate's parents MAY be checked. If
the repo is filtered, this check will complain "working directory has
unknown parents" even if the parents are perfectly known.
This may happen when the repo is used for serving and the dirstate has
parents that are secret, as those secret changesets will be filtered.
The old name was not very good for two reasons:
- caller does not care about "cache",
- set of revision returned may not be obsolete at all.
The new name was suggested by Kevin Bullock.