We change the hardcoded 'changectx' to instead use type(self).__name__ so that
objects that inherit from basectx in the future will be able to use the same
representation.
At the moment, there is no simple way to check if an object is a context
because there is no common parent class. If there were, we could use
'isinstance' everywhere. Simply having memctx inherit from workingctx or
changectx would allow the use of 'isinstance' but that could lead to some
confusing situations of reading the code since we have three distinct concepts
of a context:
- changectx represents a changeset *already* in the repo, and is therefore immutable
- workingctx represents changes on disk in the working directory
- memctx represents changes solely in memory which may or may not be on disk
Therefore, I propose refactoring context.py to have all three contexts inherit
from a parent class 'basectx'.
The annotate algorithm used a custom parents() function to try to reuse
filectx and filelogs. I simplified it a bit to rely more heavily on the
self.parents() which makes it work well with alternative filectx
implementations. I tested performance on a file with 5000+ revisions
but no renames, and on a file with 500 revisions repeating a series of
4 edits+renames and saw zero performance hit. In fact, it was reliably a
couple milliseconds faster now.
Added the perfannotate command to contrib/perf.py for future use.
Previously we used 'if filelog:' to check if the filelog existed. If the
instance did exist, this pattern then calls len() on the filelog to see
if it is empty. I'm developing a filelog replacement that doesn't have
len() implemented, so it's better to do an explicit 'is not None' check
here instead.
Also change _changeid() to return the _changeid attribute if it has it.
Previously it would try to obtain it from the _changectx(), and if that
did not exist it would construct the _changectx() using the linkrev. In
the extension I'm working on, filectx's don't have easy access to linkrevs
so avoiding this when possible is better.
Before this patch, refcount (managed in "needed") of the annotation
result is kept as 1, even if corresponding annotation result is
discarded from "hist", because it isn't decreased and discarded.
In the history tree including merging revision, the most recent common
ancestor of merged revisions is scanned twice. Refcount of such
ancestor never becomes 0, because refcount is started from 1 at the
second scanning.
This prevents annotation results of merging revision in "hist" from
being discarded, and decreases memory efficiency.
This patch discards refcount of the annotation result, when the
corresponding annotation is discarded from "hist".
Before this patch, refcount (managed in "needed") of parents of each
revisions in "visit" is increased, only when parent is not annotated
yet (examined by "p not in hist").
But this causes less refcount of the revision like "A" in the tree
below ("A" is assumed as the second parent of "C"):
A --- B --- C
\ /
\-----/
Steps of annotation for "C" in this case are shown below:
1. for "C"
1.1 increase refcount of "B"
1.2 increase refcount of "A" (=> 1)
1.3 defer annotation for "C"
2. for "A"
2.1 annotate for "A" (=> put result into "hist[A]")
2.2 clear "pcache[A]" ("pcache[A] = []")
3. for "B"
3.1 not increase refcount of "A", because "A not in hist" is False
3.2 annotate for "B"
3.3 decrease refcount of "A" (=> 0)
3.4 delete "hist[A]", even though "A" is still needed by "C"
3.5 clear "pcache[B]"
4. for "C", again
4.1 not increase refcount of "B", because "B not in hist" is False
4.2 increase refcount of "A" (=> 1)
4.3 defer annotation for "C"
5. for "A", again
5.1 annotate for "A" (=> put result into "hist[A]", again)
5.2 clear "pcache[A]"
6. for "C", once again
6.1 not increase refcount of "B", because "B not in hist" is False
6.2 not increase refcount of "A", because "A not in hist" is False
6.3 annotate for "C"
6.4 decrease refcount of "A", and delete "hist[A]"
6.5 decrease refcount of "B", and delete "hist[B]"
6.6 clear "pcache[C]"
At step (5.1), annotation for "A" mis-recognizes that all lines are
created at "A", because "pcache[A]" already cleared at step (2.2)
prevents from scanning ancestors of "A".
So, annotation for "C" or its descendants loses information about "A"
or its ancestors.
The root cause of this problem is that refcount of "A" is decreased at
step (3.3), even though it isn't increased at step (3.1).
To increase refcount correctly, this patch increases refcount of each
parents of each revisions:
- regardless of "p not in hist" or not, and
- only once for each revisions in "visit" (by "not pcached")
In fact, this problem should occur only on legacy repositories in
which a filelog includes the merging between the revision and its
ancestor (as the second parent), because:
- tree is scanned in depth-first
without such merging, revisions in "visit" refer different
revisions as parent each other
- recent Mercurial doesn't allow such merging
changelog and manifest can include such merging someway, but
filelogs can't, because "localrepository._filecommit()" converts
such merging request to linear history.
This patch tests merging cases below: these cases are from filelog of
"mercurial/commands.py" in the repository of Mercurial itself.
- both parents are same
10 --- 11 --- 12
\_/
filelogrev: changesetid:
10 526aca6bcb38
11 05098100ff44
12 2d4f4cfa81d6
- the second parent is also ancestor of the first one
37 --- 38 --- 39 --- 40
\________/
filelogrev: changesetid:
37 033dc4170fe6
38 5ff1a23ce38c
39 661a47367859
40 a2ba99fd026f
Before this patch, annotation is re-calculated even if it is already
calculated. This may cause unexpected annotation, because already
cleared "pcache" ("pcache[f] = []") prevents from scanning ancestors.
This patch reuses already calculated annotation if it is available.
In fact, "reusable" situation should be seen only on legacy
repositories in which a filelog include the merging between the
revision and its ancestor, because:
- tree is scanned in depth-first
without such merging, annotation result should be released soon
- recent Mercurial doesn't allow such merging
changelog and manifest can include such merging someway, but
filelogs can't, because "localrepository._filecommit()" converts
such merging request to linear history.
Instead of walking all the way to the root of the DAG, we generate
a set of candidate GCA revs, then figure out which ones will win
the race to the root (usually without needing to traverse all the
way to the root).
In the common case of nodes that are close to each other in both
revision number and topology, this is usually a big win: it makes
"hg --time debugancestors" up to 9 times faster than the more general
ancestor function when measured on heads of the linux-2.6 hg repo.
Victory is not assured, however. The older function can still win
by a large margin if one node is much closer to the root than the
other, or by a much smaller amount if one is an ancestor of the
other.
For now, we've also got a small paranoid harness function that calls
both ancestor functions on every input and ensures that they give
equivalent answers.
Even without the checker function, the old ancestor function needs
to stay alive for the time being, as its generality is used by
context.filectx.merge.
If exception is error.LookupError and running in i18n environment,
below condition is always true.
Because msg is translated and dosen't contain 'manifest'.
if util.safehasattr(err, 'name') and 'manifest' not in msg:
This patch creates a new exception class and uses it instead of
string match.
We can not use `len(repo,changelog)`, it may be a filtered revision. We now use
`repo,changelog.tip()` to fetch this information.
The `tip` command is also fixed and tested
Thanks goes to Idan Kamara for the initial report.
This pulls some of the logic for the cleanup that needs to happen
after a commit has been made otu of localrepo.commit and into
workingctx. This is part of a larger refactoring effort that will
eventually allow us to perform some types of merges in-memory.
Now that changelog filtering is in place, it's become evident that
naming the filters according to the set of revs _not_ included in the
filtered changelog is confusing. This is especially evident in the
collaborative branch cache scheme.
This changes the names of the filters to reflect the revs that _are_
included:
hidden -> visible
unserved -> served
mutable -> immutable
impactable -> base
repoview.filteredrevs is renamed to filterrevs, so that callers read a
bit more sensibly, e.g.:
filterrevs('visible') # filter revs according to what's visible
Add sorted() in places found by testing with PYTHONHASHSEED=random and code
inspection.
An alternative to sprinkling sorted() all over would be to change substate to a
custom dict with sorted iterators...
On `filectx`, linkrev may point to any revision in the repository. When the
repository is filtered this may lead to `filectx` trying to build `changectx`
for filtered revision. In such case we fallback to creating `changectx` on the
unfiltered version of the reposition. This fallback should not be an issue
because `changectx` from `filectx` are not used in complex operation that
care about filtering. It is complicated to work around the issue in a
clearer way as code raising such `filectx` rarely have access to the
repository directly.
Linkrevs create a lot of issue with filtering. It is stored in revlog entry at
creation time and never changed. Nothing prevent the changeset revision pointed
to become filtered. Several bogus behavior emerge from such situation. Those
bugs are complex to solve and not part of the current effort to install
filtering. This changeset is simple hack that prevent plain crash in favor on
minor misbehavior without visible effect.
This "hack" is longly documented in to code itself to help people that would
look at it in the future.
Currently the code path of `changectx(filteredrepo, rev)` call
`filteredrepo.changelog.node(rev)`. When `rev` is filtered this raise an
unhandled `IndexError`. This case now raise a `RepoLookupError` as other
error case do.
The same we have `unstable` and `bumped`. Convenient method to access troubles
information in general may land later.
This get actual use and testing in the next changesets.