This patch marks the start of having memctx inherit from committablectx,
thereby making it a full-fledged context that will eventually grow the ability
to perform diffing and also merging.
In the refactoring of removing localrepo.status, e283486177bd, we accidentally
changed the return type from a tuple to a list. Philosophically, this is
incorrect so we explicitly return a tuple again.
The old code is unneeded now that basectx has the ability to calculate the
status between two context objects.
Now, we use the objected oriented pattern called 'super'.
A future patch will remove the old workingctx.status which caches the status of
the working directory, therefore we now cache this status in the poststatus
hook of committablectx.
Previously, workingctx had custom variables for the unknown, ignored, and clean
list of files of status. These then got moved to committablectx and, after the
refactoring of localrepo.status, are no longer needed. We, therefore, simplify
the whole mess.
As a bonus, we are able to remove the need for having 'assert'.
This is just a copy from localrepo.status and is a small step to removing that
method entirely. The prestatus hook is only called for changectx's, thereby
ensuring that the same behavior is guaranteed.
This patch encapsulate the logic for changing the match.bad function when
comparing against the working directory's parent. Future patches will remove
more of the 'if ... else' blocks in localrepo.status that test for this working
directory parent case.
This patch paves the way to allow a workingctx to override the match object
with a custom 'bad' method for cases where status is sent a directory pattern.
This patch maintains the fast path for workingctx which is to not build a
manifest if the working directory is being compared to its parent since, in
this case, we can just copy the parent manifest.
With this patch, we are one step closer to removing 'if ... else' logic in
localrepo.status for testing if the context is the working directory or
not. Future patches will replace those blocks of code with a call to the
context's _poststatus hook so that each context object will do the right thing
depending on the situation.
With this patch, we are one step closer to removing 'if ... else' logic in
localrepo.status for testing if the context is the working directory or
not. Future patches will replace those blocks of code with a call to the
context's _prestatus hook so that each context object will do the right thing
depending on the situation.
This method is a duplicate of localrepo.mfmatches and sets the stage for
factoring localrepo.status into a context method that will be customizable
based on inheritance and object type.
This patch is a step forward in getting rid of needing to check 'parentworking'
throughout the status method. Eventually, we will use the power of inheritance
to do the correct thing when comparing the working directory with its parent.
This method is mostly a copy from localrepo.status. The custom status method of
workingctx will eventually be absorbed by the refactoring of localrepo.status
to context.status but unfortunately we can't do it in one step.
This patch introduces "editor" argument to "memctx.__init__()", and
moves editor invocation from "makememctx()" to "memctx.__init__()", to
centralize editor invocation into "memctx" object creation.
This relocation is needed, because "makememctx()" requires the "store"
object providing "getfile()" to create "memfilectx" object, and this
prevents some code paths from using "makememctx()" instead of
"memctx.__init__()".
This patch also invokes "localrepository.savecommitmessage()", when
"editor" is specified explicitly, to centralize saving commit message
into "memctx" object creation: passing "cmdutil.commiteditor" as
"editor" can achieve both suppressing editor invocation and saving
into ".hg/last-message.txt" for non empty commit messages.
The current situation is a bit of a layering violation as
merge-specific knowledge is pushed down to lower layers and leaks
merge assumptions into other code paths.
Here, we simply silence the warning with a hack. Both the warning and
the hack will probably go away in the near future when bid merge is
made the default.
Multiple revisions can be specified in merge.preferancestor, separated by
whitespace. First match wins.
This makes it possible to overrule the default of picking the common ancestor
with the lowest hash value among the "best" (introduced in f19507e1bcf2).
This can for instance help with some merges where the 'wrong' ancestor is used.
There will thus be some overlap between this and the problems that can be
solved with a future 'consensus merge'.
Mercurial will show a note like
note: using 40663881a6dd as ancestor of 3b08d01b0ab5 and adfe50279922
alternatively, use --config merge.preferancestor=0f6b37dbe527
when the option is available, listing all the alternative ancestors.
Show a message like
note: using 0f6b37dbe527 as ancestor of adfe50279922 and cf89f02107e5
So far this is just a warning - there is nothing the user can do to select
another ancestor.
When running 'hg cat -r . <file>' it was doing an expensive ctx.walk(m) which
applied the regex to every file in the manifest.
This changes changectx.walk to iterate over just the files in the regex, if no
other patterns are specified. This cuts hg cat time by 50% in our repo and
probably benefits a few other commands as well.
The 'copies' method has no test coverage and calls copies.pathcopies with an
incorrect number of parameters and is thus (fortunately) not used. Kill it.
This patch gets stat object of target path by "vfs.lstat()", and
examines stat object to know the type of it. This follows the way in
"workingctx.add()".
This should be cheaper than original implementation invoking
"lexists()", "isfile()" and "islink()".
This patch also changes paths added to "rejected" list from full path
(referred by "p") to relative one (referred by "f"), when type of
target path is neither file nor symlink.
This change should be reasonable, because the path added to "rejected"
list is relative one, when "OSError" is raised at "lstat()".
Currently, we have basectx that serves as a common ancestor of all contexts. We
will now add a new class commitablectx that will inherit from basectx and will
serve as a common place for code that will be shared between mutable contexts,
e.g. workingctx and memctx.
We change the hardcoded 'filectx' to instead use type(self).__name__ so that objects that
inherit from basefilectx in the future will be able to use the same representation.
Similar to the refactoring of context, we split common logic from filectx to a
parent class called basefilectx that will be inherited by filectx,
workingfilectx, and memfilectx. This will allow a clear disinction of all the
file contexts:
- filectx: read-only access to a filerevision that is already present in the repo,
- workingfilectx: a filecontext that represents files from the working directory,
- memfilectx: a filecontext that represents files in-memory
The refactoring of all the context objects allows us to simply pass a basectx
to the __new__ constructor and have it return the same object without
allocating new memory.
We change the hardcoded 'changectx' to instead use type(self).__name__ so that
objects that inherit from basectx in the future will be able to use the same
representation.
At the moment, there is no simple way to check if an object is a context
because there is no common parent class. If there were, we could use
'isinstance' everywhere. Simply having memctx inherit from workingctx or
changectx would allow the use of 'isinstance' but that could lead to some
confusing situations of reading the code since we have three distinct concepts
of a context:
- changectx represents a changeset *already* in the repo, and is therefore immutable
- workingctx represents changes on disk in the working directory
- memctx represents changes solely in memory which may or may not be on disk
Therefore, I propose refactoring context.py to have all three contexts inherit
from a parent class 'basectx'.
The annotate algorithm used a custom parents() function to try to reuse
filectx and filelogs. I simplified it a bit to rely more heavily on the
self.parents() which makes it work well with alternative filectx
implementations. I tested performance on a file with 5000+ revisions
but no renames, and on a file with 500 revisions repeating a series of
4 edits+renames and saw zero performance hit. In fact, it was reliably a
couple milliseconds faster now.
Added the perfannotate command to contrib/perf.py for future use.
Previously we used 'if filelog:' to check if the filelog existed. If the
instance did exist, this pattern then calls len() on the filelog to see
if it is empty. I'm developing a filelog replacement that doesn't have
len() implemented, so it's better to do an explicit 'is not None' check
here instead.
Also change _changeid() to return the _changeid attribute if it has it.
Previously it would try to obtain it from the _changectx(), and if that
did not exist it would construct the _changectx() using the linkrev. In
the extension I'm working on, filectx's don't have easy access to linkrevs
so avoiding this when possible is better.
Before this patch, refcount (managed in "needed") of the annotation
result is kept as 1, even if corresponding annotation result is
discarded from "hist", because it isn't decreased and discarded.
In the history tree including merging revision, the most recent common
ancestor of merged revisions is scanned twice. Refcount of such
ancestor never becomes 0, because refcount is started from 1 at the
second scanning.
This prevents annotation results of merging revision in "hist" from
being discarded, and decreases memory efficiency.
This patch discards refcount of the annotation result, when the
corresponding annotation is discarded from "hist".
Before this patch, refcount (managed in "needed") of parents of each
revisions in "visit" is increased, only when parent is not annotated
yet (examined by "p not in hist").
But this causes less refcount of the revision like "A" in the tree
below ("A" is assumed as the second parent of "C"):
A --- B --- C
\ /
\-----/
Steps of annotation for "C" in this case are shown below:
1. for "C"
1.1 increase refcount of "B"
1.2 increase refcount of "A" (=> 1)
1.3 defer annotation for "C"
2. for "A"
2.1 annotate for "A" (=> put result into "hist[A]")
2.2 clear "pcache[A]" ("pcache[A] = []")
3. for "B"
3.1 not increase refcount of "A", because "A not in hist" is False
3.2 annotate for "B"
3.3 decrease refcount of "A" (=> 0)
3.4 delete "hist[A]", even though "A" is still needed by "C"
3.5 clear "pcache[B]"
4. for "C", again
4.1 not increase refcount of "B", because "B not in hist" is False
4.2 increase refcount of "A" (=> 1)
4.3 defer annotation for "C"
5. for "A", again
5.1 annotate for "A" (=> put result into "hist[A]", again)
5.2 clear "pcache[A]"
6. for "C", once again
6.1 not increase refcount of "B", because "B not in hist" is False
6.2 not increase refcount of "A", because "A not in hist" is False
6.3 annotate for "C"
6.4 decrease refcount of "A", and delete "hist[A]"
6.5 decrease refcount of "B", and delete "hist[B]"
6.6 clear "pcache[C]"
At step (5.1), annotation for "A" mis-recognizes that all lines are
created at "A", because "pcache[A]" already cleared at step (2.2)
prevents from scanning ancestors of "A".
So, annotation for "C" or its descendants loses information about "A"
or its ancestors.
The root cause of this problem is that refcount of "A" is decreased at
step (3.3), even though it isn't increased at step (3.1).
To increase refcount correctly, this patch increases refcount of each
parents of each revisions:
- regardless of "p not in hist" or not, and
- only once for each revisions in "visit" (by "not pcached")
In fact, this problem should occur only on legacy repositories in
which a filelog includes the merging between the revision and its
ancestor (as the second parent), because:
- tree is scanned in depth-first
without such merging, revisions in "visit" refer different
revisions as parent each other
- recent Mercurial doesn't allow such merging
changelog and manifest can include such merging someway, but
filelogs can't, because "localrepository._filecommit()" converts
such merging request to linear history.
This patch tests merging cases below: these cases are from filelog of
"mercurial/commands.py" in the repository of Mercurial itself.
- both parents are same
10 --- 11 --- 12
\_/
filelogrev: changesetid:
10 526aca6bcb38
11 05098100ff44
12 2d4f4cfa81d6
- the second parent is also ancestor of the first one
37 --- 38 --- 39 --- 40
\________/
filelogrev: changesetid:
37 033dc4170fe6
38 5ff1a23ce38c
39 661a47367859
40 a2ba99fd026f
Before this patch, annotation is re-calculated even if it is already
calculated. This may cause unexpected annotation, because already
cleared "pcache" ("pcache[f] = []") prevents from scanning ancestors.
This patch reuses already calculated annotation if it is available.
In fact, "reusable" situation should be seen only on legacy
repositories in which a filelog include the merging between the
revision and its ancestor, because:
- tree is scanned in depth-first
without such merging, annotation result should be released soon
- recent Mercurial doesn't allow such merging
changelog and manifest can include such merging someway, but
filelogs can't, because "localrepository._filecommit()" converts
such merging request to linear history.
Instead of walking all the way to the root of the DAG, we generate
a set of candidate GCA revs, then figure out which ones will win
the race to the root (usually without needing to traverse all the
way to the root).
In the common case of nodes that are close to each other in both
revision number and topology, this is usually a big win: it makes
"hg --time debugancestors" up to 9 times faster than the more general
ancestor function when measured on heads of the linux-2.6 hg repo.
Victory is not assured, however. The older function can still win
by a large margin if one node is much closer to the root than the
other, or by a much smaller amount if one is an ancestor of the
other.
For now, we've also got a small paranoid harness function that calls
both ancestor functions on every input and ensures that they give
equivalent answers.
Even without the checker function, the old ancestor function needs
to stay alive for the time being, as its generality is used by
context.filectx.merge.
If exception is error.LookupError and running in i18n environment,
below condition is always true.
Because msg is translated and dosen't contain 'manifest'.
if util.safehasattr(err, 'name') and 'manifest' not in msg:
This patch creates a new exception class and uses it instead of
string match.
We can not use `len(repo,changelog)`, it may be a filtered revision. We now use
`repo,changelog.tip()` to fetch this information.
The `tip` command is also fixed and tested
Thanks goes to Idan Kamara for the initial report.
This pulls some of the logic for the cleanup that needs to happen
after a commit has been made otu of localrepo.commit and into
workingctx. This is part of a larger refactoring effort that will
eventually allow us to perform some types of merges in-memory.
Now that changelog filtering is in place, it's become evident that
naming the filters according to the set of revs _not_ included in the
filtered changelog is confusing. This is especially evident in the
collaborative branch cache scheme.
This changes the names of the filters to reflect the revs that _are_
included:
hidden -> visible
unserved -> served
mutable -> immutable
impactable -> base
repoview.filteredrevs is renamed to filterrevs, so that callers read a
bit more sensibly, e.g.:
filterrevs('visible') # filter revs according to what's visible
Add sorted() in places found by testing with PYTHONHASHSEED=random and code
inspection.
An alternative to sprinkling sorted() all over would be to change substate to a
custom dict with sorted iterators...
On `filectx`, linkrev may point to any revision in the repository. When the
repository is filtered this may lead to `filectx` trying to build `changectx`
for filtered revision. In such case we fallback to creating `changectx` on the
unfiltered version of the reposition. This fallback should not be an issue
because `changectx` from `filectx` are not used in complex operation that
care about filtering. It is complicated to work around the issue in a
clearer way as code raising such `filectx` rarely have access to the
repository directly.
Linkrevs create a lot of issue with filtering. It is stored in revlog entry at
creation time and never changed. Nothing prevent the changeset revision pointed
to become filtered. Several bogus behavior emerge from such situation. Those
bugs are complex to solve and not part of the current effort to install
filtering. This changeset is simple hack that prevent plain crash in favor on
minor misbehavior without visible effect.
This "hack" is longly documented in to code itself to help people that would
look at it in the future.
Currently the code path of `changectx(filteredrepo, rev)` call
`filteredrepo.changelog.node(rev)`. When `rev` is filtered this raise an
unhandled `IndexError`. This case now raise a `RepoLookupError` as other
error case do.
The same we have `unstable` and `bumped`. Convenient method to access troubles
information in general may land later.
This get actual use and testing in the next changesets.
During changectx __init__ the dirstate's parents MAY be checked. If
the repo is filtered, this check will complain "working directory has
unknown parents" even if the parents are perfectly known.
This may happen when the repo is used for serving and the dirstate has
parents that are secret, as those secret changesets will be filtered.
The old name was not very good for two reasons:
- caller does not care about "cache",
- set of revision returned may not be obsolete at all.
The new name was suggested by Kevin Bullock.
This patch adds "descendant()", which uses "revlog.descendant()" for
descendant examination, to changectx.
This implementation is more efficient than "new in old.descendants()"
expression, because:
- "changectx.descendants()" creates temporary "changectx" objects,
but "revlog.descendant()" doesn't
"revlog.descendant()" checks only revision numbers of descendants.
- "revlog.descendant()" stops scanning, when scanning of all
revisions less than one of examination target is finished
this can avoid useless scanning in "not descendant" case.
This changeset introduces caches on the `obsstore` that keeps track of sets of
revisions meaningful for obsolescence related logics. For now they are:
- obsolete: changesets used as precursors (and not public),
- extinct: obsolete changesets with osbolete descendants only,
- unstable: non obsolete changesets with obsolete ancestors.
The cache is accessed using the `getobscache(repo, '<set-name>')` function which
builds the cache on demand. The `clearobscaches(repo)` function takes care of
clearing the caches if any.
Caches are cleared when one of these events happens:
- a new marker is added,
- a new changeset is added,
- some changesets are made public,
- some public changesets are demoted to draft or secret.
Declaration of more sets is made easy because we will have to handle at least
two other "troubles" (latecomer and conflicting).
Caches are now used by revset and changectx. It is usually not much more
expensive to compute the whole set than to check the property of a few elements.
The performance boost is welcome in case we apply obsolescence logic on a lot of
revisions. This makes the feature usable!
This set is always accessed through the repo for now. Having this set
carried by the changelog make it complicated to:
- initialize it, computing hidden set may involve revset call
- lazy compute it, (1) only the changelog can detect someone access it,
(2) only the repo have enought knowledge to compute it.
In later version I expect he changelog to apply filtering itself and the set to
be carried by changelog again.
`extinct` changesets are obsolete changesets with obsolete descendants only. They
are of no interest anymore and can be:
- exclude from exchange
- hidden to the user in most situation
- safely garbage collected
This changeset just allows mercurial to detect them.
The implementation is a bit naive, as for unstable changesets. We better use a
simple revset query and a cache, but simple version comes first.
An unstable changeset is a changeset *not* obsolete but with some obsolete
ancestors.
The current logic to decide if a changeset is unstable is naive and very
inefficient. A better solution is to compute the set of unstable changeset with
a simple revset and to cache the result. But this require cache invalidation
logic. Simpler version goes first.
An `obsolete` boolean property is added to changeset context. Function to get
obsolete marker object from a changeset context are added to the obsolete
module.
Accepting a variable number of arguments as the old API did is
deeply ugly, particularly as it means the API can't be extended
with new arguments. Partly as a result, we have at least three
different implementations of the same ancestors algorithm (!?).
Most callers were forced to call ancestors(*somelist), adding to
both inefficiency and ugliness.
The original motivation was changectx.phase() had special logic to
correctly lookup in repo._phaserev, including invalidating it when
necessary. And at other places, repo._phaserev was accessed directly.
This led to the discovery that phases state including _phaseroots,
_phaserev and _dirtyphase was manipulated in localrepository.py,
phases.py, repair.py, etc. phasecache helps encapsulating that.
This patch replaces all phase state in localrepo with phasecache and
adjust related code except for advance/retractboundary() in phases.
These still access to phasecache internals directly. This will be
addressed in a followup.
This fixes "hg qimport -r null". Previous versions used to:
- Traceback because null revision mutability was not defined
- Add an empty -1.diff patch to the series
The error message:
abort: revision -1 is not mutable
is symptomatic of a deeper problem in phase command revision handling. It could
be fixed easily in the command itself but I feel a better fix must be done in
phase API which raises the issue of phase updates atomicity: aborting in
phases.advanceboundary/retractboundary requires a better rollback behaviour to
avoid partial changes.
When _followfirst() revset was introduced it seemed to be the sole user of such
an argument, so filectx.ancestors() was duplicated and modified instead. It now
appears this argument could be used when computing the set of files to be
considered when --patch or --stat are passed along with --follow FILE.
this patch adds 'dirs()' to changectx/workingctx, which returns map of
all directories deduced from manifest, to examine whether specified
pattern is related to the context as directory or not quickly.
'workingctx.dirs()' uses 'dirstate.dirs()' rather than building
another copy of it.
This removes use of unknown files for building the synthetic working
directory manifest used by manifestmerge. Instead, we adopt the
strategy used by _checkunknown.
Side-effect: unknown files are no longer moved by remote directory
renames, and now are left alone like ignored files.
When support for handling explicit paths in subrepos was added to the forget
command (155b89136ae7), subrepo recursion wasn't taken into account. This
change fixes that by pulling the majority of the logic of commands.forget into
cmdutil.forget, which can then be called from both there and subrepo.forget.
If file data starts with '\1\n', it will be escaped in the revlog to
create an empty metadata block, thus adding four bytes to the size in
the revlog size index. There's no way to detect that this has happened
in filelog.size() faster than decompressing each revision [1].
For filectx.cmp(), we have the size of the file in the working directory
available. If it differs by exactly four bytes, it may be this case, so
do a full comparison.
[1]: http://markmail.org/message/5akdbmmqx7vq2fsg
Before this patch, Windows always did the wrong thing with exec bits
when committing a merge: consult the flags in first parent.
Now we manually recompute the result of merging flags at commit time,
which almost always does the right thing (except when there are
conflicts between symlink and exec flags).
To do this, we:
- pull flag synthesis out into its own function
- delay building this function unless it's needed
- add a merge case that compares flags in local and other against the ancestor
This has been tested in multiple ways on Linux:
- running the whole test suite with both old and new code in place,
checking for differences in each flags() result
- running the whole test suite while comparing real on-disk flags
against synthetic ones for merges
- test-issue1802 (from Martin Geisler) which disables exec bit
checking on Unix
splitblock() was added to handle blocks returned by bdiff.blocks() which differ
only by blank lines but are not made only of blank lines. I do not know exactly
how it could happen but mdiff.blocks() threshold behaviour makes me think it
can if those blocks are made of very popular lines mixed with popular blank
lines. If it is proven to be wrong, the function can be dropped.
The first implementation made annotate share diff configuration entries. But it
looks like users will user -w/b for annotate but not for diff, on both the
command line and hgweb. Since the latter cannot use command line entries, we
introduce a new [annotate] section duplicating the diff whitespace options.