This simplifies largefiles' overridecalculateupdates(), which no
longer has to do the conversion it started doing in 478d610ca1b0
(largefiles: rewrite merge code using dictionary with entry per file,
2014-12-09).
To keep this patch small, we'll leave the name 'actionbyfile' in
overrides.py. It will be renamed in the next patch.
By moving the conversion from the file->action dict after
_forgetremoved(), we make that method shorter by removing the need for
the confusing 'xactions' variable.
By moving the conversion from the file->action dict after the bid
merge code, bid merge can be simplified a little.
A few tests are affected by this change. Where we used to iterate over
the actions first in order of the action type ('g', 'r', etc.) [1], we
now iterate in order of filename. This difference affects the order of
debug log statements.
[1] And then in the non-deterministic order of files in the manifest
dictionary (the order returned from manifest.diff()).
In the same vein as 478d610ca1b0 (largefiles: rewrite merge code using
dictionary with entry per file, 2014-12-09), rewrite manifestmerge()
itself as dictionary with the filename as key. This will let us
simplify some of the other code in merge.py and eventually drop the
conversion in the largefiles code.
No difference in speed could be detected (well within the noise level
when run in Mozilla repo).
Monkey patching repoview does not really work and making it really work will
be really hard. So we better have it broken without complexity than broken with
extra complexity.
This patch makes bundrepo retract the phase boundary for new commits to 'draft'
status, which is consistent with the behavior of 'hg unbundle'. The old
behavior was for commits to appear with the same phase as their nearest
ancestor in the base repository.
This affects several classes of operation:
* Inspecting a bundle with the -B flag
* Treating a bundle file as a peer (old: everything public, new: everything draft)
* Incoming command (neither old or new behavior is sensible -- fixed in next patch)
This patch makes bundlerepo use a subclass of phasecache that will allow phase
boundaries to be moved around, but will never write them to the underlying
repository.
Previously these would be considered to be relative to the current working
directory. That behavior is both undocumented and doesn't really make sense.
There are two reasonable options for how to resolve relative paths:
- relative to the repo root
- relative to the config file
Resolving these files relative to the repo root matches existing behavior with
hooks. An earlier discussion about this is available at
http://mercurial.markmail.org/thread/tvu7yhzsiywgkjzl.
Thanks to Isaac Jurado <diptongo@gmail.com> for the initial patchset that
spurred the discussion.
workingctx.ancestors() was not returning the dirstate parents as part of the
result set. The only place this function is used is for copy detection when
committing a file, and that code already checks the parents manually, so this
change has no affect at the moment.
I found it while playing around with changing how copy detection works.
Mercurial backout command makes a commmit by default only when the backed out
revision is the parent of working directory and doesn't commit in any other
case.
The --commit option changes behaviour of backout to make a commit whenever
possible (i.e. there is no unresolved conflicts). This behaviour seems more
intuitive to many use (especially git users migrating to hg).
This patch adds the -B/--bookmarks option to the share command added by the
share extension. All it does for now is create a marker, 'bookmarks.shared',
that will be used by future code to implement the sharing functionality.
Now that we have the machinery of namespaces in-place, we use that instead of
hand-rolling our own template function.
Note, this can only be used for tags because both branches and bookmarks have
special case logic for 'default' and the current bookmark (which is something
outside the namespace api for now).
For any namespace, we generate a template keyword. For example, given a
namespace 'babar', we automatically have the ability to use it in a template:
hg log -r . -T '{babars % "King: {babar}\n"}'
Furthermore, we only generate this keyword for a namespace if one doesn't
already exist. This is necessary for 'branches' and 'bookmarks' since both of
those have concepts of 'current' (something outside the namespace api) and also
allows extensions to override default behavior if desired.
This marks our second feature of the namespace api: automatic template keyword.
This patch adds a method that takes in a namespace and uses the node-to-name
map to output the list of names.
This patch adds a node-to-name map property to the namespace. This is necessary
because we cannot simply invert the name-to-node map because we do not assume
the name-to-node map to be unique (for example, consider named branches: many
nodes have one branch name).
The node-to-name is helpful in log commands where we are already iterating over
a set of nodes and want to display some kind of naming information to the user.
This patch adds the public api for getting the template name of a namespace so
that the next patch can use it to generate a template keyword automatically.
The template name property will be used in upcoming patches to automatically
generate a template keyword. For example, given a namespace called 'babars', we
will automatically generate a template keyword 'babar' such that we can use it
in the following way,
$ hg log -r . -T '{babars % "King: {babar}\n"}'
Before this patch, "memctx._manifest" updates all entries in the
(parent) manifest. But this is inefficiency, because almost all files
may be clean in that context.
On the other hand, just updating entries for changed "files" specified
at construction causes unexpected abortion, when there is at least one
newly removed file (see issue4470 for detail).
To calculate manifest more efficiently, this patch replaces
"pman.iteritems()" for the loop by "self._status.modified" to avoid
updating entries for clean or removed files
Examination of removal is also omitted, because removed files aren't
treated in this loop (= "self[f]" returns not None always).
Before this patch, "memctx._manifest" tries to get (and use normally)
filectx also for newly-removed files, even though "memctx.filectx()"
returns None for such files.
To calculate manifest correctly even with newly-removed files, this
patch does:
- replace "man.iteritems()" for the loop by "self._status.modified"
to avoid accessing itself to newly removed files
this also reduces loop cost for large manifest.
- remove files in "self._status.removed" from the manifest
In this patch, amending is confirmed twice to examine both (1) newly
removed files and (2) ones already removed in amended revision.
Before this patch, "memctx._manifest" calculates the manifest
according to the 1st parent. This causes the disappearance
of newly added files from the manifest.
For example, if newly added files aren't listed up in manifest of
memctx, they aren't listed up in "added" field of "status" returned by
"ctx.status()", and "{diff()}" (= "patch.diff") in "committemplate"
shows nothing for them.
To calculate manifest including newly added files correctly, this
patch puts newly added files (= ones in "self._status.added") into the
manifest.
Some details of changes for "test-commit-amend.t" in this patch:
- "touch foo" is replaced by "echo foo > foo", because newly added
empty file can't be shown in "diff()" output without "diff.git"
configuration
- amending is confirmed twice to examine both (1) newly added files
and (2) ones already added in amended revision
Before this patch, "memctx._status" is initialized by "(files, [], [],
[], [], [], [])" and this causes "memctx.modified" to include not
only modified files but also added and removed ones incorrectly.
This patch adds "_status" method to calculate exact status being
committed according to "files" specified at construction time.
Exact "_status" is useful to share/reuse logic of committablectx.
This patch is also preparation for issues fixed by subsequent patches.
Some details of changes for tests in this patch:
- some filename lines are omitted in "test-convert-svn-encoding.t",
because they are correctly listed up as "removed" files
those lines are written out in "localrepository.commitctx" for
"modified" and "added" files by "ui.note".
- "| fixbundle" filterring in "test-histedit-fold.t" is omitted to
check lines including "added" correctly
"fixbundle" discards all lines including "added".
Encoding whether or not a part is mandatory in the capitalization of the
parttype is unintuitive and error-prone. This sequence of patches separates
these concerns in the API to reduce programmer error and pave the way for
a potential change in how this information is transmitted over the wire.
Since the parttype and mandatory bit are separated in bundle2.unbundlepart
(see previous patch), there is no longer a need to remove the mandatory bit
before working with the parttype.
Encoding whether or not a part is mandatory in the capitalization of the
parttype is unintuitive and error-prone. This sequence of patches separates
these concerns in the API to reduce programmer error and pave the way for
a potential change in how this information is transmitted over the wire.
This patch separates the two pieces of information when reading the part header
so that it's unnecessary to know how they were combined during transmission.
filectxfn returns None for removed files, so we have to check for None
before computing the new file content hash for the manifest.
Includes a test that proves this works, by demonstrating that we can
show the diff of an amended commit in the committemplate.
This method has the same behavior as the 'os.path.split' function, but having
it in vfs will allow handling of tricky encoding situations in the future.
In the same patch, we replace the use of 'os.path.split' in the transaction code.
The vfs.join method only works for absolute paths. We need something
that works for relative paths too when transforming filenames. Since
os.path.join may misbehave in tricky encoding situations, encapsulate
the new join method in our vfs abstraction. The default implementation
remains os.path.join, but this opens the door to other VFSes doing
something more intelligent based on their needs.
In the same go, we replace the usage of 'os.path.join' in transaction code.
This no longer needs to be explicitly passed because the subrepo object tracks
the 'ui' reference since d4e8aa61370d. See the change to 'archive' for details
about the differences between the output level in the root repo and subrepo 'ui'
object.
The only use for 'ui' in revert is to emit status and warning messages, and to
check the verbose flag prior to printing the action to be performed on a file.
The local repo's ui was already being used to print a warning message in
wctx.forget() and for 'ui.slash' when walking dirstate in the repo.status()
call. Unlike other methods where the matcher is passed along and narrowed, a
new matcher is created in each repo, and therefore the bad() method already used
the local repo's ui.
This no longer needs to be explicitly passed because the subrepo object tracks
the 'ui' reference since d4e8aa61370d. See the change to 'archive' for details
about the differences between the output level in the root repo and subrepo 'ui'
object.
The only use for 'ui' in remove is to emit status and warning messages, and to
check the verbose flag prior to printing files to be removed. The bad() method
on the matcher still uses the root repo's ui, because narrowing the matcher
doesn't change the ui object.
The local repo's ui was already being used to print a warning message in
wctx.forget() and for 'ui.slash' when walking dirstate in the repo.status()
call.
This no longer needs to be explicitly passed because the subrepo object tracks
the 'ui' reference since d4e8aa61370d. See the change to 'archive' for details
about the differences between the output level in the root repo and subrepo 'ui'
object.
The only use for 'ui' in forget is to emit status and warning messages, and to
check the verbose flag prior to printing files to be forgotten. The bad()
method on the matcher still uses the root repo's ui, because narrowing the
matcher doesn't change the ui object.
The local repo's ui was already being used to print a warning message in
wctx.forget() and for 'ui.slash' when walking dirstate in the repo.status()
call.
This no longer needs to be explicitly passed because the subrepo object tracks
a 'ui' reference since d4e8aa61370d. See the change to 'archive' for details
about the differences between the output level in the root repo and subrepo 'ui'
object.
The only use for 'ui' in cat is to emit a status message when a subrepo is
missing. The bad() method on the matcher still uses the root repo's ui, because
narrowing the matcher doesn't change the ui object.
The current state of subrepo methods is to pass a 'ui' object to some methods,
which has the effect of overriding the subrepo configuration since it is the
root repo's 'ui' that is passed along as deep as there are subrepos. Other
subrepo method are *not* passed the root 'ui', and instead delegate to their
repo object's 'ui'. Even in the former case where the root 'ui' is available,
some methods are inconsistent in their use of both the root 'ui' and the local
repo's 'ui'. (Consider hg._incoming() uses the root 'ui' for path expansion
and some status messages, but also calls bundlerepo.getremotechanges(), which
eventually calls discovery.findcommonincoming(), which calls
setdiscovery.findcommonheads(), which calls status() on the local repo 'ui'.)
This inconsistency with respect to the configured output level is probably
always hidden, because --verbose, --debug and --quiet, along with their 'ui.xxx'
equivalents in the global and user level hgrc files are propagated from the
parent repo to the subrepo via 'baseui'. The 'ui.xxx' settings in the parent
repo hgrc file are not propagated, but that seems like an unusual thing to set
on a per repo config file. Any 'ui.xxx' options changed by --config are also
not propagated, because they are set on repo.ui by dispatch.py, not repo.baseui.
The goal here is to cleanup the subrepo methods by dropping the 'ui' parameter,
which in turn prevents mixing subtly different 'ui' instances on a given subrepo
level. Some methods use more than just the output level settings in 'ui' (add
for example ends up calling scmutil.checkportabilityalert() with both the root
and local repo's 'ui' at different points). This series just goes for the low
hanging fruit and switches methods that only use the output level.
If we really care about not letting a subrepo config override the root repo's
output level, we can propagate the verbose, debug and quiet settings to the
subrepo in the same way 'ui.commitsubrepos' is in hgsubrepo.__init__.
Archive only uses the 'ui' object to call its progress() method, and gitsubrepo
calls status().
Creation of the subrepo's '_repo' object creates a new 'ui' by combining the
parent repo's 'baseui' and reading in the subrepo's hgrc file. This simply
avoids 'self.ui' and 'self._repo.ui' pointing to different objects, which seems
like a potential source of bugs.
Git and Svn subrepos are unchanged, because they don't have their own ui, and
have always used their parent's for their configuration.
The localrepository class has a 'ui' member, so keeping the names the same will
allow for duck typing with subrepo instances when accessing 'ui'. Changing this
is easier than finding all of the localrepository instance uses and renaming
that to '_ui'.
Instead of just bootstrapping the algorithm with the first revision we
see, allow callers to pass revs that should be displayed first. All
branches are retained until we can display such revision.
Expected usage is to display the current working copy parent first.
The algorithm now works when some revisions are skipped. We now use "first
included ancestors" instead of just "parent" to link changesets with each other.
We are going to add an additional layer of indentation to support non-contiguous
revset. We do it in a pure code movement changeset to help the readability of
the next changeset.
We add an experimental config option to use the topological sorting. I first
tried to hook the 'groupbranchiter' function in the 'sort' revset but this was useless
because graphlog enforces revision number sorting :(
As the goal is to advance on the topological iteration logic, I see this
experimental option as a good way to move forward.
We have to use turn the iterator into a list because the graphlog is apparently
not ready for pure iterator input yet.
This changeset introduces a function to perform topological (one branch after
the other) iteration over a set of changesets. This first version has a lot of
limitations, but the approach should be flexible enough to allow many
improvements in the future. This changeset aims to set the first stone more
than providing a complete solution.
The algorithm does not need to know the whole set of nodes involved
before emitting revision. This makes it a good candidate for usage in place
like `hg log` or graphical tools that need a fast first result time.
Note that the exception-catching from the previous branchtip check is moved up
to catch exceptions from the try block surrounding the namespace lookup.
It turns out that maintaining a reference of any sort (even weak!) to the repo
when constructed doesn't work because we may at some point pass in a repoview
filtered by something other than what the initial repo was.
This marks the first use of abstracting our different types of named objects
(bookmarks, tags, branches, etc.) and upcoming patches will use this to
simplify logic.
This patch begins the work to provide a way to register a namespace to handle
'names'. Benefits of this would be,
- improved templating: This would provide {name} which could output any branch,
bookmark, tag, or any extension registered namespace all without having the
extension doing any extra work
- improved tab completion: Since this provides a single source of all 'names',
tab completion would not need to know of each namespace
- changeset lookup: Similar to before, a unified place to get all 'names' will
allow finding changesets without any extension code having to reimplement
this
Also, 6c946e059d3b has shown us that for internal code which expects a certain
type of method or behavior, we should provide an easy way for extensions to
check this behavior.
New code paths could fail because the old statichttprepo profile couldn't
handle the usual parameters.
Instead, reuse a more generic profile also used in readonlyvfs.
Previously, git subrepos did not support reverting.
This change adds basic support for reverting
when '--no-backup' is specified.
A warning is given (and the current state is kept)
when a revert is done without the '--no-backup' flag.
Without this patch, if the server sets preferuncompressed, there's no way for
clients to override that and force a non-streaming clone. With this patch, we
extend the meaning of --pull to also override preferuncompressed and force a
non-streaming clone.
When there are multiple common ancestors, we should check for case
collisions only on the resulting actions after bid merge has run. To
do this, move the code until after bid merge.
Move it past _resolvetrivial() too, since that might update
actions. If the remote changed a file and then reverted the change,
while the local side deleted the file and created a new file with a
name that case-folds like the old file, we should fail before this
patch but not after.
Although the changes to the actions caused by _forgetremoved() should
have no effect on case collisions, move it after that, too, so the
next person reading the code won't have to think about it.
Moving it past these blocks of code takes it to the end of
calculateupdates(), so let's even move it outside of the method, so we
also check collisions in actions produced by extensions overriding the
method.
By moving the cd/dc prompts out of calculateupdates(), we let
largefiles' overridecalculateupdates() so the unresolved values
(i.e. 'cd' or 'dc' rather than 'g', 'r', 'a' and missing). This allows
overridecalculateupdates() to ask the user whether to keep the normal
file or the largefile before the user gets the cd/dc prompt. Whichever
answer the user gives, we make overridecalculateupdates() replace 'cd'
or 'dc' action, saving the user one annoying (and less clear)
question.
Since addremove on the top of a directory tree will recursively handle sub
directories, it should be the same with deep subrepos, once the user has
explicitly asked to process a subrepo. This really only has an effect when a
path that is a subrepo (or is in a subrepo) is given, since -S causes all
subrepos to be processed already. An addremove without a path that crosses into
a subrepo, will still not enter any subrepos, per backward compatibility rules.
Git and svn subrepos are currently not supported. It doesn't look like git or
svn have these commands natively, so that's an area for a git or svn expert.
The recursive addremove operation occurs completely before the first subrepo is
committed. Only hg subrepos support the addremove operation at the moment- svn
and git subrepos will warn and abort the commit.
This will be used in the next patch to print a warning from the base class. It
seems better than having to explicitly pass it to a new method, since a lot of
existing methods also require it.
It looks like a bad path is the only mode of failure for addremove. This
warning is probably useful for the standalone command, but more important for
'commit -A'. That command doesn't currently abort if the addremove fails, but
it will be made to do so prior to adding subrepo support, since not all subrepos
will support addremove. We could just abort here, but it looks like addremove
has always silently ignored bad paths, except for the exit code.
We would eventually like to move the resolution of modify/delete and
delete/modify conflicts to the resolve phase. However, we don't want
to move the checks for identical content that were added in
99b29d2bd5ed (merge: before cd/dc prompt, check that changed side
really changed, 2014-12-01). Let's instead move these out to a new
_resolvetrivial() function that processes the actions from
manifestmerge() and replaces any false cd/dc conflicts. The function
will also provide a natural place for us to later add code for
resolving false 'm' conflicts.
As preparation for making 'dr' and 'rd' actions no longer actions,
move the reporting from applyupdates() to its caller update(). This
way we won't have to pass additonal arguments to applyupdates() when
they are no longer actions. Also, the warnings are equally unrelated
to applyupdates() as they are to recordupdates(), as they don't result
in any changes to either the working copy or the dirstate.
See earlier patch for additional motivation.
It is easier to reason about certain algorithms in terms of a
file->action mapping than the current action->list-of-files. Bid merge
is already written this way (but with a list of actions per file), and
largefiles' overridecalculateupdates() will also benefit. However,
that requires us to have at most one action per file. That requirement
is currently violated by 'dr' (divergent rename) and 'rd' (rename and
delete) actions, which can exist for the same file as some other
action.
These actions are only used for displaying warnings to the user; they
don't change anything in the working copy or the dirstate. In this
way, they are similar to the 'k' (keep) action. However, they are even
less action-like than 'k' is: 'k' at least describes what to do with
the file ("do nothing"), while 'dr' and 'rd' or only annotations for
files for which there may exist other, "real" actions.
As a first step towards separating these acitons out, stop including
them in the progress output, just like we already exclude the 'k'
action.
So far, git subrepositories were silently ignored for diffs.
This patch adds support for git subrepositories,
with the remark that --include and --exclude are not supported.
If --include or --exclude are used, the subrepo is ignored.
Using 'addfinalize' to generate 'fncache' means that no pending version of the
file will be generated for the hooks. We would have to use the
'addfilegenerator' method to get such result. However the 'fncachevfs' (who
decide that a write is necessary) have no access to the transaction to register
such file generation at add time. Having the transaction accessible to the 'vfs'
is too much trouble for no benefit. This outdated 'fncache' file at hook time is
not expected to be an issue.
The previous move from 'onclose' to 'addfinalize' had no impact on this timing.
I'm documenting it now because I looked at it.
It was just showing a status message with the internal revision number.
Instead, show a warning like
note: graft of 27:3aaa8b6725f0 "28" created no changes to commit
(message tweaked in-flight by mpm)
Show status messages with first line of commit description and names, like
grafting 12:2647734878ef "fork" (tip)
This gives more context for the user when resolving conflicts.
It doesn't seem to be a common idiom for repo instances, but the status() method
is replaced in largefiles' purge() override. Since __setattr__ is implemented
in repoview to setattr() on the unfiltered repo, the replacement method wouldn't
get called unless it was invoked with the unfiltered repo, because the filtered
repo remains unchanged.
Since this doesn't seem to be commonly used, I didn't bother to filter out
methods that perhaps shouldn't be replaced, such as changelog().
We have two different types of node type (sha1 and sha256, only sha1 is used
now) and therefor different sizes for them. We now compute the value once
instead of redoing the computation every loop. This has no visible performance
impact.
Python garbage collection is triggered by container creation. So code that
creates a lot of tuples tends to trigger GC a lot. We disable the gc during
obsolescence marker parsing and associated initialization. This provides an
interesting speedup (25%).
Load marker function on my 58758 markers repo:
before: 0.468247 seconds
after: 0.344362 seconds
The benefit is a bit less visible overall. With python2.6 on my system I see:
after: 0.60
before: 0.53
The difference is probably explained by the delaying of a costly GC. (but there
is still a win). Marking involved tuples, lists and dicts as ignorable by the
garbage collector should give us more benefit. But this is another adventure.
Thanks goes to Siddharth Agarwal for the lead.
Garbage collection behave pathologically when creating a lot of containers. As
we do that more than once it become sensible to have a decorator for it. See
inline documentation for details.
Most merge action messages don't describe the action itself, they
describe the reason the action was taken. The only exeption is the 'k'
action, for which the message is just "keep" and instead there is a
code comment folling it that says "remote unchanged". Let's move that
comment into the merge action message.
This fixes the previously mentioned issue with 7d5fcea60c78, and undoes its
corresponding test change.
The test change demonstrates the correctness when a file is specified (i.e. the
glob is required on Windows because relative paths use '\' and absolute paths
use '/'). It is admittedly very subtle, but there will be a more robust test in
the addremove -S v3 series.
Several methods print files relative to the repo root, unless files are named on
the command line, in which case they are printed relative to cwd. Since the
check relies on the 'pats' parameter, which needs to be replaced by a matcher
when adding subrepo support, this logic gets folded into the matcher to tidy up
the callers.
Prior to 7d5fcea60c78, this style decision was based off of whether or not the
'pats' list was empty. That change altered the check to test match.anypats()
instead, in order to make paths printed consistent when -I/-X is specified.
That however, changed the style when a file is given to the command. So now we
test the pattern list to get the old behavior for files, as well as test -I/-X
to get the consistency for patterns.
When the local side has renamed a directory from a/ to b/ and added a
file b/c in it, and the remote side has added a file a/c, we end up
overwriting the local file b/c with the contents of remote file
a/c. Add a check for this case and use the merge ('m') action in this
case instead of the directory rename get ('dg') action.
When the remote side has renamed a directory from a/ to b/ and added a
file b/c in it, and the local side has added a file a/c, we end up
moving a/c to b/c without considering the remote version of b/c. Add a
check for this case and use the merge ('m') action in this case
instead of the directory rename ('dm') action.
There are three high-level cases that are of interest in
manifestmerge(): 1) The file exists on both sides, 2) The file exists
only on the local side, and 3) The file exists only on the remote
side. Let's make this clearer in the code.
The 'if f in copied' case will be broken up into the two applicable
branches in the next patch.
Currently, the revlog index C implementation assumes its node tree will be
initialized before a new element is inserted by revnum. For example, revlog.py
executes 'self.index.insert(-1, e)' in _addrevision(). This is only safe
because the node tree has been initialized by a "node in self.nodemap"
check made in addrevision().
(For context, this was discovered while developing an experimental revlog
mixin which stores "elided nodes" via a separate code path from
_addrevision(); that new code path segfaults without this patch.)
The order is determined by manifest.diff(), which currently is not
sorted. There are currently no tests for this, but we will soon add
some that would be flaky without this patch.
This patch series is intended to allow bundle2 push reply part handlers to
make changes to the local repository; it has been developed in parallel with
an extension that allows the server to rebase incoming changesets while applying
them.
This diff adds an experimental config option "bundle2.pushback" which provides
a transaction to the reply unbundler during a push operation. This behavior is
opt-in because of potential security issues: the response can contain any part
type that has a handler defined, allowing the server to make arbitrary changes
to the local repository.
This patch series is intended to allow bundle2 push reply part handlers to
make changes to the local repository; it has been developed in parallel with
an extension that allows the server to rebase incoming changesets while applying
them.
The default transaction getter for processbundle is a private function that
raises an exception; this diff lets calling code pass None as the transaction
getter to explicitly request this default behavior.
The next diff will check a config option to determine whether to provide a
transaction to the reply bundle processor. If one shouldn't be provided, the
code needs a way to specify that the default behavior should be used.
This patch series is intended to allow bundle2 push reply part handlers to
make changes to the local repository; it has been developed in parallel with
an extension that allows the server to rebase incoming changesets while applying
them.
Most pushes already open a transaction in order to sync phase information.
This diff replaces that transaction with one that spans the entire push
operation.
This transaction will be used in a later patch to guard repository changes
made during the reply handler.
This patch series is intended to allow bundle2 push reply part handlers to
make changes to the local repository; it has been developed in parallel with
an extension that allows the server to rebase incoming changesets while applying
them.
Aside from the transaction logic, the pulloperation class is used primarily as
a logic-free data structure for storing state information. This diff extracts
the transaction logic into its own class that can be shared with push
operations.
These aren't exactly format-breaking features -- just ones for which patches
applied to a repo will produce incorrect commits, In any case, some commands
like record and annotate only care about this feature.
Not all callers are interested in all diffopts -- for example, commands like
record (which use diff internally) break when diffopts like noprefix are
enabled. This function will allow us to add flags that callers can use to
enable only the features they're interested in.
For "hg addremove 'glob:*.py'", we print any paths added or removed as
relative to the current directory, but when "hg addremove -I
'glob:*.py'" is used, we use the absolute path (relative from the repo
root). It seems like they should be the same, so change it so we use
relative paths in both cases. Continue to use absolute paths when no
patterns are given.
Instead of using a file that we know is not in the common ancestor's
maniffest, let's use None. This is safe as the only place that cares
about the value (applyupdates) already checks if the item exists in
the ancestor.
We can further limit the scope of the 2-way merge case by breaking out
the case where the file was not created from scratch on both sides but
rather renamed in the same way (and is therefore a 3-way merge). This
involves copying some code, but it makes it clearer which case the
"Note:" in the code refers to.
When 'f' is not in 'ma', 'a' will be 'nullid' and all the if/elif
conditions that check whether some one nodeid is equal to 'a' will
fail, and the else-clause will instead apply. We can make that more
explicit by creating a separate 'm' action for the case where 'a' is
'nullid'. While it does mean copying some code, perhaps it makes it a
little clearer which codepaths are possible, and which cases the
"Note:" in the code refers to. It also lets us make the debug action
messages a little more specific.
The change in 02ecc94fb657 created a problem on Windows and OS X:
--- /usr/local/mercurial/tests/test-issue660.t
+++ /usr/local/mercurial/tests/test-issue660.t.err
@@ -47,6 +47,8 @@
Should succeed - shadow removed:
$ hg add b
+ adding b/b
+ b/b does not exist!
Prior to the failing 'hg add', the file 'b/b' was added and committed, then 'b'
was recursively deleted from the filesystem, file 'b' was created and the delete
was recorded with 'hg rm --after'. This add is attempting to record the
existence of file 'b'.
A filesystem that is not case sensitive prevents dirstate.walk() from skipping
its step 3, and step 3 has the effect of inserting removed files into the walk
list. The Linux code doesn't run through step 3, and didn't exhibit the
problem. It's not clear why a non case sensitive filesystem triggers step 3,
given that the path normalization occurs in step 2.
Prior to 02ecc94fb657, part of the check here was 'f not in repo.dirstate'
instead of 'f not in wctx'. Files in the 'r' state are filtered out of
context.__contains__() but not dirstate.__contains__(). Therefore the removed
file name wasn't added to the list of files to add when checking against
dirstate. That change was to allow removed files to be readded, but adding a
file that doesn't exist is nonsensical. If the user specifies a missing file,
it will be an exact match and will still fail.
Since 4a56fba99974 (merge: don't use unknown(), 2012-02-09), untracked
files are no longer included in the manifest diff, so there is no need
to check exclude them when renaming files for directory moves with the
'dm' action.
calculateupdates() happens before applyupdates(), so move it before in
the code. That also moves it close to manifestmerge(), which is a good
location as calculateupdates() is the only caller of manifestmerge().
In a mozilla repo with tip at bb3ff09f52fe,
hg update tip~1000 && time hg revert -nq -r tip .
displays ~4:20 minutes. With tip~100, it runs in ~11 s. With revision
100000, it did not finish in 12 minutes.
Revert calls dirstate.status() with a matcher that matches each file
in the target revision. The main problem [1] lies in
dirstate._walkexplicit(), which looks for matching deleted directories
by checking whether each path is prefix of any path in the
dirstate. With m files in the dirstate and n files in the target
revision that are not in the dirstate, this is clearly O(m*n). Let's
improve by keeping a lazily initialized set of all the directories in
the dirstate, so the time becomes O(m+n).
After this patch, the 4:20 minutes become 5.5 s, while for a single
missing path, it slows down from 1.092 s to 1.150 s (best of 4). The
>12 min case becomes 5.8 s.
[1] A narrower optimization would be to make revert take the fast
path for '.' and '--all'.
This patch also replaces "self._getstorehashcachepath" (building
absolute path up) by "self._getstorehashcachename" (building relative
path up), because "vfs.writelines" requires relative path.
This patch uses "False" as default value of "notindexed" argument,
even though "vfs.makedir()" uses "True" for it, because "os.mkdir()"
doesn't set "_FILE_ATTRIBUTE_NOT_CONTENT_INDEXED" attribute to newly
created directories.
This patch also replaces "self._getstorehashcachepath" (building
absolute path up) by "self._getstorehashcachename" (building relative
path up), because "vfs.tryreadlines" requires relative path.
This patch makes "_readstorehashcache()" return "[]" (returned by
"vfs.tryreadlines()"), when cache file doesn't exist, even though
"_readstorehashcache()" returned '' (empty string) in such case before
this patch.
"_readstorehashcache()" is invoked only by the code path below in
"_storeclean()":
for filehash in self._readstorehashcache(path):
if filehash != itercache.next():
clean = False
break
In this case, "[]" and '' don't differ from each other, because both
of them cause avoiding iteration of "for loop".
This patch allows "readlines" and "tryreadlines" to take "mode"
argument, because "subrepo" requires to read files not in "rb"
(binary, default for vfs) but in "r" (text) mode in subsequent patch.
This "vfs" object will be used by subsequent patches to handle cache
store hash files without direct file APIs.
This patch decorates "_cachestorehashvfs" with "@propertycache" to
delay vfs creation, because it is used only for cooperation with other
repositories.
In this patch, "/" is used as the path separator, even though
"self._repo.join" uses platform specific path separator (e.g. "\\" on
Windows). But it is reasonable enough, because "store" and other
management file handling already include such implementation, and they
work well.
"_calcfilehash" can be completely replaced by simple "vfs.tryread"
invocation.
def _calcfilehash(filename):
data = ''
if os.path.exists(filename):
fd = open(filename, 'rb')
data = fd.read()
fd.close()
return util.sha1(data).hexdigest()
Building absolute path "absname" up by "self._repo.join" for files in
"filelist" is avoided, because "vfs.tryread" does so internally.
Existance of specified "path" should be examined by "exists" via wvfs
of the parent repository, because the working directory of the parent
repository may be in UTF-8 mode. Wide API should be used via wvfs in
such case.
In this patch, "/" is used as the path separator, even though "path"
uses platform specific path separator (e.g. "\\" on Windows). But it
is reasonable enough, because "store" and other management file
handling already include such implementation, and they work well.
"util.makedirs" for the (sub-)repository root of "hgsubrepo" is also
executed in the constructor of "localrepository", if "create" is True
and ".hg" of it doesn't exist.
This patch avoids redundant "util.makedirs" invocation in the
constructor of "hgsubrepo".
manifestmerge() has a piece of code that's roughly:
if not force and different:
abort
else:
# if different: old untracked f may be overwritten and lost
...
The comment only talks about what happens when 'different' is true,
and in combination with the if-block above, that must mean that it is
only about what happens when 'force and different'. It seems quite
fine that files are overwritten when 'force' is true, so let's remove
the comment. As it stands, it can easily be interpreted as a TODO
(which is how I interpreted it at first).
Such file are generated with a .pending prefix. It is up to the reader to
implement the necessary logic for reading pending files.
We add a test to ensure pending files are properly cleaned-up in both success and
error cases.
This will allow us to generate temporary pending files. Files
generated with a suffix are assumed temporary and will be cleaned up
at the end of the transaction.
As far as I and the test suite can tell, the checks in manifestmerge()
already report the errors (whether or not --check is given), so we
don't need to call merge.checkunknown(). Since this is the last call
to the method, also remove the method.
We were definitely being suboptimal here: we were constructing two full sets,
one with the full set of common nodes (i.e. a graph traversal) and one with all
nodes. Then we subtract one set from the other. This whole process is
O(commits) and causes discovery to be significantly slower than it should be.
Instead, keep track of common incrementally and keep undecided as small as
possible.
This makes discovery massively faster on large repos: on one such repo, 'hg
debugdiscovery' over SSH with one commit missing on the client and five on the
server went from 4.5 seconds to 1.5. (An 'hg debugdiscovery' with no commits
missing on the client, i.e. connection startup time, was 1.2 seconds.)
This allows multiple efficient missing ancestor queries against the same set of
bases. In upcoming patches we'll also define ways to grow the set of bases.
The fact that the test output hasn't changed establishes this patch's
correctness.
Any revs that are filtered out are also in basesvisit, which means they
wouldn't be returned in the missing list anyway. There's no need to explore
such revs or their ancestors.
The 'if not revsvisit' check moves down because we can't call max() on an empty
set.
We only actually care about whether revsvisit is empty, so we can let
basesvisit grow to arbitrary size.
It turns out that this actually helps performance. For a large repo with
hundreds of thousands of commits, hg perfrevset 'only(0, tip)' (basically the
worst case, involving a full DAG traversal) goes from 1.63 seconds to 1.50. hg
perfrevset 'only(tip, 0)' remains unchanged at 1.98 seconds.
Previously, any files relative to the root of the repo that match the -I
patterns would be deleted, but the command exited with 1 after printing a
warning:
$ hg remove -S -I 're:.*.txt' .
removing sub1/sub2/folder/test.txt
removing sub1/sub2/test.txt
not removing .: no tracked files
Like 'forget', git and svn subrepos are currently not supported. Unfortunately
the name 'remove' is already used in the subrepo classes, so we break the
convention of naming the subrepo function after the command.
Because pipe-mode server uses stdio as IPC channel, other modules should not
touch stdio directly and use ui instead. However, this strategy is brittle
because several Python functions read and write stdio implicitly.
print 'hello' # should use ui.write()
# => ch = 'h', size = 1701604463 'ello', data = '\n'
This patch adds protection for such mistakes. Both stdio files and low-level
file descriptors are redirected to /dev/null while command server uses them.
The method has been called from commands.py since 8d9ca2ac2fe8
(update: just merge unknown file collisions, 2012-02-09), so drop the
underscore prefix that suggests that it's private.
We are still doing double backups, but now that we have proper
location handling this is less of an issue. Dropping this simplifies
the code before we add some pending-related logic.
This also ensures we actually test the new 'location' mechanism.
The current naming scheme ('journal.backups.<file>') resulted is bad directory
name when 'file' was in a subdirectory. We now extract the directory name and
create the backupfile within it.
We plan to use file in a subdirectory for cachefile.
We do not want to abort if anything wrong happen while handling a cache file.
Cache file have way to be invalidated and if old/bad version stay no
misbehavior will happen. Proper value will eventually be computed and the wrong
will be righten.
This changeset use the transaction reporter (usually writing on stderr) to write
details about failed cache handling. This will only apply to write operation
using a transaction. The usual update during read only operation will stay a
debug message.
I was on the way to bring these message back to debug level when I realised it
could be a feature. People with write access to the repository are likely to
have the power to fix error related to cache (and it is valuable to fix them).
So let the things as is for now.
The goal is to allow access to file outside ofthe store directory from the
transaction. The obvious target are the `bookmarks` file. But we can envision
usage for cache too.
We keep passing a main opener explicitly because a lot of code rely on this
default opener. The main opener (operating on store) is using an empty key ''.
We need to store new data to improve the current transaction logic:
- location: We want to generate and backup file outside of the 'store' (eg:
bookmarks, or various cache files). This requires knowing and preserving where
each file is located. The value of this new field is a string. It will be used
as a key for a vfs mapping.
- cache: We would like to handle cache file in the transaction code. This
Will help to have cache consistent with the repository state and avoid
performance issue on big repository like Mozilla. However, failure to handle
cache file should not result in a transaction failure. We add a new field that
carry this information. The value is boolean, A True value mean any error
while handling this file can be ignored.
Those two mechanisms are not implemented yet, but they are now persisted in the
on disk file. Support for new mechanisms is coming in later changeset.
We update the file format now and will introduce the new features in later
changeset. The format version is set to 0 until we actually support the new feature.
This will prevent misunderstanding between incomplete and final client.
Support for reading both version 1 and (future) version 2 could be achieved
(using default value when reading version 1) but has not been seen as necessary
for now.
This dumb cache works surprisingly well: on a repository with typical delta
chains ~50k in length, unbundling a linear series of 5000 revisions (changelogs
and manifests only) went from 60 seconds to 3.
Move the code in context._manifestmatches() into a new
manifest.matches(). It's a natural place for the code to live and it
allows other callers to easily use it. It should also make it easier
to optimize the new method in alternative implementations of the
manifest (same reasoning as with manifest.diff()).
We can just modify the status tuple we got from dirstate.status()
instead of deconstructing it and constructing a new instance, thereby
simplifying the code a little.
Letting _dirstatestatus() return an scmutil.status instance also means
that _buildstatus() will always return such an instance, so we can
remove the conversion from the call sites.
It makes no sense to request reverse status (i.e. changes from the
working copy to its parent) and then look at the deleted, unknown or
ignored fields. If you do, you would get the result from the forward
status (changes from parent to the working copy). Instead of giving a
nonsensical answer to a nonsensical question, it seems a little saner
to return empty lists. It might be best if we could prevent the caller
accessing these lists, but it's doubtful it's worth the trouble.
See previous patch descriptions for the motivation.
The tests reflect the current state of the world -- as we add support we'll see
changes in the test output.
In an upcoming patch we'll enable support as an option to 'hg diff' as well.
The tests reflect the current state of the world -- as we add support we'll see
changes in the test output.
Upcoming patches will add an option that will almost certainly break diff
output parsers when enabled. Add support for forcing an option to something in
plain mode, as a fallback. Options passed in via the CLI are not affected,
though -- it is assumed that any script passing the option in explicitly knows
what it is doing.
By popular demand, we introduce an option to disable the 'a/' and 'b/'
prefixes in diff output. This makes copying and pasting filenames from diff
output easier.
This option will be implemented and documented in upcoming patches. To ensure
that existing scripts that parse output don't break, we will ensure that this
prefix is disabled in plain mode. A straight 'hg export | hg import' without
HGPLAIN=1 will still be broken though, but there's little that can be done
about that.
During the transaction, files may be created to store or expose data
involved in the transaction (eg: changelog index data are written in
a 'changelog.i.a' for hooks). But we do not have an official way to
record such file creation and make sure they are cleaned up. The lack
of clean-up is currently okay because there is a single file involved
and a single producer/consumer.
However, as we want to expose more data (bookmarks, phases, obsmarker)
we need something more solid. The 'backupentries' mechanism could
handle that. Temporary files can be encoded as a backup of nothing
'('', <temporarypath>)'. We "need" to attach it to the same mechanism
as we use to be able to use temporary transaction files outside of
.'store/' and 'backupentries' is expected to gain such feature.
This changeset makes it clear that we should rename 'backupentries' to
something more generic.
This doesn't affect normal clones since they'd be bound by the CPU bound below
anyway -- it does, however, improve generaldelta clones significantly.
This also results in better deltaing for generaldelta clones -- in generaldelta
clones, we calculate deltas with respect to the closest base if it has a higher
revision number than either parent. If the base is on a significantly different
branch, this can result in pointlessly massive deltas. This reduces the number
of bases and hence the number of bad deltas.
Empirically, for a highly branchy repository, this resulted in an improvement
of around 15% to manifest size.
This is a very silly case and not particularly likely to happen in the wild,
but it turns out we can hit it in a couple of places. As we tune the storage
parameters we're likely to hit more such cases.
The affected test cases all have smaller revlogs now.
We are about to use the 'backupentry' mechanism to allow cleaning up
transaction-related temporary files (such as 'changelog.i.a'). We start
by extracting the entry registration into its own method for easy reuse.
At that point, I would like to rename the backup-file related variable to
something generic but I'm a bit short of ideas.
This mirrors the API for 'pending' and 'finalize' callbacks. I do not have
immediate usage planned for it, but I'm sure some callback will be happy to
access transaction related data.
The callback will likely need to perform some operation related to the
transaction (eg: registering file update). So we better pass the current
transaction as the callback argument. Otherwise callback that needs it has to
rely on horrible weak reference trick.
This allow already allow us to slay a wild weak reference usage.
The callback will likely need to perform some operation related to the
transaction (eg: backing files up). So we better pass the current transaction as
the callback argument. Otherwise callback that needs it has to rely on horrible
weak reference trick.
The first foreseen user of this is changelog._writepending. We would like it to
register the temporary file it create for cleanup purpose.
The case where a backup of a missing file was requested was previously
handled by the 'entries' list. As the 'backupentries' is about to gain
ability to backup files outside of '.hg/store', we want it to be able
to handle the missing file too.
Reminder: using 'addbackup' on a missing file means that such file needs to be
deleted if we rollback the transaction.
This change is intended to avoid future problem of data corruption under
command server. out=ui.fout is mandatory as long as command server uses
stdout as IPC channel.
Future patches will allow extensions to choose which order a namespace should
output in the log, so we add a way for sortdict to insert to a specific
location.
Future patches will start using sortdict for log operations where order is
important. Adding iteritems removes the headache of having to remember to use
items() if the object is a sortdict.
After running "hg forget README && hg addremove", README will still be
reported as removed, while "hg forget README && hg add README" adds it
back so it gets reported as clean. It seems like they should behave
the same. Furthermore, it seems like no files should remain untracked
after 'hg addremove && hg commit' (or 'hg commit -A'). For these
reasons, change the behavior of addremove so it does add forgotten
files back.
The problem is with scmutil._interestingfiles(), which reports the
file as removed, so scmutil.addremove() does not add it. Fix by
teaching _interestingfiles() to report forgotten files separately from
removed files and make addremove() add forgotten files back. However,
do not treat forgotten files as sources for rename detection. Note
that since removed and forgotten files are treated the same before
this change, forgotten files were considered sources for rename
detection.
Also update the other caller, marktouched(), in the same way as
addremove().
I accidentally did 'hg forget .' and tried to undo the operation with
'hg add .'. I expected the files to be reported as either modified or
clean, but they were still reported as removed. It turns out that
forgotten files are only added back if they are listed explicitly, as
shown by the following two invocations. This makes it hard to recover
from the mistake of forgetting a lot of files.
$ hg forget README && hg add README && hg status -A README
C README
$ hg forget README && hg add . && hg status -A README
R README
The problem lies in cmdutil.add(). That method checks that the file
isn't already tracked before adding it, but it does so by checking the
dirstate, which does have an entry for forgotten files (state 'r'). We
should instead be checking whether the file exists in the
workingctx. The workingctx is also what we later call add() on, and
that method takes care of transforming the add() into a normallookup()
on the dirstate.
Since we're changing repo.dirstate into wctx, let's also change
repo.walk into wctx.walk for consistency (repo.walk calls wctx.walk,
so we're simply inlining the call).
The current heuristic for deciding between storing delta and full texts
is based on ratio of (sizeofdeltas)/(sizeoffulltext).
In some cases (for example a manifest for ahuge repo) this approach
can result in extremely long delta chains (~30,000) which are very slow to
read. (In the case of a manifest ~500ms are added to every hg command because of that).
This commit introduces "revlog.maxchainlength" configuration variable that will
limit delta chain length.
The chain length was computed correctly only when generaldelta
feature was enabled. Now it's fixed.
When generaldelta is disabled the base revision in revlog index is not
the revision we have delta against - it's always previous revision.
Instead of incorrect chainbaseandlen in command.py we are now using two
single-responsibility functions in revlog.py:
- chainbase(rev)
- chainlen(rev)
Only chainlen(rev) was missing so it was written to mimic the way the
chain of deltas is actually found during file reconstruction.
The `startgroup` and `endgroup` methods are used in a very specific
context to wrap a very specific operation (revlog truncation). It does
not make sense to perform any other operations during such a "group"
(eg:file backup). There is currently no user of backupfile during a
"group" so we drop the group-specific code and restrict authorized
operations during "group".