Encoding whether or not a part is mandatory in the capitalization of the
parttype is unintuitive and error-prone. This sequence of patches separates
these concerns in the API to reduce programmer error and pave the way for
a potential change in how this information is transmitted over the wire.
This patch separates the two pieces of information when reading the part header
so that it's unnecessary to know how they were combined during transmission.
filectxfn returns None for removed files, so we have to check for None
before computing the new file content hash for the manifest.
Includes a test that proves this works, by demonstrating that we can
show the diff of an amended commit in the committemplate.
This method has the same behavior as the 'os.path.split' function, but having
it in vfs will allow handling of tricky encoding situations in the future.
In the same patch, we replace the use of 'os.path.split' in the transaction code.
The vfs.join method only works for absolute paths. We need something
that works for relative paths too when transforming filenames. Since
os.path.join may misbehave in tricky encoding situations, encapsulate
the new join method in our vfs abstraction. The default implementation
remains os.path.join, but this opens the door to other VFSes doing
something more intelligent based on their needs.
In the same go, we replace the usage of 'os.path.join' in transaction code.
This no longer needs to be explicitly passed because the subrepo object tracks
the 'ui' reference since d4e8aa61370d. See the change to 'archive' for details
about the differences between the output level in the root repo and subrepo 'ui'
object.
The only use for 'ui' in revert is to emit status and warning messages, and to
check the verbose flag prior to printing the action to be performed on a file.
The local repo's ui was already being used to print a warning message in
wctx.forget() and for 'ui.slash' when walking dirstate in the repo.status()
call. Unlike other methods where the matcher is passed along and narrowed, a
new matcher is created in each repo, and therefore the bad() method already used
the local repo's ui.
This no longer needs to be explicitly passed because the subrepo object tracks
the 'ui' reference since d4e8aa61370d. See the change to 'archive' for details
about the differences between the output level in the root repo and subrepo 'ui'
object.
The only use for 'ui' in remove is to emit status and warning messages, and to
check the verbose flag prior to printing files to be removed. The bad() method
on the matcher still uses the root repo's ui, because narrowing the matcher
doesn't change the ui object.
The local repo's ui was already being used to print a warning message in
wctx.forget() and for 'ui.slash' when walking dirstate in the repo.status()
call.
This no longer needs to be explicitly passed because the subrepo object tracks
the 'ui' reference since d4e8aa61370d. See the change to 'archive' for details
about the differences between the output level in the root repo and subrepo 'ui'
object.
The only use for 'ui' in forget is to emit status and warning messages, and to
check the verbose flag prior to printing files to be forgotten. The bad()
method on the matcher still uses the root repo's ui, because narrowing the
matcher doesn't change the ui object.
The local repo's ui was already being used to print a warning message in
wctx.forget() and for 'ui.slash' when walking dirstate in the repo.status()
call.
This no longer needs to be explicitly passed because the subrepo object tracks
a 'ui' reference since d4e8aa61370d. See the change to 'archive' for details
about the differences between the output level in the root repo and subrepo 'ui'
object.
The only use for 'ui' in cat is to emit a status message when a subrepo is
missing. The bad() method on the matcher still uses the root repo's ui, because
narrowing the matcher doesn't change the ui object.
The current state of subrepo methods is to pass a 'ui' object to some methods,
which has the effect of overriding the subrepo configuration since it is the
root repo's 'ui' that is passed along as deep as there are subrepos. Other
subrepo method are *not* passed the root 'ui', and instead delegate to their
repo object's 'ui'. Even in the former case where the root 'ui' is available,
some methods are inconsistent in their use of both the root 'ui' and the local
repo's 'ui'. (Consider hg._incoming() uses the root 'ui' for path expansion
and some status messages, but also calls bundlerepo.getremotechanges(), which
eventually calls discovery.findcommonincoming(), which calls
setdiscovery.findcommonheads(), which calls status() on the local repo 'ui'.)
This inconsistency with respect to the configured output level is probably
always hidden, because --verbose, --debug and --quiet, along with their 'ui.xxx'
equivalents in the global and user level hgrc files are propagated from the
parent repo to the subrepo via 'baseui'. The 'ui.xxx' settings in the parent
repo hgrc file are not propagated, but that seems like an unusual thing to set
on a per repo config file. Any 'ui.xxx' options changed by --config are also
not propagated, because they are set on repo.ui by dispatch.py, not repo.baseui.
The goal here is to cleanup the subrepo methods by dropping the 'ui' parameter,
which in turn prevents mixing subtly different 'ui' instances on a given subrepo
level. Some methods use more than just the output level settings in 'ui' (add
for example ends up calling scmutil.checkportabilityalert() with both the root
and local repo's 'ui' at different points). This series just goes for the low
hanging fruit and switches methods that only use the output level.
If we really care about not letting a subrepo config override the root repo's
output level, we can propagate the verbose, debug and quiet settings to the
subrepo in the same way 'ui.commitsubrepos' is in hgsubrepo.__init__.
Archive only uses the 'ui' object to call its progress() method, and gitsubrepo
calls status().
Creation of the subrepo's '_repo' object creates a new 'ui' by combining the
parent repo's 'baseui' and reading in the subrepo's hgrc file. This simply
avoids 'self.ui' and 'self._repo.ui' pointing to different objects, which seems
like a potential source of bugs.
Git and Svn subrepos are unchanged, because they don't have their own ui, and
have always used their parent's for their configuration.
The localrepository class has a 'ui' member, so keeping the names the same will
allow for duck typing with subrepo instances when accessing 'ui'. Changing this
is easier than finding all of the localrepository instance uses and renaming
that to '_ui'.
Instead of just bootstrapping the algorithm with the first revision we
see, allow callers to pass revs that should be displayed first. All
branches are retained until we can display such revision.
Expected usage is to display the current working copy parent first.
The algorithm now works when some revisions are skipped. We now use "first
included ancestors" instead of just "parent" to link changesets with each other.
We are going to add an additional layer of indentation to support non-contiguous
revset. We do it in a pure code movement changeset to help the readability of
the next changeset.
We add an experimental config option to use the topological sorting. I first
tried to hook the 'groupbranchiter' function in the 'sort' revset but this was useless
because graphlog enforces revision number sorting :(
As the goal is to advance on the topological iteration logic, I see this
experimental option as a good way to move forward.
We have to use turn the iterator into a list because the graphlog is apparently
not ready for pure iterator input yet.
This changeset introduces a function to perform topological (one branch after
the other) iteration over a set of changesets. This first version has a lot of
limitations, but the approach should be flexible enough to allow many
improvements in the future. This changeset aims to set the first stone more
than providing a complete solution.
The algorithm does not need to know the whole set of nodes involved
before emitting revision. This makes it a good candidate for usage in place
like `hg log` or graphical tools that need a fast first result time.
Note that the exception-catching from the previous branchtip check is moved up
to catch exceptions from the try block surrounding the namespace lookup.
It turns out that maintaining a reference of any sort (even weak!) to the repo
when constructed doesn't work because we may at some point pass in a repoview
filtered by something other than what the initial repo was.
This marks the first use of abstracting our different types of named objects
(bookmarks, tags, branches, etc.) and upcoming patches will use this to
simplify logic.
This patch begins the work to provide a way to register a namespace to handle
'names'. Benefits of this would be,
- improved templating: This would provide {name} which could output any branch,
bookmark, tag, or any extension registered namespace all without having the
extension doing any extra work
- improved tab completion: Since this provides a single source of all 'names',
tab completion would not need to know of each namespace
- changeset lookup: Similar to before, a unified place to get all 'names' will
allow finding changesets without any extension code having to reimplement
this
Also, 6c946e059d3b has shown us that for internal code which expects a certain
type of method or behavior, we should provide an easy way for extensions to
check this behavior.
New code paths could fail because the old statichttprepo profile couldn't
handle the usual parameters.
Instead, reuse a more generic profile also used in readonlyvfs.
Previously, git subrepos did not support reverting.
This change adds basic support for reverting
when '--no-backup' is specified.
A warning is given (and the current state is kept)
when a revert is done without the '--no-backup' flag.
Without this patch, if the server sets preferuncompressed, there's no way for
clients to override that and force a non-streaming clone. With this patch, we
extend the meaning of --pull to also override preferuncompressed and force a
non-streaming clone.
When there are multiple common ancestors, we should check for case
collisions only on the resulting actions after bid merge has run. To
do this, move the code until after bid merge.
Move it past _resolvetrivial() too, since that might update
actions. If the remote changed a file and then reverted the change,
while the local side deleted the file and created a new file with a
name that case-folds like the old file, we should fail before this
patch but not after.
Although the changes to the actions caused by _forgetremoved() should
have no effect on case collisions, move it after that, too, so the
next person reading the code won't have to think about it.
Moving it past these blocks of code takes it to the end of
calculateupdates(), so let's even move it outside of the method, so we
also check collisions in actions produced by extensions overriding the
method.
By moving the cd/dc prompts out of calculateupdates(), we let
largefiles' overridecalculateupdates() so the unresolved values
(i.e. 'cd' or 'dc' rather than 'g', 'r', 'a' and missing). This allows
overridecalculateupdates() to ask the user whether to keep the normal
file or the largefile before the user gets the cd/dc prompt. Whichever
answer the user gives, we make overridecalculateupdates() replace 'cd'
or 'dc' action, saving the user one annoying (and less clear)
question.
Since addremove on the top of a directory tree will recursively handle sub
directories, it should be the same with deep subrepos, once the user has
explicitly asked to process a subrepo. This really only has an effect when a
path that is a subrepo (or is in a subrepo) is given, since -S causes all
subrepos to be processed already. An addremove without a path that crosses into
a subrepo, will still not enter any subrepos, per backward compatibility rules.
Git and svn subrepos are currently not supported. It doesn't look like git or
svn have these commands natively, so that's an area for a git or svn expert.
The recursive addremove operation occurs completely before the first subrepo is
committed. Only hg subrepos support the addremove operation at the moment- svn
and git subrepos will warn and abort the commit.
This will be used in the next patch to print a warning from the base class. It
seems better than having to explicitly pass it to a new method, since a lot of
existing methods also require it.
It looks like a bad path is the only mode of failure for addremove. This
warning is probably useful for the standalone command, but more important for
'commit -A'. That command doesn't currently abort if the addremove fails, but
it will be made to do so prior to adding subrepo support, since not all subrepos
will support addremove. We could just abort here, but it looks like addremove
has always silently ignored bad paths, except for the exit code.
We would eventually like to move the resolution of modify/delete and
delete/modify conflicts to the resolve phase. However, we don't want
to move the checks for identical content that were added in
99b29d2bd5ed (merge: before cd/dc prompt, check that changed side
really changed, 2014-12-01). Let's instead move these out to a new
_resolvetrivial() function that processes the actions from
manifestmerge() and replaces any false cd/dc conflicts. The function
will also provide a natural place for us to later add code for
resolving false 'm' conflicts.
As preparation for making 'dr' and 'rd' actions no longer actions,
move the reporting from applyupdates() to its caller update(). This
way we won't have to pass additonal arguments to applyupdates() when
they are no longer actions. Also, the warnings are equally unrelated
to applyupdates() as they are to recordupdates(), as they don't result
in any changes to either the working copy or the dirstate.
See earlier patch for additional motivation.
It is easier to reason about certain algorithms in terms of a
file->action mapping than the current action->list-of-files. Bid merge
is already written this way (but with a list of actions per file), and
largefiles' overridecalculateupdates() will also benefit. However,
that requires us to have at most one action per file. That requirement
is currently violated by 'dr' (divergent rename) and 'rd' (rename and
delete) actions, which can exist for the same file as some other
action.
These actions are only used for displaying warnings to the user; they
don't change anything in the working copy or the dirstate. In this
way, they are similar to the 'k' (keep) action. However, they are even
less action-like than 'k' is: 'k' at least describes what to do with
the file ("do nothing"), while 'dr' and 'rd' or only annotations for
files for which there may exist other, "real" actions.
As a first step towards separating these acitons out, stop including
them in the progress output, just like we already exclude the 'k'
action.
So far, git subrepositories were silently ignored for diffs.
This patch adds support for git subrepositories,
with the remark that --include and --exclude are not supported.
If --include or --exclude are used, the subrepo is ignored.
Using 'addfinalize' to generate 'fncache' means that no pending version of the
file will be generated for the hooks. We would have to use the
'addfilegenerator' method to get such result. However the 'fncachevfs' (who
decide that a write is necessary) have no access to the transaction to register
such file generation at add time. Having the transaction accessible to the 'vfs'
is too much trouble for no benefit. This outdated 'fncache' file at hook time is
not expected to be an issue.
The previous move from 'onclose' to 'addfinalize' had no impact on this timing.
I'm documenting it now because I looked at it.
It was just showing a status message with the internal revision number.
Instead, show a warning like
note: graft of 27:3aaa8b6725f0 "28" created no changes to commit
(message tweaked in-flight by mpm)
Show status messages with first line of commit description and names, like
grafting 12:2647734878ef "fork" (tip)
This gives more context for the user when resolving conflicts.
It doesn't seem to be a common idiom for repo instances, but the status() method
is replaced in largefiles' purge() override. Since __setattr__ is implemented
in repoview to setattr() on the unfiltered repo, the replacement method wouldn't
get called unless it was invoked with the unfiltered repo, because the filtered
repo remains unchanged.
Since this doesn't seem to be commonly used, I didn't bother to filter out
methods that perhaps shouldn't be replaced, such as changelog().
We have two different types of node type (sha1 and sha256, only sha1 is used
now) and therefor different sizes for them. We now compute the value once
instead of redoing the computation every loop. This has no visible performance
impact.
Python garbage collection is triggered by container creation. So code that
creates a lot of tuples tends to trigger GC a lot. We disable the gc during
obsolescence marker parsing and associated initialization. This provides an
interesting speedup (25%).
Load marker function on my 58758 markers repo:
before: 0.468247 seconds
after: 0.344362 seconds
The benefit is a bit less visible overall. With python2.6 on my system I see:
after: 0.60
before: 0.53
The difference is probably explained by the delaying of a costly GC. (but there
is still a win). Marking involved tuples, lists and dicts as ignorable by the
garbage collector should give us more benefit. But this is another adventure.
Thanks goes to Siddharth Agarwal for the lead.
Garbage collection behave pathologically when creating a lot of containers. As
we do that more than once it become sensible to have a decorator for it. See
inline documentation for details.
Most merge action messages don't describe the action itself, they
describe the reason the action was taken. The only exeption is the 'k'
action, for which the message is just "keep" and instead there is a
code comment folling it that says "remote unchanged". Let's move that
comment into the merge action message.
This fixes the previously mentioned issue with 7d5fcea60c78, and undoes its
corresponding test change.
The test change demonstrates the correctness when a file is specified (i.e. the
glob is required on Windows because relative paths use '\' and absolute paths
use '/'). It is admittedly very subtle, but there will be a more robust test in
the addremove -S v3 series.
Several methods print files relative to the repo root, unless files are named on
the command line, in which case they are printed relative to cwd. Since the
check relies on the 'pats' parameter, which needs to be replaced by a matcher
when adding subrepo support, this logic gets folded into the matcher to tidy up
the callers.
Prior to 7d5fcea60c78, this style decision was based off of whether or not the
'pats' list was empty. That change altered the check to test match.anypats()
instead, in order to make paths printed consistent when -I/-X is specified.
That however, changed the style when a file is given to the command. So now we
test the pattern list to get the old behavior for files, as well as test -I/-X
to get the consistency for patterns.
When the local side has renamed a directory from a/ to b/ and added a
file b/c in it, and the remote side has added a file a/c, we end up
overwriting the local file b/c with the contents of remote file
a/c. Add a check for this case and use the merge ('m') action in this
case instead of the directory rename get ('dg') action.
When the remote side has renamed a directory from a/ to b/ and added a
file b/c in it, and the local side has added a file a/c, we end up
moving a/c to b/c without considering the remote version of b/c. Add a
check for this case and use the merge ('m') action in this case
instead of the directory rename ('dm') action.
There are three high-level cases that are of interest in
manifestmerge(): 1) The file exists on both sides, 2) The file exists
only on the local side, and 3) The file exists only on the remote
side. Let's make this clearer in the code.
The 'if f in copied' case will be broken up into the two applicable
branches in the next patch.
Currently, the revlog index C implementation assumes its node tree will be
initialized before a new element is inserted by revnum. For example, revlog.py
executes 'self.index.insert(-1, e)' in _addrevision(). This is only safe
because the node tree has been initialized by a "node in self.nodemap"
check made in addrevision().
(For context, this was discovered while developing an experimental revlog
mixin which stores "elided nodes" via a separate code path from
_addrevision(); that new code path segfaults without this patch.)
The order is determined by manifest.diff(), which currently is not
sorted. There are currently no tests for this, but we will soon add
some that would be flaky without this patch.
This patch series is intended to allow bundle2 push reply part handlers to
make changes to the local repository; it has been developed in parallel with
an extension that allows the server to rebase incoming changesets while applying
them.
This diff adds an experimental config option "bundle2.pushback" which provides
a transaction to the reply unbundler during a push operation. This behavior is
opt-in because of potential security issues: the response can contain any part
type that has a handler defined, allowing the server to make arbitrary changes
to the local repository.
This patch series is intended to allow bundle2 push reply part handlers to
make changes to the local repository; it has been developed in parallel with
an extension that allows the server to rebase incoming changesets while applying
them.
The default transaction getter for processbundle is a private function that
raises an exception; this diff lets calling code pass None as the transaction
getter to explicitly request this default behavior.
The next diff will check a config option to determine whether to provide a
transaction to the reply bundle processor. If one shouldn't be provided, the
code needs a way to specify that the default behavior should be used.
This patch series is intended to allow bundle2 push reply part handlers to
make changes to the local repository; it has been developed in parallel with
an extension that allows the server to rebase incoming changesets while applying
them.
Most pushes already open a transaction in order to sync phase information.
This diff replaces that transaction with one that spans the entire push
operation.
This transaction will be used in a later patch to guard repository changes
made during the reply handler.
This patch series is intended to allow bundle2 push reply part handlers to
make changes to the local repository; it has been developed in parallel with
an extension that allows the server to rebase incoming changesets while applying
them.
Aside from the transaction logic, the pulloperation class is used primarily as
a logic-free data structure for storing state information. This diff extracts
the transaction logic into its own class that can be shared with push
operations.
These aren't exactly format-breaking features -- just ones for which patches
applied to a repo will produce incorrect commits, In any case, some commands
like record and annotate only care about this feature.
Not all callers are interested in all diffopts -- for example, commands like
record (which use diff internally) break when diffopts like noprefix are
enabled. This function will allow us to add flags that callers can use to
enable only the features they're interested in.
For "hg addremove 'glob:*.py'", we print any paths added or removed as
relative to the current directory, but when "hg addremove -I
'glob:*.py'" is used, we use the absolute path (relative from the repo
root). It seems like they should be the same, so change it so we use
relative paths in both cases. Continue to use absolute paths when no
patterns are given.
Instead of using a file that we know is not in the common ancestor's
maniffest, let's use None. This is safe as the only place that cares
about the value (applyupdates) already checks if the item exists in
the ancestor.
We can further limit the scope of the 2-way merge case by breaking out
the case where the file was not created from scratch on both sides but
rather renamed in the same way (and is therefore a 3-way merge). This
involves copying some code, but it makes it clearer which case the
"Note:" in the code refers to.
When 'f' is not in 'ma', 'a' will be 'nullid' and all the if/elif
conditions that check whether some one nodeid is equal to 'a' will
fail, and the else-clause will instead apply. We can make that more
explicit by creating a separate 'm' action for the case where 'a' is
'nullid'. While it does mean copying some code, perhaps it makes it a
little clearer which codepaths are possible, and which cases the
"Note:" in the code refers to. It also lets us make the debug action
messages a little more specific.
The change in 02ecc94fb657 created a problem on Windows and OS X:
--- /usr/local/mercurial/tests/test-issue660.t
+++ /usr/local/mercurial/tests/test-issue660.t.err
@@ -47,6 +47,8 @@
Should succeed - shadow removed:
$ hg add b
+ adding b/b
+ b/b does not exist!
Prior to the failing 'hg add', the file 'b/b' was added and committed, then 'b'
was recursively deleted from the filesystem, file 'b' was created and the delete
was recorded with 'hg rm --after'. This add is attempting to record the
existence of file 'b'.
A filesystem that is not case sensitive prevents dirstate.walk() from skipping
its step 3, and step 3 has the effect of inserting removed files into the walk
list. The Linux code doesn't run through step 3, and didn't exhibit the
problem. It's not clear why a non case sensitive filesystem triggers step 3,
given that the path normalization occurs in step 2.
Prior to 02ecc94fb657, part of the check here was 'f not in repo.dirstate'
instead of 'f not in wctx'. Files in the 'r' state are filtered out of
context.__contains__() but not dirstate.__contains__(). Therefore the removed
file name wasn't added to the list of files to add when checking against
dirstate. That change was to allow removed files to be readded, but adding a
file that doesn't exist is nonsensical. If the user specifies a missing file,
it will be an exact match and will still fail.
Since 4a56fba99974 (merge: don't use unknown(), 2012-02-09), untracked
files are no longer included in the manifest diff, so there is no need
to check exclude them when renaming files for directory moves with the
'dm' action.
calculateupdates() happens before applyupdates(), so move it before in
the code. That also moves it close to manifestmerge(), which is a good
location as calculateupdates() is the only caller of manifestmerge().
In a mozilla repo with tip at bb3ff09f52fe,
hg update tip~1000 && time hg revert -nq -r tip .
displays ~4:20 minutes. With tip~100, it runs in ~11 s. With revision
100000, it did not finish in 12 minutes.
Revert calls dirstate.status() with a matcher that matches each file
in the target revision. The main problem [1] lies in
dirstate._walkexplicit(), which looks for matching deleted directories
by checking whether each path is prefix of any path in the
dirstate. With m files in the dirstate and n files in the target
revision that are not in the dirstate, this is clearly O(m*n). Let's
improve by keeping a lazily initialized set of all the directories in
the dirstate, so the time becomes O(m+n).
After this patch, the 4:20 minutes become 5.5 s, while for a single
missing path, it slows down from 1.092 s to 1.150 s (best of 4). The
>12 min case becomes 5.8 s.
[1] A narrower optimization would be to make revert take the fast
path for '.' and '--all'.
This patch also replaces "self._getstorehashcachepath" (building
absolute path up) by "self._getstorehashcachename" (building relative
path up), because "vfs.writelines" requires relative path.
This patch uses "False" as default value of "notindexed" argument,
even though "vfs.makedir()" uses "True" for it, because "os.mkdir()"
doesn't set "_FILE_ATTRIBUTE_NOT_CONTENT_INDEXED" attribute to newly
created directories.
This patch also replaces "self._getstorehashcachepath" (building
absolute path up) by "self._getstorehashcachename" (building relative
path up), because "vfs.tryreadlines" requires relative path.
This patch makes "_readstorehashcache()" return "[]" (returned by
"vfs.tryreadlines()"), when cache file doesn't exist, even though
"_readstorehashcache()" returned '' (empty string) in such case before
this patch.
"_readstorehashcache()" is invoked only by the code path below in
"_storeclean()":
for filehash in self._readstorehashcache(path):
if filehash != itercache.next():
clean = False
break
In this case, "[]" and '' don't differ from each other, because both
of them cause avoiding iteration of "for loop".
This patch allows "readlines" and "tryreadlines" to take "mode"
argument, because "subrepo" requires to read files not in "rb"
(binary, default for vfs) but in "r" (text) mode in subsequent patch.
This "vfs" object will be used by subsequent patches to handle cache
store hash files without direct file APIs.
This patch decorates "_cachestorehashvfs" with "@propertycache" to
delay vfs creation, because it is used only for cooperation with other
repositories.
In this patch, "/" is used as the path separator, even though
"self._repo.join" uses platform specific path separator (e.g. "\\" on
Windows). But it is reasonable enough, because "store" and other
management file handling already include such implementation, and they
work well.
"_calcfilehash" can be completely replaced by simple "vfs.tryread"
invocation.
def _calcfilehash(filename):
data = ''
if os.path.exists(filename):
fd = open(filename, 'rb')
data = fd.read()
fd.close()
return util.sha1(data).hexdigest()
Building absolute path "absname" up by "self._repo.join" for files in
"filelist" is avoided, because "vfs.tryread" does so internally.
Existance of specified "path" should be examined by "exists" via wvfs
of the parent repository, because the working directory of the parent
repository may be in UTF-8 mode. Wide API should be used via wvfs in
such case.
In this patch, "/" is used as the path separator, even though "path"
uses platform specific path separator (e.g. "\\" on Windows). But it
is reasonable enough, because "store" and other management file
handling already include such implementation, and they work well.
"util.makedirs" for the (sub-)repository root of "hgsubrepo" is also
executed in the constructor of "localrepository", if "create" is True
and ".hg" of it doesn't exist.
This patch avoids redundant "util.makedirs" invocation in the
constructor of "hgsubrepo".
manifestmerge() has a piece of code that's roughly:
if not force and different:
abort
else:
# if different: old untracked f may be overwritten and lost
...
The comment only talks about what happens when 'different' is true,
and in combination with the if-block above, that must mean that it is
only about what happens when 'force and different'. It seems quite
fine that files are overwritten when 'force' is true, so let's remove
the comment. As it stands, it can easily be interpreted as a TODO
(which is how I interpreted it at first).
Such file are generated with a .pending prefix. It is up to the reader to
implement the necessary logic for reading pending files.
We add a test to ensure pending files are properly cleaned-up in both success and
error cases.
This will allow us to generate temporary pending files. Files
generated with a suffix are assumed temporary and will be cleaned up
at the end of the transaction.
As far as I and the test suite can tell, the checks in manifestmerge()
already report the errors (whether or not --check is given), so we
don't need to call merge.checkunknown(). Since this is the last call
to the method, also remove the method.
We were definitely being suboptimal here: we were constructing two full sets,
one with the full set of common nodes (i.e. a graph traversal) and one with all
nodes. Then we subtract one set from the other. This whole process is
O(commits) and causes discovery to be significantly slower than it should be.
Instead, keep track of common incrementally and keep undecided as small as
possible.
This makes discovery massively faster on large repos: on one such repo, 'hg
debugdiscovery' over SSH with one commit missing on the client and five on the
server went from 4.5 seconds to 1.5. (An 'hg debugdiscovery' with no commits
missing on the client, i.e. connection startup time, was 1.2 seconds.)
This allows multiple efficient missing ancestor queries against the same set of
bases. In upcoming patches we'll also define ways to grow the set of bases.
The fact that the test output hasn't changed establishes this patch's
correctness.
Any revs that are filtered out are also in basesvisit, which means they
wouldn't be returned in the missing list anyway. There's no need to explore
such revs or their ancestors.
The 'if not revsvisit' check moves down because we can't call max() on an empty
set.
We only actually care about whether revsvisit is empty, so we can let
basesvisit grow to arbitrary size.
It turns out that this actually helps performance. For a large repo with
hundreds of thousands of commits, hg perfrevset 'only(0, tip)' (basically the
worst case, involving a full DAG traversal) goes from 1.63 seconds to 1.50. hg
perfrevset 'only(tip, 0)' remains unchanged at 1.98 seconds.