Conversion of a merge starts with p1 and re-adds the files that were changed in
the merge or came unmodified from p2. Files that are unmodified from p1 will
thus not be touched and take no time. Files that are unmodified from p2 would be
retrieved and rehashed. They would end up getting the same hash as in p2 and end
up reusing the filelog entry and look like the p1 case ... but it was slow.
Instead, make getchanges also return 'files that are unmodified from p2' so the
sink can reuse the existing p2 entry instead of calling getfile.
Reuse of filelog entries can make a big difference when files are big and with
long revlong chains so they take time to retrieve and hash, or when using an
expensive custom getfile function (think
http://mercurial.selenic.com/wiki/ConvertExtension#Customization with a code
reformatter).
This in combination with changes to reuse filectx entries in
localrepo._filecommit make 'unchanged from p2' almost as fast as 'unchanged
from p1'.
This is so far only implemented for the combination of hg source and hg sink.
This is a refactoring/optimization. It is covered by existing tests and show no
changes - which is a good thing.
This function doesn't need access to any of the args or kwargs, so make the
monkeypatching more robust. (In upcoming patches we'll introduce another
argument to patch.diff, and this function would break if it weren't for this
patch.)
"working directory" is the standard term, we should use it consistently.
But I didn't touch the hint, "run 'hg update' to get a working copy", because
"get a working directory" sounds a bit odd.
The censor command is a core extension which can replace the contents of a
historical file revision with a censor "tombstone" which can be exchanged
with older clients in place of the real revision data. The command rewrites
the filelog by copying revision-by-revision.
Care must be taken to expand the fulltext of the children of the censored
revision before copying them to the new filelog; they might be stored as
deltas against the uncensored revision, and those deltas will be invalidated.
For more background on the censorship feature design, see:
http://mercurial.selenic.com/wiki/CensorPlan
For merges, we walk the files N-1 times, where N is the number of
parents. This means that for an octopus merge with 3 parents and 2
changed files, we actually fetch 6 files. This corrects the progress
output of the convert command when such commits are encountered.
Although Python supports `X = Y if COND else Z`, this was only
introduced in Python 2.5. Since we have to support Python 2.4, it was
a very common thing to write instead `X = COND and Y or Z`, which is a
bit obscure at a glance. It requires some intricate knowledge of
Python to understand how to parse these one-liners.
We change instead all of these one-liners to 4-liners. This was
executed with the following perlism:
find -name "*.py" -exec perl -pi -e 's,(\s*)([\.\w]+) = \(?(\S+)\s+and\s+(\S*)\)?\s+or\s+(\S*)$,$1if $3:\n$1 $2 = $4\n$1else:\n$1 $2 = $5,' {} \;
I tweaked the following cases from the automatic Perl output:
prev = (parents and parents[0]) or nullid
port = (use_ssl and 443 or 80)
cwd = (pats and repo.getcwd()) or ''
rename = fctx and webutil.renamelink(fctx) or []
ctx = fctx and fctx or ctx
self.base = (mapfile and os.path.dirname(mapfile)) or ''
I also added some newlines wherever they seemd appropriate for readability
There are probably a few ersatz ternary operators still in the code
somewhere, lurking away from the power of a simple regex.
After adding some sneaky debug printing[0], I determined that this
test flaked when a CVS commit containing two files starts too close to
the end of a second, thus putting file "a" in one second and "b/c" in
the following second. The secondary sort key meant that these changes
sorted in a different order when the timestamps were different than
they did when they matched. As far as I can tell, CVS walks through
the files in a stable order, so by sorting on the filenames in cvsps
we'll get stable output. It's fine for us to switch from sorting on
the branchpoint as a secondary key because this was already the point
when we didn't care, and we're just trying to break ties in a stable
way. It's unclear to be if having the branchpoint present matters
anymore, but it doesn't really hurt to leave it.
With this change in place, I was able to run test-convert-cvs over 650
times in a row without a failure. test-convert-cvcs-synthetic.t
appears to still be flaky, but I don't think it's *worse* than it was
before - just not better (I observed one flaky failure in 200 runs on
that test).
0: The helpful debug hack ended up being this, in case it's useful to
future flaky test assassins:
--- a/hgext/convert/cvsps.py
+++ b/hgext/convert/cvsps.py
@@ -854,6 +854,8 @@ def debugcvsps(ui, *args, **opts):
ui.write(('Branch: %s\n' % (cs.branch or 'HEAD')))
ui.write(('Tag%s: %s \n' % (['', 's'][len(cs.tags) > 1],
','.join(cs.tags) or '(none)')))
+ if cs.comment == 'ci1' and (cs.id == 6) == bool(cs.branchpoints):
+ ui.write('raw timestamp %r\n' % (cs.date,))
if cs.branchpoints:
ui.write(('Branchpoints: %s \n') %
', '.join(sorted(cs.branchpoints)))
Matt Harbison pointed out that my recent cdf4a99e1657 might cause
__contains__ to continously get replaced by another version that calls
itself, since the manifest instance returned by the super method is
always the same instance due to @propertycache. He also suggested
replacing the class instead, as is done with the context class in the
surrounding code. Do so.
We're soon going to add an alternative manifest class
(treemanifest). Rather than extending both those classes by
largesfiles versions, let's replace the method on the manifest
instances.
Rev cb4d72125aae adding a parameter here. This breaks third-party extensions
like crecord and also makes the issue fairly hard to fix on the extension's
side if it wants to retain compatibility across Mercurial versions -- in old
versions, the positional argument will be passed into the next unknown
argument, which is 'files'.
The patch also undoes a change to the record extension that is no longer
necessary.
In a previous commit we removed the extra histedit object instance being
constructed in --continue and --abort. The new --edit-todo missed this fix
though (which means the state object it produces doesn't have the locks on it).
It's not breaking anything now, but let's go ahead and clean that up before we
forget.
This fixes a mild case of cut-and-paste code regarding failing to set
terminal modes. This is evident in the win32 comment that is misplaced
for the terminfo mode since cset 1456f081d00e.
Instead, we refactor this C&P into a small local function.
It's pretty annoying to be getting this warning when already the
colour extension has no hope of working. If there isn't a human on the
other end to to see the colours, there probably isn't a human either
who cares about this warning. More likely, some script somewhere is
gonna get confused with the warning output.
Of course, if we still want to see the warning for some reason, we can
always set --config ui.interactive=True.
While using the record extension to select changes, the user couldn't see the
content of newly added files and had to select/reject them based on filename.
The test is changed accordingly in two places.
Before this change hg help histedit would use the default variable label:
--commands VALUE
...
-r --rev VALUE [+]
With this change the text will be in the usual help text style and a bit more
explanatory:
--commands FILE
...
-r --rev REV [+]
The rest of help messages for command arguments are simple phrases without any
ending punctuation, so having this text be a complete sentence didn't really
fit.
Previously, the source was silently skipped because the largefile was in the
list of changed files, but the standin was in the copies dictionary. The source
is only displayed if the changed file is a key in the copies dictionary.
It's probably possible to refactor so that the 'if m._cwd' check isn't
necessary, but the False case is the typical case (i.e. run from the root of the
repo), and simpler to read.
An exact path to a largefile from outside the repo was previously ignored.
match.rel('.hglf') will handle figuring out both the correct '../' length to the
standin directory if inside the repo, or path/to/repo from outside, at the cost
of a pconvert() to keep the patterns using '/' on Windows.
When logging '.hglf/foo', the pattern list was being transformed from
['.hglf/foo'] into ['.hglf/foo', '.hglf/.hglf/foo']. Aside from the
pathological case of somebody getting a directory named '.hglf' created inside
the standing directory, the old code shouldn't have had any bad effects.
(amended by mpm to sort patterns for test stability and not upset check-code)
Adding the standin to the patterns list was (possibly) harmless before, but was
wrong, because the pattern list was already updated above that code. Now that
patterns are handled, it was actually harmful. For example, in this test:
$ hg log -G glob:**another*
the adjusted pattern list would have been:
['glob:**another*', '.hglf/.', 'glob:.hglf/**another*']
which causes every largefile in the root to be matched.
I'm not sure why 'glob:a*' picks up the rename of a -> b commit in test-log.t,
but a simple 'a' doesn't. But it doesn't appear to be caused by the largefiles
extension.
Since many users are using terminals wider than 80 chars there should be an
option to have longer lines in histedit editor.
Even if the summary line is shorter than 80 chars after adding action line
prefixes (like "pick 7c2fd3b9020c") it doesn't fit there anymore. Setting
it to for example 110 would be a nice option to have.
Monotone used '1' for close while core Mercurial use 1. Now, for consistency,
use the same value everywhere. It will be stored as a string anyway and the
change will not make any real difference.
(The actual value of 'close' doesn't matter as long as extras has such a key.)
All current callers supply some sort of prefix, so the issue was hidden. But if
no parameter was specified, a crash occurred in the write() closure when
concatenating 'prefix' and 'name'.
When there isn't lfdirstate file in cases below, "openlfdirstate()"
call "scmutil.match()" indirectly to build lfdirstate up.
- subrepos disabling largefiles locally
- lfdirstate file is missed accidentally
This causes infinite recursive call of "openlfdirstate()" in
"overriderevert()" (introduced by 25febe9568dd), because
"openlfdirstate()" is invoked from the function overriding
"scmutil.match()" itself.
To avoid infinite recursive call of "openlfdirstate()" in
"overriderevert()" in such cases, this patch passes "create=False"
argument to "openlfdirstate()".
"create=False" forcibly makes "openlfdirstate()" avoid code path to
build lfdirstate up.
Even if largefiles extension is enabled in a repository, "repo"
object, which isn't "largefiles.reposetup()"-ed, is passed to
overridden functions in the cases below unexpectedly, because
extensions are enabled for each repositories strictly.
(1) clone without -U:
(2) pull with -U:
(3) pull with --rebase:
combination of "enabled@src", "disabled@dst" and
"not-required@src" cause this situation.
largefiles requirement
@src @dst @src result
-------- -------- --------------- --------------------
enabled disabled not-required aborted unexpectedly
required requirement error (intentional)
-------- -------- --------------- --------------------
enabled enabled * success
-------- -------- --------------- --------------------
disabled enabled * success (only for "pull")
-------- -------- --------------- --------------------
disabled disabled not-required success
required requirement error (intentional)
-------- -------- --------------- --------------------
(4) update/revert with a subrepo disabling largefiles
In these cases, overridden functions cause accessing to largefiles
specific fields of not "largefiles.reposetup()"-ed "repo" object, and
execution is aborted.
- (1), (2), (4) cause accessing to "_lfstatuswriters" in
"getstatuswriter()" invoked via "updatelfiles()"
- (3) causes accessing to "_lfcommithooks" in "overriderebase()"
For safe accessing to these fields, this patch examines whether passed
"repo" object is "largefiles.reposetup()"-ed or not before accessing
to them.
This patch chooses examining existence of newly introduced
"_largefilesenabled" instead of "_lfcommithooks" and
"_lfstatuswriters" directly, because the former is better name for the
generic "largefiles is enabled in this repo" mark than the latter.
In the future, all other overridden functions should avoid largefiles
specific processing for efficiency, and "_largefilesenabled" is better
also for such purpose.
BTW, "lfstatus" can't be used for such purpose, because some code
paths set it forcibly regardless of existence of it in specified
"repo" object.
The previous code was adding standin files to the matcher's file list when
neither the standin file nor the original existed in the context. Somehow, this
was confusing the logging code into behaving differently from when the extension
wasn't loaded.
It seems that this was an attempt to support naming a directory that only
contains largefiles, as a test fails if the else clause is dropped entirely.
Therefore, only append the "standin" if it is a directory. This was found by
running the test suite with --config extensions.largefiles=.
The first added test used to log an additional cset that wasn't logged normally.
The only relation it had to file 'a' is that 'a' was the source of a move, but
it isn't clear why having '.hglf/a' in the list causes this change:
@@ -47,6 +47,11 @@
Make sure largefiles doesn't interfere with logging a regular file
$ hg log a --config extensions.largefiles=
+ changeset: 3:2ca5ba701980
+ user: test
+ date: Thu Jan 01 00:00:04 1970 +0000
+ summary: d
+
changeset: 0:9161b9aeaf16
user: test
date: Thu Jan 01 00:00:01 1970 +0000
The second added test used to complain about a file not being in the parent
revision:
@@ -1638,10 +1643,8 @@
Ensure that largefiles doesn't intefere with following a normal file
$ hg --config extensions.largefiles= log -f d -T '{desc}' -G
- @ c
- |
- o a
-
+ abort: cannot follow file not in parent revision: ".hglf/d"
+ [255]
$ hg log -f d/a -T '{desc}' -G
@ c
|
Note that there is still something fishy with the largefiles code, because when
using a glob pattern like this:
$ hg log 'glob:sub/*'
the pattern list would contain '.hglf/glob:sub/*'. None of the tests show this
(this test lives in test-largefiles.t at 1349), it was just something that I
noticed when the code was loaded up with print statements.
Convert will try to find references to revisions in commit messages and replace
them with references to the converted revision. It will take any string that
looks like a hash (and thus also decimal numbers) and look it up in the source
repo. If it finds anything, it will use that in the commit message instead.
It would do that for all hex digit sequences of 6 to 40 characters. That was
usually no problem for small repos where it was unlikely that there would be a
matching 6 'digit' hash prefix. It was also no problem on repos with less than
100000 changesets where numbers with 6 or more digits not would match any
revision number. With more than 100000 revisions random numbers in commit
messages would be replaced with a "random" hash. For example, 'handle 100000
requests' would be changed to to 'handle 9117c6 requests'. Convert could thus
not really be used on real repositories with more than 100000 changesets.
The default hash length shown by Mercurial is 12 'digits'. It is unexpected and
unwanted that convert by default tries to replace revision references that use
less than that amount of 'digits'.
To fix this, don't match strings that are less than the default hash size of 12
characters.