Empty changelist descriptions are valid in Perforce. If we encounter one of
them we are currently running into an IndexError. In case of empty commit
messages set the commit message to **empty changelist description**, which
follows Perforce terminology.
When the config is set to true, status output becomes relative to the
working directory. This has bugged me since I started using hg and it
turns it is sillily simple to support it (unless I missed something,
of course).
We could also add a --relative flag, but I would personally always
want that on, and I haven't heard any use for having it sometimes on,
so this patch only lets you enable it via config.
We only have commands.{update,rebase}.requiredest so far. We should
clearly ignore those two if HGPLAIN is in effect, and it seems like we
should ignore any future config that will be added in [commands] since
that is about changing the behavior of commands.
Thanks to Yuya for suggesting to centralize the code in ui.py.
While at it, remove the unnecessary False values passed to
ui.configbool() for the aforementioned config options.
Unlike p1 = null, p2 = null denotes the revision has only one parent, which
shouldn't be considered a child of the null revision. This was spotted while
fixing the issue4682 and rediscovered as issue5439.
This is the behavior of the default __import__() function, which doesn't
validate the existence of the fromlist items. Later on, the missing attribute
is detected while processing the import statement.
https://hg.python.org/cpython/file/v2.7.13/Python/import.c#l2575
The comtypes library relies on this (maybe) undocumented behavior, and we
got a bug report to TortoiseHg, sigh.
https://bitbucket.org/tortoisehg/thg/issues/4647/
The test added at 0be19b069edf verifies the behavior of the import statement,
so this patch only adds the test of __import__() function and works around
CPython/PyPy difference.
Currently the warning is ambiguous about whether the new tag (possibly specified
via --rev) is being added on a branch head or whether the working directory is
based on a branch head. Clarify the error message to eliminate this ambiguity.
test-largefiles-update.t creates temporary file exec-bit.patch inside
the working directory for no-execbit platform specific test, but
subsequent tests aren't aware of it.
On execbit platform, subsequent tests can run successfully, because
exec-bit.patch isn't created.
But on no-execbit platform, this temporary file makes subsequent tests
show "? exec-bit.patch" at each "hg status".
journal extension uses util.shellquote() to record command line, but
result of it depends on runtime platform: double quotation is used on
Windows and OpenVMS, but single quotation is used otherwise.
test-journal-share.t sometimes specifies commit messages including
white space on command line. It makes journal output depend on runtime
platform, but commit message itself isn't important in this test case.
On Windows, strftime() doesn't support format code "%s", and it causes
"invalid format string" error.
https://msdn.microsoft.com/en-us/library/fe06s4ak.aspx
test-command-template.t examines not seconds value in UTC, but
arithmetic calculation. Therefore, using format code "%Y" instead of
"%s" should be reasonable.
FYI:
- Python standard library reference doesn't list "%s" up in format
code list required for "C standard (1989 version)", even though it
also mentions that additional format codes are required for "C
standard (1999 version)"
https://docs.python.org/2.7/library/datetime.html#strftime-and-strptime-behavior
- The Open Group Base Specifications Issue 7 (IEEE Std 1003.1-2008,
2016 Edition) doesn't require strftime to support format code "%s"
http://pubs.opengroup.org/onlinepubs/9699919799/functions/strftime.html
- "man strftime" of (Open/Oracle) Solaris and Mac OS X (= UNIX
certified OSs) describes about format code "%s"
If environment variable looks like PATH or so (e.g. any of components
joined by ":" contains "/"), ":" in it is replaced with ";" by MinGW
at spawning Windows native process, to follow path concatenation style
of Windows.
Therefore, "bundle:../full.hg" is converted into "bundle;..\full.hg"
on MinGW.
Difference between "/" and "\" is automatically ignored by "(glob)",
but difference between ":" and ";" should be globed explicitly.
On Windows platform, invoking printenv.py directly via hook is
problematic, because:
- unless binding between *.py suffix and python runtime, application
selector dialog is displayed, and running test is blocked at each
printenv.py invocations
- it isn't safe to assume binding between *.py suffix and python
runtime, because application binding is easily broken
For example, installing IDE (VisualStudio with Python Tools, or
so) often requires binding between source files and IDE itself.
This patch invokes printenv.py via sh -c for test portability. This is
a kind of follow up for 9e4331825bea, which eliminated explicit
"python" for printenv.py. There are already other 'sh -c "printenv.py"'
in *.t files, and this fix should be reasonable.
This changes were confirmed in cases below:
- without any application binding for *.py suffix
- with binding between *.py suffix and VisualStudio
This patch also replaces "echo + redirection" style with "heredoc"
style, because:
- hook command line is parsed by cmd.exe as shell at first, and
- single quotation can't quote arguments on cmd.exe, therefore,
- "printenv.py foobar" should be quoted by double quotation, but
- nested quoting (or tricky escaping) isn't readable
cl._partialmatch() can be pretty slow if hidden revisions are involved. This
patch cancels the slowdown introduced by the previous patch by using an
unfiltered changelog, which means shortest(node) isn't always the shortest.
The result isn't perfect, but seems okay as long as shortest(node) is short
enough to type and can be used as an identifier.
(with hidden revisions)
% hg log -R hg-committed -r0:20000 -T '{node|shortest}\n' --time > /dev/null
(.^^) time: real 1.530 secs (user 1.480+0.000 sys 0.040+0.000)
(.^) time: real 43.080 secs (user 43.060+0.000 sys 0.030+0.000)
(.) time: real 1.680 secs (user 1.650+0.000 sys 0.020+0.000)
cl.index.partialmatch() isn't a drop-in replacement for cl._partialmatch().
It has no knowledge about hidden revisions, and it raises ValueError if a node
shorter than 4 chars is given. Instead, use index.partialmatch() through
cl._partialmatch(), which has no such problems and gives the identical result
with/without --pure.
The test output was sampled with --pure without this patch, which shows the
most correct result. However, we'll need to switch to using an unfiltered
changelog because _partialmatch() of a filtered changelog can be an order of
magnitude slower.
(with hidden revisions)
% hg log -R hg-committed -r0:20000 -T '{node|shortest}\n' --time > /dev/null
(.^) time: real 1.530 secs (user 1.480+0.000 sys 0.040+0.000)
(.) time: real 43.080 secs (user 43.060+0.000 sys 0.030+0.000)
On some platforms, cwd can't be removed. In which case, util.unlinkpath()
continues with no error since the failure of directory removal isn't critical.
So it doesn't make sense to run the test added by 6395630fdfdc on those
platforms. OTOH, we need to run the test in test-rebase-scenario-global.t
since the repository is referenced after that.
This is a fix for a regression introduced by the patches for issue4028.
The test changes are due to us doing fewer _checkcopies searches now, which
makes some test outputs revert to the pre-issue4028 behavior. That issue itself
remains fixed, we only skip copy tracing for files where it isn't relevant.
As a nice side effect, this makes copy detection much faster when tracing
backwards through lots of renames.
Over the past week I've had to instruct multiple people to run
Python code to query the ssl module to see what TLS protocol support
is present. I think it would be useful for `hg debuginstall` to print
this info to make it easier to access and debug why Mercurial is
complaining about using an insecure TLS 1.0 protocol.
Ideally we'd also print the path to the CA cert bundle. But the APIs
for querying that in sslutil can emit warnings, making it slightly
more difficult to integrate into `hg debuginstall`. That work will
have to wait for another day.
A code snippet that has been around since largefiles was introduced was wrong:
Standins no longer found in lfdirstate has *not* been removed -
they have probably just been deleted ... or not created.
This wrong reporting did that 'up -C' didn't undo the change and didn't sync
the two dirstates.
Instead of reporting such files as removed, propagate the deletion to the
standin file and report the file as deleted.
Largefiles are fragile with the design where dirstate and lfdirstate must be
kept in sync.
To be less fragile, mark all clean largefiles as unsure ("normallookup") before
updating standins. After standins have been updated and we know exactly which
largefile standins actually was changed, mark the unchanged largefiles back to
clean ("normal").
This will make the failure mode more safe. If interrupted, the next command
will continue to perform extra hashing of all largefiles. That will do that all
largefiles that are out of sync with their standin will be marked dirty and
they will show up in status and can be cleaned with update --clean.
Test using existing changesets in a clean working directory, revealing problems
with files that don't show up as modified or do show up as removed when they
just not have been written yet.
Revlog can now be configured to store full snapshot only. This is used on the
changelog. However, the changegroup packing was still recomputing deltas to be
sent over the wire.
We now just reuse the full snapshot directly in this case, skipping delta
computation. This provides use with a large speed up(-30%):
# perfchangegroupchangelog on mercurial
! wall 2.010326 comb 2.020000 user 2.000000 sys 0.020000 (best of 5)
! wall 1.382039 comb 1.380000 user 1.370000 sys 0.010000 (best of 8)
# perfchangegroupchangelog on pypy
! wall 5.792589 comb 5.780000 user 5.780000 sys 0.000000 (best of 3)
! wall 3.911158 comb 3.920000 user 3.900000 sys 0.020000 (best of 3)
# perfchangegroupchangelog on mozilla central
! wall 20.683727 comb 20.680000 user 20.630000 sys 0.050000 (best of 3)
! wall 14.190204 comb 14.190000 user 14.150000 sys 0.040000 (best of 3)
Many tests have to be updated because of the change in bundle content. All
theses update have been verified. Because diffing changelog was not very
valuable, the resulting bundle have similar size (often a bit smaller):
# full bundle of mozilla central
with delta: 1142740533B
without delta: 1142173300B
So this is a win all over the board.
When working in a rotated DAG (for a graftlike merge), there can be files
that are renamed both between the base and the topological CA, and between
the TCA and the endpoint farther from the base. Such renames span the TCA
(and thus need both passes of _checkcopies to be fully detected), but may
not necessarily be divergent.
Make _checkcopies return "incomplete copies" and "incomplete divergences"
in this case, and let mergecopies recombine them once data from both passes
of _checkcopies is available.
With this patch, all known cases involving renames and grafts pass.
(Developed together with Pierre-Yves David)
Add a "trouble" line in changeset header along with a couple of labels on
"log.changeset" line to indicate whether a changeset is troubled or not and
which kind trouble occurs.
During a graftlike merge, _checkcopies runs from ctx to tca, possibly
passing over the merge base. If there is a rename both before and after
the base, then we're actually dealing with divergent renames.
If there is no rename on the other side of tca, then the divergence is
contained entirely in the range of one _checkcopies invocation, and
should be detected "in the loop" without having to rely on the other
_checkcopies pass.
The "needed" dict is used as a reference counter to free items in the giant
"hist" dict. However, currently it is not very accurate and can lead to
dropping "hist" items unnecessarily, for example, with the following DAG,
-3-
/ \
0--1--2--4--
The current algorithm will visit and calculate rev 1 twice, undesired. And
it tries to be smart by clearing rev 1's parents: "pcache[1] = []" at the
time hist[1] being accessed (note: hist[1] needs to be used twice, by rev 2
and rev 3). It can result in incorrect results if p1 of rev 4 deletes chunks
belonging to rev 0.
However, simply removing "needed" is not okay, because it will consume 10x
memory:
# without any change
% HGRCPATH= lrun ./hg annotate mercurial/commands.py -r d130a38 3>&2 [1]
MEMORY 49074176
CPUTIME 9.213
REALTIME 9.270
# with "needed" removed
MEMORY 637673472
CPUTIME 8.164
REALTIME 8.249
This patch moves "needed" (and "pcache") calculation to a separate DFS to
address the issue. It improves perf and fixes issue5360 by correctly reusing
hist, while maintaining low memory usage. Some additional attempt has been
made to further reduce memory usage, like changing "pcache[f] = []" to "del
pcache[f]". Therefore the result can be both faster and lower memory usage:
# with this patch applied
MEMORY 47575040
CPUTIME 7.870
REALTIME 7.926
[1]: lrun is a lightweight sandbox built on Linux cgroup and namespace. It's
used to measure CPU and memory usage here. Source code is available at
github.com/quark-zju/lrun.
As a followup to the issue4028 series, this fixes a variant of the issue
that can occur when updating with uncommited local changes.
The duplicated .hgsub warning is coming from wc.dirty(). We would previously
skip this call because it's only relevant when we're going to perform copy
tracing, which we didn't do before.
The change to the update summary line is because we now treat the rename as a
proper rename (which counts as a change), rather than an add+delete pair
(which counts as a change and a delete).
The algorithm of _checkcopies can only walk backwards in the DAG, never
forward. Because of this, the two _checkcopies patches need to run from
their respective endpoints to the TCA to cover the entire subgraph where
the merge is being performed. However, detection of files new in both
endpoints, as well as directory rename detection, need to run with respect
to the merge base, so we need lists of new files both from the TCA's and
the merge base's viewpoint to correctly detect renames in a graft-like
merge scenario.
(Series reworked by Pierre-Yves David)
These cover all currently known cases of renames being grafted,
or changes being grafted through renames.
Right now, most of these cases are broken. Later patches in this series
will make them behave correctly.
The testcases heavily rely on each other, which would make it very difficult
to separate them and add them one-by-one for each case fixed by a patch.
Separating them should perhaps be a 4.1 task, if it doesn't slow down
the tests too much.
(Developed together with Pierre-Yves David)
When grafting a copy backwards through a rename, a copy is wrongly detected,
which causes the graft to be applied inappropriately, in a destructive way.
Make sure that the old file name really exists in the common ancestor,
and bail out if it doesn't.
This fixes the aggravated case of bug 5343, although the basic issue
(failure to duplicate the copy information) still occurs.
Currently, exchange.getbundle() returns either a cg1unpacker or a
util.chunkbuffer (in the case of bundle2). This is kinda OK, as
both expose a .read() to consumers. However, localpeer.getbundle()
has code inferring what the response type is based on arguments and
converts the util.chunkbuffer returned in the bundle2 case to a
bundle2.unbundle20 instance. This is a sign that the API for
exchange.getbundle() is not ideal because it doesn't consistently
return an "unbundler" instance.
In addition, unbundlers mask the fact that there is an underlying
generator of changegroup data. In both cg1 and bundle2, this generator
is being fed into a util.chunkbuffer so it can be re-exposed as a
file object.
util.chunkbuffer is a nice abstraction. However, it should only be
used "at the edges." This is because keeping data as a generator is
more efficient than converting it to a chunkbuffer, especially if we
convert that chunkbuffer back to a generator (as is the case in some
code paths currently).
This patch refactors exchange.getbundle() into
exchange.getbundlechunks(). The new API returns an iterator of chunks
instead of a file-like object.
Callers of exchange.getbundle() have been updated to use the new API.
There is a minor change of behavior in test-getbundle.t. This is
because `hg debuggetbundle` isn't defining bundlecaps. As a result,
a cg1 data stream and unpacker is being produced. This is getting fed
into a new bundle20 instance via bundle2.writebundle(), which uses
a backchannel mechanism between changegroup generation to add the
"nbchanges" part parameter. I never liked this backchannel mechanism
and I plan to remove it someday. `hg bundle` still produces the
"nbchanges" part parameter, so there should be no user-visible
change of behavior. I consider this "regression" a bug in
`hg debuggetbundle`. And that bug is captured by an existing
"TODO" in the code to use bundle2 capabilities.
util.filechunkiter has been using a chunk size of 64k for more than 10 years,
also in years where Moore's law still was a law. It is probably ok to bump it
now and perhaps get a slight win in some cases.
Also, largefiles have been using 128k for a long time. Specifying that size
multiple times (or forgetting to do it) seems a bit stupid. Decreasing it to
64k also seems unfortunate.
Thus, we will set the default chunksize to 128k and use the default everywhere.
If the entry in the terminfo database for your terminal is missing some
attributes, it should be possible to create them on the fly without
resorting to just making them a color. This change allows you to have
[color]
terminfo.<effect> = <code>
where <effect> might be something like "dim" or "bold", and <code> is the
escape sequence that would otherwise have come from a call to tigetstr().
If an escape character is needed, use "\E". Any such settings will
override attributes that are present in the terminfo database.
During update directories are deleted as soon as they have no entries.
But if current working directory is deleted then it cause problems
in complex commands like 'hg split'. This commit adds a warning
that will help users figure the problem faster.
This patch disables delta chains on changelogs. After this patch, new
entries on changelogs - including existing changelogs - will be stored
as the fulltext of that data (likely compressed). No delta computation
will be performed.
An overview of delta chains and data justifying this change follows.
Revlogs try to store entries as a delta against a previous entry (either
a parent revision in the case of generaldelta or the previous physical
revision when not using generaldelta). Most of the time this is the
correct thing to do: it frequently results in less CPU usage and smaller
storage.
Delta chains are most effective when the base revision being deltad
against is similar to the current data. This tends to occur naturally
for manifests and file data, since only small parts of each tend to
change with each revision. Changelogs, however, are a different story.
Changelog entries represent changesets/commits. And unless commits in a
repository are homogonous (same author, changing same files, similar
commit messages, etc), a delta from one entry to the next tends to be
relatively large compared to the size of the entry. This means that
delta chains tend to be short. How short? Here is the full vs delta
revision breakdown on some real world repos:
Repo % Full % Delta Max Length
hg 45.8 54.2 6
mozilla-central 42.4 57.6 8
mozilla-unified 42.5 57.5 17
pypy 46.1 53.9 6
python-zstandard 46.1 53.9 3
(I threw in python-zstandard as an example of a repo that is homogonous.
It contains a small Python project with changes all from the same
author.)
Contrast this with the manifest revlog for these repos, where 99+% of
revisions are deltas and delta chains run into the thousands.
So delta chains aren't as useful on changelogs. But even a short delta
chain may provide benefits. Let's measure that.
Delta chains may require less CPU to read revisions if the CPU time
spent reading smaller deltas is less than the CPU time used to
decompress larger individual entries. We can measure this via
`hg perfrevlog -c -d 1` to iterate a revlog to resolve each revision's
fulltext. Here are the results of that command on a repo using delta
chains in its changelog and on a repo without delta chains:
hg (forward)
! wall 0.407008 comb 0.410000 user 0.410000 sys 0.000000 (best of 25)
! wall 0.390061 comb 0.390000 user 0.390000 sys 0.000000 (best of 26)
hg (reverse)
! wall 0.515221 comb 0.520000 user 0.520000 sys 0.000000 (best of 19)
! wall 0.400018 comb 0.400000 user 0.390000 sys 0.010000 (best of 25)
mozilla-central (forward)
! wall 4.508296 comb 4.490000 user 4.490000 sys 0.000000 (best of 3)
! wall 4.370222 comb 4.370000 user 4.350000 sys 0.020000 (best of 3)
mozilla-central (reverse)
! wall 5.758995 comb 5.760000 user 5.720000 sys 0.040000 (best of 3)
! wall 4.346503 comb 4.340000 user 4.320000 sys 0.020000 (best of 3)
mozilla-unified (forward)
! wall 4.957088 comb 4.950000 user 4.940000 sys 0.010000 (best of 3)
! wall 4.660528 comb 4.650000 user 4.630000 sys 0.020000 (best of 3)
mozilla-unified (reverse)
! wall 6.119827 comb 6.110000 user 6.090000 sys 0.020000 (best of 3)
! wall 4.675136 comb 4.670000 user 4.670000 sys 0.000000 (best of 3)
pypy (forward)
! wall 1.231122 comb 1.240000 user 1.230000 sys 0.010000 (best of 8)
! wall 1.164896 comb 1.160000 user 1.160000 sys 0.000000 (best of 9)
pypy (reverse)
! wall 1.467049 comb 1.460000 user 1.460000 sys 0.000000 (best of 7)
! wall 1.160200 comb 1.170000 user 1.160000 sys 0.010000 (best of 9)
The data clearly shows that it takes less wall and CPU time to resolve
revisions when there are no delta chains in the changelogs, regardless
of the direction of traversal. Furthermore, not using a delta chain
means that fulltext resolution in reverse is as fast as iterating
forward. So not using delta chains on the changelog is a clear CPU win
for reading operations.
An example of a user-visible operation showing this speed-up is revset
evaluation. Here are results for
`hg perfrevset 'author(gps) or author(mpm)'`:
hg
! wall 1.655506 comb 1.660000 user 1.650000 sys 0.010000 (best of 6)
! wall 1.612723 comb 1.610000 user 1.600000 sys 0.010000 (best of 7)
mozilla-central
! wall 17.629826 comb 17.640000 user 17.600000 sys 0.040000 (best of 3)
! wall 17.311033 comb 17.300000 user 17.260000 sys 0.040000 (best of 3)
What about 00changelog.i size?
Repo Delta Chains No Delta Chains
hg 7,033,250 6,976,771
mozilla-central 82,978,748 81,574,623
mozilla-unified 88,112,349 86,702,162
pypy 20,740,699 20,659,741
The data shows that removing delta chains from the changelog makes the
changelog smaller.
Delta chains are also used during changegroup generation. This
operation essentially converts a series of revisions to one large
delta chain. And changegroup generation is smart: if the delta in
the revlog matches what the changegroup is emitting, it will reuse
the delta instead of recalculating it. We can measure the impact
removing changelog delta chains has on changegroup generation via
`hg perfchangegroupchangelog`:
hg
! wall 1.589245 comb 1.590000 user 1.590000 sys 0.000000 (best of 7)
! wall 1.788060 comb 1.790000 user 1.790000 sys 0.000000 (best of 6)
mozilla-central
! wall 17.382585 comb 17.380000 user 17.340000 sys 0.040000 (best of 3)
! wall 20.161357 comb 20.160000 user 20.120000 sys 0.040000 (best of 3)
mozilla-unified
! wall 18.722839 comb 18.720000 user 18.680000 sys 0.040000 (best of 3)
! wall 21.168075 comb 21.170000 user 21.130000 sys 0.040000 (best of 3)
pypy
! wall 4.828317 comb 4.830000 user 4.820000 sys 0.010000 (best of 3)
! wall 5.415455 comb 5.420000 user 5.410000 sys 0.010000 (best of 3)
The data shows eliminating delta chains makes the changelog part of
changegroup generation slower. This is expected since we now have to
compute deltas for revisions where we could recycle the delta before.
It is worth putting this regression into context of overall changegroup
times. Here is the rough total CPU time spent in changegroup generation
for various repos while using delta chains on the changelog:
Repo CPU Time (s) CPU Time w/ compression
hg 4.50 7.05
mozilla-central 111.1 222.0
pypy 28.68 75.5
Before compression, removing delta chains from the changegroup adds
~4.4% overhead to hg changegroup generation, 1.3% to mozilla-central,
and 2.0% to pypy. When you factor in zlib compression, these percentages
are roughly divided by 2.
While the increased CPU usage for changegroup generation is unfortunate,
I think it is acceptable because the percentage is small, server
operators (those likely impacted most by this) have other mechanisms
to mitigate CPU consumption (namely reducing zlib compression level and
pre-generated clone bundles), and because there is room to optimize this
in the future. For example, we could use the nullid as the base revision,
effectively encoding the full revision for each entry in the changegroup.
When doing this, `hg perfchangegroupchangelog` nearly halves:
mozilla-unified
! wall 21.168075 comb 21.170000 user 21.130000 sys 0.040000 (best of 3)
! wall 11.196461 comb 11.200000 user 11.190000 sys 0.010000 (best of 3)
This looks very promising as a future optimization opportunity.
It's worth that the changes in test-acl.t to the changegroup part size.
This is because revision 6 in the changegroup had a delta chain of
length 2 before and after this patch the base revision is nullrev.
When the base revision is nullrev, cg2packer.deltaparent() hardcodes
the *previous* revision from the changegroup as the delta parent.
This caused the delta in the changegroup to switch base revisions,
the delta to change, and the size to change accordingly. While the
size increased in this case, I think sizes will remain the same
on average, as the delta base for changelog revisions doesn't matter
too much (as this patch shows). So, I don't consider this a regression.
The ability to negate any boolean flags itself is great, but I think we are not
ready to expose the help side of it yet.
First, while there exist a handful of such flags whose default value can be
changed (eg: git diff, patchwork confirmation), there is only a few of them. The
users who benefit the most from this change are alias users and large
installation that can deploy extension to change behavior (eg: facebook
tweakdefault). So the majority of user who will be affected by a large change
to command help that is not yet relevant to them. (I expect this to become
relevant when ui.progressive start to exists).
Below is an example of the impact of the new help on 'hg help diff':
-r --rev REV [+] revision
-c --change REV change made by revision
-a --[no-]text treat all files as text
-g --[no-]git use git extended diff format
--[no-]nodates omit dates from diff headers
--[no-]noprefix omit a/ and b/ prefixes from filenames
-p --[no-]show-function show which function each change is in
--[no-]reverse produce a diff that undoes the changes
-w --[no-]ignore-all-space ignore white space when comparing lines
-b --[no-]ignore-space-change ignore changes in the amount of white space
-B --[no-]ignore-blank-lines ignore changes whose lines are all blank
-U --unified NUM number of lines of context to show
--[no-]stat output diffstat-style summary of changes
--root DIR produce diffs relative to subdirectory
-I --include PATTERN [+] include names matching the given patterns
-X --exclude PATTERN [+] exclude names matching the given patterns
-S --[no-]subrepos recurse into subrepositories
Another issue with the current state of help, the default value for the
flag is not conveyed to the user. For example in the 'backout' help, there is
no real distinction between "--[no-]backup" (default to True) and "--[no-]keep"
(default) to False:
--[no-]backup no backups
--[no-]keep do not modify working directory during strip
In addition, I've discussed with Augie Fackler and the last batch of the work on
this have burned him out quite some. Therefore he is not intending to perform
any more work on this topic. Quoting him, he would rather see the help part
backed out than spending more time on it.
I do not think we are ready to expose this to users in 4.0 (freeze in a week),
especially because we cannot expect quick improvement on these aspect as this
topic no longer have an owner. We should be able to reintroduce that change in
the future when someone get back on it and the main issues are solves:
* Introduction of ui.progressive makes it relevant for a majority of user,
* Current default value are efficiently conveyed to the user.
(In addition, the excerpt from diff help show that we still have some issue with
some negative option like '--nodates' so further improvement are probably
welcome there.)
Users that want to add a copy record to an existing commit with 'hg
commit --amend' should be guided towards this workflow, rather than
reaching for some sort of uncommit-recommit flow. As part of this,
distinguish in the top-line error message whether the file merely
already exists (untracked) on disk or the file already exists in
history.
The full list of copy and rename cases and how they interact with
flags are listed below:
target exists --after --force | action
n n * | copy
n y * | (1)
untracked n n | (4) NEWHINT
untracked n y | (3)
untracked y * | (2)
y n n | (4) NEWHINT
y n y | (3)
y y n | (2)
y y y | (3)
deleted n n | copy
deleted n y | (3)
deleted y n | (1)
deleted y y | (1)
* = don't care
(1) <src>: not recording move - <target> does not exist
(2) preserve target contents
(3) replace target contents
(4) <target>: not overwriting - file {exists,already committed}
Credit to Kevin for wholly rewriting my table to cover more cases we
discovered at the sprint.
I think this change gets the hints correct in all cases, but I'd
appreciate close inspection of the test cases to make sure I haven't
gotten turned around in here.
Before this patch, using ui.configint() prevents perf.py from
measuring performance with Mercurial earlier than 1.9 (or
12e7e9fbf243), because ui.configint() isn't available in such
Mercurial, even though there are some code paths for Mercurial earlier
than 1.9 in perf.py.
For example, setting "_prereadsize" attribute in perfindex() and
perfnodelookup() is effective only with hg earlier than 1.8 (or
1299f0c14572).
This patch replaces ui.configint() invocations by newly introduced
getint().
This patch also adds check-perf-code.py an extra check entry to detect
direct usage of ui.configint() in perf.py.
BTW, this patch doesn't choose adding configint() method at runtime by
replacing ui.__class__ like below, even though this is the recommended
way to modern Mercurial extensions.
def uisetup(ui):
if not util.safehasattr(ui, 'configint'):
class uiwrap(ui.__class__):
def configint(self, section, name, ....):
....
ui.__class__ = uiwrap
Because changes to ui.__class__ by uisetup() of loaded extension have
been propagated since 1.6.1 (or 07a6e7bd0cc1), the recommended way
above doesn't work as expected with Mercurial earlier than it.
Before this patch, using svfs prevents perf.py from measuring
performance of Mercurial earlier than 2.3 (or 12df7401e8cd), because
svfs isn't available in such Mercurial, even though there are some
code paths for Mercurial earlier than 2.3 in perf.py.
For example, setting "_prereadsize" attribute in perfindex() and
perfnodelookup() is effective only with hg earlier than 1.8 (or
1299f0c14572).
To get appropriate vfs-like object to access files under .hg/store,
this patch adds getsvfs() (and also getvfs(), for future use).
To avoid examining existence of attribute at each repetition while
measuring performance, getsvfs() is invoked outside the function to be
called repeatedly.
This patch also adds check-perf-code.py an extra check entry to detect
direct usage of repo.(vfs|svfs|opener|sopener) in perf.py.
Before this patch, using branchmap.subsettable prevents perfbranchmap
from measuring performance of Mercurial earlier than 2.9 (or
aad678a92970), because aad678a92970 moved subsettable from repoview.py
to branchmap.py, even though there are some code paths for Mercurial
earlier than 2.9 in perf.py.
For example, setting "_prereadsize" attribute in perfindex() and
perfnodelookup() is effective only with hg earlier than 1.8 (or
1299f0c14572).
To get subsettable from appropriate module, this patch examines
existence of subsettable in branchmap and repoview.
This patch also adds check-perf-code.py an extra check entry to detect
direct usage of subsettable attribute in perf.py.
This test helps us to keep track on the commands which runs to Python 3.
The full traceback is hidden. Thanks to Augie and Martijn to wrap it up
in four lines.
We can't predict where those will show up and they're not
super-important for the contents of this particular test, so just drop
them. Further reduces the flakiness of the test to zero.
Healthy output (one log file mentioning "existing pooled" and one
mentioning "new pooled") will now print in a stable order, but
unhealthy output will print some sort of error.
This reduces the flakiness of the test from 55% to 38%. My next patch
makes it completely stable.
The termwidth template keyword is of limited use without some way to ensure
that margins are respected.
Provide a full set of arithmetic operators (four basic operations plus the
mod function, defined to match Python's // for division), so that you can
create termwidth based layouts that match the user's terminal size
Otherwise no code transformation would be applied to the modules which are
imported only by imp.load_module().
This change means modules are imported from PYTHONPATH, not from the paths
given by command arguments. This isn't always correct, but seems acceptable.
File paths in template are repository-absolute paths. This function can be
used to convert them to filesystem paths relative to cwd. This also converts
'/' to '\\' on Windows.
Previously, when a patch contained a move or copy from a source that did not
exist, `hg import` would crash. This patch changes import to raise a PatchError
with an explanantion of what is wrong with the patch to avoid the stack trace
and bad user experience.
Add white-space: nowrap to td.annotate to avoid wrapping div.annotate-info
into next line if there is revision number in the same cell, as it is hard to
mouse over div.annotate-info if it's wrapped into next line.
For now, these sets will be unicode characters in Python 3, which is
probably wrong, but it un-blocks importing the module so we can get
further along. In the future we'll have to come up with a reasonable
encoding strategy for revsets in Python 3.
This patch was originally pair-programmed with Martijn.
because pull might move bookmarks and bookmark are protected by wlock, we have
to grab wlock for pull :-(
This required a small upgrade of the 'lockdelay' extension used by
'test-clone.t' because the delay must apply to a single lock only.
sys.version is a string, and shouldn't be compared against a tuple for
version comparisons. This was always true, so we were never disabling
gc on 2.6.
>>> (2, 7) >= '2.7'
True
>>> (2, 6) >= '2.7'
True
There is a case and more can be present where these functions have
multiple arguments. Our transformer used to handle the first argument, so
added a loop to handle more arguments if present.
encoding.encoding is bytes, we need to pass it to encode() which accepts
unicodes in py3, so used pycomapt.sysstr() Also this can't be done using
transformer as that only transforms the string values not variables.
Before, if performing a clone+share from a repo that was itself
using shared storage, the share code would copy paths.default from
the underlying repo being shared, not from the source given by
the user.
This patch teaches hg.clonewithshare to resolve paths.default
and pass it to share so it can be written to the hgrc accordingly.
Before, "#foo" paths made hg crash. We've moved the #fragment parsing at
246862840084, but we shouldn't set path to None too early. This patch just
removes the "if not path:" block since that's checked a few lines later.
Updating dirstate by simply adding and dropping files from self._map doesn't
keep the other maps updated (think: _dirs, _copymap, _foldmap, _nonormalset)
thus introducing cache inconsistency.
This is also affecting the debugstate tests since now we don't even try to set
correct mode and mtime for the files because they are marked dirty anyway and
will be checked during next status call.
That is, help gets tweaked thus:
global options ([+] can be repeated):
-v --[no-]verbose enable additional output
Other proposals have included:
global options ([+] can be repeated, options marked [?] are boolean flags):
-v --verbose[?] enable additional output
and
global options ([+] can be repeated, options marked [^] are boolean flags):
-v --verbose[^] enable additional output
which avoid the unfortunate visual noise in this patch. In this
version's favor, it's consistent with what I'm used to seeing in man
pages and similar documentation venues.
This command can be used for testing the performance of producing the
changelog portion of a changegroup.
We could use additional perf* commands for testing other parts of
changegroup. Those can be written another time, when they are needed.
(And those may want to refactor the changegroup generation API so code
can be reused.) Speaking of code reuse, yes, this command does reinvent
a small wheel. I didn't want to scope bloat to change the changegroup
API because that will invite bikeshedding.
We can't use fctx.linkrev() because follow() revset tries hard to simulate
the traversal of changelog DAG, not filelog DAG. This patch fixes
_makefollowlogfilematcher() to walk file ancestors in the same way as
revset._follow().
I'll factor out a common function in future patches.
We already support multiple primitive for listing files, which were
affected by the current changeset.
This patch adds files() which returns files of the current changeset
matching a given pattern or fileset query via the "set:" prefix.
There are two reasons that rebase should be done this way:
1. This would make rebasing faster because it would minimize the total
number of files to be checked out in the process, as it don't need
to switch back and forth between branches.
2. It makes resolving conflicts easier as user has a better context.
This commit changes the behavior in "Test multiple root handling" of
test-rebase-obsolete.t. It is an expected change which reflects the new
behavior that commits in a branch are grouped together when rebased.
In Mercurial source tree, opening a file in "a"/"a+" mode like below
doesn't specify atomictemp=True for vfs, and this avoids file stat
ambiguity check by atomictempfile.
- writing changes out in revlog layer uses "a+" mode
- truncation in repair.strip() uses "a" mode
- truncation in transaction._playback() uses "a" mode
If steps below occurs at "the same time in sec", all of mtime, ctime
and size are same between (1) and (3).
1. append data to revlog-style file (and close transaction)
2. discard appended data by truncation (strip or rollback)
3. append same size but different data to revlog-style file again
Therefore, cache validation doesn't work after (3) as expected.
This patch adds file object wrapper class checkambigatclosing to check
(and get rid of) ambiguity at closing. It is used by vfs in subsequent
patch.
This is a part of ExactCacheValidationPlan.
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
BTW, checkambigatclosing is tested in test-filecache.py, even though
it doesn't use filecache itself, because filecache assumes that file
stat ambiguity never occurs (and there is no another test-*.py related
to filecache).
Before, a keyvalue node was processed by the last catch-all condition of
_optimize(). Therefore, topo.firstbranch=expr would bypass tree rewriting
and would crash if an expr wasn't trivial.
This parameter is slightly confusingly named in wireproto, so it got
mis-specified from the start as 'push' instead of the URL to which we
are pushing. Sigh. I've got a patch for that which I'll mail
separately since it's not really appropriate for stable.
Fixes a regression in bundle2 from bundle1.
The "normal" ISO date/time includes a T between date and time. It also
allows dropping the colons and seconds from the timespec. Add new
patterns for these forms as well as tests.
We want to be able to accept ISO 8601 style timezones that don't
include a space separator, so we change the timezone parsing function
to accept a full date string and return both the offset and the
non-timezone portion.
68ae3063a47d causes a fatal AttributeError if kwdemo is run outside a repo
because in the temporary repo creation repo is None and therefore cannot have a
baseui attribute.
In this case fall back to using ui.
Add test case.
Previously a subrepository "sub" would cause no warnings to
be issued for a file "subnot/a", if it's not present in the
corresponding changeset when calling:
hg cat subnot/a
The existing code (a) assumed path would be specified in
encoding.encoding and (b) assumed unicode() objects wouldn't cause
other parts of Mercurial to blow up. Both are dangerous assumptions.
Since we don't know the encoding of path and can't pass non-ASCII
through docstrings, just escape the path and drop the early _(). Will
have to suffice until we can teach docstrings to handle UTF-8b
escaping.
This has the side-effect that the line containing the path is now
variable by the time it reaches _() and thus can't be translated.
In cbefa73a359814e6784a63f90b78c7afd39bc7d5, I introduced a new bug:
when a symlink points to a folder in commit A and to another folder
in commit B, while updating from A to B, Mercurial will try to use
removedir on this symlink, which will fail. This is a very bad bug,
since it basically renders symlinks to folders unusable in repos.
Added test case fails without a fix and passes with it.
Since the default joinfmt() can't process a dict of multiple keywords, we
need a dedicated joinfmt for showparents().
Unlike revset(), parents are formatted as '{rev}:{node|formatnode}' by default.
We copy the default formatting just like showextras() and showfilecopies() do.
It's been broken since eef3c19484ca, which made makemap() return a dict of
multiple keywords. Because the default joinfmt() randomly picks one item
from a dict, we have to make revset() select d[name] explicitly.
There are various causes for the inability to negotiate common SSL/TLS
protocol between client and server. Previously, we had a single, not
very actionable warning message for all of them.
As people encountered TLS 1.0 servers in real life, it was quickly
obvious that the existing messaging was inadequate to help users
rectify the situation.
This patch makes the warning messages much more verbose in hopes of
making them more actionable while simultaneously encouraging users
and servers to adopt better security practices.
This messaging flirts with the anti-pattern of "never blame the
user" by signaling out poorly-configured servers. But if we're going to
disallow TLS 1.0 by default, I think we need to say *something* or
people are just going to blame Mercurial for not being able to connect.
The messaging tries to exonerate Mercurial from being the at fault
party by pointing out the server is the entity that doesn't support
proper security (when appropriate, of course).
--insecure is our psuedo-supported footgun for disabling connection
security.
The flag already disables CA verification. I think allowing the use of
TLS 1.0 when specified is appropriate.
TIL that ui instances for remote/peer repos don't automagically inherit
config options from .hg/hgrc files.
This patch makes remote ui instances inherit options from the
[hostsecurity] section. We were already inheriting options
from [hostfingerprints] and [auth]. So adding [hostsecurity] to the
list seems appropriate.
The code used self._rbcnamescount as if it was the length of self._names ...
but actually it is just the number of good entries on disk. This caused the
cache to be populated inefficiently. In some cases very inefficiently.
Instead of checking the length before lookup, just try a lookup in self._names
- that is also in most cases faster.
Comments and debug messages are tweaked to help understanding the issue
and the fix.
It was in some cases possible to end up writing to the cache file without
growing it first. The range assignment in _setcachedata would append instead of
writing at the requested position and thus write the new record in the wrong
place.
To fix this, we avoid looking up in too small caches, and when growing the
cache, do it right before writing the new record to it so we know it has been
done correctly.
The Python ssl module conditionally sets the TLS 1.1 and TLS 1.2
constants depending on whether HAVE_TLSv1_2 is defined. Yes, these
are both tied to the same constant (I would think there would be
separate constants for each version). Perhaps support for TLS 1.1
and 1.2 were added at the same time and the assumption is that
OpenSSL either has neither or both. I don't know.
As part of developing this patch, it was discovered that Apple's
/usr/bin/python2.7 does not support TLS 1.1 and 1.2 (only TLS 1.0)!
On OS X 10.11, Apple Python has the modern ssl module including
SSLContext, but it doesn't appear to negotiate TLS 1.1+ nor does
it expose the constants related to TLS 1.1+. Since this code is
doing more robust feature detection (and not assuming modern ssl
implies TLS 1.1+ support), we now get TLS 1.0 warnings when running
on Apple Python. Hence the test changes.
I'm not super thrilled about shipping a Mercurial that always
whines about TLS 1.0 on OS X. We may want a follow-up patch to
suppress this warning.
The bundle2 changegroup part has an advisory param saying how many
changesets are in the part. Before this patch, we were setting
this part when generating bundle2 parts via the wire protocol but
not when generating local bundle2 files.
A side effect of not setting the changeset count part is that progress
bars don't work when applying changesets. As the tests show, this
impacted clone bundles, shelve, backup bundles, `hg unbundle`, and
anything touching bundle2 files.
This patch adds a backdoor to allow us to pass state from
changegroup generation into the unbundler. We store the number
of changesets in the changegroup in this state and use it to
populate the aforementioned advisory part parameter when generating
the bundle2 bundle.
I concede that I'm not thrilled by how state is being passed in
changegroup.py (it feels a bit hacky). I would love to overhaul the
rather confusing set of functions in changegroup.py with something that
passes rich objects around instead of e.g. low-level generators.
However, given the code freeze for 3.9 is imminent, I'd rather not
undertake this endeavor right now. This feels like the easiest way
to get the parameter added to the changegroup part.
`hg debugbundle` is calling repr() on bundle2 part params, which are
now util.sortdict instances. Unfortunately, repr() doesn't appear
to be deterministic for util.sortdict. So, we implement one.
We include the type name because that's the common convention for
__repr__ implementations. Having the type name in `hg debugbundle`
is a bit ugly. But it's a debug command and I don't care enough to
fix it.
pycompat.py includes hack to import modules whose names are changed in Python 3.
We use try-except to load module according to the version of python. But this
method forces us to import the modules to raise an ImportError and hence making
it demandimport unfriendly.
This patch changes the try-except blocks to a single if-else block. To avoid
test-check-pyflakes.t complain about unused imports, pycompat.py is excluded
from the test.
Python 2.7 supports specifying a custom cipher list to TLS sockets.
Advanced users may wish to specify a custom cipher list to increase
security. Or in some cases they may wish to prefer weaker ciphers
in order to increase performance (e.g. when doing stream clones
of very large repositories).
This patch introduces a [hostsecurity] config option for defining
the cipher list. The help documentation states that it is for
advanced users only.
Honestly, I'm a bit on the fence about providing this because
it is a footgun and can be used to decrease security. However,
there are legitimate use cases for it, so I think support should
be provided.
To check importing modules in perf.py for historical portability, this
patch lists up files by "hg files" both for "1.2" and tip, and builds
up "module whitelist" check from those files.
This patch uses "1.2" as earlier side version of "module whitelist",
because "mercurial.error" module is a blocker for loading perf.py with
Mercurial earlier than 1.2, and just importing "mercurial.error"
separately isn't enough.
This patch introduces tests/check-perf-code.py as a preparation for
adding extra checks on contrib/perf.py in subsequent patches (mainly,
for historical portability).
At this change, check-perf-code.py doesn't add any extra check, and is
equal to check-code.py. This makes subsequent patch focus only on
adding an extra check on perf.py check-perf-code.py.
check-perf-code.py adds extra checks on perf.py by wrapping
contrib/check-code.py, because "filtering" by check-code.py (e.g.
normalize characters in string literal or comment line) is useful to
simplify regexp for check, and avoid false positive matching.
The BaseHTTPServer, SimpleHTTPServer and CGIHTTPServer has been merged into
http.server in python 3. All of them has been merged as util.httpserver to use
in both python 2 and 3. This patch adds a regex to check-code to warn against
the use of BaseHTTPServer. Moreover this patch also includes updates to lower
part of test-check-py3-compat.t which used to remain unchanged.
Some systems (like FreeBSD jails) use something other than 127.0.0.1
for localhost, and it's not safe to assume it'll always be the same
width. Using sed with a replacement like this sidesteps the problem.
Mercurial now requires TLS 1.1+ when TLS 1.1+ is supported by the
client. Since we made the decision to require TLS 1.1+ when running
with modern Python versions, it makes sense to do something for
legacy Python versions that only support TLS 1.0.
Feature parity would be to prevent TLS 1.0 connections out of the
box and require a config option to enable them. However, this is
extremely user hostile since Mercurial wouldn't talk to https://
by default in these installations! I can easily see how someone
would do something foolish like use "--insecure" instead - and
that would be worse than allowing TLS 1.0!
This patch takes the compromise position of printing a warning when
performing TLS 1.0 connections when running on old Python
versions. While this warning is no more annoying than the
CA certificate / fingerprint warnings in Mercurial 3.8, we provide
a config option to disable the warning because to many people
upgrading Python to make the warning go away is not an available
recourse (unlike pinning fingerprints is for the CA warning).
The warning appears as optional output in a lot of tests.
Currently, Mercurial will use TLS 1.0 or newer when connecting to
remote servers, selecting the highest TLS version supported by both
peers. On older Pythons, only TLS 1.0 is available. On newer Pythons,
TLS 1.1 and 1.2 should be available.
Security professionals recommend avoiding TLS 1.0 if possible.
PCI DSS 3.1 "strongly encourages" the use of TLS 1.2.
Known attacks like BEAST and POODLE exist against TLS 1.0 (although
mitigations are available and properly configured servers aren't
vulnerable).
I asked Eric Rescorla - Mozilla's resident crypto expert - whether
Mercurial should drop support for TLS 1.0. His response was
"if you can get away with it." Essentially, a number of servers on
the Internet don't support TLS 1.1+. This is why web browsers
continue to support TLS 1.0 despite desires from security experts.
This patch changes Mercurial's default behavior on modern Python
versions to require TLS 1.1+, thus avoiding known security issues
with TLS 1.0 and making Mercurial more secure by default. Rather
than drop TLS 1.0 support wholesale, we still allow TLS 1.0 to be
used if configured. This is a compromise solution - ideally we'd
disallow TLS 1.0. However, since we're not sure how many Mercurial
servers don't support TLS 1.1+ and we're not sure how much user
inconvenience this change will bring, I think it is prudent to ship
an escape hatch that still allows usage of TLS 1.0. In the default
case our users get better security. In the worst case, they are no
worse off than before this patch.
This patch has no effect when running on Python versions that don't
support TLS 1.1+.
As the added test shows, connecting to a server that doesn't
support TLS 1.1+ will display a warning message with a link to
our wiki, where we can guide people to configure their client to
allow less secure connections.
Currently, Mercurial will use TLS 1.0 or newer when connecting to
remote servers, selecting the highest TLS version supported by both
peers. On older Pythons, only TLS 1.0 is available. On newer Pythons,
TLS 1.1 and 1.2 should be available.
Security-minded people may want to not take any risks running
TLS 1.0 (or even TLS 1.1). This patch gives those people a config
option to explicitly control which TLS versions Mercurial should use.
By providing this option, one can require newer TLS versions
before they are formally deprecated by Mercurial/Python/OpenSSL/etc
and lower their security exposure. This option also provides an
easy mechanism to change protocol policies in Mercurial. If there
is a 0-day and TLS 1.0 is completely broken, we can act quickly
without changing much code.
Because setting the minimum TLS protocol is something you'll likely
want to do globally, this patch introduces a global config option under
[hostsecurity] for that purpose.
wrapserversocket() has been taught a hidden config option to define
the explicit protocol to use. This is queried in this function and
not passed as an argument because I don't want to expose this dangerous
option as part of the Python API. There is a risk someone could footgun
themselves. But the config option is a devel option, has a warning
comment, and I doubt most people are using `hg serve` to run a
production HTTPS server (I would have something not Mercurial/Python
handle TLS). If this is problematic, we can go back to using a
custom extension in tests to coerce the server into bad behavior.
Like the built-in HTTPS server, this code was using the ssl module
directly and only using TLS 1.0. Like the built-in HTTPS server,
we switch it to use sslutil.wrapserversocket() so it can follow better
practices.
This patch transitions the built-in HTTPS server to use sslutil for
creating the server socket.
As part of this transition, we implement developer-only config options
to control CA loading and whether to require client certificates. This
eliminates the need for the custom extension in test-https.t to define
these.
There is a slight change in behavior with regards to protocol
selection. Before, we would always use the TLS 1.0 constant to define
the protocol version. This would *only* use TLS 1.0. sslutil defaults
to TLS 1.0+. So this patch improves the security of `hg serve` out of
the box by allowing it to use TLS 1.1 and 1.2 (if available).
The most painful part of ensuring Python code runs on both Python 2
and 3 is string encoding. Making this difficult is that string
literals in Python 2 are bytes and string literals in Python 3 are
unicode. So, to ensure consistent types are used, you have to
use "from __future__ import unicode_literals" and/or prefix literals
with their type (e.g. b'foo' or u'foo').
Nearly every string in Mercurial is bytes. So, to use the same source
code on both Python 2 and 3 would require prefixing nearly every
string literal with "b" to make it a byte literal. This is ugly and
not something mpm is willing to do at this point in time.
This patch implements a custom module loader on Python 3 that performs
source transformation to convert string literals (unicode in Python 3)
to byte literals. In effect, it changes Python 3's string literals to
behave like Python 2's.
In addition, the module loader recognizes well-known built-in
functions (getattr, setattr, hasattr) and methods (encode and decode)
that barf when bytes are used and prevents these from being rewritten.
This prevents excessive source changes to accommodate this change
(we would have to rewrite every occurrence of these functions passing
string literals otherwise).
The module loader is only used on Python packages belonging to
Mercurial.
The loader works by tokenizing the loaded source and replacing
"string" tokens if necessary. The modified token stream is
untokenized back to source and loaded like normal. This does add some
overhead. However, this all occurs before caching: .pyc files will
cache the transformed version. This means the transformation penalty
is only paid on first load.
As the extensive inline comments explain, the presence of a custom
source transformer invalidates assumptions made by Python's built-in
bytecode caching mechanism. So, we have to wrap bytecode loading and
writing and add an additional header to bytecode files to facilitate
additional cache validation when the source transformations
change in the future.
There are still a few things this code doesn't handle well, namely
support for zip files as module sources and for extensions. Since
Mercurial doesn't officially support Python 3 yet, I'm inclined to
leave these as to-do items: getting a basic module loading mechanism
in place to unblock further Python 3 porting effort is more important
than comprehensive module importing support.
check-py3-compat.py has been updated to ignore frames. This is
necessary because CPython has built-in code to strip frames from the
built-in importer. When our custom code is present, this doesn't work
and the frames get all messed up. The new code is not perfect. It
works for now. But once you start chasing import failures you find
some edge cases where the files aren't being printed properly. This
only burdens people doing future Python 3 porting work so I'm inclined
to punt on the issue: the most important thing is for the source
transforming module loader to land.
There was a bit of churn in test-check-py3-compat.t because we now
trip up on str/unicode/bytes failures as a result of source
transformation. This is unfortunate but what are you going to do.
It's worth noting that other approaches were investigated.
We considered using a custom file encoding whose decode() would
apply source transformations. This was rejected because it would
require each source file to declare its custom Mercurial encoding.
Furthermore, when changing the source transformation we'd need to
version bump the encoding name otherwise the module caching layer
wouldn't know the .pyc file was invalidated. This would mean mass
updating every file when the source transformation changes. Yuck.
We also considered transforming at the AST layer. However, Python's
ast module is quite gnarly and doing AST transforms is quite
complicated, even for trivial rewrites. There are whole Python packages
that exist to make AST transformations usable. AST transforms would
still require import machinery, so the choice was basically to
perform source-level, token-level, or ast-level transforms.
Token-level rewriting delivers the metadata we need to rewrite
intelligently while being relatively easy to understand. So it won.
General consensus seems to be that this approach is the best available
to avoid bulk rewriting of '' to b''. However, we aren't confident
that this approach will never be a future maintenance burden. This
approach does unblock serious Python 3 porting efforts. So we can
re-evaulate once more work is done to support Python 3.
The old x509 test certificates were using cryptographic settings
that are ancient by today's standards, namely 512 bit RSA keys.
To put things in perspective, browsers have been dropping support
for 1024 bit RSA keys.
I think it is important that tests match the realities of the times.
And 2048 bit RSA keys with SHA-2 hashing are what the world is
moving to.
This patch replaces all the x509 certificates with new versions using
modern best practices. In addition, the docs for generating the
keys have been updated, as the existing docs left out a few steps,
namely how to generate certs that were not active yet or expired.
The link is embedded into a div with class="annotate-info" that only shows up
upon hover of the annotate column. To avoid duplicate hover-overs (this new
one and the one coming from link's title), drop "title" attribute from a
element and put it in the annotate-info element.
Some systems don't have a 127/8 address for localhost (I noticed this
on a FreeBSD jail). In order to work around this, use 127.0.0.1 as a
glob pattern. A future commit will update needed output lines and add
a requirement to check-code.py.
I'm about to fix a bug in check-code that a # anywhere on a line
treated the rest of the line as a comment, even if it was
meaningful. This test is the one place we explicitly *do* want
hardcoded paths referenced, but we can work around that by specifying
bin as a regular expression.
We already had the match relaxed on Windows, but on Google Compute
Engine VMs I'm seeing "Network is unreachable" instead of "Connection
refused". At this point, just give up and make sure we get an error back.
Rather than put everything into one journal file, split entries up in *shared*
and *local* entries. Working copy changes are local to a specific working copy,
so should remain local only. Other entries are shared with the source if so
configured when the share was created.
When unsharing, any shared journale entries are copied across.
Note that now the default action for `hg journal` is to list the working copy
history, not all bookmarks. In its place is the `--all` switch which lists all
name changes recorded, including the name for which the change was recorded on
each line.
Locking is switched to using a dedicated lock to avoid issues with the dirstate
being written during wlock unlocking (you can't re-lock during that process).
Many Linux distros and other Nixen have CA certificates in well-defined
locations. Rather than potentially fail to load any CA certificates at
all (which will always result in a certificate verification failure),
we scan for paths to known CA certificate files and load one if seen.
Because a proper Mercurial install will have the path to the CA
certificate file defined at install time, we print a warning that
the install isn't proper and provide a URL with instructions to
correct things.
We only perform path-based fallback on Pythons that don't know
how to call into OpenSSL to load the default verify locations. This
is because we trust that Python/OpenSSL is properly configured
and knows better than Mercurial. So this new code effectively only
runs on Python <2.7.9 (technically Pythons without the modern ssl
module).
Previously, failure to load system certificates on OS X would lead
to a certificate verify failure and that's it. We now print a warning
message with a URL that will contain information on how to configure
certificates on OS X.
As the inline comment states, there is room to improve here. I think
we could try harder to detect Homebrew and MacPorts installed
certificate files, for example. It's worth noting that Homebrew's
openssl package uses `security find-certificate -a -p` during package
installation to export the system keychain root CAs to
etc/openssl/cert.pem. This is something we could consider adding
to setup.py. We could also encourage packagers to do this. For now,
I'd just like to get this warning (which matches Windows behavior)
landed. We should have time to improve things before release.
When reverting interactively, we always backup files before prompting the user
to find out if they actually want to revert them. This can create spurious
*.orig files if a user enters an interactive revert session and then doesn't
revert any files. Instead, we should only backup files that are actually being
touched.
See the inline comment for what's going on here.
There is magic built into the "ssl" module that ships with modern
CPython that knows how to load the system CA certificates on
Windows. Since we're not shipping a CA bundle with Mercurial,
if we're running on legacy CPython there's nothing we can do
to load CAs on Windows, so it makes sense to print a warning.
I don't anticipate many people will see this warning because
the official (presumed popular) Mercurial distributions on
Windows bundle Python and should be distributing a modern Python
capable of loading system CA certs.
This patch includes addition of absolute_import and print_function to the
files where they are missing. The modern importing conventions are also followed.
Tests were failing on systems like RHEL 7 where loading the system
certificates results in CA certs being reported to Python. We add
a feature that detects when we're able to load *and detect* the
loading of system certificates. We update the tests to cover the
3 scenarios:
1) system CAs are loadable and detected
2) system CAs are loadable but not detected
3) system CAs aren't loadable
This is a fix to an old problem when Mercurial got confused by an
untracked folder with the same name as one of the files in a commit
hg was trying to update to. It is pretty safe to remove this folder if
it is empty. Backing up an empty folder seems to go against Mercurial's
"don't track dirs" philosophy.
hgweb currently offers limited functionality for "classifying"
repositories. This patch aims to change that.
The web.labels config option list is introduced. Its values
are exposed to the "index" and "summary" templates. Custom
templates can use template features like ifcontains() to e.g.
look for the presence of a specific label and engage specific
behavior. For example, a site operator may wish to assign a
"defunct" label to a repository so the repository is prominently
marked as dead in repository indexes.
Inspired by how 'git rebase -i' works, we move the autoverb to the
commit line summary that it matches. We do this by iterating over all
rules and inserting each non-autoverb line into a key in an ordered
dictionary. If we find an autoverb line later, we then search for the
matching key and append it to the list (which is the value of each key
in the dictionary). If we can't find a previous line to move to, then we
leave the rule in the same spot.
Tests have been updated but the diff looks a little messy because we
need to change one of the summary lines so that it will actually move to
a new spot. On top of that, we added -q flags to future some of the
output and needed to change the file it modified so that it wouldn't
cause a conflict.
Stripping has only partly worked since f41815302d49 (repair: use cg3
for treemanifests, 2016-01-19): the bundle seems to have been created
correctly, but revlog entries in subdirectory revlogs were not
stripped. This meant that e.g. "hg verify" would fail after stripping
in a tree manifest repo.
To find the revisions to strip, we simply iterate over all directories
in the repo (included in store.datafiles()). This is inefficient for
stripping few commits, but efficient for stripping many commits. To
optimize for stripping few commits, we could instead walk the tree
from the root and find modified subdirectories, just like we do in the
changegroup code. I'm leaving that for another day.
The httplib library is renamed to http.client in python 3. So the
import is conditionalized and a test is added in check-code to warn
to use util.httplib
If no CA certificates are loaded, that is almost certainly a/the
reason certificate verification fails when connecting to a server.
The modern ssl module in Python 2.7.9+ provides an API to access
the list of loaded CA certificates. This patch emits a warning
on modern Python when certificate verification fails and there are
no loaded CA certificates.
There is no way to detect the number of loaded CA certificates
unless the modern ssl module is present. Hence the differences
in test output depending on whether modern ssl is available.
It's worth noting that a test which specifies a CA file still
renders this warning. That is because the certificate it is loading
is a x509 client certificate and not a CA certificate. This
test could be updated if anyone is so inclined.
I'm not a fan of TLS tests not testing both branches of a possible
configuration. While we have test coverage of the inability to validate
a cert later in this file, I insist that we add this branch so
our testing of security code is extra comprehensive.
Before, sslcontext.load_verify_locations() would raise a
ssl.SSLError which would be caught further up the stack and converted
to a urlerror. By that time, we lost track of what actually errored.
Trapping the error here gives users a slightly more actionable error
message.
The behavior between Python <2.7.9 and Python 2.7.9+ differs. This
is because our fake SSLContext class installed on <2.7.9 doesn't
actually do anything during load_verify_locations: it defers actions
until wrap_socket() time. Unfortunately, a number of errors can occur
at wrap_socket() time and we're unable to ascertain what the root
cause is. But that shouldn't stop us from providing better error
messages to people running a modern and secure Python version.
smf reported that an environment with no loaded CA certs resulted
in a weird error. I'd like to detect this a bit better so we can
display an actionable error message.
The actual error being globbed over in this patch is "unknown error"
with a ssl.c line number. That isn't useful at all.
sslutil contains its own hostname matching logic. CPython has code
for the same intent. However, it is only available to Python 2.7.9+
(or distributions that have backported 2.7.9's ssl module
improvements).
This patch effectively imports CPython's hostname matching code
from its ssl.py into sslutil.py. The hostname matching code itself
is pretty similar. However, the DNS name matching code is much more
robust and spec conformant.
As the test changes show, this changes some behavior around
wildcard handling and IDNA matching. The new behavior allows
wildcards in the middle of words (e.g. 'f*.com' matches 'foo.com')
This is spec compliant according to RFC 6125 Section 6.5.3 item 3.
There is one test where the matcher is more strict. Before,
'*.a.com' matched '.a.com'. Now it doesn't match. Strictly speaking
this is a security vulnerability.
CPython has a more comprehensive test suite for it's built-in hostname
matching functionality. This patch adds its tests so we can improve
our hostname matching functionality.
Many of the tests have different results from CPython. These will be
addressed in a subsequent commit.
Prior to revision 149be6a0072e, largefiles were saved in the local repository,
even if it was using the share extension. After that change, all largefiles are
now stored in the shared repository. However, the backward compatibility for
existing largefiles already placed in the local repository was never tested,
and has been broken since.
Records bookmark locations and shows you where bookmarks were located in the
past.
This is the first in a planned series of locations to be recorded; a future
patch will add working copy (dirstate) tracking, and remote bookmarks will be
supported as well, so the journal storage format should be fairly generic to
support those use-cases.
Before the error was caught at func() as an unknown identifier, and the
optimizer failed to detect the syntax error. This patch introduces getsymbol()
helper to ensure that a string is not allowed as a function name.
It was mixing tabs and spaces, and not in a good way.
Indent style of other atom entries seems to be 1 space per level, so let's
apply it here as well.
It was mixing tabs and spaces, and not in a good way.
Indent style of other rss entries seems to be 4 spaces per level, so let's
apply it here as well.
Rather than sometimes using a complicated shell construct to dump pwned.txt
(if it wasn't expected to exist, but might, if something were broken) or
just cat (if it was expected to exist), just use the "f" utility, which
will be consistent in its behavior across different platforms.
Also make sure that *something* gets put into pwned.txt, even if we ended
up typoing the message variable.
Having a single "pwned" message which may or may not be emitted during the
tests for CVE-2016-3068 leads to extra confusion. Allow each test to emit
a more detailed message based on what the expectations are.
In both cases, we expect a version of git which has had the vulnerability
plugged, as well as a version of mercurial which also knows about
GIT_ALLOW_PROTOCOL. For the first test, we make sure GIT_ALLOW_PROTOCOL is
unset, meaning that the ext-protocol subrepo should be ignored; if it
isn't, there's either a problem with mercurial or the installed copy of
git.
For the second test, we explicitly allow ext-protocol subrepos, which means
that the subrepo will be accessed and a message emitted confirming that
this was, in fact, our intention.
The "pwned" message from this test gets gets sent to stderr, and so may get
emitted in different places from run to run in the rest of mercurial's
output. This patch forces the message to go to a specific file instead,
whose existence and contents we can examine at a stable point in the test's
execution.
When diffing against an empty file, Solaris diff uses 1 to designate the
first line of the empty file (either -1,0 on the left or +1,0 on the right)
while GNU diff uses 0 (-0,0 and +0,0). We use a glob here to make sure the
test passes with either toolchain.
I've not added tests to check-code because there are scads of places in the
tests where the GNU format is used due to that being the format that "hg
diff" and "hg export" use, and changing those to use globs seems wrong.
Before 'hg push -B .' on new remote head complained with:
abort: push creates new remote head ...
It was because _nowarnheads was not expanding active bookmark
name, so it didn't add active bookmark "proper" name to no
warn heads list.
When we remove a changeset from the changelog, the phase cache must be
invalidated, otherwise it could refer to changesets that are no longer in the
repo.
To reproduce the failure, I created an extension querying the phase cache after
the strip transaction is over.
To do that, I stripped two commits with a bookmark on one of them to force
another transaction (we open a transaction for moving bookmarks)
after the strip transaction.
Without the fix in this patch, the test leads to a stacktrace showing the issue:
repair.strip(ui, repo, revs, backup)
File "/Users/lcharignon/facebook-hg-rpms/hg-crew/mercurial/repair.py", line 205, in strip
tr.close()
File "/Users/lcharignon/facebook-hg-rpms/hg-crew/mercurial/transaction.py", line 44, in _active
return func(self, *args, **kwds)
File "/Users/lcharignon/facebook-hg-rpms/hg-crew/mercurial/transaction.py", line 490, in close
self._postclosecallback[cat](self)
File "$TESTTMP/crashstrip2.py", line 4, in test
[repo.changelog.node(r) for r in repo.revs("not public()")]
File "/Users/lcharignon/facebook-hg-rpms/hg-crew/mercurial/changelog.py", line 337, in node
return super(changelog, self).node(rev)
File "/Users/lcharignon/facebook-hg-rpms/hg-crew/mercurial/revlog.py", line 377, in node
return self.index[rev][7]
IndexError: revlog index out of range
The situation was encountered in inhibit (evolve's repo) where we would crash
following the volatile set invalidation submitted by Augie in
cbc52a99d057d11790cf5011e877c6f698bf57bf. Before his patch the issue was masked
as we were not accessing the phasecache after stripping a revision.
This bug uncovered another but in histedit (see explanation in issue5235).
I changed the histedit test accordingly to avoid fixing two things at once.
If you have just executable-bit change and amend it twice it will vanish:
* After the first amend the commit will have the proper executable bit set
in manifest but it won't have the the file on the list of files in
changelog.
* The second amend will read the wrong list of files from changelog and it
will copy the manifest entry from parent for this file.
* Voila! The change is lost.
This change repairs the bug in localrepo causing this and adds a test for it.
GNU grep (2.21-2 or later) assumes that input is encoded in LC_CTYPE,
and input is binary if it contains byte sequence not valid for that
encoding.
For example, if locale is configured as C, a byte setting most
significant bit (MSB) makes such GNU grep show "Binary file <FILENAME>
matches" message instead of matched lines unintentionally.
This behavior is recognized as a bug, and fixed in GNU grep 2.25-1 or
later. But some distributions are shipped with such buggy version
(e.g. Ubuntu xenial, which is used by launchpad buildbot).
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19230https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800670http://packages.ubuntu.com/xenial/grep
This causes failure of test-commit-interactive.t, which applies grep
on CP932 byte sequence since 4681296e309b.
But, explicit setting LC_CTYPE for CP932 might cause another problem,
because it can't be assumed that all environment running Mercurial
tests allows arbitrary locale setting.
To resolve this issue, this patch escapes bytes setting MSB in input
of grep.
For this purpose:
- str.encode('string-escape') isn't useful, because it escapes also
control code (less than 0x20), and makes EOL handling complicated
- "f --hexdump" isn't useful, because it isn't line-oriented
- "sed -n" seems reasonable, but "sed" itself sometimes causes
portability issue, too (e.g. 215a8789129e or 6d02ef568139)
This patch is posted with "stable" flag, because 4681296e309b is on
stable branch.
Before this patch, "hg help topic.section" might show unexpected
section of help topic in some encoding.
It applies str.lower() instead of encoding.lower(str) on translated
message to search section case-insensitively, but some encoding uses
0x41(A) - 0x5a(Z) as the second or later byte of multi-byte character
(for example, ja_JP.cp932), and str.lower() causes unexpected result.
To search section of help topic by translated section name correctly,
this patch replaces str.lower() by encoding.lower(str) for both query
string (in commands.help()) and translated help text (in
minirst.getsections()).
Before this patch, patch.filterpatch() shows meaningless translation
of help message for chunk selection in some encoding.
It applies str.lower() instead of encoding.lower(str) on translated
message, but some encoding uses 0x41(A) - 0x5a(Z) as the second or
later byte of multi-byte character (for example, ja_JP.cp932), and
str.lower() causes unexpected result.
To show lower-ed translated message correctly, this patch replaces
str.lower() by encoding.lower(str).
The partial bundle is not a subset of the full bundle, and the full
bundle is not full in any way that i see. The most obvious
interpretation of "full" I can think of is that it has all commits
back to the null revision, but that is not what the "full" bundle
is. The "full" bundle is simply a backup of what the user asked us to
strip (unless --no-backup). The "partial" bundle contains the
revisions we temporarily stripped because they had higher revision
numbers that some commit that the user asked us to strip.
The "full" bundle is already called "backup" in the code, so let's use
that in user-facing messages too. Let's call the "partial" bundle
"temporary" in the code.
If strip fails when applying the temporary bundle, the commits in the
temporary bundle have not yet been applied, so the user will almost
definitely want to apply the bundle. We should be more clear to the
user about that than our current "partial bundle stored in...".
Note that we will probably not be able to recover it automatically,
since whatever made it fail (e.g. a hook) will most likely make it
fail again. We need to give control back to the user to fix the
problem before trying again.
If strip fails while recovering the temporary bundle (e.g. because a
hook fails), we tell the user only about the backup bundle, not about
the temporary bundle. Since the user did not ask to strip the commits
in the temporary bundle, that's the more important bundle to mention,
so let's do that (and also mention the backup bundle as usual).
V2:
- Limit escaping to plain formatting only
- Use the formatter consistently (no more ui.debug)
- Always include 'name' and 'value'
V3:
- Always convert 'value' to string (this also makes sure we handle functions)
- Keep real debug message as ui.debug for now
- Add additional tests.
Note: I'm not quite sure about the best approach to handling
the 'print the full config' case.
For me, it printed the 'ui.promptecho' key at the end.
I went with globs there as that at least tests the json display reliably.
Example output:
[
{
"name": "ui.username",
"source": "/home/mathias/.hgrc:2",
"value": "Mathias De Maré <mathias.demare@gmail.com>"
}
]
This means that if you have git-diffs enabled by default (pretty
common) and you hit the rare (but real) case where a git-diff breaks
patch(1) or some other tool, you can easily disable it by just
specifying --no-git on the command line.
I feel a little bad about the isinstance() check, but some values in
diffopts are not booleans and so we need to preserve false iff the
flag is a boolean flag: failing to do this means we end up with empty
string defaults for flags clobbering meaningful values from the [diff]
section in hgrc.
This makes it much easier to enable some anti-foot-shooting features
(like update --check) by default, because now all boolean flags can be
explicitly disabled on the command line without having to use HGPLAIN
or similar. Flags which don't deserve this treatment can be removed
from consideration by adding them to the nevernegate set in fancyopts.
This doesn't make it any easier to identify when a flag is set: opts
still always gets filled in, either with the user-specified flag value
or with the default from the flags list in the command
table. Improving that would probably clean things up a bit, but for
now if you want a boolean flag and care if it was explicitly false or
default false (or true, but nobody uses that functionality because
before now it was nonsense) you need to use None as your default
rather than True or False.
This doesn't (yet) update help output, because I'm not quite sure how
to do that cleanly.
Because smartset.reverse() may modify the underlying subset, it should be
called only if the set can define the ordering.
In the following example, 'a' and 'c' is the same object, so 'b.reverse()'
would reverse 'a' unexpectedly.
# '0:2 & reverse(all())'
<filteredset
<spanset- 0:2>, # a
<filteredset # b
<spanset- 0:2>, # c
<spanset+ 0:9>>>
present() is special in that it returns the argument set with no
modification, so the ordering requirement should be forwarded.
We could make present() fix the order like orset(), but that would be silly
because we know the extra filtering cost is unnecessary.
This fixes the order of 'x & (y + z)' where 'y' and 'z' are trivial, and the
other uses of _list()-family functions. The original functions are renamed to
'_ordered(|int|hex)list' to say clearly that they do not follow the subset
ordering.
This fixes the order of 'x & (y + z)' where 'y' and 'z' are not trivial.
The follow-order 'or' operation is slower than the ordered operation if
an input set is large:
#0#1#2#3
0) 0.002968 0.002980 0.002982 0.073042
1) 0.004513 0.004485 0.012029 0.075261
#0: 0:4000 & (0:1099 + 1000:2099 + 2000:3099)
#1: 4000:0 & (0:1099 + 1000:2099 + 2000:3099)
#2: 10000:0 & (0:1099 + 1000:2099 + 2000:3099)
#3: file("path:hg") & (0:1099 + 1000:2099 + 2000:3099)
I've tried another implementation, but which appeared to be slower than
this version.
ss = [getset(repo, fullreposet(repo), x) for x in xs]
return subset.filter(lambda r: any(r in s for s in ss), cache=False)
This corrects a warning from lintian that we're shipping an executable without
a man page. Since there is a doc string in the text, let's use that for the man
page.
The progress bar was being cleared on every write(), regardless of
whether it was currently displayed. This could foul up the display of
any writes that didn't include a linebreak.
In particular, the win32 mode of the color extension was turning
single prompt string writes into two writes, and the resulting
clear/write/clear/write pattern was making the prompt invisible.
We fix this by insisting that we have shown a progress bar and haven't
just cleared it (setting lastprint to 0).
Conveniently, the test suite already had instances of duplicate
clears.. that are now cleared up.
This hasn't been testing anything since partway through the 3.7 cycle
due to unrelated refactoring. Sadly, the behavior it was trying to
prevent reemerged in the codebase at that time. A fix is in the next
patch, because proving that the fix was actually correct ended up
being trickier than I expected.
getbundle was requesting the "phase" namespace instead of the "phases"
namespace, which led to the client still requesting the phases
separately after getbundle finished.
Fixes CVE-2016-3105 (1/1).
Previously, it was possible for the repository path passed to git-ls-remote
to be misinterpreted as a URL.
Always passing an absolute path to git is a simple way to avoid this.
Before this patch, `hg pull --rebase` would be a strict sequence of `hg pull`
followed by `hg rebase` if anything was pulled.
Now that rebase pick his default destination the same way than merge, than
`hg rebase` step would abort in the case the repo already had multiple anonymous
heads (because of the ambiguity). (changed in 8822059a608a)
The intend of the user with `hg pull --rebase` is clearly to rebase on pulled
content. This used to be (mostly) enforced by the former default destination for
rebase, "tipmost changeset of the branch" as the tipmost would likely a
changeset that just got pulled. But this intended was no longer enforced with
the new defaul destination (unified with merge).
This changeset makes use of the '_destspace' mechanism introduced in the previous
changeset to enforce this.
This partially fixes issue5214 as no change at all have been made to the new
handling of the case with bookmark (unified with merge).
We've historically had a problem maintaining the expected invariants
on our caches, especially when introducing new caches. This tests
documents the invariants and exercises them across most of our
existing cache files.
Instead of using bdist_mpkg, we use the modern Apple-provided tools to
build an OS X Installer package directly. This has several advantages:
* Avoids bdist_mpkg which seems to be barely maintained and is hard to
use.
* Creates a single unified .pkg instead of a .mpkg.
* The package we produce is in the modern, single-file format instead of
a directory bundle that we have to zip up for download.
In addition, this way of building the package now correctly:
* Installs the manpages, bringing the `make osx`-generated package in
line with the official Mac packages we publish on the website.
* Installs files with the correct permissions instead of encoding the
UID of the user who happened to build the package.
Thanks to Augie for updating the test expectations.
Initializing a subrepo when one doesn't exist is the right thing to do when the
parent is being updated, but in few other cases. Unfortunately, there isn't
enough context in the subrepo module to distinguish this case. This same issue
can be caused with other subrepo aware commands, so there is a general issue
here beyond the scope of this fix.
A simpler attempt I tried was to add an '_updating' boolean to localrepo, and
set/clear it around the call to mergemod.update() in hg.updaterepo(). That
mostly worked, but doesn't handle the case where archive will clone the subrepo
if it is missing. (I vaguely recall that there may be other commands that will
clone if needed like this, but certainly not all do. It seems both handy, and a
bit surprising for what should be a read only operation. It might be nice if
all commands did this consistently, but we probably need Angel's subrepo caching
first, to not make a mess of the working directory.)
I originally handled 'Exception' in order to pick up the Aborts raised in
subrepo.state(), but this turns out to be unnecessary because that is called
once and cached by ctx.sub() when iterating the subrepos.
It was suggested in the bug discussion to skip looking at the subrepo links
unless -S is specified. I don't really like that idea because missing a subrepo
or (less likely, but worse) a corrupt .hgsubstate is a problem of the parent
repo when checking out a revision. The -S option seems like a better fit for
functionality that would recurse into each subrepo and do a full verification.
Ultimately, the default value for 'allowcreate' should probably be flipped, but
since the default behavior was to allow creation, this is less risky for now.
For highly structured files like JSON or XML dumps with large numbers
of duplicate lines (eg braces) and isolated matching lines, bdiff
could find large numbers of equally good spans. Because it prefers
earlier matches, this would result in pathologically unbalance
recursion that resulted in quadratic performance.
This patch makes it prefer matches closer to the middle that tend to
balance recursion. This change improves the speed of a pathological
test case from 1100s to 9s.
Included is a smaller test that has a roughly 50x safety margin on the
performance it accepts. It's likely to fail on pure builds because
difflib also has a recursion-balancing problem.
The longest_match code compares all the possible positions in two
files to find the best match. Given a pair of sequences, it
effectively searches a grid like this:
a b b b c . d e . f
0 1 2 3 4 5 6 7 8 9
a 1 - - - - - - - - -
b - 2 1 1 - - - - - -
b - 1 3 2 - - - - - -
b - 1 2 4 - - - - - -
. - - - - - 1 - - 1 -
Here, the 4 in the middle says "the first four lines of the
file match", which it can compute be comparing the fourth lines and
then adding one to the result found when comparing the third lines in
the entry to the upper left.
We generally avoid the quadratic worst case by only looking at lines
that match, which is precomputed. We also avoid quadratic storage by
only keeping a single column vector and then keeping track of the best
match.
Unfortunately, this can get us into trouble with the sequences above.
Because we want to reuse the '3' value when calculating the '4', we
need to be careful not to overwrite it with the '2' we calculate
immediately before. If we scan left to right, top to bottom, we're
going to have a problem: we'll overwrite our 3 before we use it and
calculate a suboptimal best match.
To address this, we can either keep two column vectors and swap
between them (which significantly complicates bookkeeping), or change
our scanning order. If we instead scan from left to right, bottom to
top, we'll avoid ever overwriting values we'll need in the future.
This unfortunately needs several changes to be made simultaneously:
- change the order we build the initial hash chains for the b sequence
- change the sentinel values from INT_MAX to -1
- change the visit order in the longest_match inner loop
- add a tie-breaker preference for earlier matches
This last is needed because we previously had an implicit tie-breaker
from our visitation order that our test suite relies on. Later matches
can also trigger a bug in the normalization code in diff().
Problem was files to check were gathered in the repository where
the verify was launched but verification was done on the remote
store. It was observed when user committed in cloned repository
and ran verify before pushing - committed files were marked
as non existing.
This commit fixes this by checking in the remote store only files
that are not existing in the repository store where verify was launched.
Solution is similiar to 909b9d8f9ae7
Now that we have a mechanism for declaring path sub-options, we can
start to pile on features!
Many power users have expressed frustration that bare `hg push`
attempts to push all local revisions to the remote. This patch
introduces the "pushrev" path sub-option to control which revisions
are pushed when no "-r" argument is specified.
The value of this sub-option is a revset, naturally.
A future feature addition could potentially introduce a "pushnames"
sub-options that declares the list of names (branches, bookmarks,
topics, etc) to push by default. The entire "what to push by default"
feature should probably be considered before this patch lands.
As part of developing a subsequent patch I discovered that sub-option
values like "." were getting converted to paths. This is because the
[paths] section is treated specially during config loading.
This patch prevents post-processing sub-options from the [paths]
section.
Previously, when we connected to a server and were unable to verify
its certificate against a trusted certificate authority we would
issue a warning and continue to connect. This is obviously not
great behavior because the x509 certificate model is based upon
trust of specific CAs. Failure to enforce that trust erodes security.
This behavior was defined several years ago when Python did not
support loading the system trusted CA store (Python 2.7.9's
backports of Python 3's improvements to the "ssl" module enabled
this).
This commit changes behavior when connecting to abort if the peer
certificate can't be validated. With an empty/default Mercurial
configuration, the peer certificate can be validated if Python is
able to load the system trusted CA store. Environments able to load
the system trusted CA store include:
* Python 2.7.9+ on most platforms and installations
* Python 2.7 distributions with a modern ssl module (e.g. RHEL7's
patched 2.7.5 package)
* Python shipped on OS X
Environments unable to load the system trusted CA store include:
* Python 2.6
* Python 2.7 on many existing Linux installs (because they don't
ship 2.7.9+ or haven't backported modern ssl module)
* Python 2.7.9+ on some installs where Python is unable to locate
the system CA store (this is hopefully rare)
Users of these Pythongs will need to configure Mercurial to load the
system CA store using web.cacerts. This should ideally be performed
by packagers (by setting web.cacerts in the global/system hgrc file).
Where Mercurial packagers aren't setting this, the linked URL in the
new abort message can contain instructions for users.
In the future, we may want to add more code for finding the system
CA store. For example, many Linux distributions have the CA store
at well-known locations (such as /etc/ssl/certs/ca-certificates.crt
in the case of Ubuntu). This will enable CA loading to "just work"
on more Python configurations and will be best for our users since
they won't have to change anything after upgrading to a Mercurial
with this patch.
We may also want to consider distributing a trusted CA store with
Mercurial. Although we should think long and hard about that because
most systems have a global CA store and Mercurial should almost
certainly use the same store used by everything else on the system.
The ordering of 'x & head()' was broken in 329d82866742 (revset:
improve head revset performance, 2014-03-13). Presumably due to other
optimizations since then, undoing that change to fix the order does
not slow down the simple case of "hg log -r 'head()'" mentioned in
that commit. I see a small slowdown from ~0.16s to about ~0.19s with
'not 0 & head()', but I'd say it's worth it for the correct output.
Rebuilding translation table (256 size) at each repquote() invocations
is redundant.
For example, this patch decreases user time of command invocation
below from 18.297s to 13.445s (about -27%) on a Linux box. This
command is main part of test-check-code.t.
hg locate | xargs python contrib/check-code.py --warnings --per-file=0
This patch adds "_repquote" prefix to functions and variables factored
out from repquote() to avoid conflict of name in the future.
Before this patch, "missing _() in ui message" rule overlooks
translatable message, which starts with other than alphabet.
To detect "missing _() in ui message" more exactly, this patch
improves the regexp with assumptions below.
- sequence consisting of below might precede "translatable message"
in same string token
- formatting string, which starts with '%'
- escaped character, which starts with 'b' (as replacement of '\\'), or
- characters other than '%', 'b' and 'x' (as replacement of alphabet)
- any string tokens might precede a string token, which contains
"translatable message"
This patch builds an input file, which is used to examine "missing _()
in ui message" detection, before '"$check_code" stringjoin.py' in
test-contrib-check-code.t, because this reduces amount of change churn
in subsequent patch.
This patch also applies "()" instead of "_()" on messages below to
hide false-positives:
- messages for ui.debug() or debug commands/tools
- contrib/debugshell.py
- hgext/win32mbcs.py (ui.write() is used, though)
- mercurial/commands.py
- _debugchangegroup
- debugindex
- debuglocks
- debugrevlog
- debugrevspec
- debugtemplate
- untranslatable messages
- doc/gendoc.py (ReST specific text)
- hgext/hgk.py (permission string)
- hgext/keyword.py (text written into configuration file)
- mercurial/cmdutil.py (formatting strings for JSON)
This adds mostly broken tests that will be fixed by subsequent patches. We
generally don't do that, but this patch series would be hard to review
without a set of broken tests.
Note that some tests pass thanks to the reordering problem in optimize().
For instance, '2:0 & _intlist(0 1 2)' doesn't fail because it is rewritten
as '_intlist(0 1 2) & 2:0'.
I.e. when a revision blames a block of source lines, only display the
revision link on the first line of the block (this is identified by the
"blockhead" key in annotate context).
This addresses item "Visual grouping of changesets" of the blame improvements
plan (https://www.mercurial-scm.org/wiki/BlamePlan) which states: "Typically
there are block of lines all attributed to the same revision. Instead of
rendering the revision/changeset for every line, we could only render it once
per block."
* Distinguish the /annotate/<revision>/<file>#<linenumber> link when it would
lead to the current page (i.e. <revision> is the current revision) (style it
gray and undecorated). This indicates more clearly that this is a "dead-end"
in blame navigation.
* Display lines changed in current revision in green.
Change summary webcommand to yield each element of the shortlog instead of the
entire list.
This makes generated json more readable since each entry can be formatted
separately, instead of returning all the shortlog content in a single string.
Modify changelistentry structure to also deliver phase and branch data and use
either 'parents' or 'allparents' depending on what is defined in the view, in
order to reuse it in filelog structure.
Before this commit url.opener overwritten stored password
for connection with given url/user even when
new password for given connection was not filled. This
commit makes opener overwrites saved authentication only
when it contains password.
So far password manager was keeping authentication information so opening
new connection and creating new password manager made all saved authentication
information lost.
This commit separates password manager and password database to make it
possible to reuse saved authentication information.
This commit violates code checker because it adds add_password method (name
with underscore) to passwordmgr object to provide method required by urllib2.
Before this patch, "from a import b" doesn't delay loading module "b",
if absolute_import is enabled, even though "from . import b" does.
For example:
- it is assumed that extension X has "from P import M" for module M
under package P with absolute_import feature
- if importing module M is already delayed before loading extension
X, loading module M in extension X is delayed until actually
referring
util, cmdutil, scmutil or so of Mercurial itself should be
imported by "from . import M" style before loading extension X
- otherwise, module M is loaded immediately at loading extension X,
even if extension X itself isn't used at that "hg" command invocation
Some minor modules (e.g. filemerge or so) of Mercurial itself
aren't imported by "from . import M" style before loading
extension X. And of course, external libraries aren't, too.
This might cause startup performance problem of hg command, because
many bundled extensions already enable absolute_import feature.
To delay loading module for "from a import b" with absolute_import
feature, this patch does below in "from a (or .a) import b" with
absolute_import case:
1. import root module of "name" by system built-in __import__
(referred as _origimport)
2. recurse down the module chain for hierarchical "name"
This logic can be shared with non absolute_import
case. Therefore, this patch also centralizes it into chainmodules().
3. and fall through to process elements in "fromlist" for the leaf
module of "name"
Processing elements in "fromlist" is executed in the code path
after "if _pypy: .... else: ..." clause. Therefore, this patch
replaces "if _pypy:" with "elif _pypy:" to share it.
At faecf59a4184 introducing original "work around" for "from a import
b" case, elements in "fromlist" were imported with "level=level". But
"level" might be grater than 1 (e.g. level=2 in "from .. import b"
case) at demandimport() invocation, and importing direct sub-module in
"fromlist" with level grater than 1 causes unexpected result.
IMHO, this seems main reason of "errors for unknown reason" described
in faecf59a4184, and we don't have to worry about it, because this
issue was already fixed by 2711f50242cf.
This is reason why this patch removes "errors for unknown reasons"
comment.
When grafting/rebasing, it is common for multiple changesets to make
the same change to a subdirectory. When writing the revlog for the
directory, the revlog code already takes care of not writing the entry
again. In 3eb9fa4180d3 (changegroup: prune subdirectory dirlogs too,
2016-02-12), I added the corresponding code in changegroup (not
sending entries the client already has), but I forgot to avoid sending
the entire changegroup if no nodes remained in the pruned
set. Although that's harmless besides the wasted network traffic, the
receiving side was checking for it (copied from the changegroup code
for handling files). This resulted in the client crashing with:
abort: received dir revlog group is empty
Fix by simply not emitting a changegroup for the directory if there
were no changes is it. This matches how files are handled.
Make it noop as before ddf6bfe09ab2. We could change it to an error, but
allowing empty key makes some sense for scripting that builds a key string
programmatically.
The recently introduced (fd348a6ace5a157c) test around malformed pem files hard
codes an error message which doesn't appear to be cross platform agnostic. On
our machines (centos6 if it matters) the test output differs:
- abort: error: unknown error* (glob)
+ abort: error: _ssl.c:330: error:00000000:lib(0):func(0):reason(0)
This patch increases the glob to cover the entire error message.
Sort revisions in reverse revision order but grouped by topographical branches.
Visualised as a graph, instead of:
o 4
|
| o 3
| |
| o 2
| |
o | 1
|/
o 0
revisions on a 'main' branch are emitted before 'side' branches:
o 4
|
o 1
|
| o 3
| |
| o 2
|/
o 0
where what constitutes a 'main' branch is configurable, so the sort could also
result in:
o 3
|
o 2
|
| o 4
| |
| o 1
|/
o 0
This sort was already available as an experimental option in the graphmod
module, from which it is now removed.
This sort is best used with hg log -G:
$ hg log -G "sort(all(), topo)"
Before this patch, chg uses the old pager behavior (pre 55f6f7fb60d2), which
executes pager in the main process. The user will see the exit code of the
pager, instead of the hg command.
Like 55f6f7fb60d2, this patch fixes the behavior by executing the pager in
the child process, and wait for it at the end of the main process.
Recent work has introduced the [hostsecurity] config section for
defining per-host security settings. This patch builds on top
of this foundation and implements the ability to define a per-host
path to a file containing certificates used for verifying the server
certificate. It is logically a per-host web.cacerts setting.
This patch also introduces a warning when both per-host
certificates and fingerprints are defined. These are mutually
exclusive for host verification and I think the user should be
alerted when security settings are ambiguous because, well,
security is important.
Tests validating the new behavior have been added.
I decided against putting "ca" in the option name because a
non-CA certificate can be specified and used to validate the server
certificate (commonly this will be the exact public certificate
used by the server). It's worth noting that the underlying
Python API used is load_verify_locations(cafile=X) and it calls
into OpenSSL's SSL_CTX_load_verify_locations(). Even OpenSSL's
documentation seems to omit that the file can contain a non-CA
certificate if it matches the server's certificate exactly. I
thought a CA certificate was a special kind of x509 certificate.
Perhaps I'm wrong and any x509 certificate can be used as a
CA certificate [as far as OpenSSL is concerned]. In any case,
I thought it best to drop "ca" from the name because this reflects
reality.
SSL handling in mail.py wasn't covered by our test suite, therefore it was
sometimes broken. This patch introduces pretty minimal tests that only cover
the default path. We can extend it later.
Tested with python 2.6.9 and 2.7.11 on Debian sid.
Currently it only supports SMTP over SSL since SMTPS should be simpler than
handling StartTLS.
Since we don't need asynchronous server for our tests, it does TLS handshake
in blocking way. But asyncore is required by Python smtpd module.
The cPickle is renamed to _pickle in python3 and this C extension is available
in pickle which was not included in earlier versions. So imports are conditionalized
to import cPickle in py2 and pickle in py3. Moreover the use of pickle in py2 is
switched to cPickle as the C extension is faster. The hack is added in util.py and
the modules import util.pickle
9981d464ac53 fixed matching() to preserve the order of the input set, but
the test was incorrect. Given "A and B", "A" should be the input set to "B".
But thanks to our optimizer, the test expression was rewritten as
"(2 or 3 or 1) and matching(1 or 2 or 3)", therefore it was working well.
Since I'm going to fix the overall ordering issue, the test needs to be
adjusted to do the right thing.