Many Linux distros and other Nixen have CA certificates in well-defined
locations. Rather than potentially fail to load any CA certificates at
all (which will always result in a certificate verification failure),
we scan for paths to known CA certificate files and load one if seen.
Because a proper Mercurial install will have the path to the CA
certificate file defined at install time, we print a warning that
the install isn't proper and provide a URL with instructions to
correct things.
We only perform path-based fallback on Pythons that don't know
how to call into OpenSSL to load the default verify locations. This
is because we trust that Python/OpenSSL is properly configured
and knows better than Mercurial. So this new code effectively only
runs on Python <2.7.9 (technically Pythons without the modern ssl
module).
Previously, failure to load system certificates on OS X would lead
to a certificate verify failure and that's it. We now print a warning
message with a URL that will contain information on how to configure
certificates on OS X.
As the inline comment states, there is room to improve here. I think
we could try harder to detect Homebrew and MacPorts installed
certificate files, for example. It's worth noting that Homebrew's
openssl package uses `security find-certificate -a -p` during package
installation to export the system keychain root CAs to
etc/openssl/cert.pem. This is something we could consider adding
to setup.py. We could also encourage packagers to do this. For now,
I'd just like to get this warning (which matches Windows behavior)
landed. We should have time to improve things before release.
I should have ran the entire test suite on Python 2.6. Since the
hostname matching tests are implemented in Python (not .t tests),
it didn't uncover this warning. I'm not sure why - warnings should
be printed regardless. This is possibly a bug in the test runner.
But that's for another day...
When reverting interactively, we always backup files before prompting the user
to find out if they actually want to revert them. This can create spurious
*.orig files if a user enters an interactive revert session and then doesn't
revert any files. Instead, we should only backup files that are actually being
touched.
Previously, ETag headers from hgweb weren't correctly formed, because rfc2616
(section 14, header definitions) requires double quotes around the content of
the header. str(web.mtime) didn't do that.
Additionally, strong ETags signify that the resource representations are
byte-for-byte identical. That is, they can be reconstructed from byte ranges if
client so wishes. Considering ETags for all hgweb pages is just mtime of
00changelog.i and doesn't consider of e.g. .hg/hgrc with description, contact
and other fields, it's clearly shouldn't be strong. The W/ prefix marks it as
weak, which still allows caching the whole served file/page, but doesn't allow
byte-range requests.
sslutil contains its own hostname matching logic. CPython has code
for the same intent. However, it is only available to Python 2.7.9+
(or distributions that have backported 2.7.9's ssl module
improvements).
This patch effectively imports CPython's hostname matching code
from its ssl.py into sslutil.py. The hostname matching code itself
is pretty similar. However, the DNS name matching code is much more
robust and spec conformant.
As the test changes show, this changes some behavior around
wildcard handling and IDNA matching. The new behavior allows
wildcards in the middle of words (e.g. 'f*.com' matches 'foo.com')
This is spec compliant according to RFC 6125 Section 6.5.3 item 3.
There is one test where the matcher is more strict. Before,
'*.a.com' matched '.a.com'. Now it doesn't match. Strictly speaking
this is a security vulnerability.
See the inline comment for what's going on here.
There is magic built into the "ssl" module that ships with modern
CPython that knows how to load the system CA certificates on
Windows. Since we're not shipping a CA bundle with Mercurial,
if we're running on legacy CPython there's nothing we can do
to load CAs on Windows, so it makes sense to print a warning.
I don't anticipate many people will see this warning because
the official (presumed popular) Mercurial distributions on
Windows bundle Python and should be distributing a modern Python
capable of loading system CA certs.
The "certifi" Python package provides a distribution of the
Mozilla trusted CA certificates as a Python package. If it is
present, we assume the user intends it to be used and we use
it to provide the default CA certificates when certificates
are otherwise not configured.
It's worth noting that this behavior roughly matches the popular
"requests" package, which also attempts to use "certifi" if
present.
Before, devel.disableloaddefaultcerts only impacted the loading of
default certs via SSLContext. After this patch, the config option also
prevents sslutil._defaultcacerts() from being called.
This config option is meant to be used by tests to force no CA certs
to be loaded. Future patches will enable _defaultcacerts() to have
success more often. Without this change we can't reliably test the
failure to load CA certs. (This patch also likely fixes test failures
on some OS X configurations.)
Future patches will change _defaultcacerts() to do something
on platforms that aren't OS X. Change the comment and logged
message to reflect the future.
This is a fix to an old problem when Mercurial got confused by an
untracked folder with the same name as one of the files in a commit
hg was trying to update to. It is pretty safe to remove this folder if
it is empty. Backing up an empty folder seems to go against Mercurial's
"don't track dirs" philosophy.
hgweb currently offers limited functionality for "classifying"
repositories. This patch aims to change that.
The web.labels config option list is introduced. Its values
are exposed to the "index" and "summary" templates. Custom
templates can use template features like ifcontains() to e.g.
look for the presence of a specific label and engage specific
behavior. For example, a site operator may wish to assign a
"defunct" label to a repository so the repository is prominently
marked as dead in repository indexes.
Now that we extend matches backwards in the inner loop, the final
adjustment has no effect.
(A similar extension for the forward direction is trickier and has
less benefit.)
For very large diffs that have large numbers of identical lines (JSON
dumps) that also have large blocks of identical text, bdiff could become
confused about which block matches which because it can only match
very limited regions. The result is very large diffs for small sets of edits.
The earlier recursion rebalancing fix made this behavior more frequent because
it's now more prone to match block 1 to block 2. One frequent user of
large JSON files reported being unable to pass the resulting diffs
through their code review system.
Prior to this change, bdiff would calculate the length of a match at
(i, j) as 1 + length found at (i-1, j-1). With large number of popular
(ignored) lines, this often meant matches couldn't be extended
backwards at all and thus all matching regions were very small.
Disabling the popularity threshold is not an option because it brings
back quadratic behavior.
Instead, we extend a match backwards until we either found a previously
discovered match or we find a mismatching line. This thus successfully
bridges over any popular lines inside and before a matching region.
The larger regions then significant reduce the probability of confusion.
Usually, the heads will have the same ordering in handlecheckheads. Insisting
on the same ordering is however an unnecessary constraint that in some custom
cases can cause pushes to fail even though the actual heads didn't change. This
caused production issues for us in combination with the current version of
https://bitbucket.org/Unity-Technologies/hgwebcachingproxy/ .
Stripping has only partly worked since f41815302d49 (repair: use cg3
for treemanifests, 2016-01-19): the bundle seems to have been created
correctly, but revlog entries in subdirectory revlogs were not
stripped. This meant that e.g. "hg verify" would fail after stripping
in a tree manifest repo.
To find the revisions to strip, we simply iterate over all directories
in the repo (included in store.datafiles()). This is inefficient for
stripping few commits, but efficient for stripping many commits. To
optimize for stripping few commits, we could instead walk the tree
from the root and find modified subdirectories, just like we do in the
changegroup code. I'm leaving that for another day.
This is to make them wrap-able. chgserver wants to know if an extension
accesses config or environment variables during uisetup and extsetup and
include them in confighash accordingly.
The httplib library is renamed to http.client in python 3. So the
import is conditionalized and a test is added in check-code to warn
to use util.httplib
If no CA certificates are loaded, that is almost certainly a/the
reason certificate verification fails when connecting to a server.
The modern ssl module in Python 2.7.9+ provides an API to access
the list of loaded CA certificates. This patch emits a warning
on modern Python when certificate verification fails and there are
no loaded CA certificates.
There is no way to detect the number of loaded CA certificates
unless the modern ssl module is present. Hence the differences
in test output depending on whether modern ssl is available.
It's worth noting that a test which specifies a CA file still
renders this warning. That is because the certificate it is loading
is a x509 client certificate and not a CA certificate. This
test could be updated if anyone is so inclined.
Before, we would call SSLContext.load_default_certs() when
certificate verification wasn't being used. Since
SSLContext.verify_mode == ssl.CERT_NONE, this would ideally
no-op. However, there is a slim chance the loading of system
certs could cause a failure. Furthermore, this behavior
interfered with a future patch that aims to provide a more
helpful error message when we're unable to load CAs.
The lack of test fallout is hopefully a sign that our
security code and tests are in a relatively good state.
Before, sslcontext.load_verify_locations() would raise a
ssl.SSLError which would be caught further up the stack and converted
to a urlerror. By that time, we lost track of what actually errored.
Trapping the error here gives users a slightly more actionable error
message.
The behavior between Python <2.7.9 and Python 2.7.9+ differs. This
is because our fake SSLContext class installed on <2.7.9 doesn't
actually do anything during load_verify_locations: it defers actions
until wrap_socket() time. Unfortunately, a number of errors can occur
at wrap_socket() time and we're unable to ascertain what the root
cause is. But that shouldn't stop us from providing better error
messages to people running a modern and secure Python version.
Before the error was caught at func() as an unknown identifier, and the
optimizer failed to detect the syntax error. This patch introduces getsymbol()
helper to ensure that a string is not allowed as a function name.
It was mixing tabs and spaces, and not in a good way.
Indent style of other atom entries seems to be 1 space per level, so let's
apply it here as well.
Since content is of type "text" (and is already escaped), using a CDATA section
is not required.
Looks like this was just an artifact of copying things from rss style in
529b23a26574, because other entries in atom style don't use CDATA in such
places.
It was mixing tabs and spaces, and not in a good way.
Indent style of other rss entries seems to be 4 spaces per level, so let's
apply it here as well.
The directory argument (for tree manifests) should belong to to the
--dir argument. I had mistakenly made --dir a flag. One effect of this
was that I had meant for "-m" to be optional, but instead it changed
the behavior of --dir, so with "hg debugdata -m --dir dir1 0", the -m
took over and the "dir1" got treated as a revision in the root
manifest log.
Before 'hg push -B .' on new remote head complained with:
abort: push creates new remote head ...
It was because _nowarnheads was not expanding active bookmark
name, so it didn't add active bookmark "proper" name to no
warn heads list.
When we remove a changeset from the changelog, the phase cache must be
invalidated, otherwise it could refer to changesets that are no longer in the
repo.
To reproduce the failure, I created an extension querying the phase cache after
the strip transaction is over.
To do that, I stripped two commits with a bookmark on one of them to force
another transaction (we open a transaction for moving bookmarks)
after the strip transaction.
Without the fix in this patch, the test leads to a stacktrace showing the issue:
repair.strip(ui, repo, revs, backup)
File "/Users/lcharignon/facebook-hg-rpms/hg-crew/mercurial/repair.py", line 205, in strip
tr.close()
File "/Users/lcharignon/facebook-hg-rpms/hg-crew/mercurial/transaction.py", line 44, in _active
return func(self, *args, **kwds)
File "/Users/lcharignon/facebook-hg-rpms/hg-crew/mercurial/transaction.py", line 490, in close
self._postclosecallback[cat](self)
File "$TESTTMP/crashstrip2.py", line 4, in test
[repo.changelog.node(r) for r in repo.revs("not public()")]
File "/Users/lcharignon/facebook-hg-rpms/hg-crew/mercurial/changelog.py", line 337, in node
return super(changelog, self).node(rev)
File "/Users/lcharignon/facebook-hg-rpms/hg-crew/mercurial/revlog.py", line 377, in node
return self.index[rev][7]
IndexError: revlog index out of range
The situation was encountered in inhibit (evolve's repo) where we would crash
following the volatile set invalidation submitted by Augie in
cbc52a99d057d11790cf5011e877c6f698bf57bf. Before his patch the issue was masked
as we were not accessing the phasecache after stripping a revision.
This bug uncovered another but in histedit (see explanation in issue5235).
I changed the histedit test accordingly to avoid fixing two things at once.
If you have just executable-bit change and amend it twice it will vanish:
* After the first amend the commit will have the proper executable bit set
in manifest but it won't have the the file on the list of files in
changelog.
* The second amend will read the wrong list of files from changelog and it
will copy the manifest entry from parent for this file.
* Voila! The change is lost.
This change repairs the bug in localrepo causing this and adds a test for it.
Before this patch, "hg help topic.section" might show unexpected
section of help topic in some encoding.
It applies str.lower() instead of encoding.lower(str) on translated
message to search section case-insensitively, but some encoding uses
0x41(A) - 0x5a(Z) as the second or later byte of multi-byte character
(for example, ja_JP.cp932), and str.lower() causes unexpected result.
To search section of help topic by translated section name correctly,
this patch replaces str.lower() by encoding.lower(str) for both query
string (in commands.help()) and translated help text (in
minirst.getsections()).
Before this patch, patch.filterpatch() shows meaningless translation
of help message for chunk selection in some encoding.
It applies str.lower() instead of encoding.lower(str) on translated
message, but some encoding uses 0x41(A) - 0x5a(Z) as the second or
later byte of multi-byte character (for example, ja_JP.cp932), and
str.lower() causes unexpected result.
To show lower-ed translated message correctly, this patch replaces
str.lower() by encoding.lower(str).
This corrects a warning from lintian that we're shipping an executable without
a man page. Since there is a doc string in the text, let's use that for the man
page.
The progress bar was being cleared on every write(), regardless of
whether it was currently displayed. This could foul up the display of
any writes that didn't include a linebreak.
In particular, the win32 mode of the color extension was turning
single prompt string writes into two writes, and the resulting
clear/write/clear/write pattern was making the prompt invisible.
We fix this by insisting that we have shown a progress bar and haven't
just cleared it (setting lastprint to 0).
Conveniently, the test suite already had instances of duplicate
clears.. that are now cleared up.
getbundle was requesting the "phase" namespace instead of the "phases"
namespace, which led to the client still requesting the phases
separately after getbundle finished.
In the 'hg pull --rebase', we don't want to pick a rebase destination unrelated
to the pull, we lay down basic infrastructure to allow such restriction on
stable (before 3.8 release) in this case. See issue 5214 for details.
Actual usage and test will be in the next patch.
This effectively backs out changeset 60b56b3206cc.
The http library behind ui.http2=true isn't specifying the hostname.
It is the day before the expected 3.8 release and we don't want to ship
a regression.
I'll try to restore this requirement in the 3.9 release cycle as part
of planned improvements to Mercurial's SSL/TLS interactions.
The atomictemp.close() file attempts to do a rename, which can fail.
Moving the close inside the exception handler fixes it.
This doesn't fit well with the with: pattern, as it's the finalizer
that's failing.
tryread() doesn't handle "is a directory" errors and presumably
others. We might not want to globally swallow such tryread errors, so
we replace with our own try/except handling.
An upcoming test will use directories as a portable stand-in for
various bizarre circumstances that cache read/write code should be
robust to.
Properly shell quote arguments, to avoid printing commands that won't work when
run literally. For example, a date string with timestamp needs to be quoted:
--date '1456953053 28800'
Initializing a subrepo when one doesn't exist is the right thing to do when the
parent is being updated, but in few other cases. Unfortunately, there isn't
enough context in the subrepo module to distinguish this case. This same issue
can be caused with other subrepo aware commands, so there is a general issue
here beyond the scope of this fix.
A simpler attempt I tried was to add an '_updating' boolean to localrepo, and
set/clear it around the call to mergemod.update() in hg.updaterepo(). That
mostly worked, but doesn't handle the case where archive will clone the subrepo
if it is missing. (I vaguely recall that there may be other commands that will
clone if needed like this, but certainly not all do. It seems both handy, and a
bit surprising for what should be a read only operation. It might be nice if
all commands did this consistently, but we probably need Angel's subrepo caching
first, to not make a mess of the working directory.)
I originally handled 'Exception' in order to pick up the Aborts raised in
subrepo.state(), but this turns out to be unnecessary because that is called
once and cached by ctx.sub() when iterating the subrepos.
It was suggested in the bug discussion to skip looking at the subrepo links
unless -S is specified. I don't really like that idea because missing a subrepo
or (less likely, but worse) a corrupt .hgsubstate is a problem of the parent
repo when checking out a revision. The -S option seems like a better fit for
functionality that would recurse into each subrepo and do a full verification.
Ultimately, the default value for 'allowcreate' should probably be flipped, but
since the default behavior was to allow creation, this is less risky for now.
LoadLibrary() changes behavior depending on whether the argument
passed to it contains a period. From the MSDN docs:
If no file name extension is specified in the lpFileName parameter,
the default library extension .dll is appended. However, the file name
string can include a trailing point character (.) to indicate that the
module name has no extension. When no path is specified, the function
searches for loaded modules whose base name matches the base name of
the module to be loaded. If the name matches, the load succeeds.
Otherwise, the function searches for the file.
As the subsequent patch will show, some environments on Windows
define their Python library as e.g. "libpython2.7.dll." The existing
code would pass "libpython2.7" into LoadLibrary(). It would assume
"7" was the file extension and look for a "libpython2.dll" to load.
By passing ".dll" into LoadLibrary(), we force it to search for the
exact basename we want, even if it contains a period.
The old "update across branches if no uncommitted changes" made
it sound like updating across branches (with no uncommitted changes)
was allowed only with this option, which was not true. Also, the option
did not care whether it was linear or across branches. Instead, it
checked that there were no uncommitted changes. Let's explain what it
does instead of trying to suggest what happens without it.
Update makedirs() to ignore EEXIST in case someone else has already created the
directory in question. Previously the ensuredirs() function existed, and was
nearly identical to makedirs() except that it fixed this race. Unfortunately
ensuredirs() was only used in 3 places, and most code uses the racy makedirs()
function. This fixes makedirs() to be non-racy, and replaces calls to
ensuredirs() with makedirs().
In particular, mercurial.scmutil.origpath() used the racy makedirs() code,
which could cause failures during "hg update" as it tried to create backup
directories.
This does slightly change the behavior of call sites using ensuredirs():
previously ensuredirs() would throw EEXIST if the path existed but was a
regular file instead of a directory. It did this by explicitly checking
os.path.isdir() after getting EEXIST. The makedirs() code did not do this and
swallowed all EEXIST errors. I kept the makedirs() behavior, since it seemed
preferable to avoid the extra stat call in the common case where this directory
already exists. If the path does happen to be a file, the caller will almost
certainly fail with an ENOTDIR error shortly afterwards anyway. I checked
the 3 existing call sites of ensuredirs(), and this seems to be the case for
them.
This causes the longest_match search to limit itself to a window of
30000 lines during search (roughly 1MB), thus avoiding a full O(N*M)
search that might occur in repetitive structured inputs. For a
particular class of many MB pathological test cases, this generated
the following timings:
size before after
10x 1.25s 1.24s
100x 57s 33s
1000x >8400s 400s
The times on the right quickly become much faster and appear more linear.
While windowing means deltas are no longer "optimal", the resulting
deltas were within a couple percent of expected size. While we've yet
to have a report of a file with the level of repetition necessary to
hit this case, some JSON/XML database dump scenario is fairly likely
to hit it.
This may also slightly improve the average-case performance for deltas
of large binaries.
For highly structured files like JSON or XML dumps with large numbers
of duplicate lines (eg braces) and isolated matching lines, bdiff
could find large numbers of equally good spans. Because it prefers
earlier matches, this would result in pathologically unbalance
recursion that resulted in quadratic performance.
This patch makes it prefer matches closer to the middle that tend to
balance recursion. This change improves the speed of a pathological
test case from 1100s to 9s.
Included is a smaller test that has a roughly 50x safety margin on the
performance it accepts. It's likely to fail on pure builds because
difflib also has a recursion-balancing problem.
The longest_match code compares all the possible positions in two
files to find the best match. Given a pair of sequences, it
effectively searches a grid like this:
a b b b c . d e . f
0 1 2 3 4 5 6 7 8 9
a 1 - - - - - - - - -
b - 2 1 1 - - - - - -
b - 1 3 2 - - - - - -
b - 1 2 4 - - - - - -
. - - - - - 1 - - 1 -
Here, the 4 in the middle says "the first four lines of the
file match", which it can compute be comparing the fourth lines and
then adding one to the result found when comparing the third lines in
the entry to the upper left.
We generally avoid the quadratic worst case by only looking at lines
that match, which is precomputed. We also avoid quadratic storage by
only keeping a single column vector and then keeping track of the best
match.
Unfortunately, this can get us into trouble with the sequences above.
Because we want to reuse the '3' value when calculating the '4', we
need to be careful not to overwrite it with the '2' we calculate
immediately before. If we scan left to right, top to bottom, we're
going to have a problem: we'll overwrite our 3 before we use it and
calculate a suboptimal best match.
To address this, we can either keep two column vectors and swap
between them (which significantly complicates bookkeeping), or change
our scanning order. If we instead scan from left to right, bottom to
top, we'll avoid ever overwriting values we'll need in the future.
This unfortunately needs several changes to be made simultaneously:
- change the order we build the initial hash chains for the b sequence
- change the sentinel values from INT_MAX to -1
- change the visit order in the longest_match inner loop
- add a tie-breaker preference for earlier matches
This last is needed because we previously had an implicit tie-breaker
from our visitation order that our test suite relies on. Later matches
can also trigger a bug in the normalization code in diff().
This bug is hidden by the current bias towards matches at the
beginning of the file. When this bias is tweaked later to address
recursion balancing, the normalization code could cause the next block
to shrink to a negative length, thus creating invalid delta chunks. We
add checks here to disallow that.
This bug requires test cases that are an awkwardly large size for the test
suite, but is very rapidly picked up by the included torture tester.
This just makes the code harder to read without any performance
advantage. We're going to make the check here more complex, let's make
it simpler first.
When displaying patches from graphical tools where you can browse through
individual files, with diff being called separately on each, recomputing the
limits of file copy history can become rather expensive on large repositories.
Instead, we can compute it once and pass it in for subsequent calls.
match() is the special case of a single element list being passed
to matchany() with the additional error checking that the revset
spec is defined. Change the implementation to remove the redundant
code and have match() call matchany().
I can never remember the differences between the various revset
APIs. I can never remember that scmutil.revrange() is the one I
want to use from user-facing commands.
Add some documentation to clarify this.
While we're here, the argument name for revrange() is changed to
"specs" because that's what it actually is.
Other parts of this expression are already using unicode literals.
We need this to make Python 3 happy and to avoid an implicit
conversion in Python 2.
Now that we have a mechanism for declaring path sub-options, we can
start to pile on features!
Many power users have expressed frustration that bare `hg push`
attempts to push all local revisions to the remote. This patch
introduces the "pushrev" path sub-option to control which revisions
are pushed when no "-r" argument is specified.
The value of this sub-option is a revset, naturally.
A future feature addition could potentially introduce a "pushnames"
sub-options that declares the list of names (branches, bookmarks,
topics, etc) to push by default. The entire "what to push by default"
feature should probably be considered before this patch lands.
As part of developing a subsequent patch I discovered that sub-option
values like "." were getting converted to paths. This is because the
[paths] section is treated specially during config loading.
This patch prevents post-processing sub-options from the [paths]
section.
Previously, when we connected to a server and were unable to verify
its certificate against a trusted certificate authority we would
issue a warning and continue to connect. This is obviously not
great behavior because the x509 certificate model is based upon
trust of specific CAs. Failure to enforce that trust erodes security.
This behavior was defined several years ago when Python did not
support loading the system trusted CA store (Python 2.7.9's
backports of Python 3's improvements to the "ssl" module enabled
this).
This commit changes behavior when connecting to abort if the peer
certificate can't be validated. With an empty/default Mercurial
configuration, the peer certificate can be validated if Python is
able to load the system trusted CA store. Environments able to load
the system trusted CA store include:
* Python 2.7.9+ on most platforms and installations
* Python 2.7 distributions with a modern ssl module (e.g. RHEL7's
patched 2.7.5 package)
* Python shipped on OS X
Environments unable to load the system trusted CA store include:
* Python 2.6
* Python 2.7 on many existing Linux installs (because they don't
ship 2.7.9+ or haven't backported modern ssl module)
* Python 2.7.9+ on some installs where Python is unable to locate
the system CA store (this is hopefully rare)
Users of these Pythongs will need to configure Mercurial to load the
system CA store using web.cacerts. This should ideally be performed
by packagers (by setting web.cacerts in the global/system hgrc file).
Where Mercurial packagers aren't setting this, the linked URL in the
new abort message can contain instructions for users.
In the future, we may want to add more code for finding the system
CA store. For example, many Linux distributions have the CA store
at well-known locations (such as /etc/ssl/certs/ca-certificates.crt
in the case of Ubuntu). This will enable CA loading to "just work"
on more Python configurations and will be best for our users since
they won't have to change anything after upgrading to a Mercurial
with this patch.
We may also want to consider distributing a trusted CA store with
Mercurial. Although we should think long and hard about that because
most systems have a global CA store and Mercurial should almost
certainly use the same store used by everything else on the system.
The ordering of 'x & head()' was broken in 329d82866742 (revset:
improve head revset performance, 2014-03-13). Presumably due to other
optimizations since then, undoing that change to fix the order does
not slow down the simple case of "hg log -r 'head()'" mentioned in
that commit. I see a small slowdown from ~0.16s to about ~0.19s with
'not 0 & head()', but I'd say it's worth it for the correct output.
Before this patch, "missing _() in ui message" rule overlooks
translatable message, which starts with other than alphabet.
To detect "missing _() in ui message" more exactly, this patch
improves the regexp with assumptions below.
- sequence consisting of below might precede "translatable message"
in same string token
- formatting string, which starts with '%'
- escaped character, which starts with 'b' (as replacement of '\\'), or
- characters other than '%', 'b' and 'x' (as replacement of alphabet)
- any string tokens might precede a string token, which contains
"translatable message"
This patch builds an input file, which is used to examine "missing _()
in ui message" detection, before '"$check_code" stringjoin.py' in
test-contrib-check-code.t, because this reduces amount of change churn
in subsequent patch.
This patch also applies "()" instead of "_()" on messages below to
hide false-positives:
- messages for ui.debug() or debug commands/tools
- contrib/debugshell.py
- hgext/win32mbcs.py (ui.write() is used, though)
- mercurial/commands.py
- _debugchangegroup
- debugindex
- debuglocks
- debugrevlog
- debugrevspec
- debugtemplate
- untranslatable messages
- doc/gendoc.py (ReST specific text)
- hgext/hgk.py (permission string)
- hgext/keyword.py (text written into configuration file)
- mercurial/cmdutil.py (formatting strings for JSON)
Before fd1bb7c, if the C index.partialmatch raises RevlogError, the Python
code raises "ambiguous identifier" error immediately, which is efficient.
fd1bb7c took hidden revisions into consideration and forced the slow path
enumerating the changelog to double-check hidden revisions. But it's not
necessary if we know the revlog has no hidden revisions.
This patch adds back the fast path for unfiltered revlogs.
I.e. when a revision blames a block of source lines, only display the
revision link on the first line of the block (this is identified by the
"blockhead" key in annotate context).
This addresses item "Visual grouping of changesets" of the blame improvements
plan (https://www.mercurial-scm.org/wiki/BlamePlan) which states: "Typically
there are block of lines all attributed to the same revision. Instead of
rendering the revision/changeset for every line, we could only render it once
per block."
* Distinguish the /annotate/<revision>/<file>#<linenumber> link when it would
lead to the current page (i.e. <revision> is the current revision) (style it
gray and undecorated). This indicates more clearly that this is a "dead-end"
in blame navigation.
* Display lines changed in current revision in green.
Change summary webcommand to yield each element of the shortlog instead of the
entire list.
This makes generated json more readable since each entry can be formatted
separately, instead of returning all the shortlog content in a single string.
Modify changelistentry structure to also deliver phase and branch data and use
either 'parents' or 'allparents' depending on what is defined in the view, in
order to reuse it in filelog structure.
Before this commit url.opener overwritten stored password
for connection with given url/user even when
new password for given connection was not filled. This
commit makes opener overwrites saved authentication only
when it contains password.
This makes http password database stored in ui object.
It allows reusing authentication information when we
use this database for creating password manager for
the new connection.
So far password manager was keeping authentication information so opening
new connection and creating new password manager made all saved authentication
information lost.
This commit separates password manager and password database to make it
possible to reuse saved authentication information.
This commit violates code checker because it adds add_password method (name
with underscore) to passwordmgr object to provide method required by urllib2.
Before this patch, "from a import b" doesn't delay loading module "b",
if absolute_import is enabled, even though "from . import b" does.
For example:
- it is assumed that extension X has "from P import M" for module M
under package P with absolute_import feature
- if importing module M is already delayed before loading extension
X, loading module M in extension X is delayed until actually
referring
util, cmdutil, scmutil or so of Mercurial itself should be
imported by "from . import M" style before loading extension X
- otherwise, module M is loaded immediately at loading extension X,
even if extension X itself isn't used at that "hg" command invocation
Some minor modules (e.g. filemerge or so) of Mercurial itself
aren't imported by "from . import M" style before loading
extension X. And of course, external libraries aren't, too.
This might cause startup performance problem of hg command, because
many bundled extensions already enable absolute_import feature.
To delay loading module for "from a import b" with absolute_import
feature, this patch does below in "from a (or .a) import b" with
absolute_import case:
1. import root module of "name" by system built-in __import__
(referred as _origimport)
2. recurse down the module chain for hierarchical "name"
This logic can be shared with non absolute_import
case. Therefore, this patch also centralizes it into chainmodules().
3. and fall through to process elements in "fromlist" for the leaf
module of "name"
Processing elements in "fromlist" is executed in the code path
after "if _pypy: .... else: ..." clause. Therefore, this patch
replaces "if _pypy:" with "elif _pypy:" to share it.
At faecf59a4184 introducing original "work around" for "from a import
b" case, elements in "fromlist" were imported with "level=level". But
"level" might be grater than 1 (e.g. level=2 in "from .. import b"
case) at demandimport() invocation, and importing direct sub-module in
"fromlist" with level grater than 1 causes unexpected result.
IMHO, this seems main reason of "errors for unknown reason" described
in faecf59a4184, and we don't have to worry about it, because this
issue was already fixed by 2711f50242cf.
This is reason why this patch removes "errors for unknown reasons"
comment.
To make it easier to patch the wrapped function, make it possible to access the
filecache descriptor directly on the class (rather than have to use
ClassObject.__dict__['attributename']). Returning `self` when the first
argument to `__get__` is `None` makes the descriptor behave the same way
`property` objects do.
When grafting/rebasing, it is common for multiple changesets to make
the same change to a subdirectory. When writing the revlog for the
directory, the revlog code already takes care of not writing the entry
again. In 3eb9fa4180d3 (changegroup: prune subdirectory dirlogs too,
2016-02-12), I added the corresponding code in changegroup (not
sending entries the client already has), but I forgot to avoid sending
the entire changegroup if no nodes remained in the pruned
set. Although that's harmless besides the wasted network traffic, the
receiving side was checking for it (copied from the changegroup code
for handling files). This resulted in the client crashing with:
abort: received dir revlog group is empty
Fix by simply not emitting a changegroup for the directory if there
were no changes is it. This matches how files are handled.
This will allow us to clear in-memory password storage per runcommand().
I've updated commandserver to call resetstate() of both ui and repo.ui because
they may have different states in theory.
Make it noop as before ddf6bfe09ab2. We could change it to an error, but
allowing empty key makes some sense for scripting that builds a key string
programmatically.
In some cases below, copying from backup is used to restore original
contents of a file, which is backuped via addfilegenerator(). If
copying keeps ctime, mtime and size of a file, restoring is
overlooked, and old contents cached before restoring isn't invalidated
as expected.
- failure of transaction (from '.hg/journal.backup.*')
- rollback of previous transaction (from '.hg/undo.backup.*')
To avoid ambiguity of file stat at restoring, this patch invokes
util.copyfile() with checkambig=True.
This patch is a part of "Exact Cache Validation Plan":
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
Rollback of previous transaction restores contents of files below by
renaming from 'undo.*' file. If renaming keeps ctime, mtime and size
of a file, restoring is overlooked, and old contents cached before
restoring isn't invalidated as expected.
- .hg/bookmarks
- .hg/phaseroots
To avoid ambiguity of file stat at restoring, this patch invokes
vfs.rename() with checkambig=True.
BTW, .hg/dirstate is also restored at rollback. But it is restored by
dirstate.restorebackup(), and previous patch already made it invoke
vfs.rename() with checkambig=True.
This patch is a part of "Exact Cache Validation Plan":
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
File .hg/dirstate is restored by renaming from backup in failure
inside scopes below. If renaming keeps ctime, mtime and size of a
file, restoring is overlooked, and old contents cached before
restoring isn't invalidated as expected.
- dirstateguard scope (from '.hg/dirstate.SUFFIX')
- transaction scope (from '.hg/journal.dirstate')
To avoid ambiguity of file stat at restoring, this patch invokes
vfs.rename() with checkambig=True.
This patch is a part of "Exact Cache Validation Plan":
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
Sort revisions in reverse revision order but grouped by topographical branches.
Visualised as a graph, instead of:
o 4
|
| o 3
| |
| o 2
| |
o | 1
|/
o 0
revisions on a 'main' branch are emitted before 'side' branches:
o 4
|
o 1
|
| o 3
| |
| o 2
|/
o 0
where what constitutes a 'main' branch is configurable, so the sort could also
result in:
o 3
|
o 2
|
| o 4
| |
| o 1
|/
o 0
This sort was already available as an experimental option in the graphmod
module, from which it is now removed.
This sort is best used with hg log -G:
$ hg log -G "sort(all(), topo)"
This used to be needed to paper over hashlib not being in all Pythons
we support, but that's not a problem anymore, so we can simplify
things a little bit.
All versions of Python we support or hope to support make the hash
functions available in the same way under the same name, so we may as
well drop the util forwards.
Recent work has introduced the [hostsecurity] config section for
defining per-host security settings. This patch builds on top
of this foundation and implements the ability to define a per-host
path to a file containing certificates used for verifying the server
certificate. It is logically a per-host web.cacerts setting.
This patch also introduces a warning when both per-host
certificates and fingerprints are defined. These are mutually
exclusive for host verification and I think the user should be
alerted when security settings are ambiguous because, well,
security is important.
Tests validating the new behavior have been added.
I decided against putting "ca" in the option name because a
non-CA certificate can be specified and used to validate the server
certificate (commonly this will be the exact public certificate
used by the server). It's worth noting that the underlying
Python API used is load_verify_locations(cafile=X) and it calls
into OpenSSL's SSL_CTX_load_verify_locations(). Even OpenSSL's
documentation seems to omit that the file can contain a non-CA
certificate if it matches the server's certificate exactly. I
thought a CA certificate was a special kind of x509 certificate.
Perhaps I'm wrong and any x509 certificate can be used as a
CA certificate [as far as OpenSSL is concerned]. In any case,
I thought it best to drop "ca" from the name because this reflects
reality.
Followup 814eb5a11da4 to provide complete context for proper localization.
Also update cmdutil.recordfilter docstring to remove recommendation that
"operation" argument should be translated. Indeed, for record/revert, we
either go to patch.filterpatch or crecord.filterpatch (in curses mode) ; the
former now build the full ui message from the operation parameter and the
latter does not use this parameter (removing in a followup patch). For shelve,
operation is not specified and this thus falls back to "record".
The cPickle is renamed to _pickle in python3 and this C extension is available
in pickle which was not included in earlier versions. So imports are conditionalized
to import cPickle in py2 and pickle in py3. Moreover the use of pickle in py2 is
switched to cPickle as the C extension is faster. The hack is added in util.py and
the modules import util.pickle
This fix allows __nonzero__ to respect the direction of iteration of the
whole filteredset. Here's the case when it matters. Imagine that we have a
very large repository and we want to execute a command like:
$ hg log --rev '(tip:0) and user(ikostia)' --limit 1
(we want to get the latest commit by me).
Mercurial will evaluate a filteredset lazy data structure, an
instance of the filteredset class, which will know that it has to iterate
in a descending order (isdescending() will return True if called). This
means that when some code iterates over the instance of this filteredset,
the 'and user(ikostia)' condition will be first checked on the latest
revision, then on the second latest and so on, allowing Mercurial to
print matches as it founds them. However, cmdutil.getgraphlogrevs
contains the following code:
revs = _logrevs(repo, opts)
if not revs:
return revset.baseset(), None, None
The "not revs" expression is evaluated by calling filteredset.__nonzero__,
which in its current implementation will try to iterate the filteredset
in ascending order until it finds a revision that matches the 'and user(..'
condition. If the condition is only true on late revisions, a lot of
useless iterations will be done. These iterations could be avoided if
__nonzero__ followed the order of the filteredset, which in my opinion
is a sensible thing to do here.
The problem gets even worse when instead of 'user(ikostia)' some more
expensive check is performed, like grepping the commit diff.
I tested this fix on a very large repo where tip is my commit and my very
first commit comes fairly late in the revision history. Results of timing
of the above command on that very large repo.
-with my fix:
real 0m1.795s
user 0m1.657s
sys 0m0.135s
-without my fix:
real 1m29.245s
user 1m28.223s
sys 0m0.929s
I understand that this is a very specific kind of problem that presents
itself very rarely, only on very big repositories and with expensive
checks and so on. But I don't see any disadvantages to this kind of fix
either.
Cached attribute repo._phasecache uses stat of '.hg/phaseroots' file
to examine validity of cached contents. If writing '.hg/phaseroots'
file out keeps ctime, mtime and size of it, change is overlooked, and
old contents cached before change isn't invalidated as expected.
To avoid ambiguity of file stat, this patch writes '.hg/phaseroots'
file out with checkambig=True.
This patch is a part of "Exact Cache Validation Plan":
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
Cached attribute dirstate._branch uses stat of '.hg/branch' file to
examine validity of cached contents. If writing '.hg/branch' file out
keeps ctime, mtime and size of it, change is overlooked, and old
contents cached before change isn't invalidated as expected.
To avoid ambiguity of file stat, this patch writes '.hg/branch' file
out with checkambig=True.
This patch is a part of "Exact Cache Validation Plan":
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
Cached attribute repo.dirstate uses stat of '.hg/dirstate' file to
examine validity of cached contents. If writing '.hg/dirstate' file
out keeps ctime, mtime and size of it, change is overlooked, and old
contents cached before change isn't invalidated as expected.
To avoid ambiguity of file stat, this patch writes '.hg/dirstate' file
out with checkambig=True.
The former diff hunk changes the code path for "dirstate.write()", and
the latter changes the code path for "dirstate.savebackup()".
This patch is a part of "Exact Cache Validation Plan":
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
Cached attribute repo._bookmarks uses stat of '.hg/bookmarks' and
'.hg/bookmarks.current' files to examine validity of cached
contents. If writing these files out keeps ctime, mtime and size of
them, change is overlooked, and old contents cached before change
isn't invalidated as expected.
To avoid ambiguity of file stat, this patch writes '.hg/bookmarks' and
'.hg/bookmarks.current' files out with checkambig=True.
This patch is a part of "Exact Cache Validation Plan":
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
Files below, which might be changed at closing transaction, are used
to examine validity of cached properties. If changing keeps ctime,
mtime and size of a file, change is overlooked, and old contents
cached before change isn't invalidated as expected.
- .hg/bookmarks
- .hg/dirstate
- .hg/phaseroots
To avoid ambiguity of file stat, this patch writes files out with
checkambig=True at closing transaction.
checkambig becomes True only at closing (= 'not suffix'), because stat
information of '.pending' file isn't used to examine validity of
cached properties.
This patch is a part of "Exact Cache Validation Plan":
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
Before, we would always print the unprefixed SHA-1 fingerprint when
fingerprint comparison failed. Now, we print the fingerprint of the
last hash used, including the prefix if necessary. This helps ensure
that the printed hash type matches what is in the user configuration.
There are still some cases where this can print a mismatched hash type.
e.g. if there are both SHA-1 and SHA-256 fingerprints in the config,
we could print a SHA-1 hash if it comes after the SHA-256 hash. But
I'm inclined to ignore this edge case.
While I was here, the "section" variable assignment has been moved to
just above where it is used because it is now only needed for this
error message and it makes the code easier to read.
The previous warning and abort messages were difficult to understand.
This patch makes them slightly better.
I think there is still room to tweak the messaging. And as we adopt
new security defaults, these messages will certainly change again.
But at least this takes us a step in the right direction.
References to "section" have been removed because if no fingerprint
is defined, "section" can never be "hostfingerprints." So just print
"hostsecurity" every time.
We didn't need to use a temporary variable to indicate success because
we just return anyway.
This refactor makes the code simpler. While we're here, we also call
into formatfingerprint() to ensure the fingerprint from the proper
hashing algorithm is logged.
The world is starting to move on from SHA-1. A few commits ago, we
gained the ability to define certificate fingerprints using SHA-256
and SHA-512.
Let's start printing the SHA-256 fingerprint instead of the SHA-1
fingerprint to encourage people to pin with a more secure hashing
algorithm.
There is still a bit of work to be done around the fingerprint
messaging. This will be addressed in subsequent commits.
A short time ago, validatesocket() didn't know the reasons why
cert verification was disabled. Multiple code paths could lead
to cert verification being disabled. e.g. --insecure and lack
of loaded CAs.
With the recent refactorings to sslutil.py, we now know the reasons
behind security settings. This means we can recognize when the user
requested security be disabled (as opposed to being unable to provide
certificate verification due to lack of CAs).
This patch moves the check for certificate verification being disabled
and changes the wording to distinguish it from other states. The
warning message is purposefully more dangerous sounding in order
to help discourage people from disabling security outright.
We may want to add a URL or hint to this message. I'm going to wait
until additional changes to security defaults before committing to
something.
There are various tests for behavior when CA certs aren't loaded.
Previously, we would pass --insecure to disable loading of CA
certs. This has worked up to this point because the error message
for --insecure and no CAs loaded is the same. Upcoming commits will
change the error message for --insecure and will change behavior
when CAs aren't loaded.
This commit introduces the ability to disable loading of CA certs
by setting devel.disableloaddefaultcerts. This allows a testing
backdoor to disable loading of CA certs even if system/default
CA certs are available. The flag is purposefully not exposed to
end-users because there should not be a need for this in the wild:
certificate pinning and --insecure provide workarounds to disable
cert loading/validation.
Tests have been updated to use the new method. The variable used
to disable CA certs has been renamed because the method is not
OS X specific.
This patch effectively moves the ui.insecureconnections check to
_hostsettings(). After this patch, validatesocket() no longer uses the
ui instance for anything except writing messages.
This patch also enables us to introduce a per-host config option
for disabling certificate verification.
smtp.verifycert was accidentally broken by 799db3fe9866. And,
I believe the "loose" value has been broken for longer than that.
The current code refuses to talk to a remote server unless the
CA is trusted or the fingerprint is validated. In other words,
we lost the ability for smtp.verifycert to lower/disable security.
There are special considerations for smtp.verifycert in
sslutil.validatesocket() (the "strict" argument). This violates
the direction sslutil is evolving towards, which has all security
options determined at wrapsocket() time and a unified code path and
configs for determining security options.
Since smtp.verifycert is broken and since we'll soon have new
security defaults and new mechanisms for controlling host security,
this patch formally deprecates smtp.verifycert. With this patch,
the socket security code in mail.py now effectively mirrors code
in url.py and other places we're doing socket security.
For the record, removing smtp.verifycert because it was accidentally
broken is a poor excuse to remove it. However, I would have done this
anyway because smtp.verifycert is a one-off likely used by few people
(users of the patchbomb extension) and I don't think the existence
of this seldom-used one-off in security code can be justified,
especially when you consider that better mechanisms are right around
the corner.
Before this commit bare update --clean on newly created branch
updates to the parent commit, even if there are later commits
on the parent commit's branch. Update to the latest head on the
parent commit's branch instead.
This seems reasonable as clean should discard uncommited changes,
branch is one of them.
Instead of "record this change to 'FILE'?" now prompt with:
* "discard this change to 'FILE'?" when reverting to the parent of working
directory, and,
* "revert this change to 'FILE'?" otherwise.
Those assertions will prevent the backup functions from overwriting
the dirstate file in case both: suffix and prefix are empty.
(foozy suggested making that change and I agree with him)
The issue with using actualfilename is that dirstate saved during transaction
with "pending" in filename will be impossible to recover from outside of the
transaction because the recover method will be looking for the name without
"pending".
Error messages reference the config section defining the host
fingerprint. Now that we have multiple sections where this config
setting could live, we need to point the user at the appropriate
one.
We default to the new "hostsecurity" section. But we will still
refer them to the "hostfingerprint" section if a value is defined
there.
There are some corner cases where the messaging might be off. e.g.
they could define a SHA-1 fingerprint in both sections. IMO the
messaging needs a massive overhaul. I plan to do this as part
of future refactoring to security settings.
We introduce the [hostsecurity] config section. It holds per-host
security settings.
Currently, the section only contains a "fingerprints" option,
which behaves like [hostfingerprints] but supports specifying the
hashing algorithm.
There is still some follow-up work, such as changing some error
messages.
Currently, we only support defining host fingerprints with SHA-1.
A future patch will introduce support for defining fingerprints
using other hashing algorithms. In preparation for that, we
rewrite the fingerprint verification code to support multiple
fingerprints, namely SHA-256 and SHA-512 fingerprints.
We still only display the SHA-1 fingerprint. We'll have to revisit
this code once we support defining fingerprints with other hash
functions.
As part of this, I snuck in a change to use range() instead of
xrange() because xrange() isn't necessary for such small values.
Upcoming patches will teach host fingerprint checking to verify
non-SHA1 fingerprints.
Many x509 certificates these days are SHA-256. And modern browsers
often display the SHA-256 fingerprint for certificates. Since
SHA-256 fingerprints are highly visible and easy to obtain, we
want to support them for fingerprint pinning. So add SHA-256
support to util.
I did not add SHA-256 to DIGESTS and DIGESTS_BY_STRENGTH because
this will advertise the algorithm on the wire protocol. I wasn't
sure if that would be appropriate. I'm playing it safe by leaving
it out for now.
The CA file processing code has been moved from _determinecertoptions
into _hostsettings(). As part of the move, the logic has been changed
slightly and the "cacerts" variable has been renamed to "cafile" to
match the argument used by SSLContext.load_verify_locations().
Since _determinecertoptions() no longer contains any meaningful
code, it has been removed.
_determinecertoptions() and _hostsettings() are redundant with each
other. _hostsettings() is used the flexible API we want.
We start the process of removing _determinecertoptions() by moving
some of the logic for the verify_mode value into _hostsettings().
As part of this, _determinecertoptions() now takes a settings dict
as its argument. This is technically API incompatible. But since
_determinecertoptions() came into existence a few days ago as part
of this release, I'm not flagging it as such.
This patch marks the beginning of a series that introduces a new,
more configurable, per-host security settings mechanism. Currently,
we have global settings (like web.cacerts and the --insecure argument).
We also have per-host settings via [hostfingerprints].
Global security settings are good for defaults, but they don't
provide the amount of control often wanted. For example, an
organization may want to require a particular CA is used for a
particular hostname.
[hostfingerprints] is nice. But it currently assumes SHA-1.
Furthermore, there is no obvious place to put additional per-host
settings.
Subsequent patches will be introducing new mechanisms for defining
security settings, some on a per-host basis. This commits starts
the transition to that world by introducing the _hostsettings
function. It takes a ui and hostname and returns a dict of security
settings. Currently, it limits itself to returning host fingerprint
info.
We foreshadow the future support of non-SHA1 hashing algorithms
for verifying the host fingerprint by making the "certfingerprints"
key a list of tuples instead of a list of hashes.
We add this dict to the hgstate property on the socket and use it
during socket validation for checking fingerprints. There should be
no change in behavior.
As the previous commit documented, sslkwargs() doesn't add any
value since its return is treated as a black box and proxied
to wrapsocket().
We formalize its uselessness by moving its logic into a
new, internal function and make sslkwargs() return an empty
dict.
The certificate arguments that sslkwargs specified have been
removed from wrapsocket() because they should no longer be
set.
Arguments to sslutil.wrapsocket() are partially determined by
calling sslutil.sslkwargs(). This function receives a ui and
a hostname and determines what settings, if any, need to be
applied when the socket is wrapped.
Both the ui and hostname are passed into wrapsocket(). The
other arguments to wrapsocket() provided by sslkwargs() (ca_certs
and cert_reqs) are not looked at or modified anywhere outside
of sslutil.py. So, sslkwargs() doesn't need to exist as a
separate public API called before wrapsocket().
This commit starts the process of removing external consumers of
sslkwargs() by removing the "ui" key/argument from its return.
All callers now pass the ui argument explicitly.
As the copymap is short-lived object regenerated from dirstate on each
read this didn't affect us in any serious way. But since I've started working
on permanent storage of copymap in my experiments with sqldirstate[1] I've seen
this bug leaving the copy information in copymap after reverting the file
moves and copies.
[1] https://www.mercurial-scm.org/wiki/SQLDirstatePlan
These messages have been overlooked by check-code, because they start
with non-alphabet character ('%' or '(').
Making these messages translatable seems reasonable, because messages
for ui.note(), ui.status(), ui.progress() and descriptive messages for
ui.write() in "debug" commands are already translatable in many cases.
This is also a part of preparation for making "missing _() in ui
message" detection of check-code more exact.
This message has been overlooked by check-code, because it starts with
non-alphabet character (' ').
This is also a part of preparation for making "missing _() in ui
message" detection of check-code more exact.
This message has been overlooked by check-code, because it starts with
non-alphabet character ('%').
This is also a part of preparation for making "missing _() in ui
message" detection of check-code more exact.
These messages have been overlooked by check-code, because they start
with non-alphabet character (' ').
Making these messages translatable seems reasonable, because all other
'ui.note()'-ed messages in calculateupdates() are already
translatable.
This is also a part of preparation for making "missing _() in ui
message" detection of check-code more exact.
This message has been overlooked by check-code, because it starts with
non-alphabet character ('(').
Making this message translatable seems reasonable, because exception
message below in same function is already translatable
- 'cannot create new http repository'
This is also a part of preparation for making "missing _() in ui
message" detection of check-code more exact.
This makes it possible to use keyword arguments to specify per-sort options.
For example, a hypothetical 'first' option for the user sort could sort certain
users first with:
sort(all(), user, user.first=mpm@selenic.com)
The current implementation of narrowhg needs to influence the order in
which nodes are sent to the client. adgar@ and I think this is
fixable, but it's going to require pretty substantial time investment,
so in the interim we'd like to extract this method.
I think it makes the group() code a little more obvious, as it took us
a couple of tries to isolate the exact behavior we were observing.
We are doing this check in both wrapsocket() and validatesocket().
The check was added to the validator in 8f98f4f9ff93 and the commit
message justifies the redundancy with a "might." The check in
wrapsocket() was added in 102733a3c3e1, which appears to be part of
the same series. I'm going to argue the redundancy isn't needed.
I choose to keep the check in wrapsocket() because it is working
around a bug in Python's wrap_socket() and I feel the check for
the bug should live next to the function call exhibiting the bug.
Now that the socket validator doesn't have any instance state,
we can make it a generic function.
The "validator" class has been converted into the "validatesocket"
function and all consumers have been updated.
Currently, we pass a hostname and ui to sslutil.wrap_socket()
then create a separate sslutil.validator instance also from
a hostname and ui. There is a 1:1 mapping between a wrapped
socket and a validator instance. This commit lays the groundwork
for making the validation function generic by storing the
hostname and ui instance in the state dict attached to the
socket instance and then using these variables in the
validator function.
Since the arguments to sslutil.validator.__init__ are no longer
used, we make them optional and make __init__ a no-op.
All callers now specify it. So we can require it.
Requiring the argument means SNI will always work if supported
by Python.
The main reason for this change is to store state on the socket
instance to make the validation function generic. This will be
evident in subsequent commits.
We used len(text.splitlines()) to count lines. This allocates, copies, and
deallocates an object for every line in a file. Instead, we use
count("\n") to count newlines and adjust based on whether there's a
trailing newline.
This improves the speed of annotating localrepo.py from 4.2 to 4.0
seconds.
In some cases below, copying from backup is used to restore original
contents of a file. If copying keeps ctime, mtime and size of a file,
restoring is overlooked, and old contents cached before restoring
isn't invalidated as expected.
- failure of transaction before closing (from '.hg/journal.backup.*')
- rollback of previous transaction (from '.hg/undo.backup.*')
To avoid such problem, this patch makes copyfile() avoid ambiguity of
file stat, if needed.
Ambiguity check is executed, only if:
- checkambig=True is specified (not all copying needs ambiguity check), and
- destination file exists before copying
This patch also adds 'not (copystat and checkambig)' assertion,
because combination of copystat and checkambig is meaningless.
This patch is a part of preparation for "Exact Cache Validation Plan":
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
In some cases below, renaming from backup is used to restore original
contents of a file. If renaming keeps ctime, mtime and size of a file,
restoring is overlooked, and old contents cached before restoring
isn't invalidated as expected.
- failure of transaction before closing (only from '.hg/journal.dirstate')
- rollback of previous transaction (from '.hg/undo.*')
- failure in dirstateguard scope (from '.hg/dirstate.SUFFIX')
To avoid such problem, this patch makes vfs.rename() avoid ambiguity
of file stat, if needed.
Ambiguity check is executed, only if:
- checkambig=True is specified (not all renaming needs ambiguity check), and
- destination file exists before renaming
This patch is a part of preparation for "Exact Cache Validation Plan":
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
Ambiguity check is executed at close(), only if:
- atomictempfile is created with checkambig=True, and
- target file exists before renaming
This restriction avoids performance decrement by needless examination
of file stat (for example, filelog doesn't need exact cache
validation, even though it uses atomictempfile to write changes out).
See description of filestat class for detail about why the logic in
this patch works as expected.
This patch is a part of preparation for "Exact Cache Validation Plan":
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
Current posix.cachestat implementation might overlook change of a
file, if changing keeps ctime, mtime and size of file. Comparison of
inode number also overlooks changing in such situation, because inode
number is rapidly reused.
Contents of a file cached before changing isn't invalidated as
expected, if change of a file is overlooked for this "ambiguity" of
file stat.
This patch adds filestat class to detect ambiguity of file stat.
This patch is a part of preparation for "Exact Cache Validation Plan":
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
This is one step towards having dirstate manage its own storage. It will
be useful for the implementation of sql dirstate [1].
This introduced a small test change: now we always write the dirstate before
saving backup so in some cases where dirstate file didn't exist yet
savebackup can create it.
[1] https://www.mercurial-scm.org/wiki/SQLDirstatePlan
This is one step towards having dirstate manage its own storage. It will
be useful for the implementation of sqldirstate [1].
I'm deleting two of the dirstate.invalidate() calls in localrepo because
restorebackup method does that for us.
[1] https://www.mercurial-scm.org/wiki/SQLDirstatePlan
This would allow the code explicitly copying dirstate to use this method instead.
Use of this method will increase encapsulation (the dirstate class will be sole
owner of its on-disk storage).