The initial attempt was to discard cache when appropriate, but it appears
to be error prone. We had to carefully inspect all places where audit() is
called e.g. without actually updating filesystem, before removing files and
directories, etc.
So, this patch disables the cache of audited paths by default, and enables
it only for the following cases:
- short-lived auditor objects
- repo.vfs, repo.svfs, and repo.cachevfs, which are managed directories
and considered sort of append-only (a file/directory would never be
replaced with a symlink)
There would be more cacheable vfs objects (e.g. mq.queue.opener), but I
decided not to inspect all of them in this patch. We can make them cached
later.
Benchmark result:
- using old clone of http://selenic.com/repo/linux-2.6/ (38319 files)
- on tmpfs
- run HGRCPATH=/dev/null hg up -q --time tip && hg up -q null
- try 4 times and take the last three results
original:
real 7.480 secs (user 1.140+22.760 sys 0.150+1.690)
real 8.010 secs (user 1.070+22.280 sys 0.170+2.120)
real 7.470 secs (user 1.120+22.390 sys 0.120+1.910)
clearcache (the other series):
real 7.680 secs (user 1.120+23.420 sys 0.140+1.970)
real 7.670 secs (user 1.110+23.620 sys 0.130+1.810)
real 7.740 secs (user 1.090+23.510 sys 0.160+1.940)
enable cache only for vfs and svfs (this series):
real 8.730 secs (user 1.500+25.190 sys 0.260+2.260)
real 8.750 secs (user 1.490+25.170 sys 0.250+2.340)
real 9.010 secs (user 1.680+25.340 sys 0.280+2.540)
remove cache function at all (for reference):
real 9.620 secs (user 1.440+27.120 sys 0.250+2.980)
real 9.420 secs (user 1.400+26.940 sys 0.320+3.130)
real 9.760 secs (user 1.530+27.270 sys 0.250+2.970)
Now that we records "all" changes happening in a transaction (in tr.changes)
we will be able to provide better report on various changes (phases turned
public, changeset obsoleted, branch merged or created, etc..)
This is far too late in the cycle to play with this, but having this existing
method called more widely will help extensions to play around with various
options during the 4.4 cycle.
Instead of calling registersummarycallback only for transactions we want, we
always call it and use the transaction name to decide when to report (eg: we
do not want `hg amend` to report new obsoleted changesets). Filtering on
transaction name does not seems great, but seems good enough for the moment.
We can change the API during the next cycle.
The previous manual call during unbundling of the bundle2 "obsmarkers" part is
no longer necessary and has been dropped.
This is done by a script [2] using RedBaron [1], a tool designed for doing
code refactoring. All "default" values are decided by the script and are
strongly consistent with the existing code.
There are 2 changes done manually to fix tests:
[warn] mercurial/exchange.py: experimental.bundle2-output-capture: default needs manual removal
[warn] mercurial/localrepo.py: experimental.hook-track-tags: default needs manual removal
Since RedBaron is not confident about how to indent things [2].
[1]: https://github.com/PyCQA/redbaron
[2]: https://github.com/PyCQA/redbaron/issues/100
[3]:
#!/usr/bin/env python
# codemod_configitems.py - codemod tool to fill configitems
#
# Copyright 2017 Facebook, Inc.
#
# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.
from __future__ import absolute_import, print_function
import os
import sys
import redbaron
def readpath(path):
with open(path) as f:
return f.read()
def writepath(path, content):
with open(path, 'w') as f:
f.write(content)
_configmethods = {'config', 'configbool', 'configint', 'configbytes',
'configlist', 'configdate'}
def extractstring(rnode):
"""get the string from a RedBaron string or call_argument node"""
while rnode.type != 'string':
rnode = rnode.value
return rnode.value[1:-1] # unquote, "'str'" -> "str"
def uiconfigitems(red):
"""match *.ui.config* pattern, yield (node, method, args, section, name)"""
for node in red.find_all('atomtrailers'):
entry = None
try:
obj = node[-3].value
method = node[-2].value
args = node[-1]
section = args[0].value
name = args[1].value
if (obj in ('ui', 'self') and method in _configmethods
and section.type == 'string' and name.type == 'string'):
entry = (node, method, args, extractstring(section),
extractstring(name))
except Exception:
pass
else:
if entry:
yield entry
def coreconfigitems(red):
"""match coreconfigitem(...) pattern, yield (node, args, section, name)"""
for node in red.find_all('atomtrailers'):
entry = None
try:
args = node[1]
section = args[0].value
name = args[1].value
if (node[0].value == 'coreconfigitem' and section.type == 'string'
and name.type == 'string'):
entry = (node, args, extractstring(section),
extractstring(name))
except Exception:
pass
else:
if entry:
yield entry
def registercoreconfig(cfgred, section, name, defaultrepr):
"""insert coreconfigitem to cfgred AST
section and name are plain string, defaultrepr is a string
"""
# find a place to insert the "coreconfigitem" item
entries = list(coreconfigitems(cfgred))
for node, args, nodesection, nodename in reversed(entries):
if (nodesection, nodename) < (section, name):
# insert after this entry
node.insert_after(
'coreconfigitem(%r, %r,\n'
' default=%s,\n'
')' % (section, name, defaultrepr))
return
def main(argv):
if not argv:
print('Usage: codemod_configitems.py FILES\n'
'For example, FILES could be "{hgext,mercurial}/*/**.py"')
dirname = os.path.dirname
reporoot = dirname(dirname(dirname(os.path.abspath(__file__))))
# register configitems to this destination
cfgpath = os.path.join(reporoot, 'mercurial', 'configitems.py')
cfgred = redbaron.RedBaron(readpath(cfgpath))
# state about what to do
registered = set((s, n) for n, a, s, n in coreconfigitems(cfgred))
toregister = {} # {(section, name): defaultrepr}
coreconfigs = set() # {(section, name)}, whether it's used in core
# first loop: scan all files before taking any action
for i, path in enumerate(argv):
print('(%d/%d) scanning %s' % (i + 1, len(argv), path))
iscore = ('mercurial' in path) and ('hgext' not in path)
red = redbaron.RedBaron(readpath(path))
# find all repo.ui.config* and ui.config* calls, and collect their
# section, name and default value information.
for node, method, args, section, name in uiconfigitems(red):
if section == 'web':
# [web] section has some weirdness, ignore them for now
continue
defaultrepr = None
key = (section, name)
if len(args) == 2:
if key in registered:
continue
if method == 'configlist':
defaultrepr = 'list'
elif method == 'configbool':
defaultrepr = 'False'
else:
defaultrepr = 'None'
elif len(args) >= 3 and (args[2].target is None or
args[2].target.value == 'default'):
# try to understand the "default" value
dnode = args[2].value
if dnode.type == 'name':
if dnode.value in {'None', 'True', 'False'}:
defaultrepr = dnode.value
elif dnode.type == 'string':
defaultrepr = repr(dnode.value[1:-1])
elif dnode.type in ('int', 'float'):
defaultrepr = dnode.value
# inconsistent default
if key in toregister and toregister[key] != defaultrepr:
defaultrepr = None
# interesting to rewrite
if key not in registered:
if defaultrepr is None:
print('[note] %s: %s.%s: unsupported default'
% (path, section, name))
registered.add(key) # skip checking it again
else:
toregister[key] = defaultrepr
if iscore:
coreconfigs.add(key)
# second loop: rewrite files given "toregister" result
for path in argv:
# reconstruct redbaron - trade CPU for memory
red = redbaron.RedBaron(readpath(path))
changed = False
for node, method, args, section, name in uiconfigitems(red):
key = (section, name)
defaultrepr = toregister.get(key)
if defaultrepr is None or key not in coreconfigs:
continue
if len(args) >= 3 and (args[2].target is None or
args[2].target.value == 'default'):
try:
del args[2]
changed = True
except Exception:
# redbaron fails to do the rewrite due to indentation
# see https://github.com/PyCQA/redbaron/issues/100
print('[warn] %s: %s.%s: default needs manual removal'
% (path, section, name))
if key not in registered:
print('registering %s.%s' % (section, name))
registercoreconfig(cfgred, section, name, defaultrepr)
registered.add(key)
if changed:
print('updating %s' % path)
writepath(path, red.dumps())
if toregister:
print('updating configitems.py')
writepath(cfgpath, cfgred.dumps())
if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))
The 'successors' part of the mappings used of be a tuple. This avoid issue from
code consuming the generator "by mistake". For example, an extension inspecting the
mapping content used to be able to iterate over the successors mapping without
consequence.
Since the mapping are small we do not expect any performance impact we use tuple
again for this.
cleanupnodes takes care of bookmark movement, and bookmark movement could
cause bookmark divergent resolution as a side effect. This patch adds such
bookmark divergent resolution logic so future rebase migration will be
easier.
The revset is carefully written to be equivalent to what rebase does today.
Although I think it might make sense to remove divergent bookmarks more
aggressively, for example:
F book@1
|
E book@2
|
| D book
| |
| C
|/
B book@3
|
A
When rebase -s C -d E, "book@1" will be removed, "book@3" will be kept,
and the end result is:
D book
|
C
|
F
|
E book@2 (?)
|
B book@3
|
A
The question is should we keep book@2? The current logic keeps it. If we
choose not to (makes some sense to me), the "deleterevs" revset could be
simplified to "newnode % oldnode".
For now, I just make it compatible with the existing behavior. If we want to
make the "deleterevs" revset simpler, we can always do it in the future.
In some valid usecases, the "mapping" received by scmutil.cleanupnodes have
filtered nodes. Use unfiltered repo to access them correctly.
The added test case will fail with the old cleanupnodes code.
This is important to migrate histedit to use the cleanupnodes API.
This is a first basic visible usage of the changes tracking in the transaction.
We adds a new function computing the pre-existing changesets obsoleted by a
transaction and a transaction call back displaying this information.
Example output:
added 1 changesets with 1 changes to 1 files (+1 heads)
3 new obsolescence markers
obsoleted 1 changesets
The goal is to evolve the transaction summary into something bigger, gathering
existing output there and adding new useful one. This patch is a good first step
on this road. The new output is basic but give a user to the content of
tr.changes['obsmarkers'] and give an idea of the new options we haves. I expect
to revisit the message soon.
The caller recording the transaction summary should also be moved into a more
generic location but further refactoring is needed before it can happen.
It's now common that an old node gets replaced by zero or more new nodes,
that could happen with amend, rebase, histedit, etc. And it's a common
requirement to do bookmark movements, strip or obsolete nodes and even
moving working copy parent.
Previously, amend, rebase, history have their own logic doing the above.
This patch is an attempt to unify them and future code.
This enables new developers to be able to do "replace X with Y" thing
correctly, without any knowledge about bookmarks, strip or obsstore.
The next step will be migrating rebase to the new API, so it works inside a
transaction, and its code could be simplified.
This will allow us to map repo["ff..."] to workingctx. _partialmatch() will
be updated later. I tried "return wdirrev" in place of raising the exception,
but earlier exception seemed better.
It seemed silly to convert ctx.hex() back to binary to use node.hex/short(),
or to use [:12] instead of node.short() because ctx.node() could be None.
Eventually I want to change wctx.rev() and wctx.node() to return wdirrev and
wdirid respectively, but that's quite big API breakage and can't be achieved
without some compatibility wrappers.
This makes it slightly easier to sort basectx objects by key=scmutil.intrev.
We're most likely to have ctx objects where changectx/workingctx abstraction
is necessary, so this won't increase the abstraction overhead.
To ease migration from files with version numbers in their first lines,
we want simplekeyvaluefile to support a non-key-value first line. In this
way, old versions of Mercurial will read such files, discover a newer version
than the one they know how to handle and fail gracefully, rather than with
exception. Shelve's shelvestate file is an example.
outgoing has been using an unfiltered repo since 07f64d64baf7 (discovery:
outgoing pass unfiltered repo to findcommonincoming (issue3776),
2013-01-28). If I'm reading code and history correctly, it should be
safe to run _outgoing() on a filtered repo since daf83ddd4afd
(discovery: run discovery on filtered repository, 2015-01-07). By
running _outgoing() on a filtered repo, we can also remove the
workaround there for ignoring filtered revisions.
dispatch handles KeyboardInterrupt already. This makes the code more
consistent, and makes worker not print "killed!" if it receives SIGTERM in
most cases (in rare cases there is still "killed!" printed, which will be
fixed by the next patch).
Acquiring lock by vfs.makelock() and getting lock holder (aka
"locker") information by vfs.readlock() aren't atomic action.
Therefore, failure of the former doesn't ensure success of the latter.
Before this patch, lock is unintentionally acquired, if ENOENT
causes failure of vfs.readlock() while 5 times retrying, because
lock._trylock() returns to caller silently after retrying, and
lock.lock() assumes that lock._trylock() returns successfully only if
lock is acquired.
In this case, lock symlink (or file) isn't created, even though lock
is treated as acquired in memory.
To avoid this issue, this patch makes lock._trylock() raise
LockHeld(EAGAIN) at the end of it, if lock isn't acquired while
retrying.
An empty "locker" meaning "busy for frequent lock/unlock by many
processes" might appear in an abortion message, if lock acquisition
fails. Therefore, this patch also does:
- use '%r' to increase visibility of "locker", even if it is empty
- show hint message to explain what empty "locker" means
As discussed at [1], the logic around "actual config"s seem to be
non-trivial enough that it's worth a new module.
This patch creates the module and move "scmutil.*rcpath" functions there as
the first step. More methods will be moved to the module in the future.
The module is different from config.py because the latter only cares about
data structure and parsing, and does not care about special case, or system
config paths, or environment variables.
[1]: https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/095503.html
The purpose of the added class is to serve purposes like save files of shelve
or state files of shelve, rebase and histedit. Keys of these files can be
alphanumeric and start with letters, while values must not contain newlines.
In light of Mercurial's reluctancy to use Python's json module, this tries
to provide a reasonable alternative for a non-nested named data.
Comparing to current approach of storing state in plain text files, where
semantic meaning of lines of text is only determined by their oreder,
simple key-value file allows for reordering lines and thus helps handle
optional values.
Initial use-case I see for this is obs-shelve's shelve files. Later we
can possibly migrate state files to this approach.
The test is in a new file beause I did not figure out where to put it
within existing test suite. If you give me a better idea, I will gladly
follow it.
The 'scmutil' is growing large (1500+ lines) and 2/5 of it is related to vfs.
We extract the 'vfs' related code in its own module get both module back to a
better scale and clearer contents.
We keep all the references available in 'scmutil' for now as many reference
needs to be updated.
This was one of the hardest import cycles as scmutil is widely used and
revset functions are likely to depend on a variety of modules.
New repo.anyrevs() does not expand user aliases by default to copy the
behavior of the existing repo.revs(). I don't want to add new function to
localrepository, but this function is quite similar to repo.revs() so it
won't increase the complexity of the localrepository class so much.
New revsetlang module hosts parser, tokenizer, and miscellaneous functions
working on parsed tree. It does not include functions for evaluation such as
getset() and match().
2288 mercurial/revset.py
684 mercurial/revsetlang.py
2972 total
get*() functions are aliased since they are common in revset.py.
os.name returns unicodes on py3 and we have pycompat.osname which returns
bytes. This series of 2 patches will change every ocurrence of os.name with
pycompat.osname.
Per discussion at 7d927e65eaf2 [1], we need "callcatch" in worker.py. Move
it to scmutil.py to avoid cycles.
Note that dispatch's callcatch handles some additional high-level exceptions
related to config parsing, and commands. Moving them to scmutil will make
scmutil depend on "commands" or require "_formatparse" and "_getsimilar"
(and "difflib") to be moved as well. In the worker use-case, it is forked
when config and commands are fully loaded. So it should not care about those
exceptions.
[1]: https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-August/087116.html
According to POSIX specification, just having group write access to a
file causes EPERM at invocation of os.utime() with an explicit time
information (e.g. working on the repository shared by group access
permission).
To ignore EPERM at closing file object in such case, this patch makes
checkambigatclosing._checkambig() use filestat.avoidambig() introduced
by previous patch.
Some functions below imply this code path at truncation of an existing
(= might be owned by another user) file.
- strip() in repair.py, introduced by 4d0a08431b6f
- _playback() in transaction.py, introduced by 48fe04792102
This is a variant of issue5418.
According to POSIX specification, just having group write access to a
file causes EPERM at invocation of os.utime() with an explicit time
information (e.g. working on the repository shared by group access
permission).
To ignore EPERM at renaming in such case, this patch makes
vfs.rename() use filestat.avoidambig() introduced by previous patch.
In Mercurial source tree, opening a file in "a"/"a+" mode like below
doesn't specify atomictemp=True for vfs, and this avoids file stat
ambiguity check by atomictempfile.
- writing changes out in revlog layer uses "a+" mode
- truncation in repair.strip() uses "a" mode
- truncation in transaction._playback() uses "a" mode
If steps below occurs at "the same time in sec", all of mtime, ctime
and size are same between (1) and (3).
1. append data to revlog-style file (and close transaction)
2. discard appended data by truncation (strip or rollback)
3. append same size but different data to revlog-style file again
Therefore, cache validation doesn't work after (3) as expected.
This patch uses checkambigatclosing in checkambig=True but
atomictemp=False case, to check (and get rid of) file stat ambiguity
at closing.
This is a part of ExactCacheValidationPlan.
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
In Mercurial source tree, opening a file in "a"/"a+" mode like below
doesn't specify atomictemp=True for vfs, and this avoids file stat
ambiguity check by atomictempfile.
- writing changes out in revlog layer uses "a+" mode
- truncation in repair.strip() uses "a" mode
- truncation in transaction._playback() uses "a" mode
If steps below occurs at "the same time in sec", all of mtime, ctime
and size are same between (1) and (3).
1. append data to revlog-style file (and close transaction)
2. discard appended data by truncation (strip or rollback)
3. append same size but different data to revlog-style file again
Therefore, cache validation doesn't work after (3) as expected.
This patch adds file object wrapper class checkambigatclosing to check
(and get rid of) ambiguity at closing. It is used by vfs in subsequent
patch.
This is a part of ExactCacheValidationPlan.
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
BTW, checkambigatclosing is tested in test-filecache.py, even though
it doesn't use filecache itself, because filecache assumes that file
stat ambiguity never occurs (and there is no another test-*.py related
to filecache).
It appears crecord.py has its own termsize() function. I want to get rid of it.
The fallback height is chosen from the default of cmd.exe on Windows, and
VT100 on Unix.
I'm going to get rid of sys.stderr|out|in references from posix.termwidth().
In order to do that, termwidth() needs to take a ui, but functions in util.py
shouldn't depend on a ui object. So moves termwidth() to scmutil.py.
This patch make sure scmutil.rcpath() returns bytes independent of
which platform is used on Python 3. If we want to change type for windows we
can just conditionalize the return variable.
Previously in the worst case we iterated the files in matcher twice and
had a method only for this, which reimplemented logic in subdirmatchers
constructor. So we replaced the method with a subdirmatcher.files() call.
At 9069882b46bf, we had to optimize a parsed tree to resolve x^:y ambiguity.
Since we've moved the resolution of x^:y to parse(), we no longer have to call
optimize(). Therefore, (x:y) can be taken as a single expression, not an odd
range expression x:y.