As discussed at [1], the logic around "actual config"s seem to be
non-trivial enough that it's worth a new module.
This patch creates the module and move "scmutil.*rcpath" functions there as
the first step. More methods will be moved to the module in the future.
The module is different from config.py because the latter only cares about
data structure and parsing, and does not care about special case, or system
config paths, or environment variables.
[1]: https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/095503.html
The purpose of the added class is to serve purposes like save files of shelve
or state files of shelve, rebase and histedit. Keys of these files can be
alphanumeric and start with letters, while values must not contain newlines.
In light of Mercurial's reluctancy to use Python's json module, this tries
to provide a reasonable alternative for a non-nested named data.
Comparing to current approach of storing state in plain text files, where
semantic meaning of lines of text is only determined by their oreder,
simple key-value file allows for reordering lines and thus helps handle
optional values.
Initial use-case I see for this is obs-shelve's shelve files. Later we
can possibly migrate state files to this approach.
The test is in a new file beause I did not figure out where to put it
within existing test suite. If you give me a better idea, I will gladly
follow it.
The 'scmutil' is growing large (1500+ lines) and 2/5 of it is related to vfs.
We extract the 'vfs' related code in its own module get both module back to a
better scale and clearer contents.
We keep all the references available in 'scmutil' for now as many reference
needs to be updated.
This was one of the hardest import cycles as scmutil is widely used and
revset functions are likely to depend on a variety of modules.
New repo.anyrevs() does not expand user aliases by default to copy the
behavior of the existing repo.revs(). I don't want to add new function to
localrepository, but this function is quite similar to repo.revs() so it
won't increase the complexity of the localrepository class so much.
New revsetlang module hosts parser, tokenizer, and miscellaneous functions
working on parsed tree. It does not include functions for evaluation such as
getset() and match().
2288 mercurial/revset.py
684 mercurial/revsetlang.py
2972 total
get*() functions are aliased since they are common in revset.py.
os.name returns unicodes on py3 and we have pycompat.osname which returns
bytes. This series of 2 patches will change every ocurrence of os.name with
pycompat.osname.
Per discussion at 7d927e65eaf2 [1], we need "callcatch" in worker.py. Move
it to scmutil.py to avoid cycles.
Note that dispatch's callcatch handles some additional high-level exceptions
related to config parsing, and commands. Moving them to scmutil will make
scmutil depend on "commands" or require "_formatparse" and "_getsimilar"
(and "difflib") to be moved as well. In the worker use-case, it is forked
when config and commands are fully loaded. So it should not care about those
exceptions.
[1]: https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-August/087116.html
According to POSIX specification, just having group write access to a
file causes EPERM at invocation of os.utime() with an explicit time
information (e.g. working on the repository shared by group access
permission).
To ignore EPERM at closing file object in such case, this patch makes
checkambigatclosing._checkambig() use filestat.avoidambig() introduced
by previous patch.
Some functions below imply this code path at truncation of an existing
(= might be owned by another user) file.
- strip() in repair.py, introduced by 4d0a08431b6f
- _playback() in transaction.py, introduced by 48fe04792102
This is a variant of issue5418.
According to POSIX specification, just having group write access to a
file causes EPERM at invocation of os.utime() with an explicit time
information (e.g. working on the repository shared by group access
permission).
To ignore EPERM at renaming in such case, this patch makes
vfs.rename() use filestat.avoidambig() introduced by previous patch.
In Mercurial source tree, opening a file in "a"/"a+" mode like below
doesn't specify atomictemp=True for vfs, and this avoids file stat
ambiguity check by atomictempfile.
- writing changes out in revlog layer uses "a+" mode
- truncation in repair.strip() uses "a" mode
- truncation in transaction._playback() uses "a" mode
If steps below occurs at "the same time in sec", all of mtime, ctime
and size are same between (1) and (3).
1. append data to revlog-style file (and close transaction)
2. discard appended data by truncation (strip or rollback)
3. append same size but different data to revlog-style file again
Therefore, cache validation doesn't work after (3) as expected.
This patch uses checkambigatclosing in checkambig=True but
atomictemp=False case, to check (and get rid of) file stat ambiguity
at closing.
This is a part of ExactCacheValidationPlan.
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
In Mercurial source tree, opening a file in "a"/"a+" mode like below
doesn't specify atomictemp=True for vfs, and this avoids file stat
ambiguity check by atomictempfile.
- writing changes out in revlog layer uses "a+" mode
- truncation in repair.strip() uses "a" mode
- truncation in transaction._playback() uses "a" mode
If steps below occurs at "the same time in sec", all of mtime, ctime
and size are same between (1) and (3).
1. append data to revlog-style file (and close transaction)
2. discard appended data by truncation (strip or rollback)
3. append same size but different data to revlog-style file again
Therefore, cache validation doesn't work after (3) as expected.
This patch adds file object wrapper class checkambigatclosing to check
(and get rid of) ambiguity at closing. It is used by vfs in subsequent
patch.
This is a part of ExactCacheValidationPlan.
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
BTW, checkambigatclosing is tested in test-filecache.py, even though
it doesn't use filecache itself, because filecache assumes that file
stat ambiguity never occurs (and there is no another test-*.py related
to filecache).
It appears crecord.py has its own termsize() function. I want to get rid of it.
The fallback height is chosen from the default of cmd.exe on Windows, and
VT100 on Unix.
I'm going to get rid of sys.stderr|out|in references from posix.termwidth().
In order to do that, termwidth() needs to take a ui, but functions in util.py
shouldn't depend on a ui object. So moves termwidth() to scmutil.py.
This patch make sure scmutil.rcpath() returns bytes independent of
which platform is used on Python 3. If we want to change type for windows we
can just conditionalize the return variable.
Previously in the worst case we iterated the files in matcher twice and
had a method only for this, which reimplemented logic in subdirmatchers
constructor. So we replaced the method with a subdirmatcher.files() call.
At 9069882b46bf, we had to optimize a parsed tree to resolve x^:y ambiguity.
Since we've moved the resolution of x^:y to parse(), we no longer have to call
optimize(). Therefore, (x:y) can be taken as a single expression, not an odd
range expression x:y.
This was breaking my ability to use treemanifests in bundlerepos, and
was deeply mysterious. We should probably just make the options
property a formal part of the vfs API, and make it a required
construction parameter. Sadly, I don't have time to dive into that
refactor right now.
I can never remember the differences between the various revset
APIs. I can never remember that scmutil.revrange() is the one I
want to use from user-facing commands.
Add some documentation to clarify this.
While we're here, the argument name for revrange() is changed to
"specs" because that's what it actually is.
To make it easier to patch the wrapped function, make it possible to access the
filecache descriptor directly on the class (rather than have to use
ClassObject.__dict__['attributename']). Returning `self` when the first
argument to `__get__` is `None` makes the descriptor behave the same way
`property` objects do.
All versions of Python we support or hope to support make the hash
functions available in the same way under the same name, so we may as
well drop the util forwards.
In some cases below, renaming from backup is used to restore original
contents of a file. If renaming keeps ctime, mtime and size of a file,
restoring is overlooked, and old contents cached before restoring
isn't invalidated as expected.
- failure of transaction before closing (only from '.hg/journal.dirstate')
- rollback of previous transaction (from '.hg/undo.*')
- failure in dirstateguard scope (from '.hg/dirstate.SUFFIX')
To avoid such problem, this patch makes vfs.rename() avoid ambiguity
of file stat, if needed.
Ambiguity check is executed, only if:
- checkambig=True is specified (not all renaming needs ambiguity check), and
- destination file exists before renaming
This patch is a part of preparation for "Exact Cache Validation Plan":
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
Update makedirs() to ignore EEXIST in case someone else has already created the
directory in question. Previously the ensuredirs() function existed, and was
nearly identical to makedirs() except that it fixed this race. Unfortunately
ensuredirs() was only used in 3 places, and most code uses the racy makedirs()
function. This fixes makedirs() to be non-racy, and replaces calls to
ensuredirs() with makedirs().
In particular, mercurial.scmutil.origpath() used the racy makedirs() code,
which could cause failures during "hg update" as it tried to create backup
directories.
This does slightly change the behavior of call sites using ensuredirs():
previously ensuredirs() would throw EEXIST if the path existed but was a
regular file instead of a directory. It did this by explicitly checking
os.path.isdir() after getting EEXIST. The makedirs() code did not do this and
swallowed all EEXIST errors. I kept the makedirs() behavior, since it seemed
preferable to avoid the extra stat call in the common case where this directory
already exists. If the path does happen to be a file, the caller will almost
certainly fail with an ENOTDIR error shortly afterwards anyway. I checked
the 3 existing call sites of ensuredirs(), and this seems to be the case for
them.
Closing files that have been appended to is relatively slow on
Windows/NTFS. This makes several Mercurial operations slower on
Windows.
The workaround to this issue is conceptually simple: use multiple
threads for I/O. Unfortunately, Python doesn't scale well to multiple
threads because of the GIL. And, refactoring our code to use threads
everywhere would be a huge undertaking. So, we decide to tackle this
problem by starting small: establishing a thread pool for closing
files.
This patch establishes a mechanism for closing file handles on separate
threads. The coordinator object is basically a queue of file handles to
operate on and a thread pool consuming from the queue.
When files are opened through the VFS layer, the caller can specify
that delay closing is allowed.
A proxy class for file handles has been added. We must use a proxy
because it isn't possible to modify __class__ on built-in types. This
adds some overhead. But as future patches will show, this overhead
is cancelled out by the benefit of closing file handles on background
threads.
Now that we dropped support for Python 2.4, we are able to use context
managers. Let's replace the try..finally pattern in scmutil.py with
context managers, which close files automatically when the context
manager is exited.
There should be no change in behavior with this patch.
Why convert to context managers if nothing is broken? I'm working on
closing file handles in background threads to improve performance on
Windows. As part of this, I realized there could be some future issues
if the background file closing code isn't designed with context
managers in mind. So, I'd like to switch some code to context managers
so I can design an API that works with context managers.
Previously, we were using Python's native 'os.path.isfile' method which follows
symlinks. In this case, since we're operating on repo contents, we don't want
to follow symlinks.
There's a behaviour change here, as shown by the second part of the added test.
Consider a symlink 'f' pointing to a file containing 'abc'. If we try and
replace it with a file with contents 'abc', previously we would have let it
though. Now we don't. Although this breaks naive inspection with tools like
'cat' and 'diff', on balance I believe this is the right change.
When using 'extdiff --patch' to check the changes in a rebase, 'precursors(x)'
evaluated to an empty set because I forgot the --hidden flag, so the other
revision was used as the replacement for the empty set. The result was the
patch for the other revision was diffed against itself, and the tool saying
there were no differences. That's misleading since the expected diff args were
silently changed, so it's better to bail out.
The other uses of scmutil.revpair() are commands.diff and commands.status, and
it doesn't make sense to allow an empty revision there either. The code here
was suggested by Yuya Nishihara.
Since we have pushed back the performance issue related to general delta behind
another configuration (Still off by default), we can safely create new
repository with general delta support. As client are compatible with it since
Mercurial 1.9 (4.5 years ago) I do no expect any significant compatibility
issues.
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.
For great justice.