Some code paths use 'copyfiles' (full tree) for a single file to take advantage
of the best-effort-hard-linking parameter. We add similar parameter and logic
to 'copyfile' (single file) for this purpose.
The single file version have the advantage to overwrite the destination file if
it exists.
This allows taking advantage of Python 2.5+'s struct.Struct, which
provides a slightly faster unpack due to reusing formats. Sadly,
.unpack_from is significantly slower.
Garbage collection behave pathologically when creating a lot of containers. As
we do that more than once it become sensible to have a decorator for it. See
inline documentation for details.
This patch uses "False" as default value of "notindexed" argument,
even though "vfs.makedir()" uses "True" for it, because "os.mkdir()"
doesn't set "_FILE_ATTRIBUTE_NOT_CONTENT_INDEXED" attribute to newly
created directories.
Future patches will allow extensions to choose which order a namespace should
output in the log, so we add a way for sortdict to insert to a specific
location.
Future patches will start using sortdict for log operations where order is
important. Adding iteritems removes the headache of having to remember to use
items() if the object is a sortdict.
Previously, we'd scan through the entire directory listing looking for a
normalized match. This is O(N) in the number of files in the directory. If we
decide to call util.fspath on each file in it, the overall complexity works out
to O(N^2). This becomes a problem with directories a few thousand files or
larger.
Switch to using a dictionary instead. There is a slightly higher upfront cost
to pay, but for cases like the above this is amortized O(1). Plus there is a
lower constant factor because generator comprehensions are faster than for
loops, so overall it works out to be a very small loss in performance for 1
file, and a huge gain when there's more.
For a large repo with around 200k files in it on a case-insensitive file
system, for a large directory with over 30,000 files in it, the following
command was tested:
ls | shuf -n $COUNT | xargs hg status
This command leads to util.fspath being called on $COUNT files in the
directory.
COUNT before after
1 0.77s 0.78s
100 1.42s 0.80s
1000 6.3s 0.96s
I also tested with COUNT=10000, but before took too long so I gave up.
util.system() copies subprocess' output through pipe if output file is not
stdout. Because a file iterator has internal buffering, output won't be
flushed until enough data is available. Therefore, it could easily miss
important messages such as "waiting for lock".
This effectively backs out changeset 7582042d6cce.
The API change is done so that both util.sha1 and util.md5 can be called the
same way. The function is moved in order to use it for md5 checksumming for
an upcoming bundle2 feature.
templates, help and locale data is normally stored as sub folders in the
directory containing the source of the mercurial module. In a frozen build they
live as sub folders next to 'hg.exe' and 'library.zip'.
These different kind of data were handled in different ways. Unify that by
introducing util.datapath. The value is computed from the environment and is
always used, so we just calculate the value on module load.
Reading all available data from a pipe has a platform-dependent
implementation.
This patch establishes platform.readpipe() by copying the
inline implementation in sshpeer.readerr(). The implementations
for POSIX and Windows are currently identical. The POSIX
implementation will be changed in a subsequent patch.
The escape method in at least one of the modules called 're2' is in C. This
means it is significantly faster than the Python code written in 're'.
An upcoming patch will have benchmarks.
Before this patch, 'util.ellipsis' tried to avoid splitting at
intermediate multi-byte sequence, but its implementation was incorrect.
Internal function '_ellipsis' trims specified unicode sequence not at
most maxlength 'columns in display', but at most maxlength number of
'unicode characters'.
def _ellipsis(text, maxlength):
if len(text) <= maxlength:
return text, False
else:
return "%s..." % (text[:maxlength - 3]), True
In many encodings, number of unicode characters can be different from
columns in display.
This patch replaces 'ellipsis' implementation by 'encoding.trim',
which can trim string at most maxlength columns in display correctly,
even though specified string contains multi-byte characters.
'_ellipsis' is removed in this patch, because it is referred only from
'ellipsis'.
Before this patch, "util.cachefunc()" caches the value returned by the
specified function into dictionary "cache", even if the specified
function takes no arguments.
In such case, "cache" has at most one entry, and distinction between
entries in "cache" is meaningless.
This patch adds the code path to "cachefunc()" for the function taking
no arguments for efficiency: to store only one cached value, using
list "cache" is a little faster than using dictionary "cache".
Close another stream (default stdout, which often is buffered) before writing
to the primary stream (default stderr, which often is unbuffered). The primary
stream is also flushed after writing (in case it is buffered).
This fixes non-deterministic output order, especially on windows.
This is often very handy when hacking/debugging.
Calling util.debugstacktrace('hey') from a place in hg will give something like:
hey at:
./hg:38 in <module>
/home/user/hgsrc/mercurial/dispatch.py:28 in run
/home/user/hgsrc/mercurial/dispatch.py:65 in dispatch
/home/user/hgsrc/mercurial/dispatch.py:88 in _runcatch
/home/user/hgsrc/mercurial/dispatch.py:740 in _dispatch
/home/user/hgsrc/mercurial/dispatch.py:514 in runcommand
/home/user/hgsrc/mercurial/dispatch.py:830 in _runcommand
/home/user/hgsrc/mercurial/dispatch.py:801 in checkargs
/home/user/hgsrc/mercurial/dispatch.py:737 in <lambda>
/home/user/hgsrc/mercurial/util.py:472 in check
...
Backslashes (\) in paths were encoded to %C5 when converting from url to
string. This does not look nice for windows paths. And it introduces many
problems when running tests on windows.
Paths ending with \ will fail the verification introduced in 0bc0c17d663e when
checking out on Windows ... and if it didn't fail it would probably not do what
the user expected.
Propertycache used standard attribute assignment. In the repoview case, this
assignment was forwarded to the unfiltered repo. This result in:
(1) unfiltered repo got a potentially wrong cache value,
(2) repoview never reused the cached value.
This patch replaces the standard attribute assignment by an assignment to
`objc.__dict__` which will bypass the `repoview.__setattr__`. This will not
affects other `propertycache` users and it is actually closer to the semantic we
need.
The interaction of `propertycache` and `repoview` are now tested in a python
test file.