The assumption that dynamically computing the length of the buffer was N^2, but
negligible because fast was False. So we drop the dynamic computation and
manually keep track of the buffer length.
This comment is the remains of a intermediate implementation using
self._buffer += data
This implementation never made it to the repository and we can safely drop the
comment.
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".
This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.
This patch was produced by running `2to3 -f except -w -n .`.
We'll use it to detect when a sshpeer have server output to be displayed.
The implementation is super basic because all case support is not the focus of
this series.
To restore real time server output through ssh, we need to using polling feature
(like select) on the pipes used to communicate with the ssh client. However
we cannot use select alongside python level buffering of these pipe (because we
need to know if the buffer is non-empty before calling select).
However, unbuffered performance are terrible, presumably because the 'readline'
call is issuing 'read(1)' call until it find a '\n'. To work around that we
introduces our own overlay that do buffering by hand, exposing the state of the
buffer to the outside world.
The usage of polling IO will be introduced later in the 'sshpeer' module. All
its logic will be very specific to the way mercurial communicate over ssh and
does not belong to the generic 'util' module.
We will need unbuffered IO to restore real time output with ssh peer.
Changeset b61b215fcfa8 seems to indicate playing with this value could be
dangerous, but does not indicate why.
One case where that would happen is while trying to resolve a subrepo, if the
path to the subrepo was actually a broken symlink. This bug was exposed by an
hg-git test.
According to 6b1369445b7b introducing "windows._removedirs()":
If a hg repository including working directory is a reparse point
(directory symlinked or a junction point), then using
os.removedirs will remove the reparse point erroneously.
"windows._removedirs()" should be used instead of "os.removedirs()" on
Windows.
This patch adds "removedirs" as platform depending function to replace
"os.removedirs()" invocations for portability and safety
An upcoming commit requires that match.py be able to call scmutil.dirs(), but
when match.py imports scmutil, a dependency cycle is created. This commit
avoids the cycle by moving dirs() and its related finddirs() function from
scmutil to util, which match.py already depends on.
Hi there!
Fixed date names are helpful for automated systems. So it is possible to
use english date parameter even if the underlying system uses another
locale.
We have here a jenkins with build jobs on different slaves that will do
some operations with "dates" parameter. Some systems uses English locale
and some systems uses German locale. So we needed to configure the job to
uses other date names.
As this is really annoying to keep the systems locale in mind for some
operations I looked into util.py. It would be helpful for automated systems
if the "default English" date names would even usable on other
locales.
I attached a simple patch for this.
Best regards
André Klitzing
Some code paths use 'copyfiles' (full tree) for a single file to take advantage
of the best-effort-hard-linking parameter. We add similar parameter and logic
to 'copyfile' (single file) for this purpose.
The single file version have the advantage to overwrite the destination file if
it exists.
This allows taking advantage of Python 2.5+'s struct.Struct, which
provides a slightly faster unpack due to reusing formats. Sadly,
.unpack_from is significantly slower.
Garbage collection behave pathologically when creating a lot of containers. As
we do that more than once it become sensible to have a decorator for it. See
inline documentation for details.
This patch uses "False" as default value of "notindexed" argument,
even though "vfs.makedir()" uses "True" for it, because "os.mkdir()"
doesn't set "_FILE_ATTRIBUTE_NOT_CONTENT_INDEXED" attribute to newly
created directories.
Future patches will allow extensions to choose which order a namespace should
output in the log, so we add a way for sortdict to insert to a specific
location.
Future patches will start using sortdict for log operations where order is
important. Adding iteritems removes the headache of having to remember to use
items() if the object is a sortdict.
Previously, we'd scan through the entire directory listing looking for a
normalized match. This is O(N) in the number of files in the directory. If we
decide to call util.fspath on each file in it, the overall complexity works out
to O(N^2). This becomes a problem with directories a few thousand files or
larger.
Switch to using a dictionary instead. There is a slightly higher upfront cost
to pay, but for cases like the above this is amortized O(1). Plus there is a
lower constant factor because generator comprehensions are faster than for
loops, so overall it works out to be a very small loss in performance for 1
file, and a huge gain when there's more.
For a large repo with around 200k files in it on a case-insensitive file
system, for a large directory with over 30,000 files in it, the following
command was tested:
ls | shuf -n $COUNT | xargs hg status
This command leads to util.fspath being called on $COUNT files in the
directory.
COUNT before after
1 0.77s 0.78s
100 1.42s 0.80s
1000 6.3s 0.96s
I also tested with COUNT=10000, but before took too long so I gave up.
util.system() copies subprocess' output through pipe if output file is not
stdout. Because a file iterator has internal buffering, output won't be
flushed until enough data is available. Therefore, it could easily miss
important messages such as "waiting for lock".
This effectively backs out changeset 7582042d6cce.
The API change is done so that both util.sha1 and util.md5 can be called the
same way. The function is moved in order to use it for md5 checksumming for
an upcoming bundle2 feature.
templates, help and locale data is normally stored as sub folders in the
directory containing the source of the mercurial module. In a frozen build they
live as sub folders next to 'hg.exe' and 'library.zip'.
These different kind of data were handled in different ways. Unify that by
introducing util.datapath. The value is computed from the environment and is
always used, so we just calculate the value on module load.
Reading all available data from a pipe has a platform-dependent
implementation.
This patch establishes platform.readpipe() by copying the
inline implementation in sshpeer.readerr(). The implementations
for POSIX and Windows are currently identical. The POSIX
implementation will be changed in a subsequent patch.
The escape method in at least one of the modules called 're2' is in C. This
means it is significantly faster than the Python code written in 're'.
An upcoming patch will have benchmarks.