Previously, 'hg serve -d' was trying to exec the bundled python executable,
which failed with:
Unknown option: --
usage: python [option] ...
Try 'python -h'...
abort: child process failed to start
See the previous patch for details about the content of the various command
variables. Note that unlike the previous patch here an application bundling
Mercurial could set $HG in the environment to get the correct result, there
isn't anything that a bundling application could do to get the correct result
here.
'hg serve -d' now launches under TortoiseHg, and there is a process listed in
the background, but a client process cannot connect to it for some reason, so
more investigation is needed.
sys.executable is "$appbundle/Contents/MacOS/python" when Mercurial is bundled
in a frozen app bundle on OS X, so that isn't appropriate. It appears that this
was only visible for things launched via util.system(), like external hooks,
where $HG was set wrong.
It appears that Mercurial also uses 'sys.modules['__main__'].__file__' (here)
and 'sys.argv[0]' (in platform.gethgcmd()) to figure out the command to spawn.
In both cases, this points to "$appbundle/Contents/Resources/hg", which invokes
the system python since "/usr/bin/env python" is on the shebang line. On my
system with a screwed up python install, I get an error importing the os module
if this script is invoked.
We could take the dirname of sys.executable and join 'hg' instead of this if we
want to be paranoid, but py2app boostrap is setting the environment variable
since 0.1.6 (current version is 0.9), so it seems safe and we might as well use
it.
Apparently unlike py2exe, py2app copies the Mercurial source tree as-is to a
Contents/Resources subdirectory of an app bundle, and places its binary stub in
Contents/MacOS. (The Windows install has the 'hgext' and 'mercurial' modules in
'lib/library.zip', while the help and templates subdirectories have been moved
out of the mercurial directory to the root of the installation. I assume that
the python code living in a zip file is why "py2exe doesn't support __file__".)
Therefore, prior to this change, Mercurial in a frozen app bundle on OS X would
go looking for help *.txt, templates and locale info in Contents/MacOS, where
they don't exist.
There are only a handful of places that test for frozen, and not all of them are
wrong for OS X, so it seems wiser to handle them on a case by case basis, rather
that try to change mainfrozen(). The remaining cases are:
1) util.hgexecutable() wrongly points to the bundled python executable, and
affects $HG in util.system() launched processes (e.g. external hooks)
2) util.hgcmd() wrongly points to the bundled python executable, but it seems
to only affect 'hg serve -d'
3) hook._pythonhook() may be OK, since I didn't see anything outrageous when
printing sys.path from an internal hook. I'm not sure if this special
case is needed on OS X though.
4) sslutil._plainapplepython() is OK, because sys.executable is not
/usr/bin/python, nor is it in /System/Library/Frameworks
This is similar in spirit to contextlib.nested in Python <= 2.6,
but uses an extra level of indirection to avoid its inability to
clean up if an __enter__ method raises an exception.
Why add this mechanism? It greatly simplifies scoped resource
management, and lets us eliminate several hundred lines of try/finally
blocks. In many of these cases the "finally" is separated from the
"try" by hundreds of lines of code, which makes the connection
between resource acquisition and disposal difficult to follow.
(The preferred mechanism would be the "multi-with" syntax of 2.7+,
but Mercurial can't move to 2.7 for a while.)
Intended use:
>>> with ctxmanager(lambda: file('foo'), lambda: file('bar')) as c:
>>> f1, f2 = c()
This will open both foo and bar when c() is invoked, and will close
both upon exit from the block. If the attempt to open bar raises
an exception, the block will not be entered - but foo will still
be closed.
An old implementation of this class (possibly only in my local repo)
allocated nodes in the cache during construction time, making
__init__ slow for large cache capacities. The current implementation
lazily grow the cache size, making this comment wrong.
This diff implements the standard dict copy() method for lrucachedicts, which
will be used in the pushrebase extension to make a copy of the manifestcache.
Contrary to the comment, I didn't see any evidence that we were copying
atime/mtime at all. This adds a parameter to copyfile to optionally copy it and
other stat data, with the default being to not copy it.
Many systems don't support changing the timestamp of a symlink, but we don't
need that in general anyway -- copystat is mostly useful for editors, most of
which will dereference symlinks anyway.
As part of attempting to more aggressively use the existing
lrucachedict, collections.deque operations were frequently
showing up in profiling output, negating benefits of caching.
Searching the internet seems to tell me that the most efficient
way to implement an LRU cache in Python is to have a dict indexing
the cached entries and then to use a doubly linked list to track
freshness of each entry. So, this patch replaces our existing
lrucachedict with a version using such a pattern.
The recently introduced perflrucachedict command reveals the
following timings for 10,000 operations for the following cache
sizes for the existing cache:
n=4 init=0.004079 gets=0.003632 sets=0.005188 mixed=0.005402
n=8 init=0.004045 gets=0.003998 sets=0.005064 mixed=0.005328
n=16 init=0.004011 gets=0.004496 sets=0.005021 mixed=0.005555
n=32 init=0.004064 gets=0.005611 sets=0.005188 mixed=0.006189
n=64 init=0.003975 gets=0.007684 sets=0.005178 mixed=0.007245
n=128 init=0.004121 gets=0.012005 sets=0.005422 mixed=0.009471
n=256 init=0.004143 gets=0.020295 sets=0.005227 mixed=0.013612
n=512 init=0.004039 gets=0.036703 sets=0.005243 mixed=0.020685
n=1024 init=0.004193 gets=0.068142 sets=0.005251 mixed=0.033064
n=2048 init=0.004070 gets=0.133383 sets=0.005160 mixed=0.050359
n=4096 init=0.004053 gets=0.265194 sets=0.004868 mixed=0.048352
n=8192 init=0.004087 gets=0.542218 sets=0.004562 mixed=0.032753
n=16384 init=0.004106 gets=1.064055 sets=0.004179 mixed=0.020367
n=32768 init=0.004034 gets=2.097620 sets=0.004260 mixed=0.013031
n=65536 init=0.004108 gets=4.106390 sets=0.004268 mixed=0.010191
As the data shows, the existing cache's retrieval performance
diminishes linearly with cache size. (Keep in mind the microbenchmark
is testing 100% cache hit rate.)
The new cache implementation reveals the following:
n=4 init=0.006665 gets=0.006541 sets=0.005733 mixed=0.006876
n=8 init=0.006649 gets=0.006374 sets=0.005663 mixed=0.006899
n=16 init=0.006570 gets=0.006504 sets=0.005799 mixed=0.007057
n=32 init=0.006854 gets=0.006459 sets=0.005747 mixed=0.007034
n=64 init=0.006580 gets=0.006495 sets=0.005740 mixed=0.006992
n=128 init=0.006534 gets=0.006739 sets=0.005648 mixed=0.007124
n=256 init=0.006669 gets=0.006773 sets=0.005824 mixed=0.007151
n=512 init=0.006701 gets=0.007061 sets=0.006042 mixed=0.007372
n=1024 init=0.006641 gets=0.007620 sets=0.006387 mixed=0.007464
n=2048 init=0.006517 gets=0.008598 sets=0.006871 mixed=0.008077
n=4096 init=0.006720 gets=0.010933 sets=0.007854 mixed=0.008663
n=8192 init=0.007383 gets=0.015969 sets=0.010288 mixed=0.008896
n=16384 init=0.006660 gets=0.025447 sets=0.011208 mixed=0.008826
n=32768 init=0.006658 gets=0.044390 sets=0.011192 mixed=0.008943
n=65536 init=0.006836 gets=0.082736 sets=0.011151 mixed=0.008826
Let's go through the results.
The new cache takes longer to construct. ~6.6ms vs ~4.1ms. However,
this is measuring 10,000 __init__ calls, so the difference is
~0.2us/instance. We currently only create lrucachedict for manifest
instances, so this regression is not likely relevant.
The new cache is slightly slower for retrievals for cache sizes
< 1024. It's worth noting that the only existing use of lurcachedict
is in manifest.py and the default cache size is 4. This regression
is worrisome. However, for n=4, the delta is ~2.9s for 10,000 lookups,
or ~0.29us/op. Again, this is a marginal regression and likely not
relevant in the real world. Timing `hg log -p -l 100` for
mozilla-central reveals that cache lookup times are dominated by
decompression and fulltext resolution (even with lz4 manifests).
The new cache is significantly faster for retrievals at larger
capacities. Whereas the old implementation has retrieval performance
linear with cache capacity, the new cache is constant time until much
larger values. And, when it does start to increase significantly, it
is a few magnitudes faster than the current cache.
The new cache does appear to be slower for sets when capacity is large.
However, performance is similar for smaller capacities. Of course,
caches should generally be optimized for retrieval performance because
if a cache is getting more sets than gets, it doesn't really make
sense to cache. If this regression is worrisome, again, taking the
largest regression at n=65536 of ~6.9ms for 10,000 results in a
regression of ~0.68us/op. This is not significant in the grand scheme
of things.
Overall, the new cache is performant at retrievals at much larger
capacity values which makes it a generally more useful cache backend.
While there are regressions, their absolute value is extremely small.
Since we aren't using lrucachedict aggressively today, these
regressions should not be relevant. The improved scalability of
lrucachedict should enable us to more aggressively utilize
lrucachedict for more granular caching (read: higher capacity caches)
in the near future. The impetus for this patch is to establish a cache
of decompressed revlog revisions, notably manifest revisions. And since
delta chains can grow to >10,000 and cache hit rate can be high, the
improved retrieval performance of lrucachedict should be relevant.
hashlib was added in Python 2.5. As far as I can tell, SHA-512 is always
available in 2.6+. So move the hashlib import to the top of the file and
remove the one-off handling of SHA-512.
We were computing the quotient and remainder of a division operation
separately. The built-in divmod() function allows us to do this with
a single function call. Do that.
We've globablly forced stat to return integer times which agrees with
our extension code, so this is no longer needed.
This speeds up status on mozilla-central substantially:
$ hg perfstatus
! wall 0.190179 comb 0.180000 user 0.120000 sys 0.060000 (best of 53)
$ hg perfstatus
! wall 0.275729 comb 0.270000 user 0.210000 sys 0.060000 (best of 36)
Alternate fix for this issue which avoids putting extra function calls
and exception handling in the fast path.
For almost all purposes, integer timestamps are preferable to
Mercurial. It stores integer timestamps in the dirstate and would thus
like to avoid doing any float/int comparisons or conversions. We will
continue to have to deal with 1-second granularity on filesystems for
quite some time, so this won't significantly hinder our capabilities.
This has some impact on our file cache validation code in that it
lowers timestamp resolution. But as we still have to deal with
low-resolution filesystems, we're not relying on this anyway.
An alternate approach is to use stat[ST_MTIME], which is guaranteed to
be an integer. But since this support isn't already in our extension,
we can't depend on it being available without adding a hard Python->C
API dependency that's painful for people like yours truly who have
bisect regularly and people without compilers.
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.
For great justice.
Because st.st_mtime is computed as 'sec + 1e-9 * nsec' and double is too narrow
to represent nanoseconds, int(st.st_mtime) can be 'sec + 1'. Therefore, that
value could be different from the one got by osutils.listdir().
This patch fixes the problem by accessing to raw st_mtime by tuple index.
It catches TypeError to fall back to st.st_mtime because our osutil.stat does
not support tuple index. In dirstate.normal(), 'st' is always a Python stat,
but in dirstate.status(), it can be either a Python stat or an osutil.stat.
Thanks to vgatien-baron@janestreet.com for finding the root cause of this
subtle problem.
Previously, a read(N) where N was less than the length of the first
available chunk would mutate the deque instance twice and allocate a new
str from the slice of the existing chunk. Profiling drawed my attention
to these as a potential hot spot during changegroup reading.
This patch makes the code more complicated in order to avoid the
aforementioned 3 operations.
On a pre-generated mozilla-central gzip bundle, this series has the
following impact on `hg unbundle` performance on my MacBook Pro:
before: 358.21 real 317.69 user 38.49 sys
after: 301.57 real 262.69 user 37.11 sys
delta: -56.64 real -55.00 user -1.38 sys
The new code results in simpler logic within the while loop. It is also
faster since we avoid performing operations on the queue and buf
collections. However, there shouldn't be any super hot loops for this
since the whole point of chunkbuffer is to avoid reading large amounts
of data at once. This does, however, make it easier to optimize
chunkbuffer in a subsequent patch.
For "space saving", bundle1 "strip" the first two bytes of the BZ stream since
they always are 'BZ'. So the current code boostrap the uncompressor with 'BZ'.
This hack is impractical in more generic case so we move it in a dedicated
"decompression".
The assumption that dynamically computing the length of the buffer was N^2, but
negligible because fast was False. So we drop the dynamic computation and
manually keep track of the buffer length.
This comment is the remains of a intermediate implementation using
self._buffer += data
This implementation never made it to the repository and we can safely drop the
comment.
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".
This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.
This patch was produced by running `2to3 -f except -w -n .`.
We'll use it to detect when a sshpeer have server output to be displayed.
The implementation is super basic because all case support is not the focus of
this series.
To restore real time server output through ssh, we need to using polling feature
(like select) on the pipes used to communicate with the ssh client. However
we cannot use select alongside python level buffering of these pipe (because we
need to know if the buffer is non-empty before calling select).
However, unbuffered performance are terrible, presumably because the 'readline'
call is issuing 'read(1)' call until it find a '\n'. To work around that we
introduces our own overlay that do buffering by hand, exposing the state of the
buffer to the outside world.
The usage of polling IO will be introduced later in the 'sshpeer' module. All
its logic will be very specific to the way mercurial communicate over ssh and
does not belong to the generic 'util' module.
We will need unbuffered IO to restore real time output with ssh peer.
Changeset b61b215fcfa8 seems to indicate playing with this value could be
dangerous, but does not indicate why.
One case where that would happen is while trying to resolve a subrepo, if the
path to the subrepo was actually a broken symlink. This bug was exposed by an
hg-git test.
According to 6b1369445b7b introducing "windows._removedirs()":
If a hg repository including working directory is a reparse point
(directory symlinked or a junction point), then using
os.removedirs will remove the reparse point erroneously.
"windows._removedirs()" should be used instead of "os.removedirs()" on
Windows.
This patch adds "removedirs" as platform depending function to replace
"os.removedirs()" invocations for portability and safety
An upcoming commit requires that match.py be able to call scmutil.dirs(), but
when match.py imports scmutil, a dependency cycle is created. This commit
avoids the cycle by moving dirs() and its related finddirs() function from
scmutil to util, which match.py already depends on.
Hi there!
Fixed date names are helpful for automated systems. So it is possible to
use english date parameter even if the underlying system uses another
locale.
We have here a jenkins with build jobs on different slaves that will do
some operations with "dates" parameter. Some systems uses English locale
and some systems uses German locale. So we needed to configure the job to
uses other date names.
As this is really annoying to keep the systems locale in mind for some
operations I looked into util.py. It would be helpful for automated systems
if the "default English" date names would even usable on other
locales.
I attached a simple patch for this.
Best regards
André Klitzing
Some code paths use 'copyfiles' (full tree) for a single file to take advantage
of the best-effort-hard-linking parameter. We add similar parameter and logic
to 'copyfile' (single file) for this purpose.
The single file version have the advantage to overwrite the destination file if
it exists.
This allows taking advantage of Python 2.5+'s struct.Struct, which
provides a slightly faster unpack due to reusing formats. Sadly,
.unpack_from is significantly slower.
Garbage collection behave pathologically when creating a lot of containers. As
we do that more than once it become sensible to have a decorator for it. See
inline documentation for details.
This patch uses "False" as default value of "notindexed" argument,
even though "vfs.makedir()" uses "True" for it, because "os.mkdir()"
doesn't set "_FILE_ATTRIBUTE_NOT_CONTENT_INDEXED" attribute to newly
created directories.
Future patches will allow extensions to choose which order a namespace should
output in the log, so we add a way for sortdict to insert to a specific
location.
Future patches will start using sortdict for log operations where order is
important. Adding iteritems removes the headache of having to remember to use
items() if the object is a sortdict.
Previously, we'd scan through the entire directory listing looking for a
normalized match. This is O(N) in the number of files in the directory. If we
decide to call util.fspath on each file in it, the overall complexity works out
to O(N^2). This becomes a problem with directories a few thousand files or
larger.
Switch to using a dictionary instead. There is a slightly higher upfront cost
to pay, but for cases like the above this is amortized O(1). Plus there is a
lower constant factor because generator comprehensions are faster than for
loops, so overall it works out to be a very small loss in performance for 1
file, and a huge gain when there's more.
For a large repo with around 200k files in it on a case-insensitive file
system, for a large directory with over 30,000 files in it, the following
command was tested:
ls | shuf -n $COUNT | xargs hg status
This command leads to util.fspath being called on $COUNT files in the
directory.
COUNT before after
1 0.77s 0.78s
100 1.42s 0.80s
1000 6.3s 0.96s
I also tested with COUNT=10000, but before took too long so I gave up.
util.system() copies subprocess' output through pipe if output file is not
stdout. Because a file iterator has internal buffering, output won't be
flushed until enough data is available. Therefore, it could easily miss
important messages such as "waiting for lock".
This effectively backs out changeset 7582042d6cce.
The API change is done so that both util.sha1 and util.md5 can be called the
same way. The function is moved in order to use it for md5 checksumming for
an upcoming bundle2 feature.
templates, help and locale data is normally stored as sub folders in the
directory containing the source of the mercurial module. In a frozen build they
live as sub folders next to 'hg.exe' and 'library.zip'.
These different kind of data were handled in different ways. Unify that by
introducing util.datapath. The value is computed from the environment and is
always used, so we just calculate the value on module load.
Reading all available data from a pipe has a platform-dependent
implementation.
This patch establishes platform.readpipe() by copying the
inline implementation in sshpeer.readerr(). The implementations
for POSIX and Windows are currently identical. The POSIX
implementation will be changed in a subsequent patch.
The escape method in at least one of the modules called 're2' is in C. This
means it is significantly faster than the Python code written in 're'.
An upcoming patch will have benchmarks.
Before this patch, 'util.ellipsis' tried to avoid splitting at
intermediate multi-byte sequence, but its implementation was incorrect.
Internal function '_ellipsis' trims specified unicode sequence not at
most maxlength 'columns in display', but at most maxlength number of
'unicode characters'.
def _ellipsis(text, maxlength):
if len(text) <= maxlength:
return text, False
else:
return "%s..." % (text[:maxlength - 3]), True
In many encodings, number of unicode characters can be different from
columns in display.
This patch replaces 'ellipsis' implementation by 'encoding.trim',
which can trim string at most maxlength columns in display correctly,
even though specified string contains multi-byte characters.
'_ellipsis' is removed in this patch, because it is referred only from
'ellipsis'.
Before this patch, "util.cachefunc()" caches the value returned by the
specified function into dictionary "cache", even if the specified
function takes no arguments.
In such case, "cache" has at most one entry, and distinction between
entries in "cache" is meaningless.
This patch adds the code path to "cachefunc()" for the function taking
no arguments for efficiency: to store only one cached value, using
list "cache" is a little faster than using dictionary "cache".
Close another stream (default stdout, which often is buffered) before writing
to the primary stream (default stderr, which often is unbuffered). The primary
stream is also flushed after writing (in case it is buffered).
This fixes non-deterministic output order, especially on windows.
This is often very handy when hacking/debugging.
Calling util.debugstacktrace('hey') from a place in hg will give something like:
hey at:
./hg:38 in <module>
/home/user/hgsrc/mercurial/dispatch.py:28 in run
/home/user/hgsrc/mercurial/dispatch.py:65 in dispatch
/home/user/hgsrc/mercurial/dispatch.py:88 in _runcatch
/home/user/hgsrc/mercurial/dispatch.py:740 in _dispatch
/home/user/hgsrc/mercurial/dispatch.py:514 in runcommand
/home/user/hgsrc/mercurial/dispatch.py:830 in _runcommand
/home/user/hgsrc/mercurial/dispatch.py:801 in checkargs
/home/user/hgsrc/mercurial/dispatch.py:737 in <lambda>
/home/user/hgsrc/mercurial/util.py:472 in check
...
Backslashes (\) in paths were encoded to %C5 when converting from url to
string. This does not look nice for windows paths. And it introduces many
problems when running tests on windows.
Paths ending with \ will fail the verification introduced in 0bc0c17d663e when
checking out on Windows ... and if it didn't fail it would probably not do what
the user expected.
Propertycache used standard attribute assignment. In the repoview case, this
assignment was forwarded to the unfiltered repo. This result in:
(1) unfiltered repo got a potentially wrong cache value,
(2) repoview never reused the cached value.
This patch replaces the standard attribute assignment by an assignment to
`objc.__dict__` which will bypass the `repoview.__setattr__`. This will not
affects other `propertycache` users and it is actually closer to the semantic we
need.
The interaction of `propertycache` and `repoview` are now tested in a python
test file.
When running 'hg pull --rebase', I was seeing this exception 100% of the
time as the python process was closing down:
Exception TypeError: TypeError("'NoneType' object is not callable",) in
<bound method Popen.__del__ of <subprocess.Popen object at 0x937c10>> ignored
By storing the subprocess on the sshpeer, the subprocess seems to clean up
correctly, and I no longer see the exception. I have no idea why this actually
works, but I get a 0% repro if I store the subprocess in self.subprocess,
and a 100% repro if I store None in self.subprocess.
Possibly related to issue 2240.
I often want to measure the cost of a function call before/after
an optimization, where using top level "hg --time" timing introduces
enough other noise that I can't tell if my efforts are having an
effect.
This decorator allows a developer to measure a function's cost with
finer granularity.
With the new parallel update code, it is possible for multiple
workers to try to create a hierarchy of directories at the same
time. This is hard to trigger in general, but most likely during
initial checkout.
To deal with these races, we introduce a new ensuredirs function
whose contract is to ensure that a directory hierarchy exists - it
will ignore a failure that implies that the desired directory already
exists.
In certain cases we would like to have a cache of the last N results of a
given computation, where N is small. This will be used in an upcoming patch to
increase the size of the manifest cache from 1 to 3.
Adding support to parsedate in util module to understand the more idiomatic
dates 'today' and 'yesterday'.
Added unified tests and docstring tests for added functionality.
This makes a big difference to performance.
In a clean working directory containing 170,000 files, performance of
"hg --time diff" improves from 2.38 seconds to 1.69.
Some of the localrepo property caches must be computed unfiltered and
stored globally. Some others must see the filtered version and store data
relative to the current filtering.
This changeset introduces two classes `unfilteredpropertycache`
and `filteredpropertycache` for this purpose. A new function
`hasunfilteredcache` is introduced for unambiguous checking for cached
values on unfiltered repos.
A few tweaks are made to the property cache class to allow overriding
the way the computed value is stored on the object.
Some logic relative to _tagcaches is cleaned up in the process.
The old str-based += collector performed very nicely on Linux, but
turns out to be quadratically expensive on Windows, causing
chunkbuffer to dominate in profiles.
This list-based version has been measured to significantly improve
performance with large chunks on Windows, with negligible overall
overhead on Linux (though microbenchmarks show it to be about 50% slower).
This may increase memory overhead where += didn't behave quadratically. If we
want to gather up 1G of data to join, we temporarily have 1G in our
list and 1G in our string.
bffd8f8dfc85 claims this was needed "to avoid cyclic dependency", but there is
no cyclic dependency.
windows.py already imports encoding, posix.py can import it too, so we can
simply use encoding.upper in windows.py and in posix.py.
(this is a partial backout of bffd8f8dfc85)
There are two sets of Python re2 bindings available on the internet;
this code works with both.
Using re2 can greatly improve "hg status" performance when a .hgignore
file becomes even modestly complex.
Example: "hg status" on a clean tree with 134K files, where "hg
debugignore" reports a regexp 4256 bytes in size.
no .hgignore: 1.76 sec
Python re: 2.79
re2: 1.82
The overhead of regexp matching drops from 1.03 seconds with stock
re to 0.06 with re2.
(For comparison, a git repo with the same contents and .gitignore
file runs "git status -s" in 1.71 seconds, i.e. only slightly faster
than hg with re2.)
There have been quite a few places where we pop elements off the
front of a list. This can turn O(n) algorithms into something more
like O(n**2). Python has provided a deque type that can do this
efficiently since at least 2.4.
As an example of the difference a deque can make, it improves
perfancestors performance on a Linux repo from 0.50 seconds to 0.36.
This patch contains support for Plan 9 from Bell Labs. A README is
provided in contrib/plan9 which describes the port in greater detail.
A new extension is also provided named factotum which permits the
factotum(4) authentication agent to provide credentials for HTTP
repositories. This extension is also applicable to other POSIX
platforms which make use of Plan 9 from User Space (aka plan9ports).
Currently, the 'user' filter is using util.shortuser(text) (which clearly
doesn't extract only the user portion of an email address, even though the
help text says it does).
The new 'emailuser' filter uses the new util.emailuser(text) function which,
instead, does exactly that.
The help text on the 'user' filter has been modified accordingly.
this patch disuses length check against un-normcase()-ed filenames
gotten by "os.listdir()", because there is no assurance that
filesystem stores filenames normalized except in letter case, even
though some case insensitive filesystems (in some environment, for
some language setting) store them in such manner.
'dirstate._normalize()', the only caller of 'util.fspath()', has
already confirmed exsistance of specified file as relative to root.
so, this patch omits path-absoluteness/existance check from
'util.fspath()'.
some hg operation (e.g.: qpush) create new files after first
dirstate.walk()-ing, and it invalidates _fspathcache for fspath().
then, fspath() will fail to look up specified name in _fspathcache.
this causes case preservation breaking, because parts of already
normcase()-ed path are used as result at that time.
in this case, file creation and writing out should be done before
fspath() invocation, so the second invocation of os.listdir() has not
so much impact on runtime performance.
this patch uses encoding.lower/upper for case folding, because ones of
str can not fold case of non ascii characters correctly.
to avoid cyclic dependency and to encapsulate logic of normcase in
each platforms, this patch introduces encodinglower/encodingupper in
both posix/windows specific files.
this patch does not change implementation of normcase() in posix.py,
because we do not know the encoding of filenames on POSIX.
some "normcase()" are excluded from function wrap list in
hgext/win32mbcs.py, because they become encoding aware by this patch.
'dirstate._normalize()', the only caller of 'util.fspath()', has
already normcase()-ed path before invocation of it.
normcase()-ed root can be cached on dirstate side, too.
so, this patch changes 'util.fspath()' API specification to avoid
normcase()-ing in it.
Before:
>>> str(url('file:///c:/tmp/foo/bar'))
'file:c%3C/tmp/foo/bar'
After:
>>> str(url('file:///c:/tmp/foo/bar'))
'file:///c%3C/tmp/foo/bar'
The previous behaviour had no effect on mercurial itself (clone command for
instance) because we fortunately called .localpath() on the parsed URL.
hgsubversion was not so lucky and cloning a local subversion repository on
Windows no longer worked on the default branch (it works on stable because
2b62605189dc defeats the hasdriveletter() test in url class).
I do not know if the %3C is correct or not but svn accepts file:// URLs
containing it. Mads fixed it in 2b62605189dc, so we can always backport should
the need arise.
Python's time module sets timezone and altzone based on UTC offsets of
two dates: first and middle day of the current year. This approach
doesn't work on a year when DST rules change.
For example Russia abandoned winter time this year, so the correct UTC
offset should be +4 now, but time.timezone returns 3 hours difference
because that's what it was on 01.01.2011.
Related python issue: http://bugs.python.org/issue1647654
It was very elegant that httpsendfile implemented __len__ like a string. It was
however also dangerous because that protocol can't handle sizes bigger than 2 GB.
Mercurial tried to work around that, but it turned out to be too easy to
introduce new errors in this area.
With this change __len__ is no longer implemented at all and the code will work
the same way for short and long posts.
neither number of 'bytes' in any encoding nor 'characters' is
appropriate to calculate terminal columns for specified string.
this patch modifies MBTextWrapper for:
- overriding '_wrap_chunks()' to make it use not built-in 'len()'
but 'encoding.colwidth()' for columns of string
- fixing '_cutdown()' to make it use 'encoding.colwidth()' instead
of local, similar but incorrect implementation
this patch also modifies 'encoding.py':
- dividing 'colwith()' into 2 pieces: one for calculation columns of
specified UNICODE string, and another for rest part of original
one. the former is used from MBTextWrapper in 'util.py'.
- preventing 'colwidth()' from evaluating HGENCODINGAMBIGUOUS
configuration per each invocation: 'unicodedata.east_asian_width'
checking is kept intact for reducing startup cost.
This re-introduces the unicode conversion what was lost in e5976ee55f4b 5 years
ago and had the comment:
To avoid corrupting multi-byte characters in line, we must wrap
a Unicode string instead of a bytestring.
urllib2 password manager does not strip credentials from URIs registered with
add_password() and compare them with stripped URIs in find_password(). Remove
credentials from URIs returned by util.url.authinfo(). It sometimes works when
no port was specified as the URI host is registered too.
8264e5172141 made sure that paths that seemed to start with a windows drive
letter would not get an extra leading slash.
localpath should thus not try to handle this case by removing a leading slash,
and this special handling is thus removed.
(The localpath handling of this case was wrong anyway, because paths that look
like they start with a windows drive letter can't have a leading slash.)
A quick verification of this is to run 'hg id file:///c:/foo/bar/'.
The usual contract is that close() makes your writes permanent, so
atomictempfile's use of close() to *discard* writes (and rename() to
keep them) is rather unexpected. Thus, change it so close() makes
things permanent and add a new discard() method to throw them away.
discard() is only used internally, in __del__(), to ensure that writes
are discarded when an atomictempfile object goes out of scope.
I audited mercurial.*, hgext.*, and ~80 third-party extensions, and
found no one using the existing semantics of close() to discard
writes, so this should be safe.
Before, makedirs could call itself recursively with the same path name it was
given, relying on sane file system behavior to terminate the recursion. That
could cause infinite recursion on insane file systems.
Instead we now call mkdir explicitly after having created parent directory
recursively. Exceptions from this mkdir is not swallowed.
This re-introduces the unicode conversion what was lost in e5976ee55f4b 5 years
ago and had the comment:
To avoid corrupting multi-byte characters in line, we must wrap
a Unicode string instead of a bytestring.
reducing it to a NOP on Windows.
This eliminates a pointless stat call on Windows and reduces the risk of
interferring with other processes (e.g. AV-scanners, file change watchers).
See also http://mercurial.selenic.com/wiki/UnlinkingFilesOnWindows, item 2d
util is never imported by any other name than util, so this is mostly just a
simple search and replace from util.localpath to util.urllocalpath (assuming
other uses of util.localpath already has been renamed).