Commit Graph

704 Commits

Author SHA1 Message Date
Bryan O'Sullivan
58c82f12c9 osutil: write a C implementation of statfiles for unix
This makes a big difference to performance.

In a clean working directory containing 170,000 files, performance of
"hg --time diff" improves from 2.38 seconds to 1.69.
2012-12-03 12:40:24 -08:00
Pierre-Yves David
b25a880a8e clfilter: add a propertycache that must be unfiltered
Some of the localrepo property caches must be computed unfiltered and
stored globally. Some others must see the filtered version and store data
relative to the current filtering.

This changeset introduces two classes `unfilteredpropertycache`
and `filteredpropertycache` for this purpose. A new function
`hasunfilteredcache` is introduced for unambiguous checking for cached
values on unfiltered repos.

A few tweaks are made to the property cache class to allow overriding
the way the computed value is stored on the object.

Some logic relative to _tagcaches is cleaned up in the process.
2012-10-08 20:02:20 +02:00
Matt Mackall
af5b4b62cf util: make chunkbuffer non-quadratic on Windows
The old str-based += collector performed very nicely on Linux, but
turns out to be quadratically expensive on Windows, causing
chunkbuffer to dominate in profiles.

This list-based version has been measured to significantly improve
performance with large chunks on Windows, with negligible overall
overhead on Linux (though microbenchmarks show it to be about 50% slower).

This may increase memory overhead where += didn't behave quadratically. If we
want to gather up 1G of data to join, we temporarily have 1G in our
list and 1G in our string.
2012-11-26 15:42:52 -06:00
Bryan O'Sullivan
e3555667b8 util: implement a faster os.path.split for posix systems
This is not yet used.
2012-09-14 12:08:17 -07:00
Mads Kiilerich
2372d51b68 fix wording and not-completely-trivial spelling errors and bad docstrings 2012-08-15 22:39:18 +02:00
Mads Kiilerich
2f4504e446 fix trivial spelling errors 2012-08-15 22:38:42 +02:00
Ross Lagerwall
661779d660 util: replace util.nulldev with os.devnull
Python since 2.4 has supported os.devnull so having util.nulldev
is unnecessary.
2012-08-04 07:14:40 +02:00
Bryan O'Sullivan
9f3858de6e util: delegate seek and tell methods of atomictempfile 2012-07-23 15:38:43 -07:00
Adrian Buehlmann
0fe77b0110 util, posix: eliminate encodinglower and encodingupper
bffd8f8dfc85 claims this was needed "to avoid cyclic dependency", but there is
no cyclic dependency.

windows.py already imports encoding, posix.py can import it too, so we can
simply use encoding.upper in windows.py and in posix.py.

(this is a partial backout of bffd8f8dfc85)
2012-07-18 14:41:58 +02:00
Bryan O'Sullivan
3f45806d34 matcher: use re2 bindings if available
There are two sets of Python re2 bindings available on the internet;
this code works with both.

Using re2 can greatly improve "hg status" performance when a .hgignore
file becomes even modestly complex.

Example: "hg status" on a clean tree with 134K files, where "hg
debugignore" reports a regexp 4256 bytes in size.

  no .hgignore: 1.76 sec
  Python re:    2.79
  re2:          1.82

The overhead of regexp matching drops from 1.03 seconds with stock
re to 0.06 with re2.

(For comparison, a git repo with the same contents and .gitignore
file runs "git status -s" in 1.71 seconds, i.e. only slightly faster
than hg with re2.)
2012-06-01 15:26:20 -07:00
Bryan O'Sullivan
1de6d211c8 util: simplify queue management in chunkbuffer
This also fixes a small wire protocol performance regression.
2012-06-05 16:52:20 -07:00
Bryan O'Sullivan
abdf4a8227 util: subclass deque for Python 2.4 backwards compatibility
It turns out that Python 2.4's deque type is lacking a remove method.
We can't implement remove in terms of find, because it doesn't have
find either.
2012-06-01 17:05:31 -07:00
Bryan O'Sullivan
bef5b61512 cleanup: use the deque type where appropriate
There have been quite a few places where we pop elements off the
front of a list.  This can turn O(n) algorithms into something more
like O(n**2).  Python has provided a deque type that can do this
efficiently since at least 2.4.

As an example of the difference a deque can make, it improves
perfancestors performance on a Linux repo from 0.50 seconds to 0.36.
2012-05-15 10:46:23 -07:00
Matt Mackall
f4a789ba4d merge with stable 2012-05-21 17:35:28 -05:00
Augie Fackler
3dc5160169 util: fix bad variable use in bytecount introduced by ad5e3bec298e 2012-05-21 14:24:24 -05:00
Brodie Rao
7f47d4e347 check-code: ignore naked excepts with a "re-raise" comment
This also promotes the naked except check from a warning to an error.
2012-05-13 13:18:06 +02:00
Brodie Rao
46ce54af4d cleanup: replace more naked excepts with more specific ones 2012-05-13 13:17:31 +02:00
Brodie Rao
c577fac135 cleanup: replace naked excepts with more specific ones 2012-05-12 16:02:45 +02:00
Matt Mackall
d38924097e util: create bytecount array just once
This avoids tons of gettext calls on workloads that call bytecount a lot.
2012-04-12 20:22:18 -05:00
Steven Stallion
d79ff306e5 plan9: initial support for plan 9 from bell labs
This patch contains support for Plan 9 from Bell Labs. A README is
provided in contrib/plan9 which describes the port in greater detail.
A new extension is also provided named factotum which permits the
factotum(4) authentication agent to provide credentials for HTTP
repositories. This extension is also applicable to other POSIX
platforms which make use of Plan 9 from User Space (aka plan9ports).
2012-04-08 12:43:41 -07:00
Matteo Capobianco
8305e8d305 templates/filters: extracting the user portion of an email address
Currently, the 'user' filter is using util.shortuser(text) (which clearly
doesn't extract only the user portion of an email address, even though the
help text says it does).

The new 'emailuser' filter uses the new util.emailuser(text) function which,
instead, does exactly that.

The help text on the 'user' filter has been modified accordingly.
2012-03-28 16:06:20 +02:00
FUJIWARA Katsunori
3abfeb7e54 icasefs: rewrite comment to explain situtation precisely 2011-12-24 00:52:06 +09:00
FUJIWARA Katsunori
b180efb872 icasefs: follow standard cache look up pattern 2011-12-24 00:51:14 +09:00
FUJIWARA Katsunori
1edd7d1c6d icasefs: disuse length check against un-normcase()-ed filenames
this patch disuses length check against un-normcase()-ed filenames
gotten by "os.listdir()", because there is no assurance that
filesystem stores filenames normalized except in letter case, even
though some case insensitive filesystems (in some environment, for
some language setting) store them in such manner.
2011-12-24 00:50:56 +09:00
FUJIWARA Katsunori
2d248cd109 icasefs: avoid path-absoluteness/existance check in util.fspath() for efficiency
'dirstate._normalize()', the only caller of 'util.fspath()', has
already confirmed exsistance of specified file as relative to root.

so, this patch omits path-absoluteness/existance check from
'util.fspath()'.
2011-12-16 21:09:40 +09:00
FUJIWARA Katsunori
b5973249bd icasefs: retry directory scan once for already invalidated cache
some hg operation (e.g.: qpush) create new files after first
dirstate.walk()-ing, and it invalidates _fspathcache for fspath().

then, fspath() will fail to look up specified name in _fspathcache.

this causes case preservation breaking, because parts of already
normcase()-ed path are used as result at that time.

in this case, file creation and writing out should be done before
fspath() invocation, so the second invocation of os.listdir() has not
so much impact on runtime performance.
2011-12-16 21:09:40 +09:00
Matt Mackall
7cf4e6eacb merge with stable 2011-12-16 19:05:59 -06:00
FUJIWARA Katsunori
fe972435d4 i18n: use encoding.lower/upper for encoding aware case folding
this patch uses encoding.lower/upper for case folding, because ones of
str can not fold case of non ascii characters correctly.

to avoid cyclic dependency and to encapsulate logic of normcase in
each platforms, this patch introduces encodinglower/encodingupper in
both posix/windows specific files.

this patch does not change implementation of normcase() in posix.py,
because we do not know the encoding of filenames on POSIX.

some "normcase()" are excluded from function wrap list in
hgext/win32mbcs.py, because they become encoding aware by this patch.
2011-12-16 21:09:41 +09:00
FUJIWARA Katsunori
055136813d icasefs: avoid normcase()-ing in util.fspath() for efficiency
'dirstate._normalize()', the only caller of 'util.fspath()', has
already normcase()-ed path before invocation of it.

normcase()-ed root can be cached on dirstate side, too.

so, this patch changes 'util.fspath()' API specification to avoid
normcase()-ing in it.
2011-12-16 21:09:40 +09:00
FUJIWARA Katsunori
c71595845e icasefs: use util.normcase() instead of lower() or os.path.normcase in fspath
this also avoids lower()-ing on each path components by reuse the path
normcase()-ed at beginning of function.
2011-12-16 21:09:40 +09:00
FUJIWARA Katsunori
f9ca02bd18 icasefs: consider as case sensitive if there is no counterevidence, for safety
for safety, this patch prevents case-less name from misleading into
case insensitivity, even though such names should not be used.
2011-12-16 21:09:40 +09:00
Matt Mackall
9f8ee10163 util: don't mess with builtins to emulate buffer() 2011-12-15 15:27:11 -06:00
Matt Mackall
49b0ffe198 util: clean up function ordering 2011-12-15 14:59:22 -06:00
Patrick Mezard
3a0effcd7b util: fix url.__str__() for windows file URLs
Before:

  >>> str(url('file:///c:/tmp/foo/bar'))
  'file:c%3C/tmp/foo/bar'

After:

  >>> str(url('file:///c:/tmp/foo/bar'))
  'file:///c%3C/tmp/foo/bar'

The previous behaviour had no effect on mercurial itself (clone command for
instance) because we fortunately called .localpath() on the parsed URL.
hgsubversion was not so lucky and cloning a local subversion repository on
Windows no longer worked on the default branch (it works on stable because
2b62605189dc defeats the hasdriveletter() test in url class).

I do not know if the %3C is correct or not but svn accepts file:// URLs
containing it. Mads fixed it in 2b62605189dc, so we can always backport should
the need arise.
2011-12-04 18:22:25 +01:00
Dmitry Panov
6649925e57 makedate: wrong timezone offset if DST rules changed this year (issue2511)
Python's time module sets timezone and altzone based on UTC offsets of
two dates: first and middle day of the current year. This approach
doesn't work on a year when DST rules change.

For example Russia abandoned winter time this year, so the correct UTC
offset should be +4 now, but time.timezone returns 3 hours difference
because that's what it was on 01.01.2011.

Related python issue: http://bugs.python.org/issue1647654
2011-11-13 00:29:26 +00:00
Mads Kiilerich
5d7000644a url: handle file://localhost/c:/foo "correctly"
The path was parsed correctly, but localpath prepended an extra '/' (as in
'/c:/foo') because it assumed it was an absolute unix path.
2011-11-16 00:10:56 +01:00
Matt Mackall
3eab62750e dirstate: fix case-folding identity for traditional Unix
We used to use os.path.normcase which was a no-op, which was unhelpful
for cases like VFAT on Linux.
2011-11-15 14:25:11 -06:00
Matt Mackall
9580de9b45 util: add a doctest for empty sha() calls 2011-10-31 15:41:39 -05:00
Matt Mackall
e82c2e671f merge with stable 2011-12-05 17:48:40 -06:00
Matt Mackall
75db0d196a merge with stable 2011-11-17 16:53:17 -06:00
Matt Mackall
bbf72a4e6e util: allow sha1() with no args
Normally this works because we replace util.sha1 with hashlib.sha1
after first use, but if the first user doesn't provide an arg, it
breaks.
2011-10-31 14:22:11 -05:00
Matt Mackall
226e1ff7c0 util: don't complain about '..' in path components not working on Windows 2011-10-24 16:57:14 -05:00
Matt Mackall
3a9838cebc merge with stable 2011-11-15 14:33:06 -06:00
Mads Kiilerich
6485196281 util: don't encode ':' in url paths
':' has no special meaning in paths, so there is no need for encoding it.

Not encoding ':' makes it easier to test on windows.
2011-11-07 03:25:10 +01:00
Matt Mackall
e538620d00 merge with stable 2011-09-27 18:50:18 -05:00
Kevin Gessner
d0a563a1b5 util: fix crash converting an invalid future date to string
Post-2038 timestamps cannot be handled on 32-bit architectures. Clamp
such dates to the maximum 32-bit timestamp.
2011-09-23 09:02:27 -07:00
Mads Kiilerich
35dbb9abb2 http: handle push of bundles > 2 GB again (issue3017)
It was very elegant that httpsendfile implemented __len__ like a string. It was
however also dangerous because that protocol can't handle sizes bigger than 2 GB.
Mercurial tried to work around that, but it turned out to be too easy to
introduce new errors in this area.

With this change __len__ is no longer implemented at all and the code will work
the same way for short and long posts.
2011-09-21 22:52:00 +02:00
Matt Mackall
19be20e2ef url: parse fragments first (issue2997) 2011-09-10 17:49:19 -05:00
FUJIWARA Katsunori
5b5a083f16 i18n: calculate terminal columns by width information of each characters
neither number of 'bytes' in any encoding nor 'characters' is
appropriate to calculate terminal columns for specified string.

this patch modifies MBTextWrapper for:

  - overriding '_wrap_chunks()' to make it use not built-in 'len()'
    but 'encoding.colwidth()' for columns of string

  - fixing '_cutdown()' to make it use 'encoding.colwidth()' instead
    of local, similar but incorrect implementation

this patch also modifies 'encoding.py':

  - dividing 'colwith()' into 2 pieces: one for calculation columns of
    specified UNICODE string, and another for rest part of original
    one. the former is used from MBTextWrapper in 'util.py'.

  - preventing 'colwidth()' from evaluating HGENCODINGAMBIGUOUS
    configuration per each invocation: 'unicodedata.east_asian_width'
    checking is kept intact for reducing startup cost.
2011-08-27 04:56:12 +09:00
Mads Kiilerich
ec483cfecb util: wrap lines with multi-byte characters correctly (issue2943)
This re-introduces the unicode conversion what was lost in e5976ee55f4b 5 years
ago and had the comment:
  To avoid corrupting multi-byte characters in line, we must wrap
  a Unicode string instead of a bytestring.
2011-08-06 23:52:20 +02:00