Commit Graph

853 Commits

Author SHA1 Message Date
FUJIWARA Katsunori
d7b7f8358b doc: describe detail about checkambig optional argument
This is followup for patches below, which add checkambig argument to
existing function.

  - 6508a36c1e44
  - ccbeace526ad
  - 936ec05504bf
  - a9a2d0013b68
2016-06-13 05:11:56 +09:00
Augie Fackler
493fe9d7a6 util: drop local aliases for md5, sha1, sha256, and sha512
This used to be needed to paper over hashlib not being in all Pythons
we support, but that's not a problem anymore, so we can simplify
things a little bit.
2016-06-10 00:13:23 -04:00
Pulkit Goyal
5f52c722cf py3: conditionalize cPickle import by adding in util
The cPickle is renamed to _pickle in python3 and this C extension is available
 in pickle which was not included in earlier versions. So imports are conditionalized
 to import cPickle in py2 and pickle in py3. Moreover the use of pickle in py2 is
 switched to cPickle as the C extension is faster. The hack is added in util.py and
the modules import util.pickle
2016-06-04 14:38:00 +05:30
FUJIWARA Katsunori
01b2069695 util: add __ne__ to filestat class for consistency
This is follow up for 41ed93728910, which introduced filestat class.
2016-06-03 00:44:20 +09:00
Gregory Szorc
a3fac9baf7 util: add sha256
Upcoming patches will teach host fingerprint checking to verify
non-SHA1 fingerprints.

Many x509 certificates these days are SHA-256. And modern browsers
often display the SHA-256 fingerprint for certificates. Since
SHA-256 fingerprints are highly visible and easy to obtain, we
want to support them for fingerprint pinning. So add SHA-256
support to util.

I did not add SHA-256 to DIGESTS and DIGESTS_BY_STRENGTH because
this will advertise the algorithm on the wire protocol. I wasn't
sure if that would be appropriate. I'm playing it safe by leaving
it out for now.
2016-05-28 12:57:28 -07:00
FUJIWARA Katsunori
24a3947a41 util: make copyfile avoid ambiguity of file stat if needed
In some cases below, copying from backup is used to restore original
contents of a file. If copying keeps ctime, mtime and size of a file,
restoring is overlooked, and old contents cached before restoring
isn't invalidated as expected.

  - failure of transaction before closing (from '.hg/journal.backup.*')
  - rollback of previous transaction (from '.hg/undo.backup.*')

To avoid such problem, this patch makes copyfile() avoid ambiguity of
file stat, if needed.

Ambiguity check is executed, only if:

  - checkambig=True is specified (not all copying needs ambiguity check), and
  - destination file exists before copying

This patch also adds 'not (copystat and checkambig)' assertion,
because combination of copystat and checkambig is meaningless.

This patch is a part of preparation for "Exact Cache Validation Plan":

    https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
2016-05-19 00:20:38 +09:00
FUJIWARA Katsunori
0242ba3045 util: make atomictempfile avoid ambiguity of file stat if needed
Ambiguity check is executed at close(), only if:

  - atomictempfile is created with checkambig=True, and
  - target file exists before renaming

This restriction avoids performance decrement by needless examination
of file stat (for example, filelog doesn't need exact cache
validation, even though it uses atomictempfile to write changes out).

See description of filestat class for detail about why the logic in
this patch works as expected.

This patch is a part of preparation for "Exact Cache Validation Plan":

    https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
2016-05-19 00:20:38 +09:00
FUJIWARA Katsunori
70f0cd3fdd util: add filestat class to detect ambiguity of file stat
Current posix.cachestat implementation might overlook change of a
file, if changing keeps ctime, mtime and size of file. Comparison of
inode number also overlooks changing in such situation, because inode
number is rapidly reused.

Contents of a file cached before changing isn't invalidated as
expected, if change of a file is overlooked for this "ambiguity" of
file stat.

This patch adds filestat class to detect ambiguity of file stat.

This patch is a part of preparation for "Exact Cache Validation Plan":

    https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
2016-05-19 00:20:37 +09:00
Adam Simpkins
c92a68e75e util: fix race in makedirs()
Update makedirs() to ignore EEXIST in case someone else has already created the
directory in question.  Previously the ensuredirs() function existed, and was
nearly identical to makedirs() except that it fixed this race.  Unfortunately
ensuredirs() was only used in 3 places, and most code uses the racy makedirs()
function.  This fixes makedirs() to be non-racy, and replaces calls to
ensuredirs() with makedirs().

In particular, mercurial.scmutil.origpath() used the racy makedirs() code,
which could cause failures during "hg update" as it tried to create backup
directories.

This does slightly change the behavior of call sites using ensuredirs():
previously ensuredirs() would throw EEXIST if the path existed but was a
regular file instead of a directory.  It did this by explicitly checking
os.path.isdir() after getting EEXIST.  The makedirs() code did not do this and
swallowed all EEXIST errors.  I kept the makedirs() behavior, since it seemed
preferable to avoid the extra stat call in the common case where this directory
already exists.  If the path does happen to be a file, the caller will almost
certainly fail with an ENOTDIR error shortly afterwards anyway.  I checked
the 3 existing call sites of ensuredirs(), and this seems to be the case for
them.
2016-04-26 15:32:59 -07:00
timeless
109fcbc79e pycompat: switch to util.urlreq/util.urlerr for py3 compat 2016-04-06 23:22:12 +00:00
timeless
7816ecf261 pycompat: add util.urlerr util.urlreq classes for py3 compat
python3 url.request and url.error are mapped as util.urlreq/util.urlerr
python2 equivalents from urllib/urllib2 are mapped according to the py3
hierarchy
2016-04-07 00:05:48 +00:00
Adrian Buehlmann
86883218e1 util: add doctest to datestr() 2016-04-11 19:46:50 +02:00
Florent Gallaire
ecd665adf8 date: fix boundary check of negative integer 2016-04-12 00:30:28 +02:00
timeless
26ef04b4e1 pycompat: add util.stringio to handle py3 divergence
util.stringio = cStringIO.StringIO / io.StringIO
2016-04-06 20:31:31 +00:00
timeless
bf94a35755 util: use __code__ (available since py2.6) 2016-03-29 17:43:23 +00:00
Adrian Buehlmann
0262d50094 util: fix doc for datestr()
timezone parameter was removed with f88c82e9a297
2016-04-08 22:15:06 +02:00
Florent Gallaire
478ac5c37a date: reallow negative timestamp, fix for Windows buggy gmtime() (issue2513)
DVCS are very useful to store various texts (as legislation) written before
Unix epoch. Fri, 13 Dec 1901 is a nice gain over Thu, 01 Jan 1970.
Revert 856a0e92d107 and b47a679b4e83, fix b2228cbaa635. Add tests.
2016-04-08 14:11:03 +02:00
timeless
9bbc2a69f1 pycompat: add empty and queue to handle py3 divergence
While the pycompat module will actually handle divergence, please
access these properties from the util module:
util.queue = Queue.Queue / queue.Queue
util.empty = Queue.Empty / queue.Empty
2016-04-06 20:00:49 +00:00
timeless
4241ed1e2d util: refactor getstackframes 2016-03-11 17:22:04 +00:00
timeless
a762bae068 util: reword debugstacktrace comment 2016-03-11 16:50:14 +00:00
timeless
3502e556ef util: enable getpid to be replaced
This will enable tests to write stable process ids.
2016-02-03 09:11:22 +00:00
Bryan O'Sullivan
506a2494ba util: rename ctxmanager's __call__ method to enter 2016-01-14 09:31:01 -08:00
Bryan O'Sullivan
0393e42ac3 util: simplify file I/O functions using context managers 2016-01-12 14:49:35 -08:00
Bryan O'Sullivan
a6e4850fe2 util: replace file I/O with readfile 2016-01-12 16:16:19 -08:00
Matt Harbison
4fc4388f40 util: adjust hgcmd() to handle frozen Mercurial on OS X
Previously, 'hg serve -d' was trying to exec the bundled python executable,
which failed with:

    Unknown option: --
    usage: python [option] ...
    Try 'python -h'...
    abort: child process failed to start

See the previous patch for details about the content of the various command
variables.  Note that unlike the previous patch here an application bundling
Mercurial could set $HG in the environment to get the correct result, there
isn't anything that a bundling application could do to get the correct result
here.

'hg serve -d' now launches under TortoiseHg, and there is a process listed in
the background, but a client process cannot connect to it for some reason, so
more investigation is needed.
2016-01-10 18:15:39 -05:00
Matt Harbison
c25bbe3998 util: adjust hgexecutable() to handle frozen Mercurial on OS X
sys.executable is "$appbundle/Contents/MacOS/python" when Mercurial is bundled
in a frozen app bundle on OS X, so that isn't appropriate.  It appears that this
was only visible for things launched via util.system(), like external hooks,
where $HG was set wrong.

It appears that Mercurial also uses 'sys.modules['__main__'].__file__' (here)
and 'sys.argv[0]' (in platform.gethgcmd()) to figure out the command to spawn.
In both cases, this points to "$appbundle/Contents/Resources/hg", which invokes
the system python since "/usr/bin/env python" is on the shebang line.  On my
system with a screwed up python install, I get an error importing the os module
if this script is invoked.

We could take the dirname of sys.executable and join 'hg' instead of this if we
want to be paranoid, but py2app boostrap is setting the environment variable
since 0.1.6 (current version is 0.9), so it seems safe and we might as well use
it.
2016-01-10 17:56:08 -05:00
Matt Harbison
38b5ec9395 util: adjust 'datapath' to be correct in a frozen OS X package
Apparently unlike py2exe, py2app copies the Mercurial source tree as-is to a
Contents/Resources subdirectory of an app bundle, and places its binary stub in
Contents/MacOS.  (The Windows install has the 'hgext' and 'mercurial' modules in
'lib/library.zip', while the help and templates subdirectories have been moved
out of the mercurial directory to the root of the installation.  I assume that
the python code living in a zip file is why "py2exe doesn't support __file__".)
Therefore, prior to this change, Mercurial in a frozen app bundle on OS X would
go looking for help *.txt, templates and locale info in Contents/MacOS, where
they don't exist.

There are only a handful of places that test for frozen, and not all of them are
wrong for OS X, so it seems wiser to handle them on a case by case basis, rather
that try to change mainfrozen().  The remaining cases are:

  1) util.hgexecutable() wrongly points to the bundled python executable, and
       affects $HG in util.system() launched processes (e.g. external hooks)
  2) util.hgcmd() wrongly points to the bundled python executable, but it seems
       to only affect 'hg serve -d'
  3) hook._pythonhook() may be OK, since I didn't see anything outrageous when
       printing sys.path from an internal hook.  I'm not sure if this special
       case is needed on OS X though.
  4) sslutil._plainapplepython() is OK, because sys.executable is not
       /usr/bin/python, nor is it in /System/Library/Frameworks
2016-01-10 17:49:01 -05:00
Augie Fackler
68c57ea46e util: don't capture exception with a name since we don't use it
Spotted by pyflakes.
2016-01-13 14:41:10 -05:00
Bryan O'Sullivan
e1c353db12 util: introduce ctxmanager, to avoid nested try/finally blocks
This is similar in spirit to contextlib.nested in Python <= 2.6,
but uses an extra level of indirection to avoid its inability to
clean up if an __enter__ method raises an exception.

Why add this mechanism?  It greatly simplifies scoped resource
management, and lets us eliminate several hundred lines of try/finally
blocks.  In many of these cases the "finally" is separated from the
"try" by hundreds of lines of code, which makes the connection
between resource acquisition and disposal difficult to follow.

(The preferred mechanism would be the "multi-with" syntax of 2.7+,
but Mercurial can't move to 2.7 for a while.)

Intended use:

>>> with ctxmanager(lambda: file('foo'), lambda: file('bar')) as c:
>>>    f1, f2 = c()

This will open both foo and bar when c() is invoked, and will close
both upon exit from the block.  If the attempt to open bar raises
an exception, the block will not be entered - but foo will still
be closed.
2016-01-11 15:25:43 -08:00
Gregory Szorc
0708caa27a util: remove outdated comment about construction overhead
An old implementation of this class (possibly only in my local repo)
allocated nodes in the cache during construction time, making
__init__ slow for large cache capacities. The current implementation
lazily grow the cache size, making this comment wrong.
2016-01-05 20:52:34 -08:00
Eric Sumner
c24dabdde6 lrucachedict: add copy method
This diff implements the standard dict copy() method for lrucachedicts, which
will be used in the pushrebase extension to make a copy of the manifestcache.
2015-12-30 13:10:53 -08:00
Matt Mackall
e9718e4615 merge with stable 2015-12-16 17:40:01 -06:00
Siddharth Agarwal
5544ae6c91 copyfile: add an optional parameter to copy other stat data
Contrary to the comment, I didn't see any evidence that we were copying
atime/mtime at all. This adds a parameter to copyfile to optionally copy it and
other stat data, with the default being to not copy it.

Many systems don't support changing the timestamp of a symlink, but we don't
need that in general anyway -- copystat is mostly useful for editors, most of
which will dereference symlinks anyway.
2015-12-12 11:00:04 -08:00
Gregory Szorc
f72b4b4450 util: reimplement lrucachedict
As part of attempting to more aggressively use the existing
lrucachedict, collections.deque operations were frequently
showing up in profiling output, negating benefits of caching.

Searching the internet seems to tell me that the most efficient
way to implement an LRU cache in Python is to have a dict indexing
the cached entries and then to use a doubly linked list to track
freshness of each entry. So, this patch replaces our existing
lrucachedict with a version using such a pattern.

The recently introduced perflrucachedict command reveals the
following timings for 10,000 operations for the following cache
sizes for the existing cache:

n=4     init=0.004079 gets=0.003632 sets=0.005188 mixed=0.005402
n=8     init=0.004045 gets=0.003998 sets=0.005064 mixed=0.005328
n=16    init=0.004011 gets=0.004496 sets=0.005021 mixed=0.005555
n=32    init=0.004064 gets=0.005611 sets=0.005188 mixed=0.006189
n=64    init=0.003975 gets=0.007684 sets=0.005178 mixed=0.007245
n=128   init=0.004121 gets=0.012005 sets=0.005422 mixed=0.009471
n=256   init=0.004143 gets=0.020295 sets=0.005227 mixed=0.013612
n=512   init=0.004039 gets=0.036703 sets=0.005243 mixed=0.020685
n=1024  init=0.004193 gets=0.068142 sets=0.005251 mixed=0.033064
n=2048  init=0.004070 gets=0.133383 sets=0.005160 mixed=0.050359
n=4096  init=0.004053 gets=0.265194 sets=0.004868 mixed=0.048352
n=8192  init=0.004087 gets=0.542218 sets=0.004562 mixed=0.032753
n=16384 init=0.004106 gets=1.064055 sets=0.004179 mixed=0.020367
n=32768 init=0.004034 gets=2.097620 sets=0.004260 mixed=0.013031
n=65536 init=0.004108 gets=4.106390 sets=0.004268 mixed=0.010191

As the data shows, the existing cache's retrieval performance
diminishes linearly with cache size. (Keep in mind the microbenchmark
is testing 100% cache hit rate.)

The new cache implementation reveals the following:

n=4     init=0.006665 gets=0.006541 sets=0.005733 mixed=0.006876
n=8     init=0.006649 gets=0.006374 sets=0.005663 mixed=0.006899
n=16    init=0.006570 gets=0.006504 sets=0.005799 mixed=0.007057
n=32    init=0.006854 gets=0.006459 sets=0.005747 mixed=0.007034
n=64    init=0.006580 gets=0.006495 sets=0.005740 mixed=0.006992
n=128   init=0.006534 gets=0.006739 sets=0.005648 mixed=0.007124
n=256   init=0.006669 gets=0.006773 sets=0.005824 mixed=0.007151
n=512   init=0.006701 gets=0.007061 sets=0.006042 mixed=0.007372
n=1024  init=0.006641 gets=0.007620 sets=0.006387 mixed=0.007464
n=2048  init=0.006517 gets=0.008598 sets=0.006871 mixed=0.008077
n=4096  init=0.006720 gets=0.010933 sets=0.007854 mixed=0.008663
n=8192  init=0.007383 gets=0.015969 sets=0.010288 mixed=0.008896
n=16384 init=0.006660 gets=0.025447 sets=0.011208 mixed=0.008826
n=32768 init=0.006658 gets=0.044390 sets=0.011192 mixed=0.008943
n=65536 init=0.006836 gets=0.082736 sets=0.011151 mixed=0.008826

Let's go through the results.

The new cache takes longer to construct. ~6.6ms vs ~4.1ms. However,
this is measuring 10,000 __init__ calls, so the difference is
~0.2us/instance. We currently only create lrucachedict for manifest
instances, so this regression is not likely relevant.

The new cache is slightly slower for retrievals for cache sizes
< 1024. It's worth noting that the only existing use of lurcachedict
is in manifest.py and the default cache size is 4. This regression
is worrisome. However, for n=4, the delta is ~2.9s for 10,000 lookups,
or ~0.29us/op. Again, this is a marginal regression and likely not
relevant in the real world. Timing `hg log -p -l 100` for
mozilla-central reveals that cache lookup times are dominated by
decompression and fulltext resolution (even with lz4 manifests).

The new cache is significantly faster for retrievals at larger
capacities. Whereas the old implementation has retrieval performance
linear with cache capacity, the new cache is constant time until much
larger values. And, when it does start to increase significantly, it
is a few magnitudes faster than the current cache.

The new cache does appear to be slower for sets when capacity is large.
However, performance is similar for smaller capacities. Of course,
caches should generally be optimized for retrieval performance because
if a cache is getting more sets than gets, it doesn't really make
sense to cache. If this regression is worrisome, again, taking the
largest regression at n=65536 of ~6.9ms for 10,000 results in a
regression of ~0.68us/op. This is not significant in the grand scheme
of things.

Overall, the new cache is performant at retrievals at much larger
capacity values which makes it a generally more useful cache backend.
While there are regressions, their absolute value is extremely small.
Since we aren't using lrucachedict aggressively today, these
regressions should not be relevant. The improved scalability of
lrucachedict should enable us to more aggressively utilize
lrucachedict for more granular caching (read: higher capacity caches)
in the near future. The impetus for this patch is to establish a cache
of decompressed revlog revisions, notably manifest revisions. And since
delta chains can grow to >10,000 and cache hit rate can be high, the
improved retrieval performance of lrucachedict should be relevant.
2015-12-06 19:04:10 -08:00
Yuya Nishihara
b65525f9c0 util: rename argument of isatty()
In general, "fd" is a file descriptor, but isatty() expects a file object.
We should call it "fp" or "fh".
2015-12-13 18:48:35 +09:00
Gregory Szorc
7b3fa04da1 util: use absolute_import 2015-12-12 23:14:08 -08:00
Gregory Szorc
c63ae89667 util: make hashlib import unconditional
hashlib was added in Python 2.5. As far as I can tell, SHA-512 is always
available in 2.6+. So move the hashlib import to the top of the file and
remove the one-off handling of SHA-512.
2015-12-12 23:30:37 -05:00
Gregory Szorc
11a7cac430 util: add versiontuple() for returning parsed version information
We have similar code in dispatch.py. Another consumer is about to be
created, so establish a generic function in an accessible location.
2015-11-24 14:23:51 -08:00
Gregory Szorc
5f5da12c81 util.datestr: use divmod()
We were computing the quotient and remainder of a division operation
separately. The built-in divmod() function allows us to do this with
a single function call. Do that.
2015-11-14 17:30:10 -08:00
Matt Mackall
ab46ec3b99 util: drop statmtimesec
We've globablly forced stat to return integer times which agrees with
our extension code, so this is no longer needed.

This speeds up status on mozilla-central substantially:

$ hg perfstatus
! wall 0.190179 comb 0.180000 user 0.120000 sys 0.060000 (best of 53)
$ hg perfstatus
! wall 0.275729 comb 0.270000 user 0.210000 sys 0.060000 (best of 36)
2015-11-19 13:15:17 -06:00
Matt Mackall
62d0fc250b util: disable floating point stat times (issue4836)
Alternate fix for this issue which avoids putting extra function calls
and exception handling in the fast path.

For almost all purposes, integer timestamps are preferable to
Mercurial. It stores integer timestamps in the dirstate and would thus
like to avoid doing any float/int comparisons or conversions. We will
continue to have to deal with 1-second granularity on filesystems for
quite some time, so this won't significantly hinder our capabilities.

This has some impact on our file cache validation code in that it
lowers timestamp resolution. But as we still have to deal with
low-resolution filesystems, we're not relying on this anyway.

An alternate approach is to use stat[ST_MTIME], which is guaranteed to
be an integer. But since this support isn't already in our extension,
we can't depend on it being available without adding a hard Python->C
API dependency that's painful for people like yours truly who have
bisect regularly and people without compilers.
2015-11-19 13:21:24 -06:00
Sean Farley
f6841c5b51 util: also catch IndexError
This makes life so, so much easier for hgwatchman, which provides a named tuple
but throws an IndexError instead of a TypeError.
2015-10-13 16:05:30 -07:00
Pierre-Yves David
30913031d4 error: get Abort from 'error' instead of 'util'
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.

For great justice.
2015-10-08 12:55:45 -07:00
Yuya Nishihara
5715ebc02c util: use tuple accessor to get accurate st_mtime value (issue4836)
Because st.st_mtime is computed as 'sec + 1e-9 * nsec' and double is too narrow
to represent nanoseconds, int(st.st_mtime) can be 'sec + 1'. Therefore, that
value could be different from the one got by osutils.listdir().

This patch fixes the problem by accessing to raw st_mtime by tuple index.

It catches TypeError to fall back to st.st_mtime because our osutil.stat does
not support tuple index. In dirstate.normal(), 'st' is always a Python stat,
but in dirstate.status(), it can be either a Python stat or an osutil.stat.

Thanks to vgatien-baron@janestreet.com for finding the root cause of this
subtle problem.
2015-10-04 22:35:36 +09:00
Yuya Nishihara
ea5724ad42 util: extract stub function to get mtime with second accuracy
This function is trivial but will need a long comment why it can't use
st.st_mtime. See the next patch for details.
2015-10-04 22:25:29 +09:00
Matt Harbison
bb1dafe069 util: extract stringmatcher() from revset
This is used to match against tags, bookmarks, etc in revsets.  It will be used
in a future patch to do the same tag matching in templater.
2015-08-22 22:52:18 -04:00
Gregory Szorc
fd27bc1367 util.chunkbuffer: avoid extra mutations when reading partial chunks
Previously, a read(N) where N was less than the length of the first
available chunk would mutate the deque instance twice and allocate a new
str from the slice of the existing chunk. Profiling drawed my attention
to these as a potential hot spot during changegroup reading.

This patch makes the code more complicated in order to avoid the
aforementioned 3 operations.

On a pre-generated mozilla-central gzip bundle, this series has the
following impact on `hg unbundle` performance on my MacBook Pro:

before: 358.21 real       317.69 user        38.49 sys
after:  301.57 real       262.69 user        37.11 sys
delta:  -56.64 real       -55.00 user        -1.38 sys
2015-10-05 17:36:32 -07:00
Gregory Szorc
c98628be54 util.chunkbuffer: refactor chunk handling logic
This will make the next patch easier to read. It provides no benefit on
its own.
2015-10-05 16:34:47 -07:00
Gregory Szorc
fdf56bc121 util.chunkbuffer: special case reading everything
The new code results in simpler logic within the while loop. It is also
faster since we avoid performing operations on the queue and buf
collections. However, there shouldn't be any super hot loops for this
since the whole point of chunkbuffer is to avoid reading large amounts
of data at once. This does, however, make it easier to optimize
chunkbuffer in a subsequent patch.
2015-10-05 16:28:12 -07:00
Yuya Nishihara
697917e9d9 util.system: compare fileno to see if it needs stdout redirection
Future patches will reopen stdout to be line-buffered, so sys.stdout may
be different object than sys.__stdout__.
2015-10-03 14:57:24 +09:00