Commit Graph

432 Commits

Author SHA1 Message Date
FUJIWARA Katsunori
5e2a0f5090 dirstate: read from pending file under HG_PENDING mode if it exists
True/False value of '_pendingmode' means whether 'dirstate.pending' is
used to initialize own '_map' and so on. When it is None, neither
'dirstate' nor 'dirstate.pending' is read in yet.

This is used to keep consistent view between '_pl()' and '_read()'.

Once '_pendingmode' is determined by reading one of 'dirstate' or
'dirstate.pending' in, '_pendingmode' is kept even if 'invalidate()'
is invoked. This should be reasonable, because:

  - effective 'invalidate()' invocation should occur only in wlock scope, and
  - wlock can't be gotten under HG_PENDING mode

'_trypending()' is defined as a normal function to factor similar code
path (in bookmarks and phases) out in the future easily.
2015-10-14 02:49:17 +09:00
FUJIWARA Katsunori
2ef2ab5d6d dirstate: make writing in-memory changes aware of transaction activity
This patch delays writing in-memory changes out, if transaction is
running.

'_getfsnow()' is defined as a function, to hook it easily for
ambiguous timestamp tests (see also fakedirstatewritetime.py)

'if tr:' code path in this patch is still disabled at this revision,
because there is no client invoking 'dirstate.write()' with repo
object.

BTW, this patch changes 'dirstate.invalidate()' semantics around
'dirstate.write()' in a transaction scope:

  before:
    with repo.transaction():
        dirstate.CHANGE('A')
        dirstate.write() # change for A is written out here
        dirstate.CHANGE('B')
        dirstate.invalidate() # discards only change for B

  after:
    with repo.transaction():
        dirstate.CHANGE('A')
        dirstate.write() # change for A is still kept in memory
        dirstate.CHANGE('B')
        dirstate.invalidate() # discards changes for A and B

Fortunately, there is no code path expecting the former, at least, in
Mercurial itself, because 'dirstateguard' was introduced to remove
such 'dirstate.invalidate()'.
2015-10-14 02:49:17 +09:00
FUJIWARA Katsunori
bde9721b18 dirstate: make functions for backup aware of transaction activity
Some comments in this patch assume that subsequent patch changes
'dirstate.write()' like as below:

    def write(self, repo):
        if not self._dirty:
            return
        tr = repo.currenttransaction()
        if tr:
            tr.addfilegenerator('dirstate', (self._filename,),
                                self._writedirstate, location='plain')
            return # omit actual writing out
        st = self._opener('dirstate', "w", atomictemp=True)
        self._writedirstate(st)

This patch makes '_savebackup()' write in-memory changes out, and it
causes clearing 'self._dirty'. If dirstate isn't changed after
'_savebackup()', subsequent 'dirstate.write()' never invokes
'tr.addfilegenerator()' because 'not self._dirty' is true.

Then, 'tr.writepending()' unintentionally returns False, if there is
no other (e.g. changelog) changes pending, even though dirstate
changes are already written out at '_savebackup()'.

To avoid such situation, this patch makes '_savebackup()' explicitly
invoke 'tr.addfilegenerator()', if transaction is running.

'_savebackup()' should get awareness of transaction before 'write()',
because the former depends on the behavior of the latter before this
patch.
2015-10-14 02:49:17 +09:00
FUJIWARA Katsunori
af664f3cc0 dirstate: move code paths for backup from dirstateguard to dirstate
This can centralize the logic to write in-memory changes out correctly
according to transaction activity into dirstate.

Passing 'repo' object to newly added functions is needed to examine
current transaction activity in subsequent patches, because 'dirstate'
itself doesn't have direct reference to it.
2015-10-14 02:49:17 +09:00
FUJIWARA Katsunori
77975f1ce1 parsers: make pack_dirstate take now in integer for consistency
On recent OS, 'stat.st_mtime' has a double precision floating point
value to represent nano seconds, but it is not wide enough for actual
file timestamp: nowadays, only 52 - 32 = 20 bit width is available for
decimal places in sec.

Therefore, casting it to 'int' may cause unexpected result. See also
changeset 8102a3981272 fixing issue4836 for detail.

For example, changed file A may be treated as "clean" unexpectedly in
steps below. "rounded now" is the value gotten by rounding via
'int(st.st_mtime)' or so.

    ---------------------+--------------------+------------------------
    "now"                |                    | timestamp of A (time_t)
    float  rounded time_t| action             | FS       dirstate
    ------ ------- ------+--------------------+-------- ---------------
    N+.nnn   N       N   |                    | ---      ---
                         | update file A      |  N
                         | dirstate.normal(A) |           N
    N+.999   N+1     N   |                    |
                         | dirstate.write()   |           N (*1)
                         |    :               |
                         | change file A      |  N
                         |    :               |
    N+1.00   N+1    N+1  |                    |
                         | "hg status" (*2)   |  N        N
    ------ ------- ------+--------------------+-------- ---------------

Timestamp N of A in dirstate isn't dropped at (*1), because "rounded
now" is N+1 at that time, even if 'st_mtime' in 'time_t' is still N.

Then, file A is unexpectedly treated as "clean" at (*2) in this case.

For consistent handling of 'stat.st_mtime', this patch makes
'pack_dirstate()' take 'now' argument not in floating point but in
integer.

This patch makes 'PyArg_ParseTuple()' in 'pack_dirstate()' use format
'i' (= checking type mismatch or overflow), even though it is ensured
that 'now' is in the range of 32bit signed integer by masking with
'_rangemask' (= 0x7fffffff) on caller side.

It should be cheaper enough than packing itself, and useful to
detect that legacy code invokes 'pack_dirstate()' with 'now' in
floating point value.
2015-10-14 02:40:04 +09:00
Matt Mackall
74b31a11a1 dirstate: batch calls to statfiles (issue4878)
This makes it more interruptible.
2015-10-06 16:26:20 -05:00
Pierre-Yves David
30913031d4 error: get Abort from 'error' instead of 'util'
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.

For great justice.
2015-10-08 12:55:45 -07:00
FUJIWARA Katsunori
0602f8209b dirstate: split write to write changes into files other than .hg/dirstate
'_writedirstate()' is used mainly for "transactional dirstate". See
the wiki page below for detail about it.

    https://mercurial.selenic.com/wiki/DirstateTransactionPlan
2015-10-08 01:41:30 +09:00
Yuya Nishihara
ea5724ad42 util: extract stub function to get mtime with second accuracy
This function is trivial but will need a long comment why it can't use
st.st_mtime. See the next patch for details.
2015-10-04 22:25:29 +09:00
Yuya Nishihara
d7b6a95763 hgweb: overwrite cwd to resolve file patterns relative to repo (issue4568)
It's useless to handle file patterns as relative to the cwd of the server
process. The only sensible way in hgweb is to resolve paths relative to the
repository root.

It seems dirstate.getcwd() isn't used to get a real file path, so this patch
won't cause problem.
2015-09-20 20:11:31 +09:00
Yuya Nishihara
46aef63790 dirstate: state that getcwd() shouldn't be used to get real file path
hgweb will force it to be '' so that file patterns can be resolved relative
to the repository root. I want to clarify that is correct.
2015-09-20 20:08:22 +09:00
Matt Harbison
e8c6c3b443 dirstate: ensure mv source is marked deleted when walking icasefs (issue4760)
Previously, importing a case-only rename patch on a case insensitive filesystem
caused the original file to be marked as '!' in status.  The source was being
forgotten properly in patch.workingbackend.close(), but the call it makes to
scmutil.marktouched() then put the file back into the 'n' state (but it was
still missing from the filesystem).

The cause of this was scmutil._interestingfiles() would walk dirstate,
and since dirstate was able to lstat() the old file via the new name,
was treating this as a forgotten file, not a removed file.
scmutil.marktouched() re-adds forgotten files, so dirstate got out of
sync with the filesystem.

This could be handled with less code in the "kind == regkind or kind
== lnkkind" branch of dirstate._walkexplicit(), but this avoids
filesystem accesses unless case collisions occur. _discoverpath() is
used instead of normalize(), since the dirstate case is given first
precedence, and the old file is still in it. What matters is the
actual case in the filesystem.
2015-07-27 21:27:24 -04:00
Gregory Szorc
5380dea2a7 global: mass rewrite to use modern exception syntax
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".

This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.

This patch was produced by running `2to3 -f except -w -n .`.
2015-06-23 22:20:08 -07:00
Gregory Szorc
3aa1c73868 global: mass rewrite to use modern octal syntax
Python 2.6 introduced a new octal syntax: "0oXXX", replacing "0XXX". The
old syntax is not recognized in Python 3 and will result in a parse
error.

Mass rewrite all instances of the old octal syntax to the new syntax.

This patch was generated by `2to3 -f numliterals -w -n .` and the diff
was selectively recorded to exclude changes to "<N>l" syntax conversion,
which will be handled separately.
2015-06-23 22:30:33 -07:00
Siddharth Agarwal
2da9583582 dirstate: use a presized dict for the dirstate
This uses a simple heuristic to avoid expensive resizes.

On a real-world repo with around 400,000 files, perfdirstate:

before: ! wall 0.155562 comb 0.160000 user 0.150000 sys 0.010000 (best of 64)
after:  ! wall 0.132638 comb 0.130000 user 0.120000 sys 0.010000 (best of 75)

On another real-world repo with around 250,000 files:

before: ! wall 0.098459 comb 0.100000 user 0.090000 sys 0.010000 (best of 100)
after:  ! wall 0.089084 comb 0.090000 user 0.080000 sys 0.010000 (best of 100)
2015-06-16 00:46:01 -07:00
Pierre-Yves David
4bbda309c5 dirstate: avoid invalidating every entries when list is empty
Default value was not tested with 'is None', this made empty list seen as
default value and result the invalidation of every single entry in the
dirstate. On repos with hundred of thousand of files, this results in minutes
of lookup time instead nothing.

This is a text book example of why we should test 'is None' if this is what we
mean.
2015-06-04 22:10:32 -07:00
Martin von Zweigbergk
54577d7073 dirstate: use match.prefix() instead of 'not match.anypats()'
It seems clearer to check for what it is than what it isn't.
2015-05-19 10:40:40 -07:00
Martin von Zweigbergk
4ac9e158ba dirstate: avoid match.files() in walk() 2015-05-19 10:13:43 -07:00
FUJIWARA Katsunori
1946bae3e9 dirstate: use open/read of vfs(opener) explicitly instead of read
This simplifies changes in subsequent patch, which tries to open
`.pending` file when HG_PENDING environment variable is defined.
2015-05-20 01:06:09 +09:00
FUJIWARA Katsunori
0871e7fb66 dirstate: use self._filename instead of immediate string dirstate
This prevents immediate string `dirstate` from multiplying. This is also
a preparation for making dirstate aware of PENDING mechanism in
subsequent patches.
2015-05-20 01:06:09 +09:00
Durham Goode
8e42385986 ignore: use 'include:' rules instead of custom syntax
Now that the matcher supports 'include:' rules, let's change the dirstate.ignore
creation to just create a matcher with a bunch of includes. This allows us to
completely delete ignore.py.

I moved some of the syntax documentation over to readpatternfile in match.py so
we don't lose it.
2015-05-16 16:06:22 -07:00
Durham Goode
9b71206828 ignore: remove .hgignore from ignore list if nonexistent
Previously we would always pass the root .hgignore path to the ignore parser.
The parser then had to be aware that the first path was special, and not warn if
it didn't exist.

In preparation for making the ignore file parser more generically usable, let's
make the parse logic not aware of this special case, and instead just not pass
the root .hgignore in if it doesn't exist.
2015-05-16 15:24:43 -07:00
Augie Fackler
9c2e980a64 cleanup: use __builtins__.all instead of util.all 2015-05-16 14:34:19 -04:00
FUJIWARA Katsunori
e36077823b dirstate: use pathutil.normasprefix to ensure os.sep at the end of root
419498227287 replaced "os.path.join(root, '')" by
"root.endswith(os.sep)" examination, because Python 2.7.9 changes
behavior of "os.path.join(path, '')" on UNC path.

But some problematic encodings use 0x5c (= "os.sep" on Windows) as the
tail byte of some multi-byte characters, and replacement above
prevents Mercurial from working on the repository, of which root path
ends with such multi-byte character, regardless of enabling win32mbcs.

This patch uses "pathutil.normasprefix()" instead of
"root.endswith(os.sep)" examination, to ensure "os.sep" at the end of
"dirstate._rootdir" even with problematic encodings.

"root" of dirstate can be passed to "pathutil.normasprefix()" without
normalization, because it is always given from "repo.root" =
"repo.wvfs.base", which is normalized by "os.path.realpath()".

Using "util.endswithsep()" instead of "str.endswith(os.sep)" also
fixes this problem, but this patch chooses "pathutil.normasprefix()"
to centralize "adding os.sep if endswith(os.sep)" logic into it.
2015-04-22 23:38:52 +09:00
Drew Gottlieb
901ac5e726 util: move dirs() and finddirs() from scmutil to util
An upcoming commit requires that match.py be able to call scmutil.dirs(), but
when match.py imports scmutil, a dependency cycle is created. This commit
avoids the cycle by moving dirs() and its related finddirs() function from
scmutil to util, which match.py already depends on.
2015-04-06 14:36:08 -07:00
Matt Mackall
c72258bf67 merge with stable 2015-04-06 17:16:55 -05:00
Martin von Zweigbergk
87d497396a dirstate.walk: don't report same file stat multiple times
dirstate.walk() generates pairs of filename and a stat-like
object. After "hg mv foo Foo", it generates one pair for "foo" and one
for "Foo", as it should. However, on case-insensitive file systems,
when it tries to stat to get the disk state as well, it gets the same
stat result for both names. This confuses at least
scmutil._interestingfiles(), making it think that "foo" was forgotten
rather than removed. That, in turn, makes "hg addremove" add "foo"
back, resulting in both cases in the dirstate, as reported in
issue4590.

This change only takes care of the "if unknown" branch. A similar fix
should perhaps be applied to the other branch.
2015-04-04 21:54:12 -07:00
Siddharth Agarwal
f6cb852638 dirstate: use parsers.make_file_foldmap when available
This is a significant performance win on large repositories. perffilefoldmap:

On Linux/gcc, on a test repo with over 500,000 files:
before: wall 0.605021 comb 0.600000 user 0.560000 sys 0.040000 (best of 17)
after:  wall 0.280530 comb 0.280000 user 0.250000 sys 0.030000 (best of 35)

On Mac OS X/clang, on a real-world repo with over 200,000 files:
before: wall 0.281103 comb 0.280000 user 0.260000 sys 0.020000 (best of 34)
after:  wall 0.133622 comb 0.140000 user 0.120000 sys 0.020000 (best of 65)

This visibly impacts status times on case-insensitive file systems. On the Mac
OS X repo, status goes from 3.64 seconds to 3.50.

With the third-party hgwatchman extension [1], 'hg status' on the same repo
goes from 0.80 seconds to 0.65.

[1] https://bitbucket.org/facebook/hgwatchman
2015-04-01 00:44:33 -07:00
Matt Harbison
a3480284f9 dirstate: don't require exact case when adding dirs on icasefs (issue4578)
We don't require it when adding files on a case insensitive filesystem, so don't
require it to add directories for consistency.

The problem with the previous code was that _walkexplicit() was only returning
the normalized directory.  The file(s) in the directory are then appended, and
passed to the matcher.  But if the user asks for 'capsdir1/capsdir', the matcher
will not accept 'CapsDir1/CapsDir/AbC.txt', and the name is dropped.  Matching
based on the non-normalized name is required.

If not normalizing, skip the extra string building for efficiency.  '.' is
replaced with '' so that the path being tested when no file is specified, isn't
prefixed with './' (and therefore fail the match).
2015-03-31 11:11:39 -04:00
Siddharth Agarwal
42e4ffbf78 dirstate._normalize: don't construct dirfoldmap if not necessary
Constructing the dirfoldmap is expensive, so if there's a hit in the
filefoldmap, don't construct the directory foldmap.

This helps with cases like 'hg add foo' where foo is already tracked: for a
large repository, the operation goes from 1.5 seconds to 1.2 (which is still
way too much, but that's a matter for another day.)
2015-03-31 19:34:37 -07:00
Siddharth Agarwal
d1935a56bf dirstate.walk: don't keep track of normalized files in parallel
Rev cce24e8019c8 changed the semantics of the work list to store (normalized,
non-normalized) pairs. All the tuple creation and destruction hurts perf: on a
large repo on OS X, 'hg status' went from 3.62 seconds to 3.78.

It also is unnecessary in most cases:
- it is clearly unnecessary on case-sensitive filesystems.
- it is also unnecessary when filenames have been read off of disk rather than
  being supplied by the user.

The only case where the non-normalized case is required at all is when the file
is unknown.

To eliminate most of the perf cost, keep trace of whether the directory needs
to be normalized at all with a boolean called 'alreadynormed'. Pay the cost of
directory normalization only when necessary.

For the above large repo, 'hg status' goes to 3.63 seconds.
2015-03-31 19:29:39 -07:00
Siddharth Agarwal
b8549b5bd2 dirstate.walk: factor out directory traversal
This function will be used in upcoming patches.
2015-03-31 19:18:27 -07:00
Siddharth Agarwal
77486fd514 dirstate: fix order of initializing nf vs f
Result of a bad merge.
2015-03-31 15:41:02 -07:00
Matt Mackall
91b7caa71a merge with stable 2015-03-31 16:14:14 -05:00
Siddharth Agarwal
5c1db53305 dirstate.walk: use the file foldmap to normalize
Computing the set of directories in the dirstate is expensive. It turns out
that it isn't necessary for operations like 'hg status' at all.

Why? Consider the file 'foo/bar' on disk, which is represented in the dirstate
as 'FOO/BAR'.

On 'hg status', we'd walk down the directory tree, coming across 'foo' first.

Before: we'd normalize 'foo' to 'FOO', then add 'FOO' to our visited stack.
We'd then visit 'FOO', finding the file 'bar'. We'd normalize 'FOO/bar' to
'FOO/BAR', then add it to the results dict.

After: we wouldn't normalize 'foo' at all. We'd add it to our visited stack,
then visit 'foo', finding the file 'bar'. We'd normalize 'foo/bar' to
'FOO/BAR', then add it to the results dict.

So whether we normalize intermediate directories or not actually makes no
difference in most cases.

The only case where normalization matters at all is if a file is replaced with
a directory with the same case-folded name. In that case we can do a relatively
cheap file normalization instead and still get away with not computing the set
of directories.

This is a nice boost in status performance. On OS X with case-insensitive HFS+,
for a large repo with over 200,000 files, this brings down 'hg status' from
4.00 seconds to 3.62.
2015-03-29 19:47:16 -07:00
Siddharth Agarwal
efae199860 dirstate: split the foldmap into separate ones for files and directories
Computing the set of directories in the dirstate can be pretty expensive. For
'hg status' without arguments, it turns out we actually never need to figure
out the right case for directories in the foldmap. (An upcoming patch explains
why.)

This patch splits up the directory and file maps into separate ones, allowing
for the subsequent optimization in status.
2015-03-29 19:42:49 -07:00
Siddharth Agarwal
15b2067928 dirstate: introduce function to normalize just filenames
This will be used in upcoming patches to stop generating the set of directories
in many common cases.
2015-03-28 18:53:54 -07:00
Siddharth Agarwal
28d1a1fb85 dirstate: factor out code to discover normalized path
In upcoming patches we're going to reuse this code. The storemap is currently
always the foldmap, but will vary in future patches.
2015-03-29 19:23:05 -07:00
Siddharth Agarwal
fe84899cf2 dirstate._walkexplicit: don't bother normalizing '.'
The overwhelmingly common case is running commands like 'hg diff' with no
arguments. Therefore the only file that'll be listed is the root directory.
Normalizing that's just a waste of time.

This means that for a plain 'hg diff' we'll never need to construct the
foldmap, saving us a significant chunk of time.

On case-insensitive HFS+ on OS X, for a large repository with over 200,000
files, this brings down 'hg diff' from 2.97 seconds to 2.36.
2015-03-29 18:28:48 -07:00
Siddharth Agarwal
8adc467907 dirstate._walkexplicit: drop normpath calls
The paths the matcher returns are normalized already.
2015-03-29 23:28:30 -07:00
Siddharth Agarwal
0d03b12a57 dirstate._walkexplicit: indicate root as '.', not ''
'.' is the canonical way to represent the root, and it's apparently the only
transformation that normpath makes.
2015-03-29 23:27:25 -07:00
Yuya Nishihara
d646851eb8 dirstate: make sure rootdir ends with directory separator (issue4557)
ntpath.join() of Python 2.7.9 does not work as expected if root is a UNC path
to top of share.

This patch doesn't take care of os.altsep, '/' on Windows, because root should
be normalized by realpath().
2015-03-06 00:14:22 +09:00
Mads Kiilerich
3c0558f97b dirstate: ignore negative debug.dirstate.delaywrite values - they crashed it
Sleep can only travel forward in time, not back.
2015-01-14 01:15:26 +01:00
Martin von Zweigbergk
8874cd66a5 match: add isexact() method to hide internals
Comparing a function reference seems bad.
2014-10-29 08:43:39 -07:00
Matt Mackall
da0586aaf9 merge with stable 2015-03-05 15:52:07 -06:00
Mads Kiilerich
e51a6aa2aa dirstate: clarify comment about leaving normal files undef if changed 'now'
Clarify that they only are saved as undef if they were marked as normal and
changed in the same second.
2015-01-14 01:15:26 +01:00
Siddharth Agarwal
5bc4775669 ignore: resolve ignore files relative to repo root (issue4473) (BC)
Previously these would be considered to be relative to the current working
directory. That behavior is both undocumented and doesn't really make sense.
There are two reasonable options for how to resolve relative paths:
- relative to the repo root
- relative to the config file

Resolving these files relative to the repo root matches existing behavior with
hooks. An earlier discussion about this is available at
http://mercurial.markmail.org/thread/tvu7yhzsiywgkjzl.

Thanks to Isaac Jurado <diptongo@gmail.com> for the initial patchset that
spurred the discussion.
2014-12-16 14:34:53 -08:00
Pierre-Yves David
dd01dca5ec dirstate: use the 'nogc' decorator
Now that we have a generic way to disable the gc, we use it. however, we have too
use it in a baroque way. See inline comment for details.
2014-12-04 05:43:15 -08:00
Martin von Zweigbergk
c71ba3444e dirstate: speed up repeated missing directory checks
In a mozilla repo with tip at bb3ff09f52fe,

  hg update tip~1000 && time hg revert -nq -r tip .

displays ~4:20 minutes. With tip~100, it runs in ~11 s. With revision
100000, it did not finish in 12 minutes.

Revert calls dirstate.status() with a matcher that matches each file
in the target revision. The main problem [1] lies in
dirstate._walkexplicit(), which looks for matching deleted directories
by checking whether each path is prefix of any path in the
dirstate. With m files in the dirstate and n files in the target
revision that are not in the dirstate, this is clearly O(m*n). Let's
improve by keeping a lazily initialized set of all the directories in
the dirstate, so the time becomes O(m+n).

After this patch, the 4:20 minutes become 5.5 s, while for a single
missing path, it slows down from 1.092 s to 1.150 s (best of 4). The
>12 min case becomes 5.8 s.

 [1] A narrower optimization would be to make revert take the fast
     path for '.' and '--all'.
2014-11-19 23:15:07 -08:00
Martin von Zweigbergk
8b968ecfe2 status: update and move documentation of status types to status class
The various status types are currently documented on the
dirstate.status() method. Now that we have a class for the status
types, it makese sense to document the status types there
instead. Only leave the bits related to lookup/unsure in the status()
method documentation.
2014-10-10 10:14:35 -07:00