Commit Graph

408 Commits

Author SHA1 Message Date
Drew Gottlieb
901ac5e726 util: move dirs() and finddirs() from scmutil to util
An upcoming commit requires that match.py be able to call scmutil.dirs(), but
when match.py imports scmutil, a dependency cycle is created. This commit
avoids the cycle by moving dirs() and its related finddirs() function from
scmutil to util, which match.py already depends on.
2015-04-06 14:36:08 -07:00
Matt Mackall
c72258bf67 merge with stable 2015-04-06 17:16:55 -05:00
Martin von Zweigbergk
87d497396a dirstate.walk: don't report same file stat multiple times
dirstate.walk() generates pairs of filename and a stat-like
object. After "hg mv foo Foo", it generates one pair for "foo" and one
for "Foo", as it should. However, on case-insensitive file systems,
when it tries to stat to get the disk state as well, it gets the same
stat result for both names. This confuses at least
scmutil._interestingfiles(), making it think that "foo" was forgotten
rather than removed. That, in turn, makes "hg addremove" add "foo"
back, resulting in both cases in the dirstate, as reported in
issue4590.

This change only takes care of the "if unknown" branch. A similar fix
should perhaps be applied to the other branch.
2015-04-04 21:54:12 -07:00
Siddharth Agarwal
f6cb852638 dirstate: use parsers.make_file_foldmap when available
This is a significant performance win on large repositories. perffilefoldmap:

On Linux/gcc, on a test repo with over 500,000 files:
before: wall 0.605021 comb 0.600000 user 0.560000 sys 0.040000 (best of 17)
after:  wall 0.280530 comb 0.280000 user 0.250000 sys 0.030000 (best of 35)

On Mac OS X/clang, on a real-world repo with over 200,000 files:
before: wall 0.281103 comb 0.280000 user 0.260000 sys 0.020000 (best of 34)
after:  wall 0.133622 comb 0.140000 user 0.120000 sys 0.020000 (best of 65)

This visibly impacts status times on case-insensitive file systems. On the Mac
OS X repo, status goes from 3.64 seconds to 3.50.

With the third-party hgwatchman extension [1], 'hg status' on the same repo
goes from 0.80 seconds to 0.65.

[1] https://bitbucket.org/facebook/hgwatchman
2015-04-01 00:44:33 -07:00
Matt Harbison
a3480284f9 dirstate: don't require exact case when adding dirs on icasefs (issue4578)
We don't require it when adding files on a case insensitive filesystem, so don't
require it to add directories for consistency.

The problem with the previous code was that _walkexplicit() was only returning
the normalized directory.  The file(s) in the directory are then appended, and
passed to the matcher.  But if the user asks for 'capsdir1/capsdir', the matcher
will not accept 'CapsDir1/CapsDir/AbC.txt', and the name is dropped.  Matching
based on the non-normalized name is required.

If not normalizing, skip the extra string building for efficiency.  '.' is
replaced with '' so that the path being tested when no file is specified, isn't
prefixed with './' (and therefore fail the match).
2015-03-31 11:11:39 -04:00
Siddharth Agarwal
42e4ffbf78 dirstate._normalize: don't construct dirfoldmap if not necessary
Constructing the dirfoldmap is expensive, so if there's a hit in the
filefoldmap, don't construct the directory foldmap.

This helps with cases like 'hg add foo' where foo is already tracked: for a
large repository, the operation goes from 1.5 seconds to 1.2 (which is still
way too much, but that's a matter for another day.)
2015-03-31 19:34:37 -07:00
Siddharth Agarwal
d1935a56bf dirstate.walk: don't keep track of normalized files in parallel
Rev cce24e8019c8 changed the semantics of the work list to store (normalized,
non-normalized) pairs. All the tuple creation and destruction hurts perf: on a
large repo on OS X, 'hg status' went from 3.62 seconds to 3.78.

It also is unnecessary in most cases:
- it is clearly unnecessary on case-sensitive filesystems.
- it is also unnecessary when filenames have been read off of disk rather than
  being supplied by the user.

The only case where the non-normalized case is required at all is when the file
is unknown.

To eliminate most of the perf cost, keep trace of whether the directory needs
to be normalized at all with a boolean called 'alreadynormed'. Pay the cost of
directory normalization only when necessary.

For the above large repo, 'hg status' goes to 3.63 seconds.
2015-03-31 19:29:39 -07:00
Siddharth Agarwal
b8549b5bd2 dirstate.walk: factor out directory traversal
This function will be used in upcoming patches.
2015-03-31 19:18:27 -07:00
Siddharth Agarwal
77486fd514 dirstate: fix order of initializing nf vs f
Result of a bad merge.
2015-03-31 15:41:02 -07:00
Matt Mackall
91b7caa71a merge with stable 2015-03-31 16:14:14 -05:00
Siddharth Agarwal
5c1db53305 dirstate.walk: use the file foldmap to normalize
Computing the set of directories in the dirstate is expensive. It turns out
that it isn't necessary for operations like 'hg status' at all.

Why? Consider the file 'foo/bar' on disk, which is represented in the dirstate
as 'FOO/BAR'.

On 'hg status', we'd walk down the directory tree, coming across 'foo' first.

Before: we'd normalize 'foo' to 'FOO', then add 'FOO' to our visited stack.
We'd then visit 'FOO', finding the file 'bar'. We'd normalize 'FOO/bar' to
'FOO/BAR', then add it to the results dict.

After: we wouldn't normalize 'foo' at all. We'd add it to our visited stack,
then visit 'foo', finding the file 'bar'. We'd normalize 'foo/bar' to
'FOO/BAR', then add it to the results dict.

So whether we normalize intermediate directories or not actually makes no
difference in most cases.

The only case where normalization matters at all is if a file is replaced with
a directory with the same case-folded name. In that case we can do a relatively
cheap file normalization instead and still get away with not computing the set
of directories.

This is a nice boost in status performance. On OS X with case-insensitive HFS+,
for a large repo with over 200,000 files, this brings down 'hg status' from
4.00 seconds to 3.62.
2015-03-29 19:47:16 -07:00
Siddharth Agarwal
efae199860 dirstate: split the foldmap into separate ones for files and directories
Computing the set of directories in the dirstate can be pretty expensive. For
'hg status' without arguments, it turns out we actually never need to figure
out the right case for directories in the foldmap. (An upcoming patch explains
why.)

This patch splits up the directory and file maps into separate ones, allowing
for the subsequent optimization in status.
2015-03-29 19:42:49 -07:00
Siddharth Agarwal
15b2067928 dirstate: introduce function to normalize just filenames
This will be used in upcoming patches to stop generating the set of directories
in many common cases.
2015-03-28 18:53:54 -07:00
Siddharth Agarwal
28d1a1fb85 dirstate: factor out code to discover normalized path
In upcoming patches we're going to reuse this code. The storemap is currently
always the foldmap, but will vary in future patches.
2015-03-29 19:23:05 -07:00
Siddharth Agarwal
fe84899cf2 dirstate._walkexplicit: don't bother normalizing '.'
The overwhelmingly common case is running commands like 'hg diff' with no
arguments. Therefore the only file that'll be listed is the root directory.
Normalizing that's just a waste of time.

This means that for a plain 'hg diff' we'll never need to construct the
foldmap, saving us a significant chunk of time.

On case-insensitive HFS+ on OS X, for a large repository with over 200,000
files, this brings down 'hg diff' from 2.97 seconds to 2.36.
2015-03-29 18:28:48 -07:00
Siddharth Agarwal
8adc467907 dirstate._walkexplicit: drop normpath calls
The paths the matcher returns are normalized already.
2015-03-29 23:28:30 -07:00
Siddharth Agarwal
0d03b12a57 dirstate._walkexplicit: indicate root as '.', not ''
'.' is the canonical way to represent the root, and it's apparently the only
transformation that normpath makes.
2015-03-29 23:27:25 -07:00
Yuya Nishihara
d646851eb8 dirstate: make sure rootdir ends with directory separator (issue4557)
ntpath.join() of Python 2.7.9 does not work as expected if root is a UNC path
to top of share.

This patch doesn't take care of os.altsep, '/' on Windows, because root should
be normalized by realpath().
2015-03-06 00:14:22 +09:00
Mads Kiilerich
3c0558f97b dirstate: ignore negative debug.dirstate.delaywrite values - they crashed it
Sleep can only travel forward in time, not back.
2015-01-14 01:15:26 +01:00
Martin von Zweigbergk
8874cd66a5 match: add isexact() method to hide internals
Comparing a function reference seems bad.
2014-10-29 08:43:39 -07:00
Matt Mackall
da0586aaf9 merge with stable 2015-03-05 15:52:07 -06:00
Mads Kiilerich
e51a6aa2aa dirstate: clarify comment about leaving normal files undef if changed 'now'
Clarify that they only are saved as undef if they were marked as normal and
changed in the same second.
2015-01-14 01:15:26 +01:00
Siddharth Agarwal
5bc4775669 ignore: resolve ignore files relative to repo root (issue4473) (BC)
Previously these would be considered to be relative to the current working
directory. That behavior is both undocumented and doesn't really make sense.
There are two reasonable options for how to resolve relative paths:
- relative to the repo root
- relative to the config file

Resolving these files relative to the repo root matches existing behavior with
hooks. An earlier discussion about this is available at
http://mercurial.markmail.org/thread/tvu7yhzsiywgkjzl.

Thanks to Isaac Jurado <diptongo@gmail.com> for the initial patchset that
spurred the discussion.
2014-12-16 14:34:53 -08:00
Pierre-Yves David
dd01dca5ec dirstate: use the 'nogc' decorator
Now that we have a generic way to disable the gc, we use it. however, we have too
use it in a baroque way. See inline comment for details.
2014-12-04 05:43:15 -08:00
Martin von Zweigbergk
c71ba3444e dirstate: speed up repeated missing directory checks
In a mozilla repo with tip at bb3ff09f52fe,

  hg update tip~1000 && time hg revert -nq -r tip .

displays ~4:20 minutes. With tip~100, it runs in ~11 s. With revision
100000, it did not finish in 12 minutes.

Revert calls dirstate.status() with a matcher that matches each file
in the target revision. The main problem [1] lies in
dirstate._walkexplicit(), which looks for matching deleted directories
by checking whether each path is prefix of any path in the
dirstate. With m files in the dirstate and n files in the target
revision that are not in the dirstate, this is clearly O(m*n). Let's
improve by keeping a lazily initialized set of all the directories in
the dirstate, so the time becomes O(m+n).

After this patch, the 4:20 minutes become 5.5 s, while for a single
missing path, it slows down from 1.092 s to 1.150 s (best of 4). The
>12 min case becomes 5.8 s.

 [1] A narrower optimization would be to make revert take the fast
     path for '.' and '--all'.
2014-11-19 23:15:07 -08:00
Martin von Zweigbergk
8b968ecfe2 status: update and move documentation of status types to status class
The various status types are currently documented on the
dirstate.status() method. Now that we have a class for the status
types, it makese sense to document the status types there
instead. Only leave the bits related to lookup/unsure in the status()
method documentation.
2014-10-10 10:14:35 -07:00
Martin von Zweigbergk
41a4138ec7 status: create class for status lists
Callers of various status() methods (on dirstate, context, repo) get a
tuple of 7 elements, where each element is a list of files. This
results in lots of uses of indexes where names would be much more
readable. For example, "status.ignored" seems clearer than "status[4]"
[1]. So, let's introduce a simple named tuple containing the 7 status
fields: modified, added, removed, deleted, unknown, ignored, clean.

This patch introduces the class and updates the status methods to
return instances of it. Later patches will update the callers.

 [1] Did you even notice that it should have been "status[5]"?

(tweaked by mpm to introduce the class in scmutil and only change one user)
2014-10-10 14:32:36 -07:00
Martin von Zweigbergk
1a4e0a3d51 dirstate: separate 'lookup' status field from others
The status tuple returned from dirstate.status() has an additional
field compared to the other status tuples: lookup/unsure. This field
is just an optimization and not something most callers care about
(they want the resolved value of 'modified' or 'clean'). To prepare
for a single future status type, let's separate out the 'lookup' field
from the rest by having dirstate.status() return a pair: (lookup,
status).
2014-10-03 21:44:10 -07:00
Matt Mackall
8e8234eecc dirstate: merge falls through to otherparent
This lets us more correctly fix the state when we use setparents, as
demonstrated in the change in test-graft.t.
2014-10-11 14:05:09 -05:00
Matt Mackall
f7a8e82c62 dirstate: use 'm' state in otherparent to reduce ambiguity
In rebase-like operations where we abandon the second parent, we can
correctly fix up the state in setparents.
2014-10-10 13:31:06 -05:00
Matt Mackall
a44416ab0f dirstate: properly clean-up some more merge state on setparents 2014-10-10 13:05:50 -05:00
Siddharth Agarwal
0b13ea03ab dirstate: cache util.normcase while constructing the foldmap
This is a small win on OS X. hg perfdirstatefoldmap:

before: wall 0.399708 comb 0.410000 user 0.390000 sys 0.020000 (best of 25)
after:  wall 0.386331 comb 0.390000 user 0.370000 sys 0.020000 (best of 25)
2014-10-03 18:48:09 -07:00
Siddharth Agarwal
1394c04c63 dirstate: copyedit exception for no beginparentchange call 2014-09-17 13:08:03 -07:00
Durham Goode
1d292c944f dirstate: add exception when calling setparent without begin/end (API)
Adds an exception when calling dirstate.setparent without having first called
dirstate.beginparentchange. This will prevent people from writing code that
modifies the dirstate parent without considering the transactionality of their
change.

This will break third party extensions that call setparents.
2014-09-05 11:37:44 -07:00
Durham Goode
7a67b9c913 dirstate: add begin/endparentchange to dirstate
It's possible for the dirstate to become incoherent (issue4353) if there is an
exception in the middle of the dirstate parent and entries being written (like
if the user ctrl+c's). This change adds begin/endparentchange which a future
patch will require to be set before changing the dirstate parent.  This will
allow us to prevent writing the dirstate in the event of an exception while
changing the parent.
2014-09-05 11:34:29 -07:00
Siddharth Agarwal
e621257516 dirstate: add a method to efficiently filter by match
Current callers that require just this data call workingctx.walk, which calls
dirstate.walk, which stats all the files. Even worse, workingctx.walk looks for
unknown files, significantly slowing things down, even though callers might not
be interested in them at all.
2014-08-01 22:05:16 -07:00
FUJIWARA Katsunori
2fdcc0328a dirstate: delay writing out to ensure timestamp of each entries explicitly
Even though "dirstate.write()" is invoked explicitly after "normal"
invocations, timestamp field of entries may be still "unset" in the
"dirstate" file itself , because "pack_dirstate" drops it when it is
equal to the timestamp of "dirstate" file itself.

This can avoid overlooking modification of files, which are updated at
same time in the second. But on the other hand, this may hide timing
critical problems.

For example, incorrect "normal"-ing (or lack of "normallookup"-ing on
the already "normal"-ed entry) is visible only when:

  - the target file is modified in the working directory at T1, and
  - "dirstate" file is written out at T2 (!= T1)

Otherwise, T1 is dropped by "pack_dirstate" in "dirstate.write()"
invocation, and "unset" is stored into "dirstate" file.

It often fails to reproduce problems from incorrect "normal"-ing by
Mercurial testset, because automated actions in the small repository
almost always causes that T1 and T2 are same.

This patch adds the debug feature to delay writing out to ensure
timestamp of each entries explicitly.

This feature is used to make timing critical "dirstate" problems
reproducable in subsequent patches.
2014-07-22 23:59:30 +09:00
Siddharth Agarwal
337e9e8db0 dirstate.status: assign members one by one instead of unpacking the tuple
With this patch, hg status and hg diff regain their previous speed.

The following tests are run against a working copy with over 270,000 files.
Here, 'before' means without this or the previous patch applied.

Note that in this case `hg perfstatus` isn't representative since it doesn't
take dirstate parsing time into account.

$ time hg status  # best of 5
before: 2.03s user 1.25s system 99% cpu 3.290 total
after:  2.01s user 1.25s system 99% cpu 3.261 total

$ time hg diff    # best of 5
before: 1.32s user 0.78s system 99% cpu 2.105 total
after:  1.27s user 0.79s system 99% cpu 2.066 total
2014-05-27 21:02:16 -07:00
Siddharth Agarwal
f40a94a790 parsers: inline fields of dirstate values in C version
Previously, while unpacking the dirstate we'd create 3-4 new CPython objects
for most dirstate values:

- the state is a single character string, which is pooled by CPython
- the mode is a new object if it isn't 0 due to being in the lookup set
- the size is a new object if it is greater than 255
- the mtime is a new object if it isn't -1 due to being in the lookup set
- the tuple to contain them all

In some cases such as regular hg status, we actually look at all the objects.
In other cases like hg add, hg status for a subdirectory, or hg status with the
third-party hgwatchman enabled, we look at almost none of the objects.

This patch eliminates most object creation in these cases by defining a custom
C struct that is exposed to Python with an interface similar to a tuple. Only
when tuple elements are actually requested are the respective objects created.

The gains, where they're expected, are significant. The following tests are run
against a working copy with over 270,000 files.

parse_dirstate becomes significantly faster:

$ hg perfdirstate
before: wall 0.186437 comb 0.180000 user 0.160000 sys 0.020000 (best of 35)
after:  wall 0.093158 comb 0.100000 user 0.090000 sys 0.010000 (best of 95)

and as a result, several commands benefit:

$ time hg status  # with hgwatchman enabled
before: 0.42s user 0.14s system 99% cpu 0.563 total
after:  0.34s user 0.12s system 99% cpu 0.471 total

$ time hg add new-file
before: 0.85s user 0.18s system 99% cpu 1.033 total
after:  0.76s user 0.17s system 99% cpu 0.931 total

There is a slight regression in regular status performance, but this is fixed
in an upcoming patch.
2014-05-27 14:27:41 -07:00
Siddharth Agarwal
64ffd83be6 dirstate: add dirstatetuple to create dirstate values
Upcoming patches will switch away from using Python tuples for dirstate values
in compiled builds.  Make that easier by introducing a variable called
dirstatetuple, currently set to tuple. In upcoming patches, this will be set to
an object from the parsers module.
2014-05-27 17:10:28 -07:00
Mads Kiilerich
eb39238a97 dirstate: report bad subdirectories as match.bad, not just a warning (BC)
This seems simpler and more correct.

The only test coverage for this is test-permissions.t when it says:
  dir: Permission denied
2013-10-03 18:01:21 +02:00
Mads Kiilerich
67a2ed988d dirstate: improve documentation and readability of match and ignore in the walker 2013-10-03 18:01:21 +02:00
Mads Kiilerich
020377d078 dirstate: inline local finish function
Having it as a local function adds no value.
2013-04-27 23:19:52 +02:00
Yuya Nishihara
0cf2db5155 dirstate: remove double imports of errno 2014-03-03 15:50:41 +09:00
Pierre-Yves David
b563457396 rebase: do not crash in panic when cwd disapear in the process (issue4121)
Before this patch rebase crashed badly when it happend. (not abort, crash).

Fix courtesy of Matt Mackall.
2014-01-31 15:13:15 -08:00
Augie Fackler
213fff305a pathutil: tease out a new library to break an import cycle from canonpath use 2013-11-06 18:19:04 -05:00
Siddharth Agarwal
7decbaaad4 dirstate.status: return explicit unknown files even when not asked
dirstate.walk will return unknown files that were explicitly requested, even
if listunknown is false. There's no point in dropping these files on the
floor in dirstate.status.

This has no effect on any current callers, because all of them assume the
unknown list is empty and ignore it. Future callers may find it useful,
though.
2013-10-14 00:25:29 -04:00
Siddharth Agarwal
d8f5823b88 dirstate.status: don't ignore symlink placeholders in the normal set
On Windows, there are two ways symlinks can manifest themselves:
1. As placeholders: text files containing the symlink's target. This is what
   usually happens with fresh clones on Windows.
2. With their dereferenced contents. This happens with clones accessed over NFS
   or Samba.

In order to handle case 2, 28af3e0c54f0 made dirstate.status ignore all symlink
placeholders on Windows. It doesn't ignore symlinks in the lookup set, though,
since those don't have the link bit set. This is problematic because it
violates the invariant that `hg status` with every file in the normal set
produces the same output as `hg status` with every file in the lookup set.

With this change, symlink placeholders in the normal set are no longer ignored.
We instead rely on code in localrepo.status that uses heuristics to look for
suspect placeholders.

An upcoming patch will test this out by no longer adding files written in the
last second of an update to the lookup set.
2013-08-31 10:20:15 -07:00
Matt Mackall
0e295e1642 merge with stable 2013-05-17 17:22:08 -05:00
Matt Mackall
870fe2303d dirstate: don't overnormalize for ui.slash
This should fix the issue exposed by debugpathcomplete on the buildbot.
2013-05-17 14:31:06 -05:00