Commit Graph

375 Commits

Author SHA1 Message Date
Durham Goode
1d292c944f dirstate: add exception when calling setparent without begin/end (API)
Adds an exception when calling dirstate.setparent without having first called
dirstate.beginparentchange. This will prevent people from writing code that
modifies the dirstate parent without considering the transactionality of their
change.

This will break third party extensions that call setparents.
2014-09-05 11:37:44 -07:00
Durham Goode
7a67b9c913 dirstate: add begin/endparentchange to dirstate
It's possible for the dirstate to become incoherent (issue4353) if there is an
exception in the middle of the dirstate parent and entries being written (like
if the user ctrl+c's). This change adds begin/endparentchange which a future
patch will require to be set before changing the dirstate parent.  This will
allow us to prevent writing the dirstate in the event of an exception while
changing the parent.
2014-09-05 11:34:29 -07:00
Siddharth Agarwal
e621257516 dirstate: add a method to efficiently filter by match
Current callers that require just this data call workingctx.walk, which calls
dirstate.walk, which stats all the files. Even worse, workingctx.walk looks for
unknown files, significantly slowing things down, even though callers might not
be interested in them at all.
2014-08-01 22:05:16 -07:00
FUJIWARA Katsunori
2fdcc0328a dirstate: delay writing out to ensure timestamp of each entries explicitly
Even though "dirstate.write()" is invoked explicitly after "normal"
invocations, timestamp field of entries may be still "unset" in the
"dirstate" file itself , because "pack_dirstate" drops it when it is
equal to the timestamp of "dirstate" file itself.

This can avoid overlooking modification of files, which are updated at
same time in the second. But on the other hand, this may hide timing
critical problems.

For example, incorrect "normal"-ing (or lack of "normallookup"-ing on
the already "normal"-ed entry) is visible only when:

  - the target file is modified in the working directory at T1, and
  - "dirstate" file is written out at T2 (!= T1)

Otherwise, T1 is dropped by "pack_dirstate" in "dirstate.write()"
invocation, and "unset" is stored into "dirstate" file.

It often fails to reproduce problems from incorrect "normal"-ing by
Mercurial testset, because automated actions in the small repository
almost always causes that T1 and T2 are same.

This patch adds the debug feature to delay writing out to ensure
timestamp of each entries explicitly.

This feature is used to make timing critical "dirstate" problems
reproducable in subsequent patches.
2014-07-22 23:59:30 +09:00
Siddharth Agarwal
337e9e8db0 dirstate.status: assign members one by one instead of unpacking the tuple
With this patch, hg status and hg diff regain their previous speed.

The following tests are run against a working copy with over 270,000 files.
Here, 'before' means without this or the previous patch applied.

Note that in this case `hg perfstatus` isn't representative since it doesn't
take dirstate parsing time into account.

$ time hg status  # best of 5
before: 2.03s user 1.25s system 99% cpu 3.290 total
after:  2.01s user 1.25s system 99% cpu 3.261 total

$ time hg diff    # best of 5
before: 1.32s user 0.78s system 99% cpu 2.105 total
after:  1.27s user 0.79s system 99% cpu 2.066 total
2014-05-27 21:02:16 -07:00
Siddharth Agarwal
f40a94a790 parsers: inline fields of dirstate values in C version
Previously, while unpacking the dirstate we'd create 3-4 new CPython objects
for most dirstate values:

- the state is a single character string, which is pooled by CPython
- the mode is a new object if it isn't 0 due to being in the lookup set
- the size is a new object if it is greater than 255
- the mtime is a new object if it isn't -1 due to being in the lookup set
- the tuple to contain them all

In some cases such as regular hg status, we actually look at all the objects.
In other cases like hg add, hg status for a subdirectory, or hg status with the
third-party hgwatchman enabled, we look at almost none of the objects.

This patch eliminates most object creation in these cases by defining a custom
C struct that is exposed to Python with an interface similar to a tuple. Only
when tuple elements are actually requested are the respective objects created.

The gains, where they're expected, are significant. The following tests are run
against a working copy with over 270,000 files.

parse_dirstate becomes significantly faster:

$ hg perfdirstate
before: wall 0.186437 comb 0.180000 user 0.160000 sys 0.020000 (best of 35)
after:  wall 0.093158 comb 0.100000 user 0.090000 sys 0.010000 (best of 95)

and as a result, several commands benefit:

$ time hg status  # with hgwatchman enabled
before: 0.42s user 0.14s system 99% cpu 0.563 total
after:  0.34s user 0.12s system 99% cpu 0.471 total

$ time hg add new-file
before: 0.85s user 0.18s system 99% cpu 1.033 total
after:  0.76s user 0.17s system 99% cpu 0.931 total

There is a slight regression in regular status performance, but this is fixed
in an upcoming patch.
2014-05-27 14:27:41 -07:00
Siddharth Agarwal
64ffd83be6 dirstate: add dirstatetuple to create dirstate values
Upcoming patches will switch away from using Python tuples for dirstate values
in compiled builds.  Make that easier by introducing a variable called
dirstatetuple, currently set to tuple. In upcoming patches, this will be set to
an object from the parsers module.
2014-05-27 17:10:28 -07:00
Mads Kiilerich
eb39238a97 dirstate: report bad subdirectories as match.bad, not just a warning (BC)
This seems simpler and more correct.

The only test coverage for this is test-permissions.t when it says:
  dir: Permission denied
2013-10-03 18:01:21 +02:00
Mads Kiilerich
67a2ed988d dirstate: improve documentation and readability of match and ignore in the walker 2013-10-03 18:01:21 +02:00
Mads Kiilerich
020377d078 dirstate: inline local finish function
Having it as a local function adds no value.
2013-04-27 23:19:52 +02:00
Yuya Nishihara
0cf2db5155 dirstate: remove double imports of errno 2014-03-03 15:50:41 +09:00
Pierre-Yves David
b563457396 rebase: do not crash in panic when cwd disapear in the process (issue4121)
Before this patch rebase crashed badly when it happend. (not abort, crash).

Fix courtesy of Matt Mackall.
2014-01-31 15:13:15 -08:00
Augie Fackler
213fff305a pathutil: tease out a new library to break an import cycle from canonpath use 2013-11-06 18:19:04 -05:00
Siddharth Agarwal
7decbaaad4 dirstate.status: return explicit unknown files even when not asked
dirstate.walk will return unknown files that were explicitly requested, even
if listunknown is false. There's no point in dropping these files on the
floor in dirstate.status.

This has no effect on any current callers, because all of them assume the
unknown list is empty and ignore it. Future callers may find it useful,
though.
2013-10-14 00:25:29 -04:00
Siddharth Agarwal
d8f5823b88 dirstate.status: don't ignore symlink placeholders in the normal set
On Windows, there are two ways symlinks can manifest themselves:
1. As placeholders: text files containing the symlink's target. This is what
   usually happens with fresh clones on Windows.
2. With their dereferenced contents. This happens with clones accessed over NFS
   or Samba.

In order to handle case 2, 28af3e0c54f0 made dirstate.status ignore all symlink
placeholders on Windows. It doesn't ignore symlinks in the lookup set, though,
since those don't have the link bit set. This is problematic because it
violates the invariant that `hg status` with every file in the normal set
produces the same output as `hg status` with every file in the lookup set.

With this change, symlink placeholders in the normal set are no longer ignored.
We instead rely on code in localrepo.status that uses heuristics to look for
suspect placeholders.

An upcoming patch will test this out by no longer adding files written in the
last second of an update to the lookup set.
2013-08-31 10:20:15 -07:00
Matt Mackall
0e295e1642 merge with stable 2013-05-17 17:22:08 -05:00
Matt Mackall
870fe2303d dirstate: don't overnormalize for ui.slash
This should fix the issue exposed by debugpathcomplete on the buildbot.
2013-05-17 14:31:06 -05:00
Durham Goode
c5a04c46b0 hgignore: fix regression with hgignore directory matches (issue3921)
If a directory matched a regex in hgignore but the files inside the directory
did not match the regex, they would appear as deleted in hg status. This
change fixes them to appear normally in hg status.

Removing the ignore(nf) conditional here is ok because it just means we might
stat more files than we had before. My testing on a large repo shows this
causes no performance regression since the only additional files being stat'd
are the ones that are missing (i.e. status=!), which are generally rare.
2013-05-03 09:44:50 -07:00
FUJIWARA Katsunori
364377a413 icasefs: ignore removed files at building "dirstate._foldmap" up on icasefs
Before this patch, all files in dirstate are used to build "_foldmap"
up on case insensitive filesystem regardless of their statuses.

For example, when dirstate contains both removed file 'a' and added
file 'A', "_foldmap" may be updated finally by removed file 'a'. This
causes unexpected status information for added file 'A' at "hg status"
invocation.

This patch ignores removed files at building "_foldmap" up on case
insensitive filessytem.

This patch doesn't add any test, because this issue is difficult to
reproduce intentionally: it depends on iteration order of "dirstate._map".
2013-04-30 05:00:48 +09:00
Siddharth Agarwal
65a08dce5b dirstate.status: avoid full walks when possible 2013-04-23 14:16:33 -07:00
Siddharth Agarwal
b35c53428c dirstate.walk: add a flag to let extensions avoid full walks
Consider a hypothetical extension that implements walk in a more efficient
manner and skips some known-clean files. However, that can only be done under
some situations, such as when clean files are not being asked for and a
match.traversedir callback is not set. The full flag lets walk tell these two
cases apart.
2013-04-22 17:11:18 -07:00
Siddharth Agarwal
d9060ea694 dirstate._walkexplicit: inline dirsnotfound.append 2013-05-07 14:20:34 -07:00
Siddharth Agarwal
c1be698d70 dirstate._walkexplicit: rename work to dirsfound
Now that this code is factored out, work is too specific a name.
2013-05-07 14:19:04 -07:00
Siddharth Agarwal
819d02613a dirstate.walk: refactor explicit walk into separate function
This enables this code to be reused by extensions that implement the other,
more time-consuming bits of walk in different ways.
2013-05-07 10:02:55 -07:00
Siddharth Agarwal
b485dd87ab dirstate.walk: pull skipstep3 out of the explicit walk code
This is a move towards factoring out this code into a separate function.
2013-05-07 09:31:00 -07:00
Siddharth Agarwal
c0d29e5fe5 dirstate.walk: move dirignore filter out of explicit walk code
This is a move towards factoring this code out into a separate function.
2013-05-07 09:47:10 -07:00
Siddharth Agarwal
a143c1d00c dirstate.walk: maintain a list of dirs not found
Upcoming patches will factor out the walk over explicit files done in step 1.
This helps us get there.
2013-05-07 09:29:43 -07:00
Siddharth Agarwal
8e7066e5f2 match: make explicitdir and traversedir None by default
With this, extensions can easily tell when traversedir and/or explicitdir don't
need to be called.
2013-05-03 14:41:58 -07:00
Siddharth Agarwal
053c6e8fe3 dirstate.walk: cache match.explicitdir and traversedir locally 2013-05-03 14:39:28 -07:00
Siddharth Agarwal
cad3e8cacb dirstate.walk: call match.explicitdir or traversedir as appropriate 2013-04-28 21:25:41 -07:00
Bryan O'Sullivan
4a4a5dde94 scmutil: use new dirs class in dirstate and context
The multiset-of-directories code was open coded in each of these
modules; this change gets rid of the duplication.
2013-04-10 15:08:26 -07:00
Bryan O'Sullivan
00ff38c9c3 scmutil: migrate finddirs from dirstate 2013-04-10 15:08:25 -07:00
Bryan O'Sullivan
6f6047415e dirstate: only call lstat once per flags invocation
This makes a big difference to performance in some cases.

hg --time locate 'set:symlink()'

mozilla-central (70,000 files):

  before: 2.92 sec
  after:  2.47

another repo (170,000 files):

  before: 7.87 sec
  after:  6.86
2013-04-03 11:35:27 -07:00
Siddharth Agarwal
a6bf485dee dirstate.walk: fast path none-seen + match-always case for step 3
This case is a common one -- e.g. `hg diff`.

For a repository with 170,000 files, this speeds up perfstatus from 0.95
seconds to 0.88.
2013-03-22 17:03:49 -07:00
Siddharth Agarwal
5e52c298d1 dirstate.walk: fast path match-always case during traversal
This case is a common one -- e.g. `hg status`.

For a repository with 170,000 files, this speeds up perfstatus --unknown from
2.15 seconds to 2.09.
2013-03-22 17:03:00 -07:00
Siddharth Agarwal
68d46b6a0a dirstate.walk: remove subrepo and .hg from results before step 3
An upcoming patch will speed dirstate.walk up by not querying the results dict
when it is empty. This ensures it is in some common cases.

This should be safe because subrepos and .hg aren't part of the dirstate.
2013-03-25 14:12:39 -07:00
Bryan O'Sullivan
d521a241b2 completion: add a debugpathcomplete command
The bash_completion code uses "hg status" to generate a list of
possible completions for commands that operate on files in the
working directory. In a large working directory, this can result
in a single tab-completion being very slow (several seconds) as a
result of checking the status of every file, even when there is no
need to check status or no possible matches.

The new debugpathcomplete command gains performance in a few simple
ways:

* Allow completion to operate on just a single directory. When used
  to complete the right commands, this considerably reduces the
  number of completions returned, at no loss in functionality.

* Never check the status of files. For completions that really must
  know if a file is modified, it is faster to use status:

  hg status -nm 'glob:myprefix**'

Performance:

Here are the commands used by bash_completion to complete, run in
the root of the mozilla-central working dir (~77,000 files) and
another repo (~165,000 files):

All "normal state" files (used by e.g. remove, revert):

                            mozilla    other
  status -nmcd 'glob:**'       1.77     4.10 sec
  debugpathcomplete -f -n      0.53     1.26
  debugpathcomplete -n         0.17     0.41

("-f" means "complete full paths", rather than the current directory)

Tracked files matching "a":

                            mozilla    other
  status -nmcd 'glob:a**'      0.26     0.47
  debugpathcomplete -f -n a    0.10     0.24
  debugpathcomplete -n a       0.10     0.22

We should be able to further improve completion performance once
the critbit work lands. Right now, our performance is limited by
the need to iterate over all keys in the dirstate.
2013-03-21 16:31:28 -07:00
Durham Goode
83b3faf2ec strip: make --keep option not set all dirstate times to 0
hg strip -k was using dirstate.rebuild() which reset all the dirstate
entries timestamps to 0.  This meant that the next time hg status was
run every file was considered to be 'unsure', which caused it to do
expensive read operations on every filelog. On a repo with >150,000
files it took 70 seconds when everything was in memory.  From a cold
cache it took several minutes.

The fix is to only reset files that have changed between the working
context and the destination context.

For reference, --keep means the working directory is left alone during
the strip. We have users wanting to use this operation to store their
work-in-progress as a commit on a branch while they go work on another
branch, then come back later and be able to uncommit that work and
continue working.  They currently use 'git reset HARD^' to accomplish
this in git.
2013-03-06 20:13:09 -08:00
Durham Goode
5317973684 dirstate: fix generator/list error when using python 2.7
util.statfiles returns a generator on python 2.7 with c extensions disabled.
This caused util.statfiles(...) [0] to break in tests. Since we're only
stat'ing one file, I just changed it to call lstat directly.
2013-02-10 12:23:39 -08:00
Benoit Boissinot
431f8fbd16 merge crew and main 2013-02-11 01:21:24 +01:00
Siddharth Agarwal
98fa475598 dirstate: disable gc while parsing the dirstate
This prevents a performance regression an upcoming patch would otherwise
introduce because it indirectly delays parsing the dirstate a bit.
2013-02-10 16:23:14 +00:00
Durham Goode
b4e95fed5e dirstate: walk returns None for files that have a symlink in their path
Previously dirstate.walk would return a stat object for files in the dmap
that have a symlink to a directory in their path.  Now it will return None
to indicate that they are no longer considered part of the repository. This
currently only affects walks that traverse the entire directory tree (ex:
hg status) and not walks that only list the contents of the dmap (ex: hg diff).

In a situation like this:
  mkdir foo && touch foo/a && hg commit -Am "a"
  mv foo bar
  ln -s bar foo

'hg status' will now show '! foo/a', whereas before it incorrectly considered
'foo/a' to be unchanged.

In addition to making 'hg status' report the correct information, this will
allow callers to dirstate.walk to not have to detect symlinks themselves,
which can be very expensive.
2013-02-04 14:27:15 -08:00
Siddharth Agarwal
9334236621 dirstate: move pure python dirstate packing to pure/parsers.py 2013-01-17 23:46:08 -08:00
Idan Kamara
dd240808d9 dirstate: refresh _branch cache entry after writing it 2012-12-16 20:33:00 +02:00
Kevin Bullock
aaaf5fc704 merge with crew-stable 2012-12-16 23:02:54 -06:00
Tim Henigan
31b2db1033 dirstate: remove obsolete comment from setbranch
This comment should have been removed in 9d5893d31db9, when the call
to scmutil.checknewlabel was removed.
2012-11-29 08:44:54 -05:00
Idan Kamara
26e04e85c9 dirstate: don't rename branch file if writing it failed 2012-12-15 20:19:07 +02:00
Tim Henigan
68a1fc4488 update: allow update to existing branches with invalid names (issue3710)
Starting with 049792af94d6, users are no longer able to update a
working copy to a branch named with a "bad" character (such as ':').

Prior to v2.4, it was possible to create branch names using "bad"
characters, so this breaks backwards compatibility.

Mercurial must allow users to update to existing branches with bad
names.  However, it should continue to prevent the creation of new
branches with bad names.

A test was added to confirm that 'hg update' works as expected. The
test uses a bundled repo that was created with an earlier version of
Mercurial.
2012-11-27 08:47:35 -05:00
Siddharth Agarwal
cb69b98e9b dirstate: inline more properties and methods in status
hg perfstatus -u on a working directory with 170,000 files, without this
change:
! wall 1.839561 comb 1.830000 user 1.120000 sys 0.710000 (best of 6)

With this change:
! wall 1.804222 comb 1.790000 user 1.140000 sys 0.650000 (best of 6)

hg perfstatus on the same directory, without this change:
! wall 1.016609 comb 1.020000 user 0.670000 sys 0.350000 (best of 10)

With this change:
! wall 0.985573 comb 0.980000 user 0.650000 sys 0.330000 (best of 10)
2012-12-03 14:21:45 -08:00
Siddharth Agarwal
40de7ddbeb dirstate: test normalize is truthy instead of using a no-op lambda
hg perfstatus -u on a working directory with 170,000 files, without this
change:
! wall 1.869404 comb 1.850000 user 1.170000 sys 0.680000 (best of 6)

With this change:
! wall 1.839561 comb 1.830000 user 1.120000 sys 0.710000 (best of 6)
2012-12-04 10:29:18 -08:00