Commit Graph

345 Commits

Author SHA1 Message Date
Bryan O'Sullivan
4a4a5dde94 scmutil: use new dirs class in dirstate and context
The multiset-of-directories code was open coded in each of these
modules; this change gets rid of the duplication.
2013-04-10 15:08:26 -07:00
Bryan O'Sullivan
00ff38c9c3 scmutil: migrate finddirs from dirstate 2013-04-10 15:08:25 -07:00
Bryan O'Sullivan
6f6047415e dirstate: only call lstat once per flags invocation
This makes a big difference to performance in some cases.

hg --time locate 'set:symlink()'

mozilla-central (70,000 files):

  before: 2.92 sec
  after:  2.47

another repo (170,000 files):

  before: 7.87 sec
  after:  6.86
2013-04-03 11:35:27 -07:00
Siddharth Agarwal
a6bf485dee dirstate.walk: fast path none-seen + match-always case for step 3
This case is a common one -- e.g. `hg diff`.

For a repository with 170,000 files, this speeds up perfstatus from 0.95
seconds to 0.88.
2013-03-22 17:03:49 -07:00
Siddharth Agarwal
5e52c298d1 dirstate.walk: fast path match-always case during traversal
This case is a common one -- e.g. `hg status`.

For a repository with 170,000 files, this speeds up perfstatus --unknown from
2.15 seconds to 2.09.
2013-03-22 17:03:00 -07:00
Siddharth Agarwal
68d46b6a0a dirstate.walk: remove subrepo and .hg from results before step 3
An upcoming patch will speed dirstate.walk up by not querying the results dict
when it is empty. This ensures it is in some common cases.

This should be safe because subrepos and .hg aren't part of the dirstate.
2013-03-25 14:12:39 -07:00
Bryan O'Sullivan
d521a241b2 completion: add a debugpathcomplete command
The bash_completion code uses "hg status" to generate a list of
possible completions for commands that operate on files in the
working directory. In a large working directory, this can result
in a single tab-completion being very slow (several seconds) as a
result of checking the status of every file, even when there is no
need to check status or no possible matches.

The new debugpathcomplete command gains performance in a few simple
ways:

* Allow completion to operate on just a single directory. When used
  to complete the right commands, this considerably reduces the
  number of completions returned, at no loss in functionality.

* Never check the status of files. For completions that really must
  know if a file is modified, it is faster to use status:

  hg status -nm 'glob:myprefix**'

Performance:

Here are the commands used by bash_completion to complete, run in
the root of the mozilla-central working dir (~77,000 files) and
another repo (~165,000 files):

All "normal state" files (used by e.g. remove, revert):

                            mozilla    other
  status -nmcd 'glob:**'       1.77     4.10 sec
  debugpathcomplete -f -n      0.53     1.26
  debugpathcomplete -n         0.17     0.41

("-f" means "complete full paths", rather than the current directory)

Tracked files matching "a":

                            mozilla    other
  status -nmcd 'glob:a**'      0.26     0.47
  debugpathcomplete -f -n a    0.10     0.24
  debugpathcomplete -n a       0.10     0.22

We should be able to further improve completion performance once
the critbit work lands. Right now, our performance is limited by
the need to iterate over all keys in the dirstate.
2013-03-21 16:31:28 -07:00
Durham Goode
83b3faf2ec strip: make --keep option not set all dirstate times to 0
hg strip -k was using dirstate.rebuild() which reset all the dirstate
entries timestamps to 0.  This meant that the next time hg status was
run every file was considered to be 'unsure', which caused it to do
expensive read operations on every filelog. On a repo with >150,000
files it took 70 seconds when everything was in memory.  From a cold
cache it took several minutes.

The fix is to only reset files that have changed between the working
context and the destination context.

For reference, --keep means the working directory is left alone during
the strip. We have users wanting to use this operation to store their
work-in-progress as a commit on a branch while they go work on another
branch, then come back later and be able to uncommit that work and
continue working.  They currently use 'git reset HARD^' to accomplish
this in git.
2013-03-06 20:13:09 -08:00
Durham Goode
5317973684 dirstate: fix generator/list error when using python 2.7
util.statfiles returns a generator on python 2.7 with c extensions disabled.
This caused util.statfiles(...) [0] to break in tests. Since we're only
stat'ing one file, I just changed it to call lstat directly.
2013-02-10 12:23:39 -08:00
Benoit Boissinot
431f8fbd16 merge crew and main 2013-02-11 01:21:24 +01:00
Siddharth Agarwal
98fa475598 dirstate: disable gc while parsing the dirstate
This prevents a performance regression an upcoming patch would otherwise
introduce because it indirectly delays parsing the dirstate a bit.
2013-02-10 16:23:14 +00:00
Durham Goode
b4e95fed5e dirstate: walk returns None for files that have a symlink in their path
Previously dirstate.walk would return a stat object for files in the dmap
that have a symlink to a directory in their path.  Now it will return None
to indicate that they are no longer considered part of the repository. This
currently only affects walks that traverse the entire directory tree (ex:
hg status) and not walks that only list the contents of the dmap (ex: hg diff).

In a situation like this:
  mkdir foo && touch foo/a && hg commit -Am "a"
  mv foo bar
  ln -s bar foo

'hg status' will now show '! foo/a', whereas before it incorrectly considered
'foo/a' to be unchanged.

In addition to making 'hg status' report the correct information, this will
allow callers to dirstate.walk to not have to detect symlinks themselves,
which can be very expensive.
2013-02-04 14:27:15 -08:00
Siddharth Agarwal
9334236621 dirstate: move pure python dirstate packing to pure/parsers.py 2013-01-17 23:46:08 -08:00
Idan Kamara
dd240808d9 dirstate: refresh _branch cache entry after writing it 2012-12-16 20:33:00 +02:00
Kevin Bullock
aaaf5fc704 merge with crew-stable 2012-12-16 23:02:54 -06:00
Tim Henigan
31b2db1033 dirstate: remove obsolete comment from setbranch
This comment should have been removed in 9d5893d31db9, when the call
to scmutil.checknewlabel was removed.
2012-11-29 08:44:54 -05:00
Idan Kamara
26e04e85c9 dirstate: don't rename branch file if writing it failed 2012-12-15 20:19:07 +02:00
Tim Henigan
68a1fc4488 update: allow update to existing branches with invalid names (issue3710)
Starting with 049792af94d6, users are no longer able to update a
working copy to a branch named with a "bad" character (such as ':').

Prior to v2.4, it was possible to create branch names using "bad"
characters, so this breaks backwards compatibility.

Mercurial must allow users to update to existing branches with bad
names.  However, it should continue to prevent the creation of new
branches with bad names.

A test was added to confirm that 'hg update' works as expected. The
test uses a bundled repo that was created with an earlier version of
Mercurial.
2012-11-27 08:47:35 -05:00
Siddharth Agarwal
cb69b98e9b dirstate: inline more properties and methods in status
hg perfstatus -u on a working directory with 170,000 files, without this
change:
! wall 1.839561 comb 1.830000 user 1.120000 sys 0.710000 (best of 6)

With this change:
! wall 1.804222 comb 1.790000 user 1.140000 sys 0.650000 (best of 6)

hg perfstatus on the same directory, without this change:
! wall 1.016609 comb 1.020000 user 0.670000 sys 0.350000 (best of 10)

With this change:
! wall 0.985573 comb 0.980000 user 0.650000 sys 0.330000 (best of 10)
2012-12-03 14:21:45 -08:00
Siddharth Agarwal
40de7ddbeb dirstate: test normalize is truthy instead of using a no-op lambda
hg perfstatus -u on a working directory with 170,000 files, without this
change:
! wall 1.869404 comb 1.850000 user 1.170000 sys 0.680000 (best of 6)

With this change:
! wall 1.839561 comb 1.830000 user 1.120000 sys 0.710000 (best of 6)
2012-12-04 10:29:18 -08:00
Bryan O'Sullivan
09d0142d5e dirstate: avoid use of zip on big lists
In a clean working directory containing 170,000 tracked files, this
improves performance of "hg --time diff" from 1.69 seconds to 1.43.

This idea is due to Siddharth Agarwal.
2012-11-30 15:55:08 -08:00
Bryan O'Sullivan
a288702cfd dirstate: move file type filtering to its source
This prepares us to move to a much faster statfiles implementation on Unix.
2012-11-30 15:55:07 -08:00
Bryan O'Sullivan
2484b3b328 dirstate: handle dangling junctions on windows (issue2579) 2012-10-23 21:25:22 -07:00
Tim Henigan
e6d3f7258f dirstate: remove obsolete comment from setbranch
This comment should have been removed in 4d438984605c, when the call
to scmutil.checknewlabel was removed.
2012-11-29 08:44:54 -05:00
Kevin Bullock
13bf0fbbec scmutil: add bad character checking to checknewlabel
This factors out the checks from tags and bookmarks, and newly applies
the same prohibitions to branches. checknewlabel takes a new parameter,
kind, indicating the kind of label being checked.

Test coverage is added for all three types of labels.
2012-10-17 21:42:06 -05:00
Kevin Bullock
68efaf1422 dirstate: use scmutil.checknewlabel to check new branch name 2012-10-17 21:32:19 -05:00
Matt Mackall
38701c0855 dirstate: handle large dates and times with masking (issue2608)
Dates and times that are outside the 31-bit signed range are now
compared modulo 2^31. This should prevent it from behaving badly with
very large files or corrupt dates while still having a high
probability of detecting changes.
2012-10-08 17:50:42 -05:00
Matt Mackall
c49b56acb4 dirstate: drop assert 2012-07-16 16:19:53 -05:00
Adrian Buehlmann
3087eeedfb dirstate: eliminate redundant check parameter on _addpath()
state == 'a' implies check

I fail to see what the point of this check parameter is. Near as I can see,
the only _addpath call where it was set to True was in add(), but there, state
is 'a'.

This is a follow-up to 24a646d9943a.
2012-07-04 01:31:37 +02:00
Joshua Redstone
bb82354578 dirstate: factor common update code into _addpath
Factor update code common to all callers of _addpath into _addpath.
By centralizing the update code here, it provides one place to put
updates to new data structures - in a future patch.  It also removes
a few lines of duplicate code.
2012-06-18 08:06:42 -07:00
Bryan O'Sullivan
ce2a30609e parsers: add a C function to pack the dirstate
This is about 9 times faster than the Python dirstate packing code.
The relatively small speedup is due to the poor locality and memory
access patterns caused by traversing dicts and other boxed Python
values.
2012-05-30 12:55:33 -07:00
Brodie Rao
a7ef0a0cc5 cleanup: "not x in y" -> "x not in y" 2012-05-12 16:00:57 +02:00
Brodie Rao
d6a6abf2b0 cleanup: eradicate long lines 2012-05-12 15:54:54 +02:00
Patrick Mezard
2c65c226cf localrepo: add setparents() to adjust dirstate copies (issue3407)
The fix introduced in 3509b9cf8f86 was only partially successful. It is correct
to turn dirstate 'm' merge records into normal/dirty ones but copy records are
lost in the process. To adjust them as well, we need to look in the first
parent manifest to know which files were added and preserve only related
records. But the dirstate does not have access to changesets, the logic has to
moved at another level, in localrepo.
2012-04-29 22:25:55 +02:00
Patrick Mezard
68ff0fa6cf dirstate: preserve path components case on renames (issue3402)
The original issue was something like:

  $ hg init repo
  $ cd repo
  $ mkdir D
  $ echo a > D/a
  $ hg ci -Am adda
  adding D/a
  $ mv D temp
  $ mv temp d
  $ echo b > d/b
  $ hg add d/b
  adding D/b
  $ hg ci -m addb
  $ hg mv d/b d/c
  moving D/b to d/c
  $ hg st
  A d/c
  R D/b

Here we expected:

  A D/c
  R D/b

the logic being we try to preserve case of path components already known in the
dirstate. This is fixed by the current patch.

Note the following stories are not still not supported:

Changing directory case
  $ hg mv D d
  moving D/a to D/D/a
  moving D/b to D/D/b
  $ hg st
  A D/D/a
  A D/D/b
  R D/a
  R D/b

or:

  $ hg mv D/* d
  D/a: not overwriting - file exists
  D/b: not overwriting - file exists

And if they were, there are probably similar issues with diffing/patching.
2012-04-28 20:29:21 +02:00
Patrick Mezard
3c6f7e9f84 rebase: skip resolved but emptied revisions
When rebasing, if a conflict occurs and is resolved in a way the rebased
revision becomes empty, it is not skipped, unlike revisions being emptied
without conflicts.

The reason is:
- File 'x' is merged and resolved, merge.update() marks it as 'm' in the
  dirstate.
- rebase.concludenode() calls localrepo.commit(), which calls
  localrepo.status() which calls dirstate.status(). 'x' shows up as 'm' and is
  unconditionnally added to the modified files list, instead of being checked
  again.
- localrepo.commit() detects 'x' as changed an create a new revision where only
  the manifest parents and linkrev differ.

Marking 'x' as modified without checking it makes sense for regular merges. But
in rebase case, the merge looks normal but the second parent is usually
discarded. When this happens, 'm' files in dirstate are a bit irrelevant and
should be considered 'n' possibly dirty instead. That is what the current patch
does.

Another approach, maybe more efficient, would be to pass another flag to
merge.update() saying the 'branchmerge' is a bit of a lie and recordupdate()
should call dirstate.normallookup() instead of merge().

It is also tempting to add this logic to dirstate.setparents(), moving from two
to one parent is what invalidates the 'm' markers. But this is a far bigger
change to make.

v2: succumb to the temptation and move the logic in dirstate.setparents(). mpm
suggested trying _filecommit() first but it is called by commitctx() which
knows nothing about the dirstate and comes too late into the game. A second
approach was to rewrite the 'm' state into 'n' on the fly in dirstate.status()
which failed for graft in the following case:

  $ hg init repo
  $ cd repo
  $ echo a > a
  $ hg ci -qAm0
  $ echo a >> a
  $ hg ci -m1
  $ hg up 0
  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
  $ hg mv a b
  $ echo c > b
  $ hg ci -m2
  created new head
  $ hg graft 1 --tool internal:local
  grafting revision 1
  $ hg --config extensions.graphlog= glog --template '{rev} {desc|firstline}\n'
  @  3 1
  |
  o  2 2
  |
  | o  1 1
  |/
  o  0 0

  $ hg log -r 3 --debug --patch --git --copies
  changeset:   3:19cd7d1417952af13161b94c32e901769104560c
  tag:         tip
  phase:       draft
  parent:      2:b5c505595c9e9a12d5dd457919c143e05fc16fb8
  parent:      -1:0000000000000000000000000000000000000000
  manifest:    3:3d27ce8d02241aa59b60804805edf103c5c0cda4
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  extra:       branch=default
  extra:       source=a03df74c41413a75c0a42997fc36c2de97b26658
  description:
  1

Here, revision 3 is created because there is a copy record for 'b' in the
dirstate and thus 'b' is considered modified. But this information is discarded
at commit time since 'b' content is unchanged. I do not know if discarding this
information is correct or not, but at this time we cannot represent it anyway.

This patch therefore implements the last solution of moving the logic into
dirstate.setparents(). It does not sound crazy as 'm' files makes no sense with
only one parent. It also makes dirstate.merge() calls .lookupnormal() if there
is one parent, to preserve the invariant.

I am a bit concerned about introducing this kind of stateful behaviour to
existing code which historically treated setparents() as a basic setter without
side-effects. And doing that during the code freeze.
2012-04-22 20:06:36 +02:00
Idan Kamara
62440b9ac3 dirstate: write branch file atomically 2012-04-19 18:11:42 +03:00
FUJIWARA Katsunori
36e9e8107c dirstate: fix some problems for recursive case normalization (issue3342)
file in nested directory causes unexpected abort.

problems below should be fixed for recursive normalization route in
dirstate._normalize():

    1. rsplit() may cause unpacking into more than 2 elements.
       it should be called with 'maxsplit' argument to unpack
       into 'd, f'

    2. 'd' is replaced by normalized value prefixed with
       'self._root', but this makes 'folded' as absolute path,
       and it is unexpected one for caller of recursive
       normalization
2012-03-31 15:55:03 +09:00
FUJIWARA Katsunori
dba1f40821 dirstate: avoid normalizing letter case on icasefs for exact match (issue3340)
on icasefs, "hg qnew" fails to import changing letter case of filename
already occurred in working directory, for example:

    $ hg rename a tmp
    $ hg rename tmp A
    $ hg qnew casechange
    $ hg status
    R a
    $

"hg qnew" invokes 'dirstate.walk()' via 'localrepository.commit()'
with 'exact match' matching object having exact filenames of targets
in ones 'files()'.

current implementation of 'dirstate.walk()' always normalizes letter
case of filenames from 'match.files()' on icasefs, even though exact
matching is required.

then, files only different in letter case are treated as one file.

this patch prevents 'dirstate.walk()' from normalizing, if exact
matching is required, even on icasefs.

filenames for 'exact matching' are given not from user command line,
but from dirstate walk result, manifest of changecontext, patch files
or fixed list for specific system files (e.g.: '.hgtags').

in such case, case normalization should not be done, so this patch
works well.
2012-03-31 00:04:08 +09:00
Matt Mackall
b7a9e07a8c dirstate: normalize case of directory components
If we have an existing f/a, and rename f to F, adding F/b should be
normalized to f/b.
2012-03-28 19:24:13 -05:00
FUJIWARA Katsunori
71230ee2e8 icasefs: use case preserved root for 'util.fspath()' invocation (issue3302)
path to repo root may contains case sensitive part, even though repo
is located in case insensitive filesystem: e.g. repo in FAT32 device
mounted on Unix.

so, case normalized root causes failure of stat(2).

this patch uses case preserved root for 'util.fspath()' invocation to
avoid this problem.

case preserved root for 'util.fspath()' may decrease efficiency of
fspath cache, but 'util.fspath()' is currently called only from
dirstate, so this fix has less impact.
2012-03-15 00:46:37 +09:00
Idan Kamara
653ad0c355 dirstate: filecacheify _ignore (issue3278)
This still doesn't handle the case where a command is run with
--config ui.ignore=path since we only look for changes in .hgignore.
2012-03-01 17:49:59 +02:00
Idan Kamara
ca3ebc8b96 dirstate: filecacheify _branch
The opener is relative to .hg, use it in a subclass of filecache to compute
the final path.
2012-03-01 17:42:49 +02:00
Idan Kamara
8fa8539f01 dirstate: add filecache support 2012-03-01 17:39:58 +02:00
FUJIWARA Katsunori
a4ea09aa29 context: add 'dirs()' to changectx/workingctx for directory patterns
this patch adds 'dirs()' to changectx/workingctx, which returns map of
all directories deduced from manifest, to examine whether specified
pattern is related to the context as directory or not quickly.

'workingctx.dirs()' uses 'dirstate.dirs()' rather than building
another copy of it.
2012-02-23 00:07:54 +09:00
Matt Mackall
0fb748c56c merge with stable 2012-01-09 20:16:57 -06:00
Martin Geisler
ba8731035e Use explicit integer division
Found by running the test suite with the -3 flag to show places where
we have int / int division that can be replaced with int // int.
2012-01-08 18:15:54 +01:00
Pierre-Yves David
cbc6c76868 dirstate: propagate IOError other than ENOENT when reading branch 2012-01-06 07:37:59 +01:00
FUJIWARA Katsunori
055136813d icasefs: avoid normcase()-ing in util.fspath() for efficiency
'dirstate._normalize()', the only caller of 'util.fspath()', has
already normcase()-ed path before invocation of it.

normcase()-ed root can be cached on dirstate side, too.

so, this patch changes 'util.fspath()' API specification to avoid
normcase()-ing in it.
2011-12-16 21:09:40 +09:00
FUJIWARA Katsunori
167dff84e5 dirstate: prevent useless util.fspath() invocation for '.'
at first of dirstate.walk() on case insensitive filesystem,
normalization of '.' causes util.fspath() invocation, but '.' is not
cached in it.

this invocation is not only useless, but also harmful: initial "hg
tag" causes creation of ".hgtags" file after dirstate.walk(), and
looking up ".hgtags" in cache will fail, because directory contents of
root is already cached at util.fspath() invocation for '.'.
2011-12-16 21:09:40 +09:00