Commit Graph

13451 Commits

Author SHA1 Message Date
Siddharth Agarwal
5c1db53305 dirstate.walk: use the file foldmap to normalize
Computing the set of directories in the dirstate is expensive. It turns out
that it isn't necessary for operations like 'hg status' at all.

Why? Consider the file 'foo/bar' on disk, which is represented in the dirstate
as 'FOO/BAR'.

On 'hg status', we'd walk down the directory tree, coming across 'foo' first.

Before: we'd normalize 'foo' to 'FOO', then add 'FOO' to our visited stack.
We'd then visit 'FOO', finding the file 'bar'. We'd normalize 'FOO/bar' to
'FOO/BAR', then add it to the results dict.

After: we wouldn't normalize 'foo' at all. We'd add it to our visited stack,
then visit 'foo', finding the file 'bar'. We'd normalize 'foo/bar' to
'FOO/BAR', then add it to the results dict.

So whether we normalize intermediate directories or not actually makes no
difference in most cases.

The only case where normalization matters at all is if a file is replaced with
a directory with the same case-folded name. In that case we can do a relatively
cheap file normalization instead and still get away with not computing the set
of directories.

This is a nice boost in status performance. On OS X with case-insensitive HFS+,
for a large repo with over 200,000 files, this brings down 'hg status' from
4.00 seconds to 3.62.
2015-03-29 19:47:16 -07:00
Siddharth Agarwal
efae199860 dirstate: split the foldmap into separate ones for files and directories
Computing the set of directories in the dirstate can be pretty expensive. For
'hg status' without arguments, it turns out we actually never need to figure
out the right case for directories in the foldmap. (An upcoming patch explains
why.)

This patch splits up the directory and file maps into separate ones, allowing
for the subsequent optimization in status.
2015-03-29 19:42:49 -07:00
Siddharth Agarwal
15b2067928 dirstate: introduce function to normalize just filenames
This will be used in upcoming patches to stop generating the set of directories
in many common cases.
2015-03-28 18:53:54 -07:00
Siddharth Agarwal
28d1a1fb85 dirstate: factor out code to discover normalized path
In upcoming patches we're going to reuse this code. The storemap is currently
always the foldmap, but will vary in future patches.
2015-03-29 19:23:05 -07:00
Martin von Zweigbergk
3e235a01ca log: prefer 'wctx' over 'pctx' for working context 2015-03-18 21:44:25 -07:00
Matt Mackall
c2e689e49d merge with stable 2015-03-31 08:31:42 -05:00
Matt Mackall
1dee8dda3b tags: remove scary message about corrupt tags cache
Caches should be transparent. If a cache is damaged, it should
silently be rebuilt, much like if it were invalid. No one seems to
have ever hit this in the wild.
2015-03-31 08:04:42 -05:00
Yuya Nishihara
4a691471f2 templates: fix "log -q" output of phases style
It had the same problem as be4dab229c78, name conflicts of {node} keyword.
2015-03-28 20:22:03 +09:00
Martin von Zweigbergk
7dfcb254e9 manifestv2: implement slow readdelta() without revdiff
For manifest v2, revlog.revdiff() usually does not provide enough
information to produce a manifest. As a simple workaround, implement
readdelta() by reading both the old and the new manifest and use
manifest.diff() to find the difference. This is several times slower
than the current readdelta() for v1 manifests, but there seems to be
no other simple option, and this is still much faster than returning
the full manifest (at least for verify).
2015-03-27 20:41:30 -07:00
Martin von Zweigbergk
b7bfa722d1 manifestv2: disable fastdelta optimization
We may add support for the fastdelta optimization for manifest v2 at a
later point, but let's disable it for now, so we don't have to
implement it right away.
2015-03-27 17:07:24 -07:00
Martin von Zweigbergk
16d87fc88d manifestv2: add (unused) config option
With tree manifests, hashes will change anyway, so now is a good time
to also take up the old plans of a new manifest format. While there
should be little or no reason to use tree manifests with the current
manifest format (v1) once the new format (v2) is supported, we'll try
to keep the two dimensions (flat/tree and v1/v2) separate.

In preparation for adding a the new format, let's add configuration
for it and propagate that configuration to the manifest revlog
subclass. The new configuration ("experimental.manifestv2") says in
what format to write the manifest data. We may later add other
configuration to choose how to hash it, either keeping the v1 hash for
BC or hashing the v2 content.

See http://mercurial.selenic.com/wiki/ManifestV2Plan for more details.
2015-03-27 16:19:44 -07:00
Martin von Zweigbergk
a7479ae566 manifest: extract method for creating manifest text
Similar to the previous change, this one extracts a method for
producing a manifest text from an iterator over (path, node, flags)
tuples.
2015-03-27 15:37:46 -07:00
Martin von Zweigbergk
a16ddaed87 manifest: extract method for parsing manifest
By extracting a method that generates (path, node, flags) tuples, we
can reuse the code for parsing a manifest without doing it via a
_lazymanifest like treemanifest currently does. It also prepares for
parsing the new manifest format.

Note that this makes parsing into treemanifest slower, since the
parsing is now always done in pure Python. Since treemanifests will be
expected (or even forced) to be used only with the new manifest
format, parsing via _lazymanifest was not an option anyway.
2015-03-27 15:02:43 -07:00
Siddharth Agarwal
fe84899cf2 dirstate._walkexplicit: don't bother normalizing '.'
The overwhelmingly common case is running commands like 'hg diff' with no
arguments. Therefore the only file that'll be listed is the root directory.
Normalizing that's just a waste of time.

This means that for a plain 'hg diff' we'll never need to construct the
foldmap, saving us a significant chunk of time.

On case-insensitive HFS+ on OS X, for a large repository with over 200,000
files, this brings down 'hg diff' from 2.97 seconds to 2.36.
2015-03-29 18:28:48 -07:00
Siddharth Agarwal
8adc467907 dirstate._walkexplicit: drop normpath calls
The paths the matcher returns are normalized already.
2015-03-29 23:28:30 -07:00
Siddharth Agarwal
0d03b12a57 dirstate._walkexplicit: indicate root as '.', not ''
'.' is the canonical way to represent the root, and it's apparently the only
transformation that normpath makes.
2015-03-29 23:27:25 -07:00
Laurent Charignon
9430b8ddec phases: add killswitch for native implementation 2015-03-30 12:57:55 -07:00
Laurent Charignon
5e18944310 phases: move pure phase computation in a function 2015-03-30 12:48:15 -07:00
Laurent Charignon
dfc226357c revset: add hook after tree parsing
This will be useful to execute actions after the tree is parsed and
before the revset returns a match. Finding symbols in the parse tree
will later allow hashes of hidden revisions to work on the command
line without the --hidden flag.
2015-03-24 14:24:55 -07:00
Gregory Szorc
7ba88d1a50 commands.debugrevlog: report max chain length
This is sometimes useful to know. Report it.
2015-03-28 12:58:44 -07:00
Martin von Zweigbergk
35cf546efe _lazymanifest: drop unnecessary call to sorted()
The entries returned from _lazymanifest.iterentries() are already
sorted.
2015-03-27 20:55:54 -07:00
André Sintzoff
aadf1ad8aa parsers.c: avoid implicit conversion loses integer warnings
These warnings are raised by Apple LLVM version 6.0 (clang-600.0.57)
(based on LLVM 3.5svn) and were introduced in 37171a30314d
2015-03-29 19:06:23 +02:00
Drew Gottlieb
d2ab66f723 manifest: make manifest.intersectfiles() internal
manifest.intersectfiles() is just a utility used by manifest.matches(), and
a future commit removes intersectfiles for treemanifest for optimization
purposes.

This commit makes the intersectfiles methods on manifestdict and treemanifest
internal, and converts its test to a more generic testMatches(), which has the
exact same coverage.
2015-03-30 10:43:52 -07:00
Adrian Buehlmann
fa1861d8ff win32: add comment about WinError
Prevent reintroducing the bug that was added in a9badcbcfb79 (and fixed with
77bbc180dbf8).
2015-03-28 11:19:34 +01:00
Laurent Charignon
ff5c5b9b22 record_curses: fix ui bug for newly added file
With record's curses interface toggling and untoggling a newly added
file would lead to a confusing UI (the header was marked as partial
and the hunks as unselected). Tested additionally using the curses
interface with newly added, removed and modified files in a test repo.
2015-03-27 14:11:13 -07:00
Siddharth Agarwal
0a19443756 dirs.addpath: rework algorithm to search forward
This improves performance because it uses strchr rather than a loop.

For LLVM/clang version "Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM
3.5svn)" on OS X, for a repo with over 200,000 files, this improves perfdirs
from 0.248 seconds to 0.230 (7.3%)

For gcc 4.4.6 on Linux, for a test repo with over 500,000 files, this improves
perfdirs from 0.704 seconds to 0.658 (6.5%).
2015-03-27 01:03:06 -07:00
Matt Harbison
e1c0e69712 win32: 'raise ctypes.WinError' -> 'raise ctypes.WinError()'
WinError is a function that creates an Error, not an Error itself.  This is a
partial backout of a9badcbcfb79.
2015-03-22 19:08:13 -04:00
Pierre-Yves David
41927328f0 mergecopies: reuse ancestry context when traversing file history (issue4537)
Merge copies is traversing file history in search for copies and renames.
Since 3.3 we are doing "linkrev adjustment" to ensure duplicated filelog entry
does not confuse the traversal. This "linkrev adjustment" involved ancestry
testing and walking in the changeset graph. If we do such walk in the changesets
graph for each file, we end up with a 'O(<changesets>x<files>)' complexity
that create massive issue. For examples, grafting a changeset in Mozilla's repo
moved from 6 seconds to more than 3 minutes.

There is a mechanism to reuse such ancestors computation between all files. But
it has to be manually set up in situation were it make sense to take such
shortcut. This changesets set this mechanism up and bring back the graph time
from 3 minutes to 8 seconds.

To do so, we need a bigger control on the way 'filectx' are instantiated during
each 'checkcopies' calls that 'mergecopies' is doing. We add a new 'setupctx'
that configure and return a 'filectx' factory. The function make sure the
ancestry context is properly created and the factory make sure it is properly
installed on returned 'filectx'.
2015-03-20 00:30:35 -07:00
Pierre-Yves David
ea1a0fd29f adjustlinkrev: handle 'None' value as source
When the source rev value is 'None', the ctx is a working context. We
cannot compute the ancestors from there so we directly skip to its
parents. This will be necessary to allow 'None' value for
'_descendantrev' itself necessary to make all contexts used in
'mergecopies' reuse the same '_ancestrycontext'.
2015-03-19 23:57:34 -07:00
Pierre-Yves David
7605b9a4cf adjustlinkrev: prepare source revs for ancestry only once
We'll need some more complex initialisation to handle workingfilectx
case. We do this small change in a different patch for clarity.
2015-03-19 23:52:26 -07:00
Pierre-Yves David
78f35efc70 annotate: reuse ancestry context when adjusting linkrev (issue4532)
The linkrev adjustment will likely do the same ancestry walking multiple time
so we already have an optional mechanism to take advantage of this. Since
4e4e9e954fae, linkrev adjustment was done lazily to prevent too bad performance
impact on rename computation. However, this laziness created a quadratic
situation in 'annotate'.

Mercurial repo: hg annotate mercurial/commands.py
before:   8.090
after:  36.300

Mozilla repo: hg annotate layout/generic/nsTextFrame.cpp
before:   1.190
after:  290.230


So we setup sharing of the ancestry context in the annotate case too. Linkrev
adjustment still have an impact but it a much more sensible one.

Mercurial repo: hg annotate mercurial/commands.py
before:  36.300
after:   10.230

Mozilla repo: hg annotate layout/generic/nsTextFrame.cpp
before: 290.230
after:    5.560
2015-03-19 19:52:23 -07:00
Yuya Nishihara
990aa241a1 templates: fix "log -q" output of default style
It was changed at ad92c202bbcd unintentionally due to name conflicts.
2015-03-14 22:34:27 +09:00
Yuya Nishihara
0e68ae9536 changeset_printer: use changectx to get status tuple
log.parents() can't handle wdir() revision. Because repo.status() creates ctx
objects, there would be no benefit to get parent node from changelog.
2015-03-14 17:40:47 +09:00
Yuya Nishihara
414876e494 changeset_printer: replace _meaningful_parentrevs() by changeset_templater's
Because changeset_printer needs pctx object anyway, there would be no benefit
to avoid creation of pctx in _meaningful_parentrevs().
2015-03-14 17:23:51 +09:00
Yuya Nishihara
a8d2b740f9 changeset_printer: use context objects consistently to show parents
This prepares for merging changeset_printer._maningful_parentrevs() with
changeset_templater's.
2015-03-14 17:19:04 +09:00
Matt Mackall
9fdd0e8abe verify: add a note about a paleo-bug
In the very early days of hg, it was possible to commit /dev/null because our
patch importer was too simple. Repos from this era may still
exist, add a note about why we ignore this name.
2015-03-27 15:13:21 -05:00
Matt Mackall
82e10f2861 cmdutil: remove some excess vertical whitespace 2015-03-27 13:51:21 -05:00
Matt Mackall
61bcacd057 revert: move calculation of targetsubs earlier 2015-03-27 13:48:51 -05:00
Laurent Charignon
0ad4a0efef record: change return value of recording code
It makes it easier to include interactive mode to more commands that
require to get a reference to the newly created node
2015-03-25 15:51:57 -07:00
Laurent Charignon
ce53ca6b37 revert: fix --interactive on local modification (issue4576)
We were moving files during the backup phase and it was incompatible with the
way record/crecord is working
2015-03-25 14:01:14 -07:00
FUJIWARA Katsunori
3706cf02be update: replace workingctx.dirty and raising Abort by cmdutil.bailifchanged
This patch makes wrapping "commands.update()" by largefiles extension
useless, because "cmdutil.bailifchanged()" can detect changes of
largefiles in the working directory.

This patch also changes test-update-branches.t, because
"cmdutil.bailifchanged()" shows more detailed information about
dirty-ness of the working directory than "workingctx.dirty()".
2015-03-25 13:55:35 +09:00
FUJIWARA Katsunori
9592a103f7 cmdutil: allow bailifchanged to ignore merging in progress
In "commands.update()", "cmdutil.bailifchanged()" isn't used for
"abort if the working directory is dirty", because it forcibly
examines about merging in progress.

"workingctx.dirty()" used in "commands.update()" can't detect changes
of largefiles in the working directory without "repo.lfstatus = True"
wrapping. This is only reason of "commands.update()" wrapping by
largefiles extension.

On the other hand, "cmdutil.bailifchanged()" already wrapped by
largefiles extension can detect changes of largefiles.

This patch is a preparations for replacing "workingctx.dirty()" and
raising Abort in "commands.update()" by "cmdutil.bailifchanged()". It
can remove redundant "commands.update()" wrapping.
2015-03-25 13:55:35 +09:00
FUJIWARA Katsunori
e269967115 subrepo: add bailifchanged to centralize raising Abort if subrepo is dirty
This patch also centralizes composing dirty reason message like
"uncommitted changes in subrepository 'xxxx'".
2015-03-25 13:55:35 +09:00
FUJIWARA Katsunori
d9071e6959 subrepo: add dirtyreason to centralize composing dirty reason message
This patch newly adds "dirtyreason()" to centralize composing dirty
reason message like "uncommitted changes in subrepository 'xxxx'".

There are 3 similar messages below, and this patch is a part of
preparations for unifying them into (1), too.

  1. uncommitted changes in subrepository 'XXXX'
  2. uncommitted changes in subrepository XXXX
  3. uncommitted changes in subrepo XXXX

This patch chooses adding new method "dirtyreason()" instead of making
"dirty()" return "reason string", because:

  - some of existing "dirty()" implementation is too complicated to do
    so simply, and

  - ill-mannered 3rd party subrepo classes, of which "dirty()" doesn't
    return "reason string", cause meaningless message (even though it
    is rare case)
2015-03-25 13:55:32 +09:00
Martin von Zweigbergk
c7787f3a4e treemanifest: drop 22nd byte for consistency with manifestdict
When assigning a 22-byte hash to a nodeid in a manifest, manifestdict
drops the 22nd byte, while treemanifest keeps it. Let's make
treemanifest drop the 22nd byte as well.
2015-03-26 09:42:21 -07:00
Matt Harbison
1b58f58b45 revert: evaluate subrepos to revert against the working directory
Reverting to a revision where the subrepo didn't exist will now abort, and
matching subrepos against the working directory is consistent with how filesets
are evaluated since dd1c701aad4d.
2015-03-25 22:20:44 -04:00
Matt Harbison
31c3ab572e revert: handle subrepos missing in the given --rev
The list of subrepos to revert is currently based on the given --rev, so there
is currently no way for this to fail.  Using the --rev context is wrong though,
because if the subrepo doesn't exist in --rev, it is skipped, so it won't be
changed.  This change makes it so that the revert aborts, which is what happens
if a plain file is reverted to -1.  Finding matches based on --rev is also
inconsistent with evaluating files against the working directory (dd1c701aad4d).

This change is made now, so as to not cause breakage when the context is
switched in an upcoming patch.
2015-03-25 21:54:47 -04:00
Siddharth Agarwal
45755aa70f osutil: mark end of string with null char, not 0
Noticed this while working on other stuff in the area.
2015-03-25 16:21:58 -07:00
Siddharth Agarwal
886bf50396 osutil: use getdirentriesattr on OS X if possible
This is a significant win for large repositories on OS X, especially with a
cold cache. Unfortunately we need to keep the lstat-based implementation around
for two reasons:

- Not all filesystems support this call.
- There's an edge case in which it's best to fall back to avoid a retry loop.
  More about this in the comments.

The below tests are all performed on a Mac with an SSD running OS X 10.9, on a
repository with over 200k files. The results are best of 5 with simulated
best-effort conditions.

The gains with a hot cache are pretty impressive: 'hg status' goes from 5.18
seconds to 3.79 seconds.

However, a repository that large will probably already be using something like
hgwatchman [1], which helps much more (for this repo, 'hg status' with
hgwatchman is approximately 1 second). Where this really helps is when the
cache is cold [2]: hg status goes from 31.0 seconds to 9.66.

See http://lists.apple.com/archives/filesystem-dev/2014/Dec/msg00002.html for
some more discussion about this function.

This is based on a patch by Sean Farley <sean@farley.io>.

[1] https://bitbucket.org/facebook/hgwatchman

[2] There appears to be no easy way to clear the file cache (aka "vnodes") on
OS X short of rebooting. purge(8) purportedly does that but in my testing had
little effect. The workaround I came up with was to assume that vnode eviction
was LRU, make sure the kern.maxvnodes sysctl is smaller than the size of the
repository, then make sure we'd always miss the cache by running 'hg status' in
another clone of the repository before running it in the test repository.
2015-03-25 15:55:31 -07:00
Siddharth Agarwal
2af765a245 osutil._listdir: rename to _listdir_stat
In upcoming patches we'll add another implementation of listdir on OS X. That
implementation will have to fall back to this one under some circumstances,
though. We'll make _listdir be able to detect those circumstances and use the
right function as appropriate.
2015-03-25 16:43:29 -07:00