Commit Graph

31972 Commits

Author SHA1 Message Date
Gregory Szorc
5d6e940365 revlog: rename _chunkraw to _getsegmentforrevs()
This completes our rename of internal revlog methods to
distinguish between low-level raw revlog data "segments" and
higher-level, per-revision "chunks."

perf.py has been updated to consult both names so it will work
against older Mercurial versions.
2017-05-06 12:12:53 -07:00
Gregory Szorc
e83c5e1628 perf: store reference to revlog._chunkraw in a local variable
To prepare for renaming revlog._chunkraw, we stuff a reference to this
metho in a local variable. This does 2 things. First, it moves the
attribute lookup outside of a loop, which more accurately measures
the time of the code being invoked. Second, it allows us to alias
to different methods depending on their presence (perf.py needs to
support running against old Mercurial versions).

Removing an attribute lookup from a tigh loop appears to shift the
numbers slightly with mozilla-central:

$ hg perfrevlogchunks -c

! read
! wall 0.354789 comb 0.340000 user 0.330000 sys 0.010000 (best of 28)
! wall 0.335932 comb 0.330000 user 0.290000 sys 0.040000 (best of 30)
! read w/ reused fd
! wall 0.342326 comb 0.340000 user 0.320000 sys 0.020000 (best of 29)
! wall 0.332857 comb 0.340000 user 0.290000 sys 0.050000 (best of 30)
! read batch
! wall 0.023623 comb 0.020000 user 0.000000 sys 0.020000 (best of 124)
! wall 0.023666 comb 0.020000 user 0.000000 sys 0.020000 (best of 125)
! read batch w/ reused fd
! wall 0.023828 comb 0.020000 user 0.000000 sys 0.020000 (best of 124)
! wall 0.023556 comb 0.020000 user 0.000000 sys 0.020000 (best of 126)
2017-05-06 12:02:31 -07:00
Gregory Szorc
46413ff643 revlog: rename internal functions containing "chunk" to use "segment"
Currently, "chunk" is overloaded in revlog terminology to mean
multiple things. One of them refers to a segment of raw data from
the revlog. This commit renames various methods only used within
revlog.py to have "segment" in their name instead of "chunk."

While I was here, I also made the names more descriptive. e.g.
"_loadchunk()" becomes "_readsegment()" because it actually does
I/O.
2017-05-06 12:02:12 -07:00
Jun Wu
4229b98381 fsmonitor: do not nuke dirstate filecache
In the future, chg may prefill repo's dirstate filecache so it's valuable
and should be kept. Previously we drop both filecache and property cache for
dirstate during fsmonitor reposetup, this patch changes it to only drop
property cache but keep the filecache.
2017-05-06 16:36:24 -07:00
Gregory Szorc
d84946776c perf: move gettimer() call
This is more consistent with other perf* functions.
2017-05-06 11:01:02 -07:00
Gregory Szorc
c89bd8ad72 perf: don't clobber startrev variable
Previously, the "startrev" argument would be ignored due to
"startrev = 0" in the benchmark function. This meant that
`hg perfrevlog` always started at revision 0.

Rename the local variable to "beginrev" so the variable does the
right thing.
2017-05-06 10:59:38 -07:00
Pierre-Yves David
68b43a6db4 bundle: add optional 'tagsfnodecache' data to on disk bundle (issue5543)
This should help performance when unbundling.
2017-05-05 17:31:15 +02:00
Pierre-Yves David
a8c6292f9c bundle2: move tagsfnodecache generation in a generic function
This will help us reusing the logic for `hg bundle`.
2017-05-05 17:28:52 +02:00
Pierre-Yves David
6777304ca7 bundle: introduce an higher level function to write bundle on disk
The current function ('writebundle') is focussing on getting an existing
changegroup to disk. It is no easy ways to includes more part in the generated
bundle2. So we introduce a slightly higher level function that is fed the
'outgoing' object (that defines the bundled spec) and the bundlespec parameters
(to control the changegroup generation and inclusion of other parts).

This is creating the third logic dedicated to create a consistent bundle2 (the
other 2 are the push code and the getbundle code). We should probably reconcile
them at some points but they all takes different types of input. So we need to
introduce an intermediate "object" that each different input could be converted
to. Such unified "bundle2 specification" could be fed to some unified code.

We start by having the `hg bundle` related code on its own to helps defines its
specific needs first. Once the common and specific parts of each logic will be
known we can start unification.
2017-05-05 17:09:47 +02:00
Pierre-Yves David
e9ce49e179 bundle: handle compression earlier
We can also handle that part before starting any generation.
2017-05-04 21:47:03 +02:00
Pierre-Yves David
f76028d798 bundle: check changegroup version earlier
We can check if we know how to bundle this changegroup version before actually
starting to generate the changegroup.
2017-05-04 21:46:02 +02:00
Pierre-Yves David
f734e61f99 bundle: check lack of revs to bundle before generating the changegroup
We already have the information so we can check it earlier.
2017-05-04 21:44:36 +02:00
Matt Harbison
36f740cc43 extdiff: copy back files to the working directory if the size changed
In theory, it should be enough to pay attention only to the modification time
when detecting if a snapshotted working directory file changed.  In practice,
BeyondCompare preserves all file attributes when syncing files at the directory
level.  (If you open the file and sync individual hunks, then mtime does change,
and everything was being copied back as desired.)  I'm not sure how many other
synchronization tools would trigger this issue, but it's annoyingly inconsistent
(if a single file is diffed, it isn't snapshotted, so the same BeyondCompare
file sync operation _is_ visible, because wdir() is updated in place.

I filed a bug with them, and they stated it is on their wish list, but won't be
fixed in the near term.  This isn't a complete fix (there is still the case of
the size not changing), but this seems like a trivial enough change to fix most
of the problem.  I suppose we could fool around with making files in the other
snapshot readonly, and copy back if we see the readonly bit copied.  That seems
pretty hacky though, and only works if the external tool copies all attributes.
2017-05-06 23:00:57 -04:00
Matt Harbison
76f6c3e6b8 test-extdiff: enable a previously failing test on Windows 2017-05-06 22:48:06 -04:00
Matt Harbison
a8aad83714 test-extdiff: narrow the range of an '#if execbit' block
Now that output can be conditionalized, the few `chmod +x` specific outputs can
be conditionalized, and the rest of the tests run as normal.  Disable one test
that is failing on Windows for now.
2017-05-06 19:11:59 -04:00
Matt Harbison
73a6095bf9 test-extdiff: deduplicate tests 2017-05-06 14:36:26 -04:00
Matt Harbison
d12266faed test-extdiff: fill in a missing Windows test 2017-05-06 13:37:00 -04:00
Yuya Nishihara
5314481bbf policy: eliminate ".pure." from module name only if marked as dual
So we can switch cext/pure modules to new layout one by one.
2016-08-13 17:21:58 +09:00
Yuya Nishihara
276d62703a policy: add "cext" package which will host CPython extension modules
I'm going to restructure cext/pure modules and get rid of our hgimporter
hack. C extension modules will be moved to cext/ directory so old and new
compiled modules can coexist in development tree. This is necessary to
run 'hg bisect' without recompiling.

New extension modules will be loaded by an importer function:

  base85 = policy.importmod('base85')  # select pure.base85 or cext.base85

This will also allow us to split cffi from pure modules, which is currently
difficult because pure modules can't be imported by name.
2016-08-12 11:06:14 +09:00
Yuya Nishihara
6a96222913 policy: mark all string literals as sysstr or bytes
The policy module won't be imported early in future, which means string
literals will be processed by our Python 3 loader.
2017-05-02 18:35:09 +09:00
Yuya Nishihara
65560a3296 debuginstall: check C extensions only if they are loadable per policy
This check is useless in pure installation and I want to make it directly
import C extension modules.
2017-04-26 23:30:52 +09:00
Yuya Nishihara
cbe21a1cc9 osutil: proxy through util (and platform) modules (API)
See the previous commit for why. Marked as API change since osutil.listdir()
seems widely used in third-party extensions.

The win32mbcs extension is updated to wrap both util. and windows. aliases.
2017-04-26 22:26:28 +09:00
Yuya Nishihara
9adc269e31 mpatch: proxy through mdiff module
See the previous commit for why.
2017-04-26 22:05:59 +09:00
Yuya Nishihara
d9d64e114f bdiff: proxy through mdiff module
See the previous commit for why.

mdiff seems a good place to host bdiff functions. bdiff.bdiff was already
aliased as textdiff, so we use it.
2017-04-26 22:03:37 +09:00
Yuya Nishihara
ab046506ef base85: proxy through util module
I'm going to replace hgimporter with a simpler import function, so we can
access to pure/cext modules by name:

  # util.py
  base85 = policy.importmod('base85')  # select pure.base85 or cext.base85

  # cffi/base85.py
  from ..pure.base85 import *  # may re-export pure.base85 functions

This means we'll have to use policy.importmod() function in place of the
standard import statement, but we wouldn't want to write it every place where
C extension modules are used. So this patch makes util host base85 functions.
2017-04-26 21:56:47 +09:00
Yuya Nishihara
32bd8b34ed mdiff: move re-exports to top
This style seems more common in our codebase.
2017-05-02 17:05:22 +09:00
Yuya Nishihara
dcf75add48 test-commit-interactive-curses: remove unused import of parsers 2017-05-02 19:10:55 +09:00
Durham Goode
fd9ba7b071 strip: make tree stripping O(changes) instead of O(repo)
The old tree stripping logic iterated over every tree revlog in the repo looking
for commits that had revs to be stripped. That's very inefficient in large
repos. Instead, let's look at what files are touched by the strip and only
inspect those revlogs.

I don't have actual perf numbers, since internally we don't use a true
treemanifest, but simply iterating over hundreds of thousands of revlogs takes
many, many seconds, so this should help tremendously when stripping only a few
commits.
2017-05-08 11:35:23 -07:00
Durham Goode
7f29b67f6b strip: move tree strip logic to it's own function
This will allow external extensions to modify tree strip behavior more
precisely.
2017-05-08 11:35:23 -07:00
Martin von Zweigbergk
53fada2d33 manifest: remove unused property _oldmanifest
The last use seems to have gone away in 9df18405feb6 (manifest: make
manifestlog use it's own cache, 2016-11-10).
2017-05-08 09:39:21 -07:00
Pulkit Goyal
60e05dd544 py3: convert key to str to make kwargs.pop work in mq
The keys are passed here and there as unicodes and our transformer make things
bytes. Due to that, mq was not poped and this results in error on Py3.
Here we abuse r'' to make that str on Python 3.
2017-05-05 04:48:42 +05:30
Pulkit Goyal
c3219cc9ed py3: convert kwargs' keys to str before passing in cmdutil.getcommiteditor 2017-05-05 04:41:45 +05:30
Jun Wu
9147494fab diff: add a fast path to avoid loading binary contents
When diffing binary contents, with certain configs, we can show
"Binary file <name> has changed" without actual content.

That allows a fast path where we could avoid providing actual binary
contents. Note: in that case we still need to test if two contents are the
same, that's done by using "filectx.cmp", which could have its own fast
path.
2017-05-03 23:50:41 -07:00
Jun Wu
49fe7ea4f6 diff: correct binary testing logic
This seems to be more correct given the table drawn in the previous patch.

Namely, "losedatafn" and "opts.git" are removed, "not opts.text" is added.

  - losedatafn: diff output (binary) should not be affected by "losedatafn"
  - opts.git: binary testing is helpful for detecting a fast path in the
    next path. the fast path can also be used if opts.git is False
  - opts.text: if it's set, we should treat the content as non-binary
2017-05-05 17:20:32 -07:00
Jun Wu
2b452ab2f5 diff: draw a table about binary diff behaviors
The table should make it easier to reason about future changes.
2017-05-05 16:48:58 -07:00
Jun Wu
119973f0ba diff: use fctx.size() to test empty
fctx.size() could have a fast path that does not require loading content.
2017-05-03 22:20:44 -07:00
Jun Wu
8b31c4d99f diff: use fctx.isbinary() to test binary
The end goal is to avoid calling fctx.data() when unnecessary. For example,
if diff.nobinary=1 and files are binary, the expected behavior is to print
"Binary file has changed". That could avoid reading fctx.data() sometimes.

This is mainly to enable an external LFS extension to skip expensive binary
file loading sometimes (read: most of the time with diff.nobinary=1 and
diff.text=0), without any behavior changes to mercurial (i.e. whether a file
is LFS or not does not change any behavior, LFS could be 100% transparent to
users).
2017-05-03 22:16:54 -07:00
Yuya Nishihara
ffd3ffbc1a pycompat: extract helper to raise exception with traceback
It uses "raise excobj, None, tb" form which I think is simpler and more
useful than "raise exctype, args, tb".
2017-04-20 22:16:12 +09:00
Yuya Nishihara
e4989d80e7 check-code: ignore re-exports of os.environ in encoding.py
These are valid uses of os.environ.
2017-05-01 17:23:48 +09:00
Yuya Nishihara
edbbe128cc check-code: exclude demandimport.py and policy.py from Python 3 checks
These modules can't depend on pycompat.py, which means we have to write Py3
hacks in them.
2017-04-26 21:51:19 +09:00
Yuya Nishihara
33b4c27bff check-code: rewrite py3 exclusion pattern with negative lookahead
I want to add more patterns, but negative lookbehind requires patterns of
the same length so not useful.
2017-05-01 17:10:22 +09:00
Yuya Nishihara
1310da54cd cleanup: remove useless re-raises of KeyboardInterrupt
KeyboardInterrupt is no longer a subclass of Exception since Python 2.5.

https://docs.python.org/2/whatsnew/2.5.html#pep-352-exceptions-as-new-style-classes
2017-05-03 11:16:55 +09:00
Yuya Nishihara
b4362a7bf8 make: drop deprecated rule to process temporary copy of pure modules
Pure modules never be copied to mercurial/ since d071f155c000.
2016-08-12 11:36:42 +09:00
Martin von Zweigbergk
da238dd9f0 dirstate: optimize walk() by using match.visitdir()
We already have the logic for restricting directory walks in
match.visitdir() that we use for treemanifests. We should take
advantage of it when walking the working copy as well.

This speeds up "hg st -I rootfilesin:." on the Firefox repo from
0.587s to 0.305s on warm disk (and much more on cold disk). More time
is spent reading the dirstate than walking the working copy after.

I tried to find scenarios where calling match.visitdir() would be a
noticeable overhead, but I couldn't find any. I encourage the reader
to try for themselves, since this is performance-critical code.
2017-05-05 08:49:46 -07:00
Martin von Zweigbergk
41765f4b1b match: optimize visitdir() for patterns matching only root directory
Because _rootsanddirs() returns a list of directories to visit
recursively and a list of directories to visit non-recursively. For
patterns such as 'rootfilesin:foo/bar', we clearly need to visit the
directory foo/bar, but we also need to visit its parents. The method
therefore uses util.dirs() to find the parent directories of
'foo/bar'. That method does not include the root directory, but since
we obviously need to visit the root directory, we always added '.' to
the set of directories to visit non-recursively.

The visitdir() method had special handling to consider set(['.']) to
mean that no includes had been specified and would thus visit all
directories. However, when the pattern is 'rootfilesin:.', set(['.'])
is actually the real set of directories to visit and the special
handling of that set meant that all directories got visited instead of
just the root directory.

The fix is simple: add '.' to the set of parent directories in
_rootsanddirs() and stop treating set(['.']) specially. This makes

  hg files -r .  -I rootfilesin:.

in a treemanifest version of the Firefox repo go from 1.5s to 0.26s on
warm disk (and a *much* bigger improvement on cold disk).

Note that the -I is necessary for no good reason. We just haven't
optimized visitdir() for regular (non-include, non-exclude) patterns
yet.
2017-05-05 08:49:07 -07:00
Martin von Zweigbergk
a49529bbbf rebase: don't update state dict same way for each root
The update statement does not depend on anything in the loop, so just
move it before the loop and do it once. There are no cases where
update would happen 0 times before (and 1 now); the function returns
early in all such cases.
2017-03-11 12:25:56 -08:00
Martin von Zweigbergk
0a45a04c8a forget: access status fields by name, not index 2017-05-04 21:11:40 -07:00
Phil Cohen
0c884d7a6a demandimport: add urwid.command_map to ignore list
The useful pudb debugger can be used with Mercurial, but its import of urwid
fails when demandimport is enabled. Add urwid.command_map to the ignore list so
pudb can be used with hg without disabling all of demandimport.
2017-05-03 18:26:57 -07:00
Martin von Zweigbergk
612215ff94 outgoing: run on filtered repo
outgoing has been using an unfiltered repo since 07f64d64baf7 (discovery:
outgoing pass unfiltered repo to findcommonincoming (issue3776),
2013-01-28). If I'm reading code and history correctly, it should be
safe to run _outgoing() on a filtered repo since daf83ddd4afd
(discovery: run discovery on filtered repository, 2015-01-07). By
running _outgoing() on a filtered repo, we can also remove the
workaround there for ignoring filtered revisions.
2017-05-05 10:08:36 -07:00
Martin von Zweigbergk
62b3db1726 manifest: remove check for non-contexts in _dirmancache
It looks like the _dirmancache has contained only manifest contexts
since 4df3a0172646 (manifest: remove usages of manifest.read,
2016-11-10).
2017-05-05 14:10:58 -07:00