Commit Graph

517 Commits

Author SHA1 Message Date
Durham Goode
c83eaaa0f1 dirstate: avoid reading the map when possible (issue5713) (issue5717)
Before the recent refactor, we would not load the entire map until it was
accessed. As part of the refactor, that got lost and even just trying to load
the dirstate parents would load the whole map. This caused a perf regression
(issue5713) and a regression with static http serving (issue5717).

Making it lazy loaded again fixes both.

Differential Revision: https://phab.mercurial-scm.org/D1253
2017-10-26 16:15:36 -07:00
Durham Goode
6a075fddcc dirstate: move clear onto dirstatemap class
A future diff will move the lazy loading aspect of dirstate to the dirstatemap
class. This means it requires a slightly different strategy of clearing than
just reinstantiating the object (since just reinstantiating the object will
lazily load the on disk data again later instead of remaining permanently
empty).

So let's give it it's own clear function.

Differential Revision: https://phab.mercurial-scm.org/D1252
2017-10-26 16:15:31 -07:00
Durham Goode
1d4b04170c dirstate: move the _dirfoldmap to dirstatemap
Now that dirstatemap is the source of truth for the list of directories, let's
move _dirfoldmap on to it.

This pattern of moving cached variables onto the dirstate map makes it easier to
invalidate them, as seen by how the cache invalidation functions are slowly
shrinking to just be recreating the dirstatemap instance.

Differential Revision: https://phab.mercurial-scm.org/D983
2017-10-05 11:34:41 -07:00
Durham Goode
e37fd6649b dirstate: remove _dirs property cache
Now that dirs is source of truthed on the dirstatemap, let's get rid of the
_dirs propertycache on the dirstate.

Differential Revision: https://phab.mercurial-scm.org/D982
2017-10-05 11:34:41 -07:00
Durham Goode
1b9b3caa47 dirstate: remove _filefoldmap property cache
Now that the filefoldmap is source of truthed on the dirstatemap, let's get rid
of the property cache on the dirstate.

Differential Revision: https://phab.mercurial-scm.org/D981
2017-10-05 11:34:41 -07:00
Durham Goode
28f48a6846 dirstate: move identity to dirstatemap
Moving the identity function to the dirstatemap class will allow alternative
dirstate implementations to replace the implementation.

Differential Revision: https://phab.mercurial-scm.org/D980
2017-10-05 11:34:41 -07:00
Durham Goode
f97f617524 dirstate: move nonnormal and otherparent sets to dirstatemap
As part of separating dirstate business logic from storage, let's move the
nonnormal and otherparent storage to the dirstatemap class. This will allow
alternative dirstate storage to persist these sets instead of recomputing them.

Differential Revision: https://phab.mercurial-scm.org/D979
2017-10-05 11:34:41 -07:00
Durham Goode
99779c0bbb dirstate: move write into dirstatemap
As part of separating the dirstate business logic from the dirstate storage
logic, let's move the serialization code down into dirstatemap.

Differential Revision: https://phab.mercurial-scm.org/D978
2017-10-05 11:34:41 -07:00
Durham Goode
76bcc9b3be dirstate: move _read into dirstatemap
As part of separating the dirstate business logic from the storage, let's move
the read code into the new dirstatemap class.

Differential Revision: https://phab.mercurial-scm.org/D977
2017-10-05 11:34:41 -07:00
Boris Feld
568bdc4e58 configitems: register the 'debug.dirstate.delaywrite' config 2017-06-30 03:37:05 +02:00
Simon Whitaker
a83744bf9c dirstate: implement __len__ on dirstatemap (issue5695)
Differential Revision: https://phab.mercurial-scm.org/D884
2017-10-01 16:46:02 +01:00
Durham Goode
5a20b12a39 dirstate: move parents source of truth to dirstatemap
As part of moving dirstate storage to its own class, let's move the source of
truth for the dirstate parents to dirstatemap. This requires that dirstate._pl
no longer be a cache, and that all sets go through dirstatemap.setparents.

Differential Revision: https://phab.mercurial-scm.org/D759
2017-09-26 03:56:20 -07:00
Durham Goode
2f94dc7bc7 dirstate: move parent reading to the dirstatemap class
As part of moving dirstate storage logic to a separate class, let's move the
function that reads the parents from the file. This will allow extensions to
write dirstate's that store the parents in other ways.

Differential Revision: https://phab.mercurial-scm.org/D758
2017-09-26 03:56:20 -07:00
Durham Goode
deed83f53e dirstate: move opendirstatefile to dirstatemap
As part of moving the dirstate storage logic to another class, let's move
opendirstatefile to dirstatemap. This will allow extensions to replace the
pending abstraction.

Future patches will move the consumers of _opendirstatefile into dirstatemap as
well.

Differential Revision: https://phab.mercurial-scm.org/D757
2017-09-26 03:56:20 -07:00
Durham Goode
a7494d8102 dirstate: move _copymap to dirstatemap
As part of moving all dirstate storage to a new class, let's move the copymap
onto that class. In a future patch this will let us move the read/write
functions to the dirstatemap class, and for extensions this lets us replace the
copy storage with alternative storage.

Differential Revision: https://phab.mercurial-scm.org/D756
2017-09-26 03:56:20 -07:00
Durham Goode
ed35bfa154 dirstate: move _dirs to dirstatemap
As part of moving the dirstate storage logic to a new class, lets move the _dirs
computation onto the class so extensions can replace it with a persisted index
of directories.

Differential Revision: https://phab.mercurial-scm.org/D755
2017-09-26 03:56:20 -07:00
Durham Goode
f30d4eb317 dirstate: move filefoldmap to dirstatemap
As part of moving the dirstate storage logic to a separate class, lets move the
filfoldmap computation to that class. This will allow extensions to replace the
dirstate storage with something that persists the filefoldmap.

Differential Revision: https://phab.mercurial-scm.org/D754
2017-09-26 03:56:20 -07:00
Durham Goode
7995f4da69 dirstate: move nonnormalentries to dirstatemap
As part of moving dirstate storage to its own class, let's move the
nonnormalentries logic onto the dirstatemap class. This will let extensions
replace the nonnormalentries logic with a persisted cache.

Differential Revision: https://phab.mercurial-scm.org/D753
2017-09-26 03:56:20 -07:00
Durham Goode
a009efe7c5 dirstate: create new dirstatemap class
This is part of a larger refactor to move the dirstate storage logic to a
separate class, so it's easier to rewrite the dirstate storage layer without
having to rewrite all the algorithms as well.

Step one it to create the class, and replace dirstate._map with it.  The
abstraction bleeds through in a few places where the main dirstate class has to
access self._map._map, but those will be cleaned up in future patches.

Differential Revision: https://phab.mercurial-scm.org/D752
2017-09-26 03:56:20 -07:00
Michael Bolin
454c5f00c6 dirstate: perform transactions with _map using single call, where possible
This is in the same style as https://phab.mercurial-scm.org/D493.

In general, this replaces patterns such as:

```
f in self._map:
    entry = self._map[f]
```

with:

```
entry = self._map.get(f):
if entry is not None:
    # use entry
```

Test Plan:
`make tests`

Differential Revision: https://phab.mercurial-scm.org/D663
2017-09-14 09:41:22 -07:00
Augie Fackler
e2774d9258 python3: wrap all uses of <exception>.strerror with strtolocal
Our string literals are bytes, and we mostly want to %-format a
strerror into a one of those literals, so this fixes a ton of issues.
2017-08-22 20:03:07 -04:00
Michael Bolin
3b574ea901 dirstate: perform transactions with _copymap using single call, where possible
This replaces patterns such as this:

```
if f in self._copymap:
    del self._copymap[f]
```

with this:

```
self._copymap.pop(f, None)
```

Although eliminating the extra lookup/call may be a negligible performance win
in the standard dirstate, alternative implementations, such as
[sqldirstate](https://bitbucket.org/facebook/hg-experimental/src/default/sqldirstate/)
may see a bigger win where each of these calls results in an RPC,
so the savings is greater.

Test Plan:
`make tests`

Differential Revision: https://phab.mercurial-scm.org/D493
2017-08-23 18:24:57 +00:00
Augie Fackler
ef945af30b merge with stable 2017-08-10 18:55:33 -04:00
Yuya Nishihara
ba69ca47d4 pathauditor: disable cache of audited paths by default (issue5628)
The initial attempt was to discard cache when appropriate, but it appears
to be error prone. We had to carefully inspect all places where audit() is
called e.g. without actually updating filesystem, before removing files and
directories, etc.

So, this patch disables the cache of audited paths by default, and enables
it only for the following cases:

 - short-lived auditor objects
 - repo.vfs, repo.svfs, and repo.cachevfs, which are managed directories
   and considered sort of append-only (a file/directory would never be
   replaced with a symlink)

There would be more cacheable vfs objects (e.g. mq.queue.opener), but I
decided not to inspect all of them in this patch. We can make them cached
later.

Benchmark result:

- using old clone of http://selenic.com/repo/linux-2.6/ (38319 files)
- on tmpfs
- run HGRCPATH=/dev/null hg up -q --time tip && hg up -q null
- try 4 times and take the last three results

original:
real 7.480 secs (user 1.140+22.760 sys 0.150+1.690)
real 8.010 secs (user 1.070+22.280 sys 0.170+2.120)
real 7.470 secs (user 1.120+22.390 sys 0.120+1.910)

clearcache (the other series):
real 7.680 secs (user 1.120+23.420 sys 0.140+1.970)
real 7.670 secs (user 1.110+23.620 sys 0.130+1.810)
real 7.740 secs (user 1.090+23.510 sys 0.160+1.940)

enable cache only for vfs and svfs (this series):
real 8.730 secs (user 1.500+25.190 sys 0.260+2.260)
real 8.750 secs (user 1.490+25.170 sys 0.250+2.340)
real 9.010 secs (user 1.680+25.340 sys 0.280+2.540)

remove cache function at all (for reference):
real 9.620 secs (user 1.440+27.120 sys 0.250+2.980)
real 9.420 secs (user 1.400+26.940 sys 0.320+3.130)
real 9.760 secs (user 1.530+27.270 sys 0.250+2.970)
2017-07-26 22:10:15 +09:00
Alex Gaynor
9b1b200906 dirstate: simplify dirstate's __iter__
Probably also a performance win, but not measurable in perfdirstate.

Differential Revision: https://phab.mercurial-scm.org/D269
2017-08-08 18:53:13 +00:00
Adam Simpkins
2a467faccd dirstate: update backup functions to take full backup filename
Update the dirstate functions so that the caller supplies the full backup
filename rather than just a prefix and suffix.

The localrepo code was already hard-coding the fact that the backup name must
be (exactly prefix + "dirstate" + suffix): it relied on this in _journalfiles()
and undofiles().  Making the caller responsible for specifying the full backup
name removes the need for the localrepo code to assume that dirstate._filename
is always "dirstate".

Differential Revision: https://phab.mercurial-scm.org/D68
2017-07-12 15:24:07 -07:00
Gregory Szorc
a7c49e2ec2 dirstate: expose a sparse matcher on dirstate (API)
The sparse extension performs a lot of monkeypatching of dirstate
to make it sparse aware. Essentially, various operations need to
take the active sparse config into account. They do this by obtaining
a matcher representing the sparse config and filtering paths through
it.

The monkeypatching is done by stuffing a reference to a repo on
dirstate and calling sparse.matcher() (which takes a repo instance)
during each function call. The reason this function takes a repo
instance is because resolving the sparse config may require resolving
file contents from filelogs, and that requires a repo. (If the
current sparse config references "profile" files, the contents of
those files from the dirstate's parent revisions is resolved.)

I seem to recall people having strong opinions that the dirstate
object not have a reference to a repo. So copying what the sparse
extension does probably won't fly in core. Plus, the dirstate
modifications shouldn't require a full repo: they only need a matcher.
So there's no good reason to stuff a reference to the repo in
dirstate.

This commit exposes a sparse matcher to dirstate via a property that
when looked up will call a function that eventually calls
sparse.matcher(). The repo instance is bound in a closure, so it
isn't exposed to dirstate.

This approach is functionally similar to what the sparse extension does
today, except it hides the repo instance from dirstate. The approach
is not optimal because we have to call a proxy function and
sparse.matcher() on every property lookup. There is room to cache
the matcher instance in dirstate. After all, the matcher only changes
if the dirstate's parents change or if the sparse config changes. It
feels like we should be able to detect both events and update the
matcher when this occurs. But for now we preserve the existing
semantics so we can move the dirstate sparseness bits into core. Once
in core, refactoring becomes a bit easier since it will be clearer how
all these components interact.

The sparse extension has been updated to use the new property.
Because all references to the repo on dirstate have been removed,
the code for setting it has been removed.
2017-07-08 16:18:04 -07:00
FUJIWARA Katsunori
38ee15a9e6 dirstate: centralize _cwd handling into _cwd method
Before this patch, immediate value is assigned to dirstate._cwd, if
ui.forcecwd is specified at instantiation of dirstate.

But this doesn't work as expected in some cases.

For example, hgweb set ui.forcecwd after instantiation of repo object.
If an extension touches repo.dirstate in its reposetup(), dirstate is
instantiated without setting ui.forcecwd, and dirstate.getcwd()
returns incorrect result.

In addition to it, hgweb.__init__() can take already instantiated repo
object, too. In this case, repo.dirstate might be already
instantiated, even if all enabled extensions don't so in their own
reposetup().

To avoid such issue, this patch centralizes _cwd handling into _cwd
method.

This issue can be reproduced by running test-hgweb-commands.t with
fsmonitor-run-tests.py.
2017-07-03 02:52:40 +09:00
Siddharth Agarwal
e0e3b8e4ce filestat: move __init__ to frompath constructor
We're going to add a `fromfp` constructor soon, and this also allows a filestat
object for a non-existent file to be created.
2017-06-10 14:09:54 -07:00
FUJIWARA Katsunori
a3a205cf6c dirstate: add identity information to detect simultaneous changing in storage
This identity is used to examine whether dirstate is simultaneously
changed in storage after previous caching (see issue5584 for detail).

util.cachestat can't be used for this purpose, because it has no
valuable information on Windows.

On the other hand, util.filestat can detect changing dirstate in
storage certainly, regardless of platforms.

    https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan

Strictly speaking, if underlying filesystem doesn't support
ctime/mtime, util.filestat can't detect simultaneous changing in
storage as expected. But simultaneous changing on such (very rare)
platform can't be detected regardless of this patch series.

Therefore, util.filestat should be reasonable identity for almost all
usecases.
2017-06-09 13:07:48 +09:00
Siddharth Agarwal
b1a08c7ef4 dirstate: add docstring for invalidate
This always confuses me, and we already have a docstring on
localrepo.invalidatedirstate.
2017-06-04 16:08:50 -07:00
Siddharth Agarwal
9202c22f7c match: introduce nevermatcher for when no ignore files are present
c01965ab5195 introduced a deterministic `__repr__` for ignores. However, it
didn't account for when ignore was `util.never`. This broke fsmonitor's ignore
change detection -- with an empty hgignore, it would kick in all the time.

Introduce `nevermatcher` and switch to it. This neatly parallels
`alwaysmatcher`.
2017-06-01 00:40:52 -07:00
Augie Fackler
47e0320f8b cleanup: rename all iteritems methods to items and add iteritems alias
Due to a quirk of our module importer setup on Python 3, all calls and
definitions of methods named iteritems() get rewritten at import
time. Unfortunately, this means there's not a good portable way to
access these methods from non-module-loader'ed code like our unit
tests. This change fixes that, which also unblocks test-manifest.py
from passing under Python 3.

We don't presently define any itervalues methods, or we'd need to give
those similar treatment.
2017-05-29 00:00:02 -04:00
Yuya Nishihara
4563e16232 parsers: switch to policy importer
# no-check-commit
2016-08-13 12:23:56 +09:00
Augie Fackler
8197ec1496 dirstate: mark {begin,end}parentchange as deprecated (API) 2017-05-18 17:13:32 -04:00
Augie Fackler
dae08b1747 dirstate: introduce new context manager for marking dirstate parent changes 2017-05-18 17:10:30 -04:00
Yuya Nishihara
cbe21a1cc9 osutil: proxy through util (and platform) modules (API)
See the previous commit for why. Marked as API change since osutil.listdir()
seems widely used in third-party extensions.

The win32mbcs extension is updated to wrap both util. and windows. aliases.
2017-04-26 22:26:28 +09:00
Martin von Zweigbergk
da238dd9f0 dirstate: optimize walk() by using match.visitdir()
We already have the logic for restricting directory walks in
match.visitdir() that we use for treemanifests. We should take
advantage of it when walking the working copy as well.

This speeds up "hg st -I rootfilesin:." on the Firefox repo from
0.587s to 0.305s on warm disk (and much more on cold disk). More time
is spent reading the dirstate than walking the working copy after.

I tried to find scenarios where calling match.visitdir() would be a
noticeable overhead, but I couldn't find any. I encourage the reader
to try for themselves, since this is performance-critical code.
2017-05-05 08:49:46 -07:00
Ryan McElroy
b1c2e24234 dirstate: use tryunlink 2017-03-21 06:50:28 -07:00
Augie Fackler
4ee50ed840 dirstate: use future-proof next(iter) instead of iter.next
The latter has been removed in Python 3.
2017-03-19 01:08:17 -04:00
Pulkit Goyal
f9b7821158 dirstate: use list comprehension to get a list of keys
We have used dict.keys() which returns a dict_keys() object instead
of list on Python 3. So this patch replaces that with list comprehension
which works both on Python 2 and 3.
2017-03-16 09:00:27 +05:30
Durham Goode
832a191f75 dirstate: track otherparent files same as nonnormal
Calling dirstate.setparents() is expensive in a large repo because it iterates
over every file in the dirstate. It does so to undo any merge state or
otherparent state files. Merge state files are already covered by
dirstate._nonnormalset, so we just need to track otherparent files in a similar
manner to avoid the full iteration here.

Fixing this shaves 20-25% off histedit in large repos.

I tested this by adding temporary debug logic to verify that the old files
processed in the loop matched the new files processed in the loop and running
the test suite.
2017-03-08 17:35:20 -08:00
Jun Wu
3c38af4dff dirstate: avoid unnecessary load+dump during backup
Previously, dirstate.savebackup unconditionally dumps the dirstate map to
disk. It may require loading dirstate first to be able to dump it. Those
operations could be expensive if the dirstate is big, and could be avoided
if we know the dirstate file is up-to-date.

This patch avoids the read and write if the dirstate is clean. In that case,
we just do a plain copy without any serialization.

This should make commands which use transactions but do not touch dirstate
faster. For example, "hg bookmark -r REV NAME".
2017-03-01 18:21:06 -08:00
Jun Wu
0bbf2239d3 dirstate: try to use hardlink to backup dirstate
This should be more efficient once util.copyfile has real hardlink support.
2017-03-01 17:59:21 -08:00
Durham Goode
c93b79c0e6 dirstate: track updated files to improve write time
Previously, dirstate.write() would iterate over the entire dirstate to find any
entries that needed to be marked 'lookup' (i.e. if they have the same timestamp
as now). This was O(working copy) and slow in large repos. It was most visible
when rebasing or histediting multiple commits, since it gets executed once per
commit, even if the entire rebase/histedit is wrapped in a transaction.

The fix is to track which files have been editted, and only check those to see
if they need to be marked as 'lookup'. This saves 25% on histedit times in very
large repositories.

I tested this by adding temporary debug logic to verify that the old files
processed in the loop matched the new files processed in the loop and running
the test suite.
2017-03-05 16:20:07 -08:00
FUJIWARA Katsunori
c63bad51f6 txnutil: factor out the logic to read file in according to HG_PENDING
This patch adds new file txnutil.py, because:

  - transaction.py is too large to import small utility logic
  - scmutil.py or so causes cyclic importing in phases.py

mayhavepending() is defined separately for convenience in subsequent
patch.
2017-02-21 01:20:59 +09:00
Pulkit Goyal
f1c1938039 py3: replace os.environ with encoding.environ (part 1 of 5)
os.environ is a dictionary which has string elements on Python 3. We have
encoding.environ which take care of all these things. This is the first patch
of 5 patch series which tend to replace the occurences of os.environ with
encoding.environ as using os.environ will result in unusual behaviour.
2016-12-18 01:34:41 +05:30
Pulkit Goyal
a7d1fd0177 py3: replace os.sep with pycompat.ossep (part 2 of 4)
This part also replaces some chunks of os.sep with pycompat.ossep.
2016-12-17 20:02:50 +05:30
Pulkit Goyal
97f340e354 py3: use pycompat.getcwd() instead of os.getcwd()
We have pycompat.getcwd() which returns bytes path on Python 3. This patch
changes most of the occurences of the os.getcwd() with pycompat one.
2016-11-23 00:03:11 +05:30
Pulkit Goyal
af51d6f213 py3: use pycompat.ossep at certain places
Certain instances of os.sep has been converted to pycompat.ossep where it was
sure to use bytes only. There are more such instances which needs some more
attention and will get surely.
2016-11-06 04:10:33 +05:30