Commit Graph

508 Commits

Author SHA1 Message Date
Boris Feld
568bdc4e58 configitems: register the 'debug.dirstate.delaywrite' config 2017-06-30 03:37:05 +02:00
Simon Whitaker
a83744bf9c dirstate: implement __len__ on dirstatemap (issue5695)
Differential Revision: https://phab.mercurial-scm.org/D884
2017-10-01 16:46:02 +01:00
Durham Goode
5a20b12a39 dirstate: move parents source of truth to dirstatemap
As part of moving dirstate storage to its own class, let's move the source of
truth for the dirstate parents to dirstatemap. This requires that dirstate._pl
no longer be a cache, and that all sets go through dirstatemap.setparents.

Differential Revision: https://phab.mercurial-scm.org/D759
2017-09-26 03:56:20 -07:00
Durham Goode
2f94dc7bc7 dirstate: move parent reading to the dirstatemap class
As part of moving dirstate storage logic to a separate class, let's move the
function that reads the parents from the file. This will allow extensions to
write dirstate's that store the parents in other ways.

Differential Revision: https://phab.mercurial-scm.org/D758
2017-09-26 03:56:20 -07:00
Durham Goode
deed83f53e dirstate: move opendirstatefile to dirstatemap
As part of moving the dirstate storage logic to another class, let's move
opendirstatefile to dirstatemap. This will allow extensions to replace the
pending abstraction.

Future patches will move the consumers of _opendirstatefile into dirstatemap as
well.

Differential Revision: https://phab.mercurial-scm.org/D757
2017-09-26 03:56:20 -07:00
Durham Goode
a7494d8102 dirstate: move _copymap to dirstatemap
As part of moving all dirstate storage to a new class, let's move the copymap
onto that class. In a future patch this will let us move the read/write
functions to the dirstatemap class, and for extensions this lets us replace the
copy storage with alternative storage.

Differential Revision: https://phab.mercurial-scm.org/D756
2017-09-26 03:56:20 -07:00
Durham Goode
ed35bfa154 dirstate: move _dirs to dirstatemap
As part of moving the dirstate storage logic to a new class, lets move the _dirs
computation onto the class so extensions can replace it with a persisted index
of directories.

Differential Revision: https://phab.mercurial-scm.org/D755
2017-09-26 03:56:20 -07:00
Durham Goode
f30d4eb317 dirstate: move filefoldmap to dirstatemap
As part of moving the dirstate storage logic to a separate class, lets move the
filfoldmap computation to that class. This will allow extensions to replace the
dirstate storage with something that persists the filefoldmap.

Differential Revision: https://phab.mercurial-scm.org/D754
2017-09-26 03:56:20 -07:00
Durham Goode
7995f4da69 dirstate: move nonnormalentries to dirstatemap
As part of moving dirstate storage to its own class, let's move the
nonnormalentries logic onto the dirstatemap class. This will let extensions
replace the nonnormalentries logic with a persisted cache.

Differential Revision: https://phab.mercurial-scm.org/D753
2017-09-26 03:56:20 -07:00
Durham Goode
a009efe7c5 dirstate: create new dirstatemap class
This is part of a larger refactor to move the dirstate storage logic to a
separate class, so it's easier to rewrite the dirstate storage layer without
having to rewrite all the algorithms as well.

Step one it to create the class, and replace dirstate._map with it.  The
abstraction bleeds through in a few places where the main dirstate class has to
access self._map._map, but those will be cleaned up in future patches.

Differential Revision: https://phab.mercurial-scm.org/D752
2017-09-26 03:56:20 -07:00
Michael Bolin
454c5f00c6 dirstate: perform transactions with _map using single call, where possible
This is in the same style as https://phab.mercurial-scm.org/D493.

In general, this replaces patterns such as:

```
f in self._map:
    entry = self._map[f]
```

with:

```
entry = self._map.get(f):
if entry is not None:
    # use entry
```

Test Plan:
`make tests`

Differential Revision: https://phab.mercurial-scm.org/D663
2017-09-14 09:41:22 -07:00
Augie Fackler
e2774d9258 python3: wrap all uses of <exception>.strerror with strtolocal
Our string literals are bytes, and we mostly want to %-format a
strerror into a one of those literals, so this fixes a ton of issues.
2017-08-22 20:03:07 -04:00
Michael Bolin
3b574ea901 dirstate: perform transactions with _copymap using single call, where possible
This replaces patterns such as this:

```
if f in self._copymap:
    del self._copymap[f]
```

with this:

```
self._copymap.pop(f, None)
```

Although eliminating the extra lookup/call may be a negligible performance win
in the standard dirstate, alternative implementations, such as
[sqldirstate](https://bitbucket.org/facebook/hg-experimental/src/default/sqldirstate/)
may see a bigger win where each of these calls results in an RPC,
so the savings is greater.

Test Plan:
`make tests`

Differential Revision: https://phab.mercurial-scm.org/D493
2017-08-23 18:24:57 +00:00
Augie Fackler
ef945af30b merge with stable 2017-08-10 18:55:33 -04:00
Yuya Nishihara
ba69ca47d4 pathauditor: disable cache of audited paths by default (issue5628)
The initial attempt was to discard cache when appropriate, but it appears
to be error prone. We had to carefully inspect all places where audit() is
called e.g. without actually updating filesystem, before removing files and
directories, etc.

So, this patch disables the cache of audited paths by default, and enables
it only for the following cases:

 - short-lived auditor objects
 - repo.vfs, repo.svfs, and repo.cachevfs, which are managed directories
   and considered sort of append-only (a file/directory would never be
   replaced with a symlink)

There would be more cacheable vfs objects (e.g. mq.queue.opener), but I
decided not to inspect all of them in this patch. We can make them cached
later.

Benchmark result:

- using old clone of http://selenic.com/repo/linux-2.6/ (38319 files)
- on tmpfs
- run HGRCPATH=/dev/null hg up -q --time tip && hg up -q null
- try 4 times and take the last three results

original:
real 7.480 secs (user 1.140+22.760 sys 0.150+1.690)
real 8.010 secs (user 1.070+22.280 sys 0.170+2.120)
real 7.470 secs (user 1.120+22.390 sys 0.120+1.910)

clearcache (the other series):
real 7.680 secs (user 1.120+23.420 sys 0.140+1.970)
real 7.670 secs (user 1.110+23.620 sys 0.130+1.810)
real 7.740 secs (user 1.090+23.510 sys 0.160+1.940)

enable cache only for vfs and svfs (this series):
real 8.730 secs (user 1.500+25.190 sys 0.260+2.260)
real 8.750 secs (user 1.490+25.170 sys 0.250+2.340)
real 9.010 secs (user 1.680+25.340 sys 0.280+2.540)

remove cache function at all (for reference):
real 9.620 secs (user 1.440+27.120 sys 0.250+2.980)
real 9.420 secs (user 1.400+26.940 sys 0.320+3.130)
real 9.760 secs (user 1.530+27.270 sys 0.250+2.970)
2017-07-26 22:10:15 +09:00
Alex Gaynor
9b1b200906 dirstate: simplify dirstate's __iter__
Probably also a performance win, but not measurable in perfdirstate.

Differential Revision: https://phab.mercurial-scm.org/D269
2017-08-08 18:53:13 +00:00
Adam Simpkins
2a467faccd dirstate: update backup functions to take full backup filename
Update the dirstate functions so that the caller supplies the full backup
filename rather than just a prefix and suffix.

The localrepo code was already hard-coding the fact that the backup name must
be (exactly prefix + "dirstate" + suffix): it relied on this in _journalfiles()
and undofiles().  Making the caller responsible for specifying the full backup
name removes the need for the localrepo code to assume that dirstate._filename
is always "dirstate".

Differential Revision: https://phab.mercurial-scm.org/D68
2017-07-12 15:24:07 -07:00
Gregory Szorc
a7c49e2ec2 dirstate: expose a sparse matcher on dirstate (API)
The sparse extension performs a lot of monkeypatching of dirstate
to make it sparse aware. Essentially, various operations need to
take the active sparse config into account. They do this by obtaining
a matcher representing the sparse config and filtering paths through
it.

The monkeypatching is done by stuffing a reference to a repo on
dirstate and calling sparse.matcher() (which takes a repo instance)
during each function call. The reason this function takes a repo
instance is because resolving the sparse config may require resolving
file contents from filelogs, and that requires a repo. (If the
current sparse config references "profile" files, the contents of
those files from the dirstate's parent revisions is resolved.)

I seem to recall people having strong opinions that the dirstate
object not have a reference to a repo. So copying what the sparse
extension does probably won't fly in core. Plus, the dirstate
modifications shouldn't require a full repo: they only need a matcher.
So there's no good reason to stuff a reference to the repo in
dirstate.

This commit exposes a sparse matcher to dirstate via a property that
when looked up will call a function that eventually calls
sparse.matcher(). The repo instance is bound in a closure, so it
isn't exposed to dirstate.

This approach is functionally similar to what the sparse extension does
today, except it hides the repo instance from dirstate. The approach
is not optimal because we have to call a proxy function and
sparse.matcher() on every property lookup. There is room to cache
the matcher instance in dirstate. After all, the matcher only changes
if the dirstate's parents change or if the sparse config changes. It
feels like we should be able to detect both events and update the
matcher when this occurs. But for now we preserve the existing
semantics so we can move the dirstate sparseness bits into core. Once
in core, refactoring becomes a bit easier since it will be clearer how
all these components interact.

The sparse extension has been updated to use the new property.
Because all references to the repo on dirstate have been removed,
the code for setting it has been removed.
2017-07-08 16:18:04 -07:00
FUJIWARA Katsunori
38ee15a9e6 dirstate: centralize _cwd handling into _cwd method
Before this patch, immediate value is assigned to dirstate._cwd, if
ui.forcecwd is specified at instantiation of dirstate.

But this doesn't work as expected in some cases.

For example, hgweb set ui.forcecwd after instantiation of repo object.
If an extension touches repo.dirstate in its reposetup(), dirstate is
instantiated without setting ui.forcecwd, and dirstate.getcwd()
returns incorrect result.

In addition to it, hgweb.__init__() can take already instantiated repo
object, too. In this case, repo.dirstate might be already
instantiated, even if all enabled extensions don't so in their own
reposetup().

To avoid such issue, this patch centralizes _cwd handling into _cwd
method.

This issue can be reproduced by running test-hgweb-commands.t with
fsmonitor-run-tests.py.
2017-07-03 02:52:40 +09:00
Siddharth Agarwal
e0e3b8e4ce filestat: move __init__ to frompath constructor
We're going to add a `fromfp` constructor soon, and this also allows a filestat
object for a non-existent file to be created.
2017-06-10 14:09:54 -07:00
FUJIWARA Katsunori
a3a205cf6c dirstate: add identity information to detect simultaneous changing in storage
This identity is used to examine whether dirstate is simultaneously
changed in storage after previous caching (see issue5584 for detail).

util.cachestat can't be used for this purpose, because it has no
valuable information on Windows.

On the other hand, util.filestat can detect changing dirstate in
storage certainly, regardless of platforms.

    https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan

Strictly speaking, if underlying filesystem doesn't support
ctime/mtime, util.filestat can't detect simultaneous changing in
storage as expected. But simultaneous changing on such (very rare)
platform can't be detected regardless of this patch series.

Therefore, util.filestat should be reasonable identity for almost all
usecases.
2017-06-09 13:07:48 +09:00
Siddharth Agarwal
b1a08c7ef4 dirstate: add docstring for invalidate
This always confuses me, and we already have a docstring on
localrepo.invalidatedirstate.
2017-06-04 16:08:50 -07:00
Siddharth Agarwal
9202c22f7c match: introduce nevermatcher for when no ignore files are present
c01965ab5195 introduced a deterministic `__repr__` for ignores. However, it
didn't account for when ignore was `util.never`. This broke fsmonitor's ignore
change detection -- with an empty hgignore, it would kick in all the time.

Introduce `nevermatcher` and switch to it. This neatly parallels
`alwaysmatcher`.
2017-06-01 00:40:52 -07:00
Augie Fackler
47e0320f8b cleanup: rename all iteritems methods to items and add iteritems alias
Due to a quirk of our module importer setup on Python 3, all calls and
definitions of methods named iteritems() get rewritten at import
time. Unfortunately, this means there's not a good portable way to
access these methods from non-module-loader'ed code like our unit
tests. This change fixes that, which also unblocks test-manifest.py
from passing under Python 3.

We don't presently define any itervalues methods, or we'd need to give
those similar treatment.
2017-05-29 00:00:02 -04:00
Yuya Nishihara
4563e16232 parsers: switch to policy importer
# no-check-commit
2016-08-13 12:23:56 +09:00
Augie Fackler
8197ec1496 dirstate: mark {begin,end}parentchange as deprecated (API) 2017-05-18 17:13:32 -04:00
Augie Fackler
dae08b1747 dirstate: introduce new context manager for marking dirstate parent changes 2017-05-18 17:10:30 -04:00
Yuya Nishihara
cbe21a1cc9 osutil: proxy through util (and platform) modules (API)
See the previous commit for why. Marked as API change since osutil.listdir()
seems widely used in third-party extensions.

The win32mbcs extension is updated to wrap both util. and windows. aliases.
2017-04-26 22:26:28 +09:00
Martin von Zweigbergk
da238dd9f0 dirstate: optimize walk() by using match.visitdir()
We already have the logic for restricting directory walks in
match.visitdir() that we use for treemanifests. We should take
advantage of it when walking the working copy as well.

This speeds up "hg st -I rootfilesin:." on the Firefox repo from
0.587s to 0.305s on warm disk (and much more on cold disk). More time
is spent reading the dirstate than walking the working copy after.

I tried to find scenarios where calling match.visitdir() would be a
noticeable overhead, but I couldn't find any. I encourage the reader
to try for themselves, since this is performance-critical code.
2017-05-05 08:49:46 -07:00
Ryan McElroy
b1c2e24234 dirstate: use tryunlink 2017-03-21 06:50:28 -07:00
Augie Fackler
4ee50ed840 dirstate: use future-proof next(iter) instead of iter.next
The latter has been removed in Python 3.
2017-03-19 01:08:17 -04:00
Pulkit Goyal
f9b7821158 dirstate: use list comprehension to get a list of keys
We have used dict.keys() which returns a dict_keys() object instead
of list on Python 3. So this patch replaces that with list comprehension
which works both on Python 2 and 3.
2017-03-16 09:00:27 +05:30
Durham Goode
832a191f75 dirstate: track otherparent files same as nonnormal
Calling dirstate.setparents() is expensive in a large repo because it iterates
over every file in the dirstate. It does so to undo any merge state or
otherparent state files. Merge state files are already covered by
dirstate._nonnormalset, so we just need to track otherparent files in a similar
manner to avoid the full iteration here.

Fixing this shaves 20-25% off histedit in large repos.

I tested this by adding temporary debug logic to verify that the old files
processed in the loop matched the new files processed in the loop and running
the test suite.
2017-03-08 17:35:20 -08:00
Jun Wu
3c38af4dff dirstate: avoid unnecessary load+dump during backup
Previously, dirstate.savebackup unconditionally dumps the dirstate map to
disk. It may require loading dirstate first to be able to dump it. Those
operations could be expensive if the dirstate is big, and could be avoided
if we know the dirstate file is up-to-date.

This patch avoids the read and write if the dirstate is clean. In that case,
we just do a plain copy without any serialization.

This should make commands which use transactions but do not touch dirstate
faster. For example, "hg bookmark -r REV NAME".
2017-03-01 18:21:06 -08:00
Jun Wu
0bbf2239d3 dirstate: try to use hardlink to backup dirstate
This should be more efficient once util.copyfile has real hardlink support.
2017-03-01 17:59:21 -08:00
Durham Goode
c93b79c0e6 dirstate: track updated files to improve write time
Previously, dirstate.write() would iterate over the entire dirstate to find any
entries that needed to be marked 'lookup' (i.e. if they have the same timestamp
as now). This was O(working copy) and slow in large repos. It was most visible
when rebasing or histediting multiple commits, since it gets executed once per
commit, even if the entire rebase/histedit is wrapped in a transaction.

The fix is to track which files have been editted, and only check those to see
if they need to be marked as 'lookup'. This saves 25% on histedit times in very
large repositories.

I tested this by adding temporary debug logic to verify that the old files
processed in the loop matched the new files processed in the loop and running
the test suite.
2017-03-05 16:20:07 -08:00
FUJIWARA Katsunori
c63bad51f6 txnutil: factor out the logic to read file in according to HG_PENDING
This patch adds new file txnutil.py, because:

  - transaction.py is too large to import small utility logic
  - scmutil.py or so causes cyclic importing in phases.py

mayhavepending() is defined separately for convenience in subsequent
patch.
2017-02-21 01:20:59 +09:00
Pulkit Goyal
f1c1938039 py3: replace os.environ with encoding.environ (part 1 of 5)
os.environ is a dictionary which has string elements on Python 3. We have
encoding.environ which take care of all these things. This is the first patch
of 5 patch series which tend to replace the occurences of os.environ with
encoding.environ as using os.environ will result in unusual behaviour.
2016-12-18 01:34:41 +05:30
Pulkit Goyal
a7d1fd0177 py3: replace os.sep with pycompat.ossep (part 2 of 4)
This part also replaces some chunks of os.sep with pycompat.ossep.
2016-12-17 20:02:50 +05:30
Pulkit Goyal
97f340e354 py3: use pycompat.getcwd() instead of os.getcwd()
We have pycompat.getcwd() which returns bytes path on Python 3. This patch
changes most of the occurences of the os.getcwd() with pycompat one.
2016-11-23 00:03:11 +05:30
Pulkit Goyal
af51d6f213 py3: use pycompat.ossep at certain places
Certain instances of os.sep has been converted to pycompat.ossep where it was
sure to use bytes only. There are more such instances which needs some more
attention and will get surely.
2016-11-06 04:10:33 +05:30
Mads Kiilerich
a2ea53b6ed dirstate: fix debug.dirstate.delaywrite to use the new "now" after sleeping
It seems like the a regression has sneaked into debug.dirstate.delaywrite in
14bddc099338. It would sleep until no files were modified "now" any more, but
when writing the dirstate it would use the old "now" and still mark files as
'unset' instead of recording the timestamp that would make the file show up as
clean instead of unknown.

Instead of getting a new "now" from the file system, we trust the computed end
time as the new "now" and thus cause the actual modification time to be
writiten to the dirstate.

debug.dirstate.delaywrite is undocumented and only used in
test-largefiles-update.t . All tests seems to work fine for me without
debug.dirstate.delaywrite . Perhaps because it not really worked as intended
without the fix in this patch, and code and tests thus have evolved to do fine
without it? It could thus perhaps make sense to drop usage of this setting in
the tests. That could speed the test up a bit.

This functionality (or something very similar) can however apparently be very
convenient in setups where checking dirty-ness is expensive - such as when
using large files and have slow file filesystems or are CPU constrained. Now it
works and we can try it. (But ideally, for the largefile use case, it should
probably only delay lfdirstate writes - not ordinary dirstate.)
2016-10-18 16:52:35 +02:00
Mateusz Kwapich
016e1e5ef8 dirstate: rebuild should update dirstate properly
Updating dirstate by simply adding and dropping files from self._map doesn't
keep the other maps updated (think: _dirs, _copymap, _foldmap, _nonormalset)
thus introducing cache inconsistency.

This is also affecting the debugstate tests since now we don't even try to set
correct mode and mtime for the files because they are marked dirty anyway and
will be checked during next status call.
2016-08-30 15:16:28 -07:00
Martin von Zweigbergk
2fbf01764c util: rename checkcase() to fscasesensitive() (API)
I always read the name "checkcase(path)" as "do we need to check for
case folding at this path", but it's actually (I think) meant to be
read "check if the file system cares about case at this path". I'm
clearly not the only one confused by this as the dirstate has this
property:

  def _checkcase(self):
      return not util.checkcase(self._join('.hg'))

Maybe we should even inverse the function and call it fscasefolding()
since that's what all callers care about?
2016-08-30 09:22:53 -07:00
Mateusz Kwapich
de44e608e1 dirstate: add callback to notify extensions about wd parent change
The journal extension had to touch the dirstate internals to be notified about
wd parent change. To make that detection cleaner and reusable let's move it core.
Now the extension can register to be notified about parent changes.
2016-08-11 08:00:41 -07:00
Pierre-Yves David
9652a8ad03 deprecation: enforce thew 'tr' argument of 'dirstate.write' (API)
Compatibility was meant to be drop after 3.9 is released.
2016-08-02 16:51:27 +02:00
FUJIWARA Katsunori
19907e427c dirstate: make restoring from backup avoid ambiguity of file stat
File .hg/dirstate is restored by renaming from backup in failure
inside scopes below. If renaming keeps ctime, mtime and size of a
file, restoring is overlooked, and old contents cached before
restoring isn't invalidated as expected.

  - dirstateguard scope (from '.hg/dirstate.SUFFIX')
  - transaction scope (from '.hg/journal.dirstate')

To avoid ambiguity of file stat at restoring, this patch invokes
vfs.rename() with checkambig=True.

This patch is a part of "Exact Cache Validation Plan":

    https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
2016-06-13 05:11:56 +09:00
FUJIWARA Katsunori
f90241706d dirstate: make writing branch file out avoid ambiguity of file stat
Cached attribute dirstate._branch uses stat of '.hg/branch' file to
examine validity of cached contents. If writing '.hg/branch' file out
keeps ctime, mtime and size of it, change is overlooked, and old
contents cached before change isn't invalidated as expected.

To avoid ambiguity of file stat, this patch writes '.hg/branch' file
out with checkambig=True.

This patch is a part of "Exact Cache Validation Plan":

    https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
2016-06-03 00:44:20 +09:00
FUJIWARA Katsunori
00e016ed43 dirstate: make writing dirstate file out avoid ambiguity of file stat
Cached attribute repo.dirstate uses stat of '.hg/dirstate' file to
examine validity of cached contents. If writing '.hg/dirstate' file
out keeps ctime, mtime and size of it, change is overlooked, and old
contents cached before change isn't invalidated as expected.

To avoid ambiguity of file stat, this patch writes '.hg/dirstate' file
out with checkambig=True.

The former diff hunk changes the code path for "dirstate.write()", and
the latter changes the code path for "dirstate.savebackup()".

This patch is a part of "Exact Cache Validation Plan":

    https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
2016-06-03 00:44:20 +09:00
Mateusz Kwapich
6a0946ae30 distate: add assertions to backup functions
Those assertions will prevent the backup functions from overwriting
the dirstate file in case both: suffix and prefix are empty.

(foozy suggested making that change and I agree with him)
2016-05-26 17:36:44 -07:00