We have used dict.keys() which returns a dict_keys() object instead
of list on Python 3. So this patch replaces that with list comprehension
which works both on Python 2 and 3.
Calling dirstate.setparents() is expensive in a large repo because it iterates
over every file in the dirstate. It does so to undo any merge state or
otherparent state files. Merge state files are already covered by
dirstate._nonnormalset, so we just need to track otherparent files in a similar
manner to avoid the full iteration here.
Fixing this shaves 20-25% off histedit in large repos.
I tested this by adding temporary debug logic to verify that the old files
processed in the loop matched the new files processed in the loop and running
the test suite.
Previously, dirstate.savebackup unconditionally dumps the dirstate map to
disk. It may require loading dirstate first to be able to dump it. Those
operations could be expensive if the dirstate is big, and could be avoided
if we know the dirstate file is up-to-date.
This patch avoids the read and write if the dirstate is clean. In that case,
we just do a plain copy without any serialization.
This should make commands which use transactions but do not touch dirstate
faster. For example, "hg bookmark -r REV NAME".
Previously, dirstate.write() would iterate over the entire dirstate to find any
entries that needed to be marked 'lookup' (i.e. if they have the same timestamp
as now). This was O(working copy) and slow in large repos. It was most visible
when rebasing or histediting multiple commits, since it gets executed once per
commit, even if the entire rebase/histedit is wrapped in a transaction.
The fix is to track which files have been editted, and only check those to see
if they need to be marked as 'lookup'. This saves 25% on histedit times in very
large repositories.
I tested this by adding temporary debug logic to verify that the old files
processed in the loop matched the new files processed in the loop and running
the test suite.
This patch adds new file txnutil.py, because:
- transaction.py is too large to import small utility logic
- scmutil.py or so causes cyclic importing in phases.py
mayhavepending() is defined separately for convenience in subsequent
patch.
os.environ is a dictionary which has string elements on Python 3. We have
encoding.environ which take care of all these things. This is the first patch
of 5 patch series which tend to replace the occurences of os.environ with
encoding.environ as using os.environ will result in unusual behaviour.
Certain instances of os.sep has been converted to pycompat.ossep where it was
sure to use bytes only. There are more such instances which needs some more
attention and will get surely.
It seems like the a regression has sneaked into debug.dirstate.delaywrite in
14bddc099338. It would sleep until no files were modified "now" any more, but
when writing the dirstate it would use the old "now" and still mark files as
'unset' instead of recording the timestamp that would make the file show up as
clean instead of unknown.
Instead of getting a new "now" from the file system, we trust the computed end
time as the new "now" and thus cause the actual modification time to be
writiten to the dirstate.
debug.dirstate.delaywrite is undocumented and only used in
test-largefiles-update.t . All tests seems to work fine for me without
debug.dirstate.delaywrite . Perhaps because it not really worked as intended
without the fix in this patch, and code and tests thus have evolved to do fine
without it? It could thus perhaps make sense to drop usage of this setting in
the tests. That could speed the test up a bit.
This functionality (or something very similar) can however apparently be very
convenient in setups where checking dirty-ness is expensive - such as when
using large files and have slow file filesystems or are CPU constrained. Now it
works and we can try it. (But ideally, for the largefile use case, it should
probably only delay lfdirstate writes - not ordinary dirstate.)
Updating dirstate by simply adding and dropping files from self._map doesn't
keep the other maps updated (think: _dirs, _copymap, _foldmap, _nonormalset)
thus introducing cache inconsistency.
This is also affecting the debugstate tests since now we don't even try to set
correct mode and mtime for the files because they are marked dirty anyway and
will be checked during next status call.
I always read the name "checkcase(path)" as "do we need to check for
case folding at this path", but it's actually (I think) meant to be
read "check if the file system cares about case at this path". I'm
clearly not the only one confused by this as the dirstate has this
property:
def _checkcase(self):
return not util.checkcase(self._join('.hg'))
Maybe we should even inverse the function and call it fscasefolding()
since that's what all callers care about?
The journal extension had to touch the dirstate internals to be notified about
wd parent change. To make that detection cleaner and reusable let's move it core.
Now the extension can register to be notified about parent changes.
File .hg/dirstate is restored by renaming from backup in failure
inside scopes below. If renaming keeps ctime, mtime and size of a
file, restoring is overlooked, and old contents cached before
restoring isn't invalidated as expected.
- dirstateguard scope (from '.hg/dirstate.SUFFIX')
- transaction scope (from '.hg/journal.dirstate')
To avoid ambiguity of file stat at restoring, this patch invokes
vfs.rename() with checkambig=True.
This patch is a part of "Exact Cache Validation Plan":
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
Cached attribute dirstate._branch uses stat of '.hg/branch' file to
examine validity of cached contents. If writing '.hg/branch' file out
keeps ctime, mtime and size of it, change is overlooked, and old
contents cached before change isn't invalidated as expected.
To avoid ambiguity of file stat, this patch writes '.hg/branch' file
out with checkambig=True.
This patch is a part of "Exact Cache Validation Plan":
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
Cached attribute repo.dirstate uses stat of '.hg/dirstate' file to
examine validity of cached contents. If writing '.hg/dirstate' file
out keeps ctime, mtime and size of it, change is overlooked, and old
contents cached before change isn't invalidated as expected.
To avoid ambiguity of file stat, this patch writes '.hg/dirstate' file
out with checkambig=True.
The former diff hunk changes the code path for "dirstate.write()", and
the latter changes the code path for "dirstate.savebackup()".
This patch is a part of "Exact Cache Validation Plan":
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
Those assertions will prevent the backup functions from overwriting
the dirstate file in case both: suffix and prefix are empty.
(foozy suggested making that change and I agree with him)
The issue with using actualfilename is that dirstate saved during transaction
with "pending" in filename will be impossible to recover from outside of the
transaction because the recover method will be looking for the name without
"pending".
As the copymap is short-lived object regenerated from dirstate on each
read this didn't affect us in any serious way. But since I've started working
on permanent storage of copymap in my experiments with sqldirstate[1] I've seen
this bug leaving the copy information in copymap after reverting the file
moves and copies.
[1] https://www.mercurial-scm.org/wiki/SQLDirstatePlan
This would allow the code explicitly copying dirstate to use this method instead.
Use of this method will increase encapsulation (the dirstate class will be sole
owner of its on-disk storage).
The recommended way to check default value (when None is not as option) is a
token object. Identity testing to integer is less explicit and not guaranteed to
work in all implementations.
When we introduce the develwarning, we did not had an official deprecation API
and infrastructure. We can now officially deprecate the old way with a version
deadline.
Before this patch, we were using python code for computing the nonnormal
dirstate entries. This patch makes us use the C implementation of the function
when it is available.
Using the nonnormal set in hgwatchman improves hg status performance. Below
are the numbers for mozilla-central.
with the changes:
$ hg perfstatus
! wall 0.010632 comb 0.000000 user 0.000000 sys 0.000000 (best of 246)
without the changes:
$ hg perfstatus
! wall 0.036442 comb 0.030000 user 0.030000 sys 0.000000 (best of 100)
On mozilla-central the improvement to hg status is ~20% (0.25s to 0.2s),
on our big repos at Facebook, the win is ~40% (1.2s to 0.72s).
Before this patch, we were only populating the non-normal set when parsing
or packing the dirstate. This was not enough to keep the non-normal set up to
date at all time as we don't write and read the dirstate whenever a change
happens. This patch solves this issue by updating the non-normal set when it
should be updated. note: pack_dirstate changes the dmap and we have it keep
it unchanged for retrocompatibility so we are forced to recompute the
non-normal set after calling it.
This patch adds a new python function in the dirstate to compute the set of
non-normal files from the dmap. These files are useful to compute the repository
status.
Rather than sleep for 2 seconds, we sleep until the next even-numbered
second, which has the same effect, but makes tests faster. This
removes test-largefiles-update as the long pole of the test suite.
When debugrebuilddirstate --minimal is called, rebuilding the dirstate was done
outside of the appropriate rebuild function. This patch makes
debugrebuilddirstate use dirstate.rebuild.
This was done to allow our extension to become aware debugrebuilddirstate
--minimal
We've globablly forced stat to return integer times which agrees with
our extension code, so this is no longer needed.
This speeds up status on mozilla-central substantially:
$ hg perfstatus
! wall 0.190179 comb 0.180000 user 0.120000 sys 0.060000 (best of 53)
$ hg perfstatus
! wall 0.275729 comb 0.270000 user 0.210000 sys 0.060000 (best of 36)
The _filefoldmap is not updated in when files are deleted from dirstate. In the
case where the file with the same but differently cased name is added afterwards
it renders _filefoldmap incorrect. Those steps must occur to for a problem to
reproduce:
- call status (with listunknown=True),
- update working rectory to a commit which does a casefolding change (A -> a)
- call status again (it will show the file "a" as deleted)
Unfortunately I'm unable to write a test for it because I don't know any
core-mercurial command able to reproduce those steps.
The bug was originally spotted when hgwatchman was enabled. It caused the
changeset contents change during hg rebase (one file unrelarted to changeset
was deleted in it after rebase).
The hgwatchman is able to hit it because when hgignore changes the hgwatchmans
overridestatus is calling original status with listunknown=True.
This violation, which passes repo object to dirstate, was introduced
by dc3ddedecae7.
This patch uses 'False' instead of 'None' as default value of 'tr'
argument, to distinguish "None as repo.currenttransaction() result"
from "legacy invocation without explicit tr passing".
True/False value of '_pendingmode' means whether 'dirstate.pending' is
used to initialize own '_map' and so on. When it is None, neither
'dirstate' nor 'dirstate.pending' is read in yet.
This is used to keep consistent view between '_pl()' and '_read()'.
Once '_pendingmode' is determined by reading one of 'dirstate' or
'dirstate.pending' in, '_pendingmode' is kept even if 'invalidate()'
is invoked. This should be reasonable, because:
- effective 'invalidate()' invocation should occur only in wlock scope, and
- wlock can't be gotten under HG_PENDING mode
'_trypending()' is defined as a normal function to factor similar code
path (in bookmarks and phases) out in the future easily.
This patch delays writing in-memory changes out, if transaction is
running.
'_getfsnow()' is defined as a function, to hook it easily for
ambiguous timestamp tests (see also fakedirstatewritetime.py)
'if tr:' code path in this patch is still disabled at this revision,
because there is no client invoking 'dirstate.write()' with repo
object.
BTW, this patch changes 'dirstate.invalidate()' semantics around
'dirstate.write()' in a transaction scope:
before:
with repo.transaction():
dirstate.CHANGE('A')
dirstate.write() # change for A is written out here
dirstate.CHANGE('B')
dirstate.invalidate() # discards only change for B
after:
with repo.transaction():
dirstate.CHANGE('A')
dirstate.write() # change for A is still kept in memory
dirstate.CHANGE('B')
dirstate.invalidate() # discards changes for A and B
Fortunately, there is no code path expecting the former, at least, in
Mercurial itself, because 'dirstateguard' was introduced to remove
such 'dirstate.invalidate()'.
Some comments in this patch assume that subsequent patch changes
'dirstate.write()' like as below:
def write(self, repo):
if not self._dirty:
return
tr = repo.currenttransaction()
if tr:
tr.addfilegenerator('dirstate', (self._filename,),
self._writedirstate, location='plain')
return # omit actual writing out
st = self._opener('dirstate', "w", atomictemp=True)
self._writedirstate(st)
This patch makes '_savebackup()' write in-memory changes out, and it
causes clearing 'self._dirty'. If dirstate isn't changed after
'_savebackup()', subsequent 'dirstate.write()' never invokes
'tr.addfilegenerator()' because 'not self._dirty' is true.
Then, 'tr.writepending()' unintentionally returns False, if there is
no other (e.g. changelog) changes pending, even though dirstate
changes are already written out at '_savebackup()'.
To avoid such situation, this patch makes '_savebackup()' explicitly
invoke 'tr.addfilegenerator()', if transaction is running.
'_savebackup()' should get awareness of transaction before 'write()',
because the former depends on the behavior of the latter before this
patch.