prepushoutgoinghook was introduced in 8dfcd476a7f7 and largefiles is the only
in-tree use of it. Refactor it to be more useful for other use cases in
largefiles.
This patch extracts the code for determining requirements for a new
repo into a standalone function. By doing so, future code that will
perform an in-place repository upgrade (e.g. to generaldelta) can
examine the set of proposed new requirements and possibly take
additional actions (such as adding dotencode or fncache) when
performing the upgrade.
This patch is marked as API because _baserequirements (which was added
in 87e5a9d0fbb1 so extensions could override it) has been removed and
will presumably impact whatever extension it was added for. Consumers
should be able to monkeypatch the new function to achieve the same
functionality.
The "create" argument has been dropped because the function is only
called in one location and "create" is always true in that case.
While it makes logical sense for this code to be a method so extensions
can implement a custom repo class / method to override it, this won't
actually work. This is because requirements determination occurs during
localrepository.__init__ and this is before the "reposetup"
"callback" is fired. So, the only way for extensions to customize
requirements would be to overwrite localrepo.localrepository or to
monkeypatch a function on a module during extsetup(). Since we try to
keep localrepository small, we use a standalone function. There is
probably room to offer extensions a "hook" point to alter repository
creation. But that is scope bloat.
A future patch will refactor requirements determination into a
standalone function. To prepare for this, refactor the requirements
code to assign to a local variable instead of to self.requirements.
Before, the hook() closure (which is called as part of locking hooks)
would maintain a reference to a transaction instance (which should be
finalized by the time lock hooks are called). Because we accumulate
hook() instances when there are multiple transactions per lock, this
would result in holding references to the transaction instances which
would lead to higher memory utilization.
Creating a reference to the hook arguments dict minimizes the number
of objects that are kept alive until the lock release hook runs,
minimizing memory "leaks."
This further centralizes the handling of bookmark storage, and will
help get some lingering bookmarks business out of localrepo. Right
now, this change implies reading of the active bookmark to also imply
reading all bookmarks from disk - for users with many many bookmarks
this may be a measurable performance hit. In that case, we should
migrate bmstore to be able to lazy-read its properties from disk
rather than having to eagerly read them, but I decided to avoid doing
that to try and avoid some potentially complicated filecache decorator
issues.
This doesn't move the logic for writing the active bookmark into a
transaction, though that is probably the correct next step. Since the
API probably needs to morph a little more, I didn't bother marking
bookmarks.{activate,deactivate} as deprecated yet.
The 'peer.known' call (handled at the repository level) was applying its own
manual filtering (looking at phases) instead of relying on the repoview
mechanism. This led to the discovery finding more "common" node that
'getbundle' was willing to recognised. From there, bad things happen, issue4982
is a symptom of it. While situations like described in issue4982 can still
happen because of race conditions, fixing 'peer.known' is important for
consistency in all cases.
We update the code to use 'repoview' filtering. This lead to small changes in
the tests for exchanging obsolescence marker because the discovery yields
different results.
The test affected in 'test-obsolete-changeset-exchange.t' is a test for
issue4982 getting back to its expected state.
If acquisition of wlock waits for another "hg commit" process to
release it, dirstate will refer newly committed revision after
acquisition of wlock.
At that time, '00changelog.i' on the filesystem contains this new
revision, but in-memory 'repo.changelog' doesn't, if it is cached
without store lock (slock) before updating by another "hg commit".
This makes validating parents at re-loading 'repo.dirstate' from
'.hg/dirstate' replace such new revision with 'nullid'. Then,
'localrepository.commit()' creates "orphan" revision (see issue4368
for detail).
ec817376a322 makes 'commands.commit()' acquire both wlock and slock
before processing to avoid this issue at "hg commit".
But similar issue can occur even after ec817376a322, if 3rd party
extension does:
- refer 'repo.changelog' outside wlock scope, and
- invoke 'repo.commit()' directly (instead of 'commands.commit()')
This patch makes 'commit()' acquire slock before processing, to refer
recent changelog at validating parents of 'repo.dirstate'.
The function was dropped in d5d613de0f44. This API drop brokes three of my
extensions including some critical to my workflow like tortoisehg. Lets mark
this API for death and give people time to fix their code.
Previously we were only checking modified files for their resolve state. But a
file might be unresolved yet not in the modified state. Handle all such cases
properly.
Auditors keeps a cache of audited paths. Therefore we cannot use the same
auditor for working copy and history operation. We create a new one without
file system check for this purposes.
localrepo.parents() has relatively few users, and most of those were
actually implicitly looking at the wctx, which is now made explicit
via repo[None].
'repo.invalidate()' deletes 'filecache'-ed properties by
'filecache.__delete__()' below via 'delattr(unfiltered, k)'. But
cached objects are still kept in 'repo._filecache'.
def __delete__(self, obj):
try:
del obj.__dict__[self.name]
except KeyError:
raise AttributeError(self.name)
If 'repo' object is reused even after failure of command execution,
referring 'filecache'-ed property may reuse one kept in
'repo._filecache', even if reloading from a file is expected.
Executing command sequence on command server is a typical case of this
situation (e0a0f9ad3e4c also tried to fix this issue). For example:
1. start a command execution
2. 'changelog.delayupdate()' is invoked in a transaction scope
This replaces own 'opener' by '_divertopener()' for additional
accessing to '00changelog.i.a' (aka "pending file").
3. transaction is aborted, and command (1) execution is ended
After 'repo.invalidate()' at releasing store lock, changelog
object above (= 'opener' of it is still replaced) is deleted from
'repo.__dict__', but still kept in 'repo._filecache'.
4. start next command execution with same 'repo'
5. referring 'repo.changelog' may reuse changelog object kept in
'repo._filecache' according to timestamp of '00changelog.i'
'00changelog.i' is truncated at transaction failure (even though
this truncation is unintentional one, as described later), and
'st_mtime' of it is changed. But 'st_mtime' doesn't have enough
resolution to always detect this truncation, and invalid
changelog object kept in 'repo._filecache' is reused
occasionally.
Then, "No such file or directory" error occurs for
'00changelog.i.a', which is already removed at (3).
This patch discards objects in '_filecache' other than dirstate at
transaction failure.
Changes in 'invalidate()' can't be simplified by 'self._filecache =
{}', because 'invalidate()' should keep dirstate in 'self._filecache'
'repo.invalidate()' at "hg qpush" failure is removed in this patch,
because now it is redundant.
This patch doesn't make 'repo.invalidate()' always discard objects in
'_filecache', because 'repo.invalidate()' is invoked also at unlocking
store lock.
- "always discard objects in filecache at unlocking" may cause
serious performance problem for subsequent procedures at normal
execution
- but it is impossible to "discard objects in filecache at unlocking
only at failure", because 'releasefn' of lock can't know whether a
lock scope is terminated normally or not
BTW, using "with" statement described in PEP343 for lock may
resolve this ?
After this patch, truncation of '00changelog.i' still occurs at
transaction failure, even though newly added revisions exist only in
'00changelog.i.a' and size of '00changelog.i' isn't changed by this
truncation.
Updating 'st_mtime' of '00changelog.i' implied by this redundant
truncation also affects cache behavior as described above.
This will be fixed by dropping '00changelog.i' at aborting from the
list of files to be truncated in transaction.
This patch centralizes passing HG_PENDING to external hook process
into '_exthook()'. To make in-memory changes visible to external hook
process, this patch does:
- write (or schedule to write) in-memory dirstate changes, and
- set HG_PENDING environment variable, if:
- a transaction is running, and
- there are in-memory changes to be visible
This patch tests some commands with some hooks, because transaction
activity of a same hook differs from each other ("---": "not tested").
======== ========= ========= ============
command preupdate precommit pretxncommit
======== ========= ========= ============
unshelve o --- ---
backout x --- ---
import --- o o
qrefresh --- x o
======== ========= ========= ============
Each hooks are examined separately to prevent in-memory changes from
being visible to external process accidentally by side effect of hooks
previously invoked.
Now, 'dirstate.write(tr)' delays writing in-memory changes out, if a
transaction is running.
This may cause treating this revision as "the first bad one" at
bisecting in some cases using external hook process inside transaction
scope, because some external hooks and editor process are still
invoked without HG_PENDING and pending changes aren't visible to them.
'dirstate.write()' callers below in localrepo.py explicitly use 'None'
as 'tr', because they can assume that no transaction is running:
- just before starting transaction
- at closing transaction, or
- at unlocking wlock
This code will not currently be activated because there's no code to mark
files as driver-resolved in core. This point is also somewhat hard to plug into
from extensions.
Before this patch, making a commit on a local repo could move a bookmark and
both operations would not be grouped as one transaction. This patch makes both
operations part of one transaction. This is necessary to switch to the new api
to save bookmarks repo._bookmarks.recordchange if we don't want to change the
current behavior of rollback.
Dirstate change happening after the commit is done is now part of the
transaction mentioned above. This leads to a change in the expected output of
several tests.
The change to test-fncache happens because both lock are now released in the
same finally clause. The lock release is made explicitly buggy in this test.
Previously releasing lock would crash triggering release of wlock that crashes
too. Now lock release crash does not directly result in the release of wlock.
Instead wlock is released at garbage collection time and the error raised at
that time "confuses" python.
This patch delays writing in-memory changes out, if transaction is
running.
'_getfsnow()' is defined as a function, to hook it easily for
ambiguous timestamp tests (see also fakedirstatewritetime.py)
'if tr:' code path in this patch is still disabled at this revision,
because there is no client invoking 'dirstate.write()' with repo
object.
BTW, this patch changes 'dirstate.invalidate()' semantics around
'dirstate.write()' in a transaction scope:
before:
with repo.transaction():
dirstate.CHANGE('A')
dirstate.write() # change for A is written out here
dirstate.CHANGE('B')
dirstate.invalidate() # discards only change for B
after:
with repo.transaction():
dirstate.CHANGE('A')
dirstate.write() # change for A is still kept in memory
dirstate.CHANGE('B')
dirstate.invalidate() # discards changes for A and B
Fortunately, there is no code path expecting the former, at least, in
Mercurial itself, because 'dirstateguard' was introduced to remove
such 'dirstate.invalidate()'.
'localrepository.rollback()' explicilty restores dirstate, only if at
least one of current parents of the working directory is removed at
rollbacking (a.k.a "parent-gone").
After DirstateTransactionPlan, 'dirstate.write()' will cause marking
'.hg/dirstate' as a file to be restored at rollbacking.
https://mercurial.selenic.com/wiki/DirstateTransactionPlan
Then, 'transaction.rollback()' restores '.hg/dirstate' regardless of
parents of the working directory at that time, and this causes
unexpected dirstate changes if not "parent-gone" (e.g. "hg update" to
another branch after "hg commit" or so, then "hg rollback").
To avoid such situation, this patch restores dirstate to one before
rollbacking if not "parent-gone".
before:
b1. restore dirstate explicitly, if "parent-gone"
after:
a1. save dirstate before actual rollbacking via dirstateguard
a2. restore dirstate via 'transaction.rollback()'
a3. if "parent-gone"
- discard backup (a1)
- restore dirstate from 'undo.dirstate'
a4. otherwise, restore dirstate from backup (a1)
Even though restoring dirstate at (a3) after (a2) seems redundant,
this patch keeps this existing code path, because:
- it isn't ensured that 'dirstate.write()' was invoked at least once
while transaction running
If not, '.hg/dirstate' isn't restored at (a2).
In addition to it, rude 3rd party extension invoking
'dirstate.write()' without 'repo' while transaction running (see
subsequent patches for detail) may break consistency of a file
backup-ed by transaction.
- this patch mainly focuses on changes for DirstateTransactionPlan
Restoring dirstate at (a3) itself should be cheaper enough than
rollbacking itself. Redundancy will be removed in next step.
Newly added test is almost meaningless at this point. It will be used
to detect regression while implementing delayed dirstate write out.
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.
For great justice.
Before this patch, in-memory dirstate changes are still kept over a
transaction scope boundary regardless of the result of it.
For "all or nothing" policy of the transaction, in-memory dirstate
changes should be:
- written out at successful closing a transaction, because
subsequent 'dirstate.invalidate()' can lose them
- discarded at failure of a transaction, because outer
'wlock.release()' or so may write them out
To discard all changes in a transaction completely, this patch also
restores '.hg/dirstate' by '.hg/journal.dirstate' at failure, because
'transaction' itself does nothing for files related to '.hg/journal.*'
in such case (therefore, renaming in this patch is safe enough).
This is a part of preparations for "transactional dirstate". See also
the wiki page below for detail about it.
https://mercurial.selenic.com/wiki/DirstateTransactionPlan
This patch also removes redundant 'dirstate.invalidate()' just before
aborting a transaction for shelve/unshelve.
Review feedback from Pierre-Yves David. A separate line of work is working to
ensure that dirstate writes are written to a separate 'pending' file while a
transaction is active. Lock inheritance currently conflicts with that, so dodge
the issue by simply preventing inheritance while a transaction is running.
Custom merge drivers aren't going to run inside a transaction, so this doesn't
affect that.
This will be useful to pass around a reference to the lock to some functions
we're going to add to scmutil. We don't want those functions to live in
localrepo to avoid bloat.
This is part of a series that will allow locks to be inherited by subprocesses
in limited circumstances.
When allowed, the parent process will pass down requisite information to the
child process by way of this environment variable.
Stream clones are a special case of clones. Clones are a special case of
pull. Most of the logic for deciding what to do at pull time is in
exchange.py. It makes sense for the stream clone determination to live
there as well.
This patch moves the calling of the stream clone code into pull(). The
checks in streamclone.canperformstreamclone() ensure that we don't
perform a stream clone unless it is possible.
A future patch will convert maybeperformstreamclone() to accept a
pullop to make it consistent with everything else in pull(). It will
also grow some functionality (in case you doubted the necessity of a 4
line function).
An upcoming patch will move the invocation of stream cloning logic to
the normal pull code path (from localrepository.clone). In preparation
for this, we teach pull() and pulloperation about whether a streaming
clone is requested.
The return logic in localrepository.clone() has been reformatted
slightly because of line length issues.
This is the last remnants of streaming clone code in localrepo.py.
This is a mostly mechanical transplant of code to a new file. Only a
rewrite of "self" to "repo" was performed. The code will be
significantly refactored in upcoming patches. So don't scrutinize it too
closely.
Upcoming patches will modernize the streaming clone code. Streaming
clone data and code kind of lives in its own world. exchange.py is
arguably the most appropriate existing location for it. However, over
a dozen patches from now it became apparent that there was a lot of code
related to streaming clones and that having it contained within its own
module would make it easier to comprehend. So, we establish
streamclone.py.
It's worth noting that streamclone.py existed a long time ago, last seen
in the 1.6 release. It was removed in 2cd3dd86758c.
The function was renamed as part of the move because its old name was
redundant with the new module name. The only other content change was
"self" was renamed to "repo" and minor grammar in the docstring was
updated.
Because _phaserevs and _phasesets cache revision numbers, they must be
invalidated if there are new commits or stripped revisions. We could do
that by calling _phasecache.invalidate(), but it wasn't simple to be
integrated with the filecache mechanism.
So for now, phasecache will be recreated after repo.invalidate() if
00changelog.i was modified before.
Because _phaserevs and _phasesets cache revision numbers, they must be
invalidated if there are new commits or stripped revisions. We could do
that by calling _phasecache.invalidate(), but it wasn't simple to be
integrated with the filecache mechanism.
So for now, phasecache will be recreated after repo.invalidate() if
00changelog.i was modified before.
Mutable default arguments are know to the state of California to cause bugs. We
just added support of None for the underlying function, so nothing else the
required.