If commit is aborted by pretxncommit hook, in-memory changelog and manifest
have entries that would be added. So they must be discarded on invalidate().
But the mechanism introduced by 071f71da2fe2 doesn't handle this case well.
It tries to mitigate the penalty of invalidate() by marking in-memory cache
as "clean" on unlock assuming that they are identical to the stored data.
But this assumption is wrong if stored data are rolled back by tr.abort().
This patch moves the hook to post-close action so that it will never be
triggered on abort.
This bug was originally reported to thg, which is only reproducible in
command-server process on unix, evolve disabled.
https://bitbucket.org/tortoisehg/thg/issues/4285/
This adds an option for delta'ing against both p1 and p2 when applying merge
revisions and picking whichever is smallest.
Some before and after stats on manifest.d size:
internal large repo:
before: 1.2 GB
after: 930 MB
mozilla-central:
before: 261 MB
after: 92 MB
Before this patch, in-memory dirstate changes aren't written out at
opening transaction, even though 'journal.dirstate' is created
directly from '.hg/dirstate'.
Therefore, subsequent 'hg rollback' uses incomplete 'undo.dirstate' to
restore dirstate, if dirstate is changed and isn't written out before
opening transaction.
In cases below, the condition "dirstate is changed and isn't written
out before opening transaction" isn't satisfied and this problem
doesn't appear:
- "wlock scope" and "transaction scope" are almost equivalent
e.g. 'commit --amend', 'import' and so on
- dirstate changes are written out before opening transaction
e.g. 'rebase' (via 'dirstateguard') and 'commit -A' (by separated
wlock scopes)
On the other hand, 'backout' may satisfy the condition above.
To make 'journal.dirstate' contain in-memory changes before opening
transaction, this patch explicitly invokes 'dirstate.write()' in
'localrepository.transaction()'.
'dirstate.write()' is placed before not "writing journal files out"
but "invoking pretxnopen hooks" for visibility of dirstate changes to
external hook processes.
BTW, in the test script, 'touch -t 200001010000' and 'hg status' are
invoked to make file 'c' surely clean in dirstate, because "clean but
unsure" files indirectly cause 'dirstate.write()' at 'repo.status()'
in 'repo.commit()' (see e1d123a2ee1f for detail) and prevents from
certainly reproducing the issue.
This option will make repositories created as general delta by default but will
not make Mercurial aggressively recompute deltas for all incoming bundle.
Instead, the delta contained in the bundle will be used. This will allow us to
start having general delta repositories created everywhere without triggering
massive recomputation costs for all new clients cloning from old servers.
General delta is currently controlled by a single option, we will introduce a
new one in the next changeset.
We extract the logic in a function while it is simple.
This allows us to use the integer representation in revset. None doesn't
work well while computing revset because revset heavily depends on and
optimized for integer revisions.
Still repo[wdirrev].rev() is None, which means the canonical form of the
working-directory revision is None.
This patch doesn't add the case for the wdirid because we can't handle short
and ambiguous identifiers here. Perhaps, the wdirid will have to be handled
in the changelog layer.
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".
This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.
This patch was produced by running `2to3 -f except -w -n .`.
We have started to isolate extra usecases for developer-only output
that is not a warning. As the section has the fairly generic name
'devel' it makes sense to tuck them there. As a result, 'all' becomes
a bit misleading so we rename it to 'all-warnings'. This will break
some developer setups but the tests are still fine and developers will
likely spot this change.
Before this patch, hook argument `txnid` isn't passed to `pretxnopen`
hooks, even though `hooks` section of `hg help config` describes so.
``pretxnopen``
Run before any new repository transaction is open. The reason for the
transaction will be in ``$HG_TXNNAME`` and a unique identifier for the
transaction will be in ``HG_TXNID``. A non-zero status will prevent the
transaction from being opened.
Before this patch, transaction ID (TXNID) is calculated from
`transaction` object itself by `id()`, but this prevents TXNID from
being passed to `pretxnopen` hooks, which should be executed before
starting transaction processing (also any preparations for it, like
writing journal files out).
As a preparation for passing TXNID to `pretxnopen` hooks, this patch
separates calculation of TXNID from creation of `transaction` object.
This patch uses "random" library for reasonable unique ID. "uuid"
library can't be used, because it was introduced since Python 2.5 and
isn't suitable for Mercurial 3.4.x stable line.
`%f` formatting for `random.random()` is used with explicit precision
number 40, because default precision for `%f` is 6. 40 should be long
enough, even if 10**9 transactions are executed in a short time (a
second or less).
On the other hand, `time.time()` is used to ensures uniqueness of
TXNID in a long time, for safety.
BTW, platform not providing `/dev/urandom` or so may cause failure of
`import random` itself with some Python versions (see Python
issue15340 for detail http://bugs.python.org/issue15340).
But this patch uses "random" without any workaround, because:
- "random" is already used directly in some code paths,
- such platforms are very rare (e.g. Tru64 and HPUX), and
http://bugs.python.org/issue15340#msg170000
- updating Python runtime can avoid this issue
The existing stream_in method assumes a streaming clone is applied via
the wire protocol. Previous patches have enabled streaming clone data to
be produced and consumed outside the context of the wire protocol.
However, the consuming part was incomplete because it didn't deal with
things like updating the branch caches or writing out a requirements
file.
This patch finishes the separation of stream clone handling from the
wire protocol. After this patch, it is possible to consume stream clones
from arbitrary sources, including files. Mozilla plans to leverage this
to serve pre-generated stream clone files to consumers, drastically
reducing the wall and CPU time required to clone large repositories.
This will enable clones to be nearly as fast as `tar`.
For reasons outlined in the previous commit, we want to make the code
for consuming "stream bundles" reusable. This patch extracts the code
into a standalone function.
Before this patch, reverting a file to the revision other than the
parent doesn't update dirstate. This seems to expect that timestamp
and/or size will be changed by reverting.
But if (1) dirstate of file "f" is filled with timestamp before
reverting and (2) size and timestamp of file "f" isn't changed at
reverting, file "f" is recognized as CLEAN unexpectedly.
This patch applies "dirstate.normallookup()" on reverted file, if size
isn't changed.
Making "localrepository.wwrite()" return length of written data is
needed to avoid additional (and redundant) "lstat(2)" on the reverted
file. "filectx.size()" can't be used to know it, because data may be
decoded at being written out.
BTW, interactive reverting may cause similar problem, too. But this
patch doesn't focus on fixing it, because (1) interactive (maybe slow)
reverting changes one (or both) of size/timestamp of reverted files in
many usecases, and (2) changes to fix it seems not suitable for stable
branch.
The pre-pushkey hook will likely validate the pushkey based on element
previously changed in the same transaction. We need to make theses data
available for the hook.
When pushkey is called during a transaction, we include its 'hookargs' when
running the pre-pushkey hooks. Having more data cannot hurt, especially the
transaction ID.
If 'wlock' is taken, we should add 'afterlock' callback to the 'wlock' instead.
Otherwise, running post transaction hook after 'lock' is release but 'wlock' is
still taken lead to a deadlock (eg: 'hg update' during a hook).
This situation is much more common since: b7067abc16c0
push: acquire local 'wlock' if "pushback" is expected (BC) (issue4596)
In case of errors, output parts salvaged from the reply bundle need to be
processed for outputting their content. This concludes our quest for fixing
issue4594.
We are going to add output related logic in this function. We do the
indentation first to help next changeset readability. We need a new try except
because we want to handle output on any exception, including PushRaced ones.
This hook will be called whenever a transaction is aborted. This will make it
easy for people to clean up temporary content they may have created during a
transaction.
We are warning about lock acquired in the wrong order because this can create
dead-lock situation. But non-wait acquisition will not block and therefore not
create a dead-lock. So we do not need to wait in such case.
Lock must be acquired in a specific order to avoid dead-lock. This was
documented on the wiki, but having this information in the docstring is also
useful.
We change the wording of the developer warning:
- "lock" taken before "wlock"
+ "wlock" acquired after "lock"
The goals here are to:
- Put the "subject" as the first word,
- use "acquired" instead of "taken" since it seems more accurate.
Before this change, any call to 'wlock' after we acquired a 'lock' was issuing a
warning. This is wrong as the 'wlock' have been properly acquired before the
'lock' in a previous call.
We move the warning code to only issue such warnings when we are acquiring the
'wlock' -after- acquiring 'lock'. This is the expected behavior of this warning
from the start.
That way, any new server will be ready to accept bundle2 payload. The decision
for the client to use it is still off by default so this is not turning bundle2
everywhere.
We introduce a new kill switch for this in case stuff goes wrong.
It is finally time to freeze the bundle2 format! To do so we:
- rename HG2Y to HG20,
- drop "b2x:" prefix from all part names,
- rename capability to "bundle2-exp" to "bundle2"
- rename the hook flag from 'bundle2-exp' to 'bundle2'
To support multiple bundle2 formats, we will need a function returning
the proper unbundler according to the header. We introduce such aa
function and change the usage in the code base. The function will get
smarter in later changesets.
This is somewhat similar to the dispatching we do for 'HG10' and 'HG11'.
The main target is to allow HG2Y support in an extension to ease transition of
companies using the experimental protocol in production (yeah...) But I've no
doubt this will be useful when playing with a future HG21.
The 'format' argument was not used even when it was added in
05af7fe09faf (getbundle: pass arbitrary arguments all along the call
chain, 2014-04-17).
Note that by removing the argument, if any caller did pass a named
'format' argument, we will now pass that along to exchange.getbundle()
via the kwargs. If the idea was to remove such a key, that should have
been done explicitly.
While it should be safe to switch to the new manifest format on an
existing repo, let's keep it simple for now and make the configuration
have any effect only at repo creation time. If the configuration is
enabled then (at repo creation), we add an entry to requires and read
that instead of the configuration from then on.
With tree manifests, hashes will change anyway, so now is a good time
to also take up the old plans of a new manifest format. While there
should be little or no reason to use tree manifests with the current
manifest format (v1) once the new format (v2) is supported, we'll try
to keep the two dimensions (flat/tree and v1/v2) separate.
In preparation for adding a the new format, let's add configuration
for it and propagate that configuration to the manifest revlog
subclass. The new configuration ("experimental.manifestv2") says in
what format to write the manifest data. We may later add other
configuration to choose how to hash it, either keeping the v1 hash for
BC or hashing the v2 content.
See http://mercurial.selenic.com/wiki/ManifestV2Plan for more details.
This patch newly adds "dirtyreason()" to centralize composing dirty
reason message like "uncommitted changes in subrepository 'xxxx'".
There are 3 similar messages below, and this patch is a part of
preparations for unifying them into (1), too.
1. uncommitted changes in subrepository 'XXXX'
2. uncommitted changes in subrepository XXXX
3. uncommitted changes in subrepo XXXX
This patch chooses adding new method "dirtyreason()" instead of making
"dirty()" return "reason string", because:
- some of existing "dirty()" implementation is too complicated to do
so simply, and
- ill-mannered 3rd party subrepo classes, of which "dirty()" doesn't
return "reason string", cause meaningless message (even though it
is rare case)
Since manifests instances are cached on the manifest log instance, we
can cache directory manifests by caching the directory manifest
logs. The directory manifest log cache is a plain dict, so it never
expires; we assume that we can keep all the directories in memory.
The cache is kept on the root manifestlog, so access to directory
manifest logs now has to go through the root manifest log.
The caching will soon not be only an optimization. When we start
lazily loading directory manifests, we need to make sure we don't
create multiple instances of the log objects. The caching takes care
of that problem.
It should be possible to debug the submanifest revlogs without having
to know where they are stored (in .hg/store/meta/), so let's add a
--dir option for this purpose.
match.ispartial() will return the opposite of match.always() in core, but this
function will be extensible by extensions to produce another result even
if match.always() will be untouched.
This will be useful for narrowhg, where ispartial() will return False even if
the match won't always match. This would happen in the case where the only
time the match function is False is when the path is outside of the narrow
spec.
The new way to allow empty commit is to temporarily set the
'ui.allowemptycommit' config option.
allowemptyback = repo.ui.backupconfig('ui', 'allowemptycommit')
try:
repo.ui.setconfig('ui', 'allowemptycommit', True)
repo.commit(...)
finally:
repo.ui.restoreconfig(allowemptyback)
All known uses of force for allowing empty commits have been removed, so let's
remove it from the allowemptycommits condition.
This adds a config flag that enables a user to make empty commits.
This is useful in a number of cases.
For instance, automation that creates release branches via
bookmarks may want to make empty commits to that release bookmark so that it
can't be fast-forwarded and so it can record information about the release
bookmark's creation. This is already possible with named branches, so making it
possible for bookmarks makes sense.
Another case we've wanted it is for mirroring repositories into Mercurial. We
have automation that syncs commits into hg by running things from the command
line. The ability to produce empty commits is useful for syncing unusual commits
from other VCS's.
In general, allowing the user to create the DAG as they see fit seems useful,
and when I mentioned this in IRC more than one person piped up and said they
were already hacking around this limitation by using mq, import, and
commit-dummy-change-then-amend-the-content-away style solutions.
The empty commit condition was a messy if condition. Let's move it to a new line
and change it to 'or' statements so it's cleaner and more readable. A future
commit will add additional logic to this line.
Before this patch, releasing the store lock implies the actions below, when
the transaction is aborted:
1. "commithook()" scheduled in "localrepository.commit()" is invoked
2. "changectx.__init__()" is invoked via "self.__contains__()"
3. specified ID is examined against "repo.dirstate.p1()"
4. validation function is invoked in "dirstate.p1()"
In subsequent patches, "dirstate.invalidate()" invocations for
discarding changes are replaced with "dirstateguard", but discarding
changes by "dirstateguard" is executed after releasing the store lock:
resources are acquired in "wlock => dirstateguard => store lock" order,
and are released in reverse order.
This may cause that "dirstate.p1()" still refers to the changeset to be
rolled-back at (4) above: pushing multiple patches by "hg qpush" is
a typical case.
When releasing the store lock, such changesets are:
- not contained in "repo.changelog", if it is reloaded from
".hg/00changelog.i", as that file was already truncated by
"transaction.abort()"
- still contained in it, otherwise
(this "dirty read" problem is discussed in "Transaction Plan"
http://mercurial.selenic.com/wiki/TransactionPlan)
Validation function shows "unknown working parent" warning in the
former case, but reloading "repo.changelog" depends on the timestamp
of ".hg/00changelog.i". This causes occasional test failures.
In the case of scheduled "commithook()", it just wants to examine
whether "node ID" of committed changeset is still valid or not. Other
examinations implied in "changectx.__init__()" are meaningless.
To avoid showing the "unknown working parent" warning irregularly, this
patch uses "changelog.hasnode()" instead of "node in self" to examine
existence of committed changeset.
The very next changeset will start writing one revlog per directory
when tree manifests are enabled. That is backwards incompatible, so it
requires .hg/requires to be updated. Just like with generaldelta, we
want to update .hg/requires only when the repo is created. Updating
..hg/requires is bad for repos on shared disk. Instead, those who do
want to upgrade a repo to using treemanifest (or manifestv2, etc) can
run
hg clone --config experimental.treemanifest repo clone
which will create a new repo with the requirement set. Unlike the case
of e.g. generaldelta, it will not rewrite the changesets, since tree
manifests hash differently.
Today, the terms 'active' and 'current' are interchangeably used throughout the
codebase in reference to the active bookmark (the bookmark that will be updated
with the next commit). This leads to confusion among developers and users.
This patch is part of a series to standardize the usage to 'active' throughout
the mercurial codebase and user interface.
Today, the terms 'active' and 'current' are interchangeably used throughout the
codebase in reference to the active bookmark (the bookmark that will be updated
with the next commit). This leads to confusion among developers and users.
This patch is part of a series to standardize the usage to 'active' throughout
the mercurial codebase and user interface.
This change adds boolean configuration option
experimental.treemanifest. When the option is enabled, manifests are
parsed into the new treemanifest type.
Tests can be now run using treemanifest by switching the config option
default in localrepo._applyrequirements(). Tests pass even when made
to randomly choose between manifestdict and treemanifest, suggesting
that the two types produce identical manifests (so e.g. a manifest
revlog entry written from a treemanifest can be parsed by the
manifestdict code).
It is currently only amend and debugbuilddag that will pass a filectx to
localrepo._filecommit. Amend will usually not hit the case where a the filectx
is a parent that just can be reused. Future convert changes will use it more.
Nobody should start a transaction on an unlocked repository. If
developer warnings are enabled this will be reported. This use the
same config as bad locking order since this is closely related.
Previously we would only actually write the revbranchcache to disk if we were in
the middle of a write operation (like commit). Now we will also write it during
any read operation. The cache knows how to invalidate itself, so it shouldn't
become corrupt if multiple writers try at once (and the write-on-read
behavior/risk is the same as all our other caches).
Previously the revbranchcache was a field inside the branchmap. This is bad for
a couple reasons:
1) There can be multiple branchmaps per repo (one for each filter level). There
can only be one revbranchcache per repo. In fact, a revbranchcache could only
exist on a branchmap that was for the unfiltered view, so you could have
branchmaps exist for which you couldn't have a revbranchcache. It was funky.
2) The write lifecycle for the revbranchcache is going to be different from
the branchmap (branchmap is greedily written early on, revbranchcache
should be lazily computed and written).
This patch moves the revbranchcache to live as a field on the localrepo
(alongside self._branchmap). This will allow us to handle it's lifecycle
differently, which will let us move it to be lazily computed in future patches.
This is necessary to implement "wc" symbol for workingctx, that will be used
as follows:
$ hg annotate -r wc FILE
In principle, "rev in repo" should be True if "repo[rev]" can return a context
object. But when it was implemented by 354374e0dc2a, lookup() had a long logic
to map all sorts of changeids to nodes, and "None in repo" did crash because
lookup() could not accept None. So I assume that the case of changeid=None was
not considered.
Now "None in repo" doesn't crash, it should be True for workingctx revision.
Behavior of "changeid in repo":
revision "null" existing rev None (workingctx)
---------- ------ ------------ -----------------
original* True True TypeError
current True True False
this patch True True True
(*original: 354374e0dc2a)
Although Python supports `X = Y if COND else Z`, this was only
introduced in Python 2.5. Since we have to support Python 2.4, it was
a very common thing to write instead `X = COND and Y or Z`, which is a
bit obscure at a glance. It requires some intricate knowledge of
Python to understand how to parse these one-liners.
We change instead all of these one-liners to 4-liners. This was
executed with the following perlism:
find -name "*.py" -exec perl -pi -e 's,(\s*)([\.\w]+) = \(?(\S+)\s+and\s+(\S*)\)?\s+or\s+(\S*)$,$1if $3:\n$1 $2 = $4\n$1else:\n$1 $2 = $5,' {} \;
I tweaked the following cases from the automatic Perl output:
prev = (parents and parents[0]) or nullid
port = (use_ssl and 443 or 80)
cwd = (pats and repo.getcwd()) or ''
rename = fctx and webutil.renamelink(fctx) or []
ctx = fctx and fctx or ctx
self.base = (mapfile and os.path.dirname(mapfile)) or ''
I also added some newlines wherever they seemd appropriate for readability
There are probably a few ersatz ternary operators still in the code
somewhere, lurking away from the power of a simple regex.
We are adding a 'txnclose' hook that will be run right before a transaction is
closed. Hooks running at that time will have access to the full transaction
content through both 'hookargs' content and on-disk reading. They will be able
to abort the transaction.
We are adding generic hooking for all transactions. We may have useful
information about what happened during the transaction, user of the transaction
should have filled the 'hookargs' dictionnary of the transaction. This hook is
simple because it has no power to rollback the transaction.
We are adding generic hooking for all transactions. We do not really have any
useful information to include when opening the transaction but this is a
useful time to allow a hook anyway. We better let people abort transaction before
they happen than after multiple seconds/minutes of processing.
Running the tags function filtered will lead to different results with different
filter levels. This seems too dangerous to be done blindly as 39c37a1a9e2d did.
This change is intended to avoid exposing the implementation detail to
callers. I'm going to extend fullreposet to support "null" revision, so
these mfunc calls will have to use fullreposet() instead of spanset().
If a commit and a followup tag commit are pruned, there are no references to it
in any non obsolete version of .hgtags. Without this change however, the next
time a tag is added to another branch, the obsolete references are appended in
.hgtags before the new entries for the current tag command.
The annotation to unfilter localrepo._tag() has been around since 3da49fd631fb.
The log message for it mentions computing the tag cache though, so I'm not sure
if this was misplaced? It looks like branchmap was aware of filtering then, and
now tracks a cache per view.
On machines with lots of ram, it's beneficial to increase the lru size of the
manifest cache. On a large repo, configuring the lru to be size 10 can shave a
large rebase (~12 commits) down from 95s to 70s.
Previously, we had weird, nonsensical behavior when committing a file move
with a missing source. This removes that weird logic and tests that the bug
this strange behavior caused is fixed. Also adds a longish comment to prevent
some poor soul from accidentally re-implementing the bug in the future.
It is time for the transaction to be responsible for setting up the
undo data. It is necessary to move this logic into the transaction
because many more files are handled now, and the transaction is the
object tracking them all.
The value can be set to None if no undo should be set.
Previously, in the namespaces api, the only caller of branchtip was singlenode
which happened to raise the same exception that branchtip raised: KeyError.
This is a minor change but will allow upcoming patches to use repo.branchtip to
not raise an exception if a branch doesn't exist. After that, it will be
possible for extensions to use the namespace api in a stable way.
commitctx already showed notes with filenames but didn't provide any context.
It is just as relevant to know when manifest or changelog is committed.
So, in addition to filenames, also show headlines 'committing files:',
'committing manifest' and 'committing changelog'.
Before this patch, "workingctx" is also used for the context to be
committed. But "workingctx" works incorrectly in some cases.
For example, even when only some of changed files in the working
directory are committed, "status()" on "workingctx" object for
committing recognizes files not to be committed as changed, too.
As the preparation for fixing these issues, this patch chooses adding
new class "workingcommitctx" for exact context to be committed,
because switching by the flag (like "self._fixedstatus" or so) in some
code paths of "workingctx" is less readable and maintenancable.
This will make it possible to customize the behavior of the join method by
changing the vfs class (e.g. by using the altvfs" class introduced recently).
Note that we could have modified the VFS join methods to acept a set of optional
paths in the same way thta the localrepo join method does. However it seemed
simpler to simply call os.path.join before calling self.vfs.join.
Up until now we compared the "path" and "sharedpath" repository properties to
check if a repository is shared. This was relying an implementation detail of
shared repositories. In order to make it possible to change the way shared
repositories are implemented, we encapsulate this check into its own localrepo
method, called shared.
This new method returns None if the repository is shared, and something else
(for now a string describing the short of share) otherwise.
The reason why I did not call this method "isshared" and made it return a
boolean is that I plan to introduce a new type of shared repository soon.
# NOTES:
This is the first patch in a series whose purpose is to add support for
creating "full repository shares", which are repositories that share everything
with the repository source except their current revision, branch and bookmark.
This series is RFC because I am not very sure of some of the solutions I took.
Comments are welcome!
The pushkey operation used to be in its own wireprotocol command and (in
practice) always be lock free when running the hook. With bundle2, it happen in
a greater scheme and a hook running locking command would get stuck. We now run
such hooks after the lock release as similar hook do.
Bundle2 test are altered to ensure we are lockfree at hook running time.
It turns out that maintaining a reference of any sort (even weak!) to the repo
when constructed doesn't work because we may at some point pass in a repoview
filtered by something other than what the initial repo was.
This marks the first use of abstracting our different types of named objects
(bookmarks, tags, branches, etc.) and upcoming patches will use this to
simplify logic.