Before this patch, if steps below occurs at "the same time in sec",
all of mtime, ctime and size are same between (1) and (3).
1. append data to revlog-style file (and close transaction)
2. discard appended data by truncation of strip
3. append same size but different data to revlog-style file again
Therefore, cache validation doesn't work after (3) as expected.
To avoid such file stat ambiguity around truncation, this patch opens
a file with checkambig=True.
This patch also introduces "with" statement style, to ensure immediate
invocation of close() after truncation, because closing file is the
only trigger to check (and get rid of) file stat ambiguity.
This is a part of ExactCacheValidationPlan.
https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
The partial bundle is not a subset of the full bundle, and the full
bundle is not full in any way that i see. The most obvious
interpretation of "full" I can think of is that it has all commits
back to the null revision, but that is not what the "full" bundle
is. The "full" bundle is simply a backup of what the user asked us to
strip (unless --no-backup). The "partial" bundle contains the
revisions we temporarily stripped because they had higher revision
numbers that some commit that the user asked us to strip.
The "full" bundle is already called "backup" in the code, so let's use
that in user-facing messages too. Let's call the "partial" bundle
"temporary" in the code.
If strip fails when applying the temporary bundle, the commits in the
temporary bundle have not yet been applied, so the user will almost
definitely want to apply the bundle. We should be more clear to the
user about that than our current "partial bundle stored in...".
Note that we will probably not be able to recover it automatically,
since whatever made it fail (e.g. a hook) will most likely make it
fail again. We need to give control back to the user to fix the
problem before trying again.
If strip fails while recovering the temporary bundle (e.g. because a
hook fails), we tell the user only about the backup bundle, not about
the temporary bundle. Since the user did not ask to strip the commits
in the temporary bundle, that's the more important bundle to mention,
so let's do that (and also mention the backup bundle as usual).
We check "if saveheads or savebases" in several places to see if we
should or have created a bundle of the changesets to apply after
truncating the revlogs. One of the conditions is actually just "if
saveheads", but since there can't be savebases without saveheads, that
is effectively the same condition. It seems simpler to check only once
and from then on see if we created the file.
Stripping has only partly worked since f41815302d49 (repair: use cg3
for treemanifests, 2016-01-19): the bundle seems to have been created
correctly, but revlog entries in subdirectory revlogs were not
stripped. This meant that e.g. "hg verify" would fail after stripping
in a tree manifest repo.
To find the revisions to strip, we simply iterate over all directories
in the repo (included in store.datafiles()). This is inefficient for
stripping few commits, but efficient for stripping many commits. To
optimize for stripping few commits, we could instead walk the tree
from the root and find modified subdirectories, just like we do in the
changegroup code. I'm leaving that for another day.
All versions of Python we support or hope to support make the hash
functions available in the same way under the same name, so we may as
well drop the util forwards.
When we remove a changeset from the changelog, the phase cache must be
invalidated, otherwise it could refer to changesets that are no longer in the
repo.
To reproduce the failure, I created an extension querying the phase cache after
the strip transaction is over.
To do that, I stripped two commits with a bookmark on one of them to force
another transaction (we open a transaction for moving bookmarks)
after the strip transaction.
Without the fix in this patch, the test leads to a stacktrace showing the issue:
repair.strip(ui, repo, revs, backup)
File "/Users/lcharignon/facebook-hg-rpms/hg-crew/mercurial/repair.py", line 205, in strip
tr.close()
File "/Users/lcharignon/facebook-hg-rpms/hg-crew/mercurial/transaction.py", line 44, in _active
return func(self, *args, **kwds)
File "/Users/lcharignon/facebook-hg-rpms/hg-crew/mercurial/transaction.py", line 490, in close
self._postclosecallback[cat](self)
File "$TESTTMP/crashstrip2.py", line 4, in test
[repo.changelog.node(r) for r in repo.revs("not public()")]
File "/Users/lcharignon/facebook-hg-rpms/hg-crew/mercurial/changelog.py", line 337, in node
return super(changelog, self).node(rev)
File "/Users/lcharignon/facebook-hg-rpms/hg-crew/mercurial/revlog.py", line 377, in node
return self.index[rev][7]
IndexError: revlog index out of range
The situation was encountered in inhibit (evolve's repo) where we would crash
following the volatile set invalidation submitted by Augie in
cbc52a99d057d11790cf5011e877c6f698bf57bf. Before his patch the issue was masked
as we were not accessing the phasecache after stripping a revision.
This bug uncovered another but in histedit (see explanation in issue5235).
I changed the histedit test accordingly to avoid fixing two things at once.
Since one of the original patches was accepted already and people on the
mailing list still have suggestions as to how this should be improved, I'm
implementing those suggestions in the following patches (this and the ones that
might follow).
writebundle() writes a bundle2 bundle or a plain changegroup1. Imagine
away the "2" in "bundle2.py" for a moment and this change should makes
sense. The bundle wraps the changegroup, so it makes sense that it
knows about it. Another sign that this is correct is that the delayed
import of bundle2 in changegroup goes away.
I'll leave it for another time to remove the "2" in "bundle2.py"
(alternatively, extract a new bundle.py from it).
When I taught debugrebuildfncache about dirlogs in ebe9dacc63ba
(treemanifests: fix streaming clone, 2016-02-04), I added a
last-minute "if 'treemanifest' in repo" guard. That should have been
checking for "... in repo.requirements". Fix that and add tests for
it.
Similar to the previous patch, the .hg/store/meta/ directory does not
get copied when when using "hg clone --uncompressed". Fix by including
"meta/" in store.datafiles(). This seems safe to do, as there are only
a few users of this method. "hg manifest" already filters the paths by
"data/" prefix. The calls from largefiles also seem safe. The use in
verify needs updating to prevent it from mistaking dirlogs for
orphaned filelogs. That change is included in this patch.
Since the dirlogs will now be in the fncache when using fncachestore,
let's also update debugrebuildfncache(). That will also allow any
existing treemanifest repos to get their dirlogs into the fncache.
Also update test-treemanifest.t to use an a directory name that
requires dot-encoding and uppercase-encoding so we test that the path
encoding works.
The newly created helper changegroup.safeversion() knows to pick
version 03 if the repo uses treemanifests, so just using that means we
pick the right changegroup version.
This patch adds a comment making it clear that we should hold a lock before
calling repair.strip. The wording is the same than what we have for
obsolete.createmarkers
Before this patch we were using the deprecated bookmarks.write api. This
patch replaces the call to bookmarks.write by a call to bookmarks.recordchange.
We move the bookmark code above the code removing the undo file because with
bookmarks.recordchange we have to create a transaction that would create an
undo file.
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.
For great justice.
The bundle10 format (plain changegroup-01) does not support general delta and
result into expensive delta re-computation when stripping. If the repository is
general delta, we store backups as bundle20 containing a changegroup-02 payload.
We remove the experimental feature related to strip backup bundle format because
this achieve the same goal in a leaner way. Removing the experimental option is
fine, that is why it experimental in the first place.
Compression of these bundles are coming in later changesets.
The previous code, was calling 'abort' in all exception cases. This was wrong
when an exception was raised by post-close callback on the transaction. Calling
'abort' on an already closed transaction resulted in a error, shadowing the
original error.
We now use the same pattern as everywhere else. 'tr.release()' will abort the
transaction if we escape the scope without closing it. We add a test to make
sure we do not regress.
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".
This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.
This patch was produced by running `2to3 -f except -w -n .`.
Currently, there is no way to recover from a missing or corrupt fncache
file in place (a clone is required). For certain use cases such as
servers and with large repositories, an in-place repair may be
desirable. This patch adds functionality for in-place repair of the
fncache.
The `hg debugrebuildfncache` command is introduced. It ensures the
fncache is up to date by reconstructing the fncache from all seen files
encountered during a brute force traversal of the repository's entire
history.
The command will add missing entries and will prune excess ones.
Currently, the command no-ops unless the repository has the fncache
requirement. The command could later grow the ability to "upgrade" an
existing repository to be fncache enabled, if desired.
When testing this patch on a local clone of the Firefox repository, it
removed a bunch of entries. Investigation revealed that removed entries
belonged to empty (0 byte size) .i filelogs. The functionality for
pruning fncache of stripped revlogs was introduced in 93ba76bfbe8a, so
the presence of these entries likely predates this feature.
All the test change have been isolated and validated. We have free to turn on
bundle2 as the default exchange protocol.
"To reach a port we must set sail –
Sail, not tie at anchor
Sail, not drift."
String concatenation by "+" operator causes failure of extracting
messages to be translated.
Python automatically concatenates strings separated by whitespaces
into one string.
It is finally time to freeze the bundle2 format! To do so we:
- rename HG2Y to HG20,
- drop "b2x:" prefix from all part names,
- rename capability to "bundle2-exp" to "bundle2"
- rename the hook flag from 'bundle2-exp' to 'bundle2'
The node ID used in strip bundle names is currently taken as the last
iterated value in a list comprehension found much earlier in the function.
This change makes the node selection more explicit at the cost of redundancy.
addchangegroup() modifies its behavior based on the transaction source.
This is incorrect for bundle2 repair files, causing rebases to abort when this
option is enabled.
This diff specifies the source type in the way recommended by comments in
bundle2.py and adds a test to ensure that rebases with the experimental option
work successfully.
On IRC, rom1dep reported a traceback[1] from setting
experimental.strip-bundle2-version to True. This diff catches
unexpected values and falls back to the non-experimental bundle1
implementation after issuing a warning.
[1] http://gist.tamytro.org/_admin/gists/qXcdQLwtApgy6e3NwWgl
This adds an experimental option 'strip-bundle2-version' which causes backup
bundles to use bundle2 formatting. Especially for generaldelta repositories,
this should provide significant performance gains for any operation that needs
to write a backup.
The next diff will add support for writing bundle2 files to writebundle, but
the bundle2 generator wants access to a ui object. This changes the signature
and callsites to pass one in.
This change touches every module in which repository.sopener was being used, and
changes it for the equivalent repository.svfs.
It should now be possible to remove localrepo.sopener.
Previously, a backup bundle could overwrite an existing bundle and cause user
data loss. For instance, if you have A<-B<-C and strip B, it produces backup
bundle B-backup.hg. If you then hg pull -r B B-backup.hg and strip it again, it
overwrites the existing B-backup.hg and C is lost.
The fix is to add a hash of all the nodes inside that bundle to the filename.
Fixed up existing tests and added a new test in test-strip.t
cset 21b4faf3787e has removed this option. This commit just tidies the
code that was associated to it. It also fixes the internal calls to
the strip() function.
Before this change, any function that thought it would want as a final
safety to keep a partial backup bundle (bundling changes not linearly
related to the current change being stripped), had to explicitly pass
a backup="strip" option. With this change, these backups are always
kept in case of an exception and always removed if there is no
exception. Only full backups can be specified with backup=True or no
full backups with backup=False.
This patch makes "repair.strip()" treat bundle files via vfs.
This patch also avoids applying "vfs.join()" on the value returned by
"changegroup.writebundle()", to get relative path from "_bundle()".
This patch makes paths below in "_bundle()" relative to ".hg":
- backup directory ("strip-backup"), and
- bundle file under backup directory
"vfs" is passed to "changegroup.writebundle()" to use relative path
directly.
This patch applies "vfs.join()" on the value returned by "_bundle()",
because the caller expect it to return absolute path.
This will be changed by succeeding patch changing the caller side.
Before this patch, "localrepository.undofiles()" returns list of
absolute filename of undo files.
This patch makes it return list of tuples "(vfs, relative filename)"
to access undo files via vfs.
This patch also changes "repair.strip()", which is the only user of
"localrepository.undofiles()".
This is a gratuitous code move aimed at reducing the localrepo bloatness.
The method had few callers, not enough to be kept in local repo.
The peer API remains unchanged.
Previously the fncache was cleaned up at read time by noticing when it was out
of sync. This caused writes to happen outside the scope of transactions and
could have caused race conditions. With this change, we'll keep the fncache
up-to-date as we go by removing old entries during repair.strip.
The previous revlog strip computation would walk every rev in the revlog, from
the bottom to the top. Since we're usually stripping only the top few revs of
the revlog, this was needlessly expensive on large repos.
The new algorithm walks the exact number of revs that will be stripped, thus
making the operation not dependent on the number of revs in the repo.
This makes amend on a large repo go from 8.7 seconds to 6 seconds.
Writes the backup bundle paths to the blackbox so it's easy to see which
backup bundle is associated with which command when you are debugging an
issue.
Example output:
2013/03/13 10:39:56 durham> strip tip
2013/03/13 10:39:59 durham> saved backup bundle to /data/users/durham/www-hg/.hg/strip-backup/e5fac262363a-backup.hg
2013/03/13 10:40:03 durham> strip tip exited 0 after 7.97 seconds
The strip code used a trick to lower the cost of branchcache update after a
strip. However is less necessary since we have branchcache collaboration.
Invalid branchcache are likely to be cheaply rebuilt again a near subset of the
repo.
Moreover, this trick would need update to be relevant in the now filtered
repository world. It currently update the unfiltered branchcache that few people
cares about. Make it smarter on that aspect would need complexes update of the
calling logic
So this mechanism is:
- Arguably needed,
- Currently irrelevant,
- Hard to update
and I'm dropping it.
We now update the branchcache in all case by courtesy of the read only reader.
This changeset have a few expected impact on the testsuite are different cache
are updated.
The current query to get the new bookmark target for stripped revisions
involves multiple walks up the DAG, and is really expensive, taking over 2.5
seconds on a repository with over 400,000 changesets even if just one
changeset is being stripped.
A slightly simplified version of the current query is
max(heads(::<tostrip> - <tostrip>))
We make two observations here.
1. For any set s, max(heads(s)) == max(s). That is because revision numbers
define a topological order, so that the element with the highest revision
number in s will not have any children in s.
2. For any set s, max(::s - s) == max(parents(s) - s). In other words, the
ancestor of s with the highest revision number not in s is a parent of one
of the revs in s. Why? Because if it were an ancestor but not a parent of s,
it would have a descendant that would be a parent of s. This descendant
would have a higher revision number, leading to a contradiction.
Combining these two observations, we rewrite the revset query as
max(parents(<tostrip>) - <tostrip>)
The time complexity is now linear in the number of changesets being stripped.
For the above repository, the query now takes 0.1 seconds when one changeset
is stripped. This speeds up operations that use repair.strip, like the rebase
and strip commands.
Strip is a "write" operation that needs to be aware of the whole repo's
content before destroying changesets.
Only the low level function is altered. The top level command will still
process its argument filtered (if any filtering is in place).
Bookmarks persistence still showed a fair amount of its legacy as a
monkeypatching extension. This encapsulates all bookmarks
serialization and parsing in a single class, and offers a single
location where other bookmarks storage engines can be substituted
in. As a result, many files no longer import the bookmarks module,
which strikes me as an encapsulation win.
This doesn't do anything to the current bookmark state yet, but I'm
hoping put that in the bmstore class as well.
The `repair` code builds a giant revset query instead of using the "%lr" idiom.
It is inefficient and crash when the number of stripped changeset is too big.
This changeset replaces the bad code by a better revset usage.
If you've got this graph:
0-1-2
\
3
and 3 is checked out, 2 is bookmarked with "broken", and you do "hg
strip 2", the bookmark will move to 3, not 1. That's always struck me
as a bug.
This change makes bookmarks move to the tipmost ancestor of
the stripped set rather than the currently-checked-out revision, which
is what I always expected should happen.
This function augments strip to incrementally update the branchheads cache
rather than recompute it from scratch. This speeds up the performance of strip
and rebase on repos with long history. The performance optimization only
happens if the revisions stripped are all on the same branch and the parents of
the stripped revisions are also on that same branch.
This adds a few test cases, particularly one that reproduces the extra heads
that mpm observed.
Destroying history via strip used to invalidate the branchheads cache,
causing it to be regenerated the next time it is read. This is
expensive in large repos. This change converts strip to pass info to
localrepo.destroyed() to enable to it to incrementally update the
cache, improving the performance of strip and other operations that
depend on it (e.g., rebase).
This change also strengthens a bit the integrity checking of the
branchheads cache when it is read, by rejecting the cache if it has
nodes in it that no longer exist.
Cosmetic cleanups. Fix comment typo referring to the notion of multiple tips.
Make variable describing a generator end in 'gen'.
Fix another var containing a node not to end with 'rev'.
Calling strip() will eventually trigger localrepo.destroyed() which will
invalidate _parseroots. It will call filterunknown() upon reload.
Changes to test-keyword.t are related to commit --debug running after
either qpop or rollback.
Originally, mq.strip called repair.strip a single rev at a time.
repair.strip stores in a backup bundle any revision greater than
the revision being stripped, strips, then restores the backup with
repo.addchangegroup. So, when stripping revisions on more than one
topological branch, some could end up being restored from the backup
bundle, only to be later removed by a subsequent repair.strip call.
But repo.addchangegroup calls hooks for all those restore operations.
And 1671d21e8e41 changed it to delay all hook calls until the
repository lock were released - by mq.strip, after stripping all
revisions. Thus, the hooks could be called over revisions already
removed from the repository at that point.
By generating the revision lists at once inside repo.strip, we avoid
calling addchangegroup for temporary restores. Incidentally, this
also avoids creating many backup files for a single strip command.
Instead of computing the exact set of missing revlog revisions, we only
compute the set of missing/broken changesets. The resulting bundle can be
slightly bigger but we will be able to get rid of the ugly extranodes handling
in changegroupsubset.
A partial bundle is created to temporarily save revisions > rev but
not descending from the node to strip, to be able to restore the
changesets after stripping the changelog.
Since this bundle is not kept after the strip operation, and is not
user-visible, it is not necessary and should be faster to avoid
compression.
All callers to localrepo.transaction() must supply a transaction description.
The description and the existing repository tip are then stored
(transactionally) into .hg/undo.desc; where rollback can later find it.