It is finally time to freeze the bundle2 format! To do so we:
- rename HG2Y to HG20,
- drop "b2x:" prefix from all part names,
- rename capability to "bundle2-exp" to "bundle2"
- rename the hook flag from 'bundle2-exp' to 'bundle2'
The node ID used in strip bundle names is currently taken as the last
iterated value in a list comprehension found much earlier in the function.
This change makes the node selection more explicit at the cost of redundancy.
addchangegroup() modifies its behavior based on the transaction source.
This is incorrect for bundle2 repair files, causing rebases to abort when this
option is enabled.
This diff specifies the source type in the way recommended by comments in
bundle2.py and adds a test to ensure that rebases with the experimental option
work successfully.
On IRC, rom1dep reported a traceback[1] from setting
experimental.strip-bundle2-version to True. This diff catches
unexpected values and falls back to the non-experimental bundle1
implementation after issuing a warning.
[1] http://gist.tamytro.org/_admin/gists/qXcdQLwtApgy6e3NwWgl
This adds an experimental option 'strip-bundle2-version' which causes backup
bundles to use bundle2 formatting. Especially for generaldelta repositories,
this should provide significant performance gains for any operation that needs
to write a backup.
The next diff will add support for writing bundle2 files to writebundle, but
the bundle2 generator wants access to a ui object. This changes the signature
and callsites to pass one in.
This change touches every module in which repository.sopener was being used, and
changes it for the equivalent repository.svfs.
It should now be possible to remove localrepo.sopener.
Previously, a backup bundle could overwrite an existing bundle and cause user
data loss. For instance, if you have A<-B<-C and strip B, it produces backup
bundle B-backup.hg. If you then hg pull -r B B-backup.hg and strip it again, it
overwrites the existing B-backup.hg and C is lost.
The fix is to add a hash of all the nodes inside that bundle to the filename.
Fixed up existing tests and added a new test in test-strip.t
cset 21b4faf3787e has removed this option. This commit just tidies the
code that was associated to it. It also fixes the internal calls to
the strip() function.
Before this change, any function that thought it would want as a final
safety to keep a partial backup bundle (bundling changes not linearly
related to the current change being stripped), had to explicitly pass
a backup="strip" option. With this change, these backups are always
kept in case of an exception and always removed if there is no
exception. Only full backups can be specified with backup=True or no
full backups with backup=False.
This patch makes "repair.strip()" treat bundle files via vfs.
This patch also avoids applying "vfs.join()" on the value returned by
"changegroup.writebundle()", to get relative path from "_bundle()".
This patch makes paths below in "_bundle()" relative to ".hg":
- backup directory ("strip-backup"), and
- bundle file under backup directory
"vfs" is passed to "changegroup.writebundle()" to use relative path
directly.
This patch applies "vfs.join()" on the value returned by "_bundle()",
because the caller expect it to return absolute path.
This will be changed by succeeding patch changing the caller side.
Before this patch, "localrepository.undofiles()" returns list of
absolute filename of undo files.
This patch makes it return list of tuples "(vfs, relative filename)"
to access undo files via vfs.
This patch also changes "repair.strip()", which is the only user of
"localrepository.undofiles()".
This is a gratuitous code move aimed at reducing the localrepo bloatness.
The method had few callers, not enough to be kept in local repo.
The peer API remains unchanged.
Previously the fncache was cleaned up at read time by noticing when it was out
of sync. This caused writes to happen outside the scope of transactions and
could have caused race conditions. With this change, we'll keep the fncache
up-to-date as we go by removing old entries during repair.strip.
The previous revlog strip computation would walk every rev in the revlog, from
the bottom to the top. Since we're usually stripping only the top few revs of
the revlog, this was needlessly expensive on large repos.
The new algorithm walks the exact number of revs that will be stripped, thus
making the operation not dependent on the number of revs in the repo.
This makes amend on a large repo go from 8.7 seconds to 6 seconds.
Writes the backup bundle paths to the blackbox so it's easy to see which
backup bundle is associated with which command when you are debugging an
issue.
Example output:
2013/03/13 10:39:56 durham> strip tip
2013/03/13 10:39:59 durham> saved backup bundle to /data/users/durham/www-hg/.hg/strip-backup/e5fac262363a-backup.hg
2013/03/13 10:40:03 durham> strip tip exited 0 after 7.97 seconds
The strip code used a trick to lower the cost of branchcache update after a
strip. However is less necessary since we have branchcache collaboration.
Invalid branchcache are likely to be cheaply rebuilt again a near subset of the
repo.
Moreover, this trick would need update to be relevant in the now filtered
repository world. It currently update the unfiltered branchcache that few people
cares about. Make it smarter on that aspect would need complexes update of the
calling logic
So this mechanism is:
- Arguably needed,
- Currently irrelevant,
- Hard to update
and I'm dropping it.
We now update the branchcache in all case by courtesy of the read only reader.
This changeset have a few expected impact on the testsuite are different cache
are updated.
The current query to get the new bookmark target for stripped revisions
involves multiple walks up the DAG, and is really expensive, taking over 2.5
seconds on a repository with over 400,000 changesets even if just one
changeset is being stripped.
A slightly simplified version of the current query is
max(heads(::<tostrip> - <tostrip>))
We make two observations here.
1. For any set s, max(heads(s)) == max(s). That is because revision numbers
define a topological order, so that the element with the highest revision
number in s will not have any children in s.
2. For any set s, max(::s - s) == max(parents(s) - s). In other words, the
ancestor of s with the highest revision number not in s is a parent of one
of the revs in s. Why? Because if it were an ancestor but not a parent of s,
it would have a descendant that would be a parent of s. This descendant
would have a higher revision number, leading to a contradiction.
Combining these two observations, we rewrite the revset query as
max(parents(<tostrip>) - <tostrip>)
The time complexity is now linear in the number of changesets being stripped.
For the above repository, the query now takes 0.1 seconds when one changeset
is stripped. This speeds up operations that use repair.strip, like the rebase
and strip commands.
Strip is a "write" operation that needs to be aware of the whole repo's
content before destroying changesets.
Only the low level function is altered. The top level command will still
process its argument filtered (if any filtering is in place).
Bookmarks persistence still showed a fair amount of its legacy as a
monkeypatching extension. This encapsulates all bookmarks
serialization and parsing in a single class, and offers a single
location where other bookmarks storage engines can be substituted
in. As a result, many files no longer import the bookmarks module,
which strikes me as an encapsulation win.
This doesn't do anything to the current bookmark state yet, but I'm
hoping put that in the bmstore class as well.
The `repair` code builds a giant revset query instead of using the "%lr" idiom.
It is inefficient and crash when the number of stripped changeset is too big.
This changeset replaces the bad code by a better revset usage.
If you've got this graph:
0-1-2
\
3
and 3 is checked out, 2 is bookmarked with "broken", and you do "hg
strip 2", the bookmark will move to 3, not 1. That's always struck me
as a bug.
This change makes bookmarks move to the tipmost ancestor of
the stripped set rather than the currently-checked-out revision, which
is what I always expected should happen.
This function augments strip to incrementally update the branchheads cache
rather than recompute it from scratch. This speeds up the performance of strip
and rebase on repos with long history. The performance optimization only
happens if the revisions stripped are all on the same branch and the parents of
the stripped revisions are also on that same branch.
This adds a few test cases, particularly one that reproduces the extra heads
that mpm observed.
Destroying history via strip used to invalidate the branchheads cache,
causing it to be regenerated the next time it is read. This is
expensive in large repos. This change converts strip to pass info to
localrepo.destroyed() to enable to it to incrementally update the
cache, improving the performance of strip and other operations that
depend on it (e.g., rebase).
This change also strengthens a bit the integrity checking of the
branchheads cache when it is read, by rejecting the cache if it has
nodes in it that no longer exist.
Cosmetic cleanups. Fix comment typo referring to the notion of multiple tips.
Make variable describing a generator end in 'gen'.
Fix another var containing a node not to end with 'rev'.
Calling strip() will eventually trigger localrepo.destroyed() which will
invalidate _parseroots. It will call filterunknown() upon reload.
Changes to test-keyword.t are related to commit --debug running after
either qpop or rollback.
Originally, mq.strip called repair.strip a single rev at a time.
repair.strip stores in a backup bundle any revision greater than
the revision being stripped, strips, then restores the backup with
repo.addchangegroup. So, when stripping revisions on more than one
topological branch, some could end up being restored from the backup
bundle, only to be later removed by a subsequent repair.strip call.
But repo.addchangegroup calls hooks for all those restore operations.
And 1671d21e8e41 changed it to delay all hook calls until the
repository lock were released - by mq.strip, after stripping all
revisions. Thus, the hooks could be called over revisions already
removed from the repository at that point.
By generating the revision lists at once inside repo.strip, we avoid
calling addchangegroup for temporary restores. Incidentally, this
also avoids creating many backup files for a single strip command.