Commit Graph

173 Commits

Author SHA1 Message Date
Kevin Bullock
f0dbcd6ad0 merge with stable 2017-09-18 14:12:20 -05:00
Martin von Zweigbergk
01e8a04410 repair: preserve phase also when not using generaldelta (issue5678)
It seems like we used to pick the oldest possible version of the
changegroup to use for bundles created by the repair module (used
e.g. by "hg strip" and for temporary bundles by "hg rebase"). I tried
to preserve that behavior when I created the changegroup.safeversion()
method in 77f74106b264 (changegroup: introduce safeversion(),
2016-01-19).

However, we have recently chagned our minds and decided that these
commands are only used locally and downgrades are unlikely. That
decicion allowed us to start adding obsmarker and phase information to
these bundles. However, as the bug report shows, it means we get
different behavior e.g. when generaldelta is not enabled (because when
it was enabled, it forced us to use bundle2). The commit that actually
caused the reported bug was 26d535788092 (strip: include phases in
bundle (BC), 2017-06-15).

So, since we now depend on having more information in the bundles,
let's make sure we instead pick the newest possible changegroup
version.

Differential Revision: https://phab.mercurial-scm.org/D715
2017-09-14 11:16:57 -07:00
Augie Fackler
4f3dc48acb repair: reliably obtain bytestr of node ids 2017-08-22 21:22:34 -04:00
Boris Feld
31cf043735 bookmark: use 'applychanges' in 'repair.strip' 2017-07-10 17:46:47 +02:00
Durham Goode
9c6e46253e repair: move manifest strip to a separate function
This moves manifest stripping to a separate function so implementations of the
manifest that don't support stripping can replace this function with a no-op.

I considered adding a strip api to the manifestlog, so other implementations
could make it a no-op there, but it seems like strip might be unique to the
revlog implementation, and therefore shouldn't be present on the generic api.

Differential Revision: https://phab.mercurial-scm.org/D292
2017-08-08 17:25:38 -07:00
Durham Goode
b2040ce2e5 repair: refactor broken linkrev collection
This refactors broken linkrev collection such that manifest collection is in a
separate function. This allows extensions to replace the manifest collection
with a non-revlog oriented version.

I considered moving the collect changes function onto the manifestlog itself, so
it would be behind the abstraction, but since the store we're building doesn't
even have the concept of strip, embeding that concept in the manifestlog api
seemed odd.

Differential Revision: https://phab.mercurial-scm.org/D291
2017-08-08 17:25:38 -07:00
Martin von Zweigbergk
8a965b4d0f strip: don't allow empty changegroup in bundle1
Applying an empty changegroup has been an error since the
beginning. The only exception was strip, which would allow to apply an
empty changegroup from the temporary bundle. However, the emptyok=True
option was only set for bundle1 bundles. In other words, temporary
bundle2 bundles would fail if they were empty.

Bundle2 has now been used enough that it seems safe to say that we
simply don't create bundle2 bundles with empty changegroups. That also
suggests that we never create bundle1 bundles with empty changegroups
(i.e. empty bundle1 bundles, since bundle1 is just a changegroup),
because, AFAICT, the code leading up to the application of the bundle
is the same for bundle1 and bundle2.

Therefore, let's stop passing emptyok=True, so we more clearly get the
same behavior for bundle1 and bundle2.
2017-06-30 23:58:31 -07:00
Pierre-Yves David
a14bfceb42 config: register the 'devel.strip-obsmarkers' config
The single explicit default that existed so far is converted to registered
config value.
2017-06-28 13:32:36 +02:00
Pierre-Yves David
8b519349e8 obsutil: move 'exclusivemarkers' to the new modules
We have a new 'obsutil' module now. We move the high level utility there to
bring 'obsolete.py' back to a more reasonable size.
2017-06-27 01:11:56 +02:00
Jun Wu
d15f755cab strip: respect the backup option in stripcallback
The backup option was mistakenly ignored. It should be respected.

Thanks Martin von Zweigbergk for finding this out!
2017-06-26 21:11:02 -07:00
Jun Wu
e38073e90f strip: add a delayedstrip method that works in a transaction
For long, the fact that strip does not work inside a transaction and some
code has to work with both obsstore and fallback to strip lead to duplicated
code like:

      with repo.transaction():
          ....
          if obsstore:
              obsstore.createmarkers(...)
      if not obsstore:
          repair.strip(...)

Things get more complex when you want to call something which may call strip
under the hood. Like you cannot simply write:

      with repo.transaction():
          ....
          rebasemod.rebase(...) # may call "strip", so this doesn't work

But you do want rebase to run inside a same transaction if possible, so the
code may look like:

      with repo.transaction():
          ....
          if obsstore:
              rebasemod.rebase(...)
              obsstore.createmarkers(...)
      if not obsstore:
          rebasemod.rebase(...)
          repair.strip(...)

That's ugly and error-prone. Ideally it's possible to just write:

      with repo.transaction():
          rebasemod.rebase(...)
          saferemovenodes(...)

This patch is the first step towards that. It adds a "delayedstrip" method
to repair.py which maintains a postclose callback in the transaction object.
2017-06-25 10:38:45 -07:00
Martin von Zweigbergk
2625447afc bundle: make applybundle() delegate v1 bundles to applybundle1() 2017-06-22 15:00:19 -07:00
Martin von Zweigbergk
00dfaaff56 bundle: transpose transaction scope with bundle type switch
This moves the transaction with-statements outside of the
per-bundle-version switches, so the next patch will be a little
simpler.
2017-06-22 21:27:57 -07:00
Martin von Zweigbergk
eae1a1d9e5 bundle: add a applybundle1() method
This is one step towards removing a bunch of "if isinstance(gen,
unbundle20)" by treating bundle1 and bundle2 more similarly.

The name may sounds ironic for a method in the bundle2 module, but I
didn't think it was worth it yet to create a new 'bundle' module that
depends on the 'bundle2' module. Besides, we'll inline the method
again later.
2017-06-16 10:25:11 -07:00
Martin von Zweigbergk
1f6af49532 strip: include phases in bundle (BC)
Before this patch, unbundling a stripped changeset would make it a
draft (unless the parent was secret). This meant that one would lose
phase information when stripping and unbundling secret changesets. The
same thing was true for public changesets. While stripping public
changesets is generally rare, it's done frequently by e.g. the
narrowhg extension.

We also include the phases in the temporary bundle, just in case
stripping were to fail after that point, so the user can still restore
the repo including phase information. Before this patch, the phases
were left untouched during the bundling and unbundling of the
temporary bundle. Only at the end of the transaction would
phasecache.filterunknown() be called to remove phase roots that were
no longer valid. We now need to call that also after the first
stripping, i.e. before applying the temporary bundle. Otherwise
unbundling the temporary bundle will cause a read of the phase cache
which has stripped changesets in the cache and that fails.

Like with obsmarkers, we unconditionally include the phases in the
bundle when stripping (when using bundle2, such as when generaldelta
is enabled). The reason for doing that for strip but not for bundle is
that strip bundles are not meant to be shared outside the repo, so we
don't care as much about compatibility.
2017-06-15 00:15:52 -07:00
Martin von Zweigbergk
560e5ce4f1 changegroup: let callers pass in transaction to apply() (API)
I think passing in the transaction makes it a little clearer and more
consistent with bundle2.
2017-06-15 22:46:38 -07:00
Martin von Zweigbergk
6700618bdb repair: create transaction for bundle1 unbundling earlier
See earlier patch for motivation.
2017-06-15 23:09:14 -07:00
Martin von Zweigbergk
28a143f25f repair: remove unnecessary locking for bookmarks
The caller has already locked the repo.
2017-06-19 11:24:49 -07:00
Martin von Zweigbergk
d8b160077a repair: move check for existing transaction earlier
Several benefits:

 * Gets close the comment describing it
 * Splits off unrelated comment about "backup" argument
 * Error checking is customarily done early
 * If we added an early return to the method, it would still
   consistently fail if there was an existing transaction (so
   we would find and fix that case quickly)

One test needs updating with for this change, because we no longer
create the backup bundle before we fail. I don't see much reason to
create that backup bundle. If some command was adding content and then
trying to strip it as well within the transaction, we would have a
backup for the user, but the risk of that not being discovered in
development seems very small.
2017-06-19 13:18:00 -07:00
Martin von Zweigbergk
64eed0f13b strip: remove unncessary "del" and inline variable 2017-06-19 13:13:28 -07:00
Martin von Zweigbergk
e070466004 repair: clarify in comment that caller must take lock, but not transaction
I have checked that all callers have already taken the lock (and if
they hadn't, we should have seen tests fail thanks to the 'transaction
requires locking' devel warning in localrepo.transaction()).
2017-06-19 11:24:21 -07:00
Martin von Zweigbergk
7059c57ed3 strip: remove a redundant setting of hookargs
bundle2.applybundle() will set both 'source' and 'url'.
2017-06-16 10:13:44 -07:00
Pierre-Yves David
8d733e89bc strip: strip obsmarkers exclusive to the stripped changeset
This is it, `hg strip --rev X` will now also remove obsolescence markers
exclusive to X. Since a previous changeset, the obsmarkers has been backed up
in the strip backup bundle, so it is possible to restore them.

Note: stripping obsmarkers means the precursors of the stripped changeset might no
longer be obsolete after the strip.

Stripping changeset without obsmarkers can be useful when building test case. So
It is possible to disable the stripping of obsmarkers using the
'devel.strip-obsmarkers' config option.

Test change have been carefully validated.
2017-05-20 16:19:59 +02:00
Pierre-Yves David
9461f1ce49 strip: do not include obsolescence markers for the temporary bundle
When stripping, we need to put all non-stripped revisions "above" the stripped
ones in a "temporary-bundle" while we strip the targets revision. Then we
reapply that bundle to restore these non-stripped revisions (with a new revision
numbers). We skip the inclusion of obsolescence markers in that bundle. This is
safe since all obsmarkers we plan to strip will be backed-up in the strip backup
bundle. Including the markers would create issue in some case were we try to
strip a prune markers that is "relevant" to a revision in the
"temporary-bundle".

(note: we do not strip obsmarkers yet)
2017-06-01 12:08:49 +02:00
Pierre-Yves David
e1977b120c strip: also backup obsmarkers
We are about to give 'strip' the ability to remove obsmarkers. Before we start
removing data we must make sure it is preserved somewhere. So the backup bundle
created by 'strip' now contains obsmarkers.
2017-05-20 15:06:10 +02:00
Pierre-Yves David
11f26a2c51 strip: use the 'writenewbundle' function to get bundle on disk
This will ensure the backup bundle use the best available logic (eg: includes
relevant caches so that we loose less of them on strip.)
2017-05-05 18:15:42 +02:00
Durham Goode
8712c20680 hg: backout optimizing for treemanifests
It turns out that the files list is not sufficient to identify with revlogs have
changed. In a merge commit, no files could've changed but directories would
have. For now let's just backout this optimization.
2017-05-15 18:55:58 -07:00
Durham Goode
fd9ba7b071 strip: make tree stripping O(changes) instead of O(repo)
The old tree stripping logic iterated over every tree revlog in the repo looking
for commits that had revs to be stripped. That's very inefficient in large
repos. Instead, let's look at what files are touched by the strip and only
inspect those revlogs.

I don't have actual perf numbers, since internally we don't use a true
treemanifest, but simply iterating over hundreds of thousands of revlogs takes
many, many seconds, so this should help tremendously when stripping only a few
commits.
2017-05-08 11:35:23 -07:00
Durham Goode
7f29b67f6b strip: move tree strip logic to it's own function
This will allow external extensions to modify tree strip behavior more
precisely.
2017-05-08 11:35:23 -07:00
Pierre-Yves David
64e5cd2f7e upgrade: extract code in its own module
Given about 2/3 or 'mercurial.repair' is now about repository upgrade, I think
it is fair to move it into its own module.

An expected benefit is the ability to drop the 'upgrade' prefix of many
functions. This will be done in coming changesets.
2017-04-07 18:53:17 +02:00
Jun Wu
809dfffea4 repair: use ProgrammingError 2017-03-26 16:53:28 -07:00
Matt Harbison
721da7fc07 repair: use context manager for lock management
If repo.lock() raised inside of the try block, 'tr' would have been None in the
finally block where it tries to release().  Modernize the syntax instead of just
winching the lock out of the try block.

I found several other instances of acquiring the lock inside of the 'try', but
those finally blocks handle None references.  I also started switching some
trivial try/finally blocks to context managers, but didn't get them all because
indenting over 3x for lock, wlock and transaction would have spilled over 80
characters.  That got me wondering if there should be a repo.rwlock(), to handle
locking and unlocking in the proper order.

It also looks like py27 supports supports multiple context managers for a single
'with' statement.  Should I hold off on the rest until py26 is dropped?
2017-03-23 23:47:23 -04:00
Pierre-Yves David
197ab7aeb0 repair: directly use repo.vfs.join
The 'repo.join' method is about to be deprecated.
2017-03-08 16:53:39 -08:00
Pierre-Yves David
37925b72f7 vfs: use 'vfs' module directly in 'mercurial.repair'
Now that the 'vfs' classes moved in their own module, lets use the new module
directly. We update code iteratively to help with possible bisect needs in the
future.
2017-03-02 13:29:43 +01:00
Simon Farnsworth
e0b70e4f7f mercurial: switch to util.timer for all interval timings
util.timer is now the best available interval timer, at the expense of not
having a known epoch. Let's use it whenever the epoch is irrelevant.
2017-02-15 13:17:39 -08:00
Gregory Szorc
abe1c0e17e repair: clean up stale lock file from store backup
Since we did a directory rename on the stores, the source
repository's lock path now references the dest repository's
lock path and the dest repository's lock path now references
a non-existent filename.

So releasing the lock on the source will unlock the dest and
releasing the lock on the dest will no-op because it fails due
to file not found. So we clean up the dest's lock manually.
2016-11-24 18:45:29 -08:00
Gregory Szorc
a400e3d753 repair: copy non-revlog store files during upgrade
The store contains more than just revlogs. This patch teaches the
upgrade code to copy regular files as well.

As the test changes demonstrate, the phaseroots file is now copied.
2016-11-24 18:34:50 -08:00
Gregory Szorc
93504084a0 repair: migrate revlogs during upgrade
Our next step for in-place upgrade is to migrate store data. Revlogs
are the biggest source of data within the store and a store is useless
without them, so we implement their migration first.

Our strategy for migrating revlogs is to walk the store and call
`revlog.clone()` on each revlog. There are some minor complications.

Because revlogs have different storage options (e.g. changelog has
generaldelta and delta chains disabled), we need to obtain the
correct class of revlog so inserted data is encoded properly for its
type.

Various attempts at implementing progress indicators that didn't lead
to frustration from false "it's almost done" indicators were made.

I initially used a single progress bar based on number of revlogs.
However, this quickly churned through all filelogs, got to 99% then
effectively froze at 99.99% when it got to the manifest.

So I converted the progress bar to total revision count. This was a
little bit better. But the manifest was still significantly slower
than filelogs and it took forever to process the last few percent.

I then tried both revision/chunk bytes and raw bytes as the
denominator. This had the opposite effect: because so much data is in
manifests, it would churn through filelogs without showing much
progress. When it got to manifests, it would fill in 90+% of the
progress bar.

I finally gave up having a unified progress bar and instead implemented
3 progress bars: 1 for filelog revisions, 1 for manifest revisions, and
1 for changelog revisions. I added extra messages indicating the total
number of revisions of each so users know there are more progress bars
coming.

I also added extra messages before and after each stage to give extra
details about what is happening. Strictly speaking, this isn't
necessary. But the numbers are impressive. For example, when converting
a non-generaldelta mozilla-central repository, the messages you see are:

   migrating 2475593 total revisions (1833043 in filelogs, 321156 in manifests, 321394 in changelog)
   migrating 1.67 GB in store; 2508 GB tracked data
   migrating 267868 filelogs containing 1833043 revisions (1.09 GB in store; 57.3 GB tracked data)
   finished migrating 1833043 filelog revisions across 267868 filelogs; change in size: -415776 bytes
   migrating 1 manifests containing 321156 revisions (518 MB in store; 2451 GB tracked data)

That "2508 GB" figure really blew me away. I had no clue that the raw
tracked data in mozilla-central was that large. Granted, 2451 GB is in
the manifest and "only" 57.3 GB is in filelogs. But still.

It's worth noting that gratuitous loading of source revlogs in order
to display numbers and progress bars does serve a purpose: it ensures
we can open all source revlogs. We don't want to spend several minutes
copying revlogs only to encounter a permissions error or similar later.

As part of this commit, we also add swapping of the store directory
to the upgrade function. After revlogs are converted, we move the
old store into the backup directory then move the temporary repo's
store into the old store's location. On well-behaved systems, this
should be 2 atomic operations and the window of inconsistency show be
very narrow.

There are still a few improvements to be made to store copying and
upgrading. But this commit gets the bulk of the work out of the way.
2016-12-18 17:00:15 -08:00
Gregory Szorc
b9b6954ea9 repair: begin implementation of in-place upgrading
Now that all the upgrade planning work is in place, we can start
doing the real work: actually upgrading a repository.

The main goal of this commit is to get the "framework" for running
in-place upgrade actions in place.

Rather than get too clever and low-level with regards to in-place
upgrades, our strategy is to create a new, temporary repository,
copy data to it, then replace the old data with the new. This allows
us to reuse a lot of code in localrepo.py around store interaction,
which will eventually consume the bulk of the upgrade code.

But we have to start small. This patch implements adding new
repository requirements. But it still sets up a temporary
repository and locks it and the source repo before performing the
requirements file swap. This means all the plumbing is in place
to implement store copying in subsequent commits.
2016-12-18 16:59:04 -08:00
Gregory Szorc
a3569d4b71 repair: determine what upgrade will do
This commit introduces code for determining what actions/improvements
an upgrade should perform.

The "upgradefindimprovements" function introduces a mechanism to
return a list of improvements that can be made to a repository.
Each improvement is effectively an action that an upgrade will
perform. Associated with each of these improvements is metadata
that will be used to inform users what's wrong and what an
upgrade will do.

Each "improvement" is categorized as a "deficiency" or an
"optimization." TBH, I'm not thrilled about the terminology and
am receptive to constructive bikeshedding. The main difference
between a "deficiency" and an "optimization" is a deficiency
is always corrected (if it deviates from the current config) and
an "optimization" is an optional action that goes above and beyond
to improve the state of the repository (usually by requiring more
CPU during upgrade).

Our initial set of improvements identifies missing repository
requirements, a single, easily correctable problem with
changelog storage, and a set of "optimizations" related to delta
recalculation.

The main "upgraderepo" function has been expanded to handle
improvements. It queries for the list of improvements and determines
which of them will run based on the current repository state and user

I went through numerous iterations of the output format before
settling on a ReST-inspired definition list format. (I used
bulleted lists in the first submission of this commit and could
not get it to format just right.) Even with the various iterations,
I'm still not super thrilled with the format. But, this is a debug*
command, so that should mean we can refine the output without BC
concerns.
2016-12-18 16:51:09 -08:00
Gregory Szorc
f42e2dcaac repair: implement requirements checking for upgrades
This commit introduces functionality for upgrading a repository in
place. The first part that's implemented is testing for upgrade
"compatibility." This is done by examining repository requirements.

There are 5 functions returning sets of requirements that control
upgrading. Why so many functions? Mainly to support extensions.
Functions are easier to monkeypatch than module variables.

Astute readers will see that we don't support "manifestv2" and
"treemanifest" requirements in the upgrade mechanism. I don't have
a great answer for why other than this is a complex set of patches
and I don't want to deal with the complexity of these experimental
features just yet. We can teach the upgrade mechanism about them
later, once the basic upgrade mechanism is in place.

This commit also introduces the "upgraderepo" function. This will be
our main routine for performing an in-place upgrade. Currently, it
just implements requirements checking. The structure of some code in
this function may look a bit weird (e.g. the inline function that is
only called once). But this will make sense after future commits.
2016-12-18 16:16:54 -08:00
Martin von Zweigbergk
e1f0ba8ef9 repair: combine two loops over changelog revisions
This just saves a few lines.
2017-01-04 10:35:04 -08:00
Martin von Zweigbergk
92d0334538 repair: speed up stripping of many roots
repair.strip() expects a set of root revisions to strip. It then
builds the full set of descedants by walking the descandants of
each. It is rare that more than a few roots get passed in, but if that
happens, it will wastefully walk the changelog for each root. So let's
just walk it once.

I noticed this because the narrowhg extension was passing not only
roots, but all the commits to strip. When there were tens of thousands
of commits to strip, this resulted in quadratic behavior with that
extension.
2017-01-04 10:07:12 -08:00
Durham Goode
52b8095f37 manifest: remove last uses of repo.manifest
Now that all the functionality has been moved to manifestlog/manifestrevlog/etc,
we can finally change all the uses of repo.manifest to use the new versions. A
future diff will then delete repo.manifest.

One additional change in this commit is to change repo.manifestlog to be a
@storecache property instead of @property. This is required by some uses of
repo.manifest require that it be settable (contrib/perf.py and the static http
server). We can't do this in a prior change because we can't use @storecache on
this until repo.manifest is no longer used anywhere.
2016-11-10 02:13:19 -08:00
Durham Goode
f980c11277 manifest: delete unused dirlog and _newmanifest functions
As part of migrating all manifest functionality out of manifest.manifest, let's
migrate a couple spots off of manifest.dirlog() to use the revlog specific
accessor. Then we can delete manifest.dirlog() and other unused functions.
2016-11-10 02:13:19 -08:00
Martin von Zweigbergk
422165fd86 repair: make strip() return backup file path
narrowhg wants to strip some commits and then re-apply them after
applying another bundle. Having repair.strip() return the bundle path
will be helpful for it.
2016-10-31 15:40:30 -07:00
FUJIWARA Katsunori
49079a5fce repair: open a file with checkambig=True to avoid file stat ambiguity
Before this patch, if steps below occurs at "the same time in sec",
all of mtime, ctime and size are same between (1) and (3).

  1. append data to revlog-style file (and close transaction)
  2. discard appended data by truncation of strip
  3. append same size but different data to revlog-style file again

Therefore, cache validation doesn't work after (3) as expected.

To avoid such file stat ambiguity around truncation, this patch opens
a file with checkambig=True.

This patch also introduces "with" statement style, to ensure immediate
invocation of close() after truncation, because closing file is the
only trigger to check (and get rid of) file stat ambiguity.

This is a part of ExactCacheValidationPlan.

    https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
2016-09-22 21:52:00 +09:00
Martin von Zweigbergk
e3fc1041dc strip: don't use "full" and "partial" to describe bundles
The partial bundle is not a subset of the full bundle, and the full
bundle is not full in any way that i see. The most obvious
interpretation of "full" I can think of is that it has all commits
back to the null revision, but that is not what the "full" bundle
is. The "full" bundle is simply a backup of what the user asked us to
strip (unless --no-backup). The "partial" bundle contains the
revisions we temporarily stripped because they had higher revision
numbers that some commit that the user asked us to strip.

The "full" bundle is already called "backup" in the code, so let's use
that in user-facing messages too. Let's call the "partial" bundle
"temporary" in the code.
2016-09-19 09:14:35 -07:00
Martin von Zweigbergk
fe544b2a62 strip: clarify that user action is required to recover temp bundle
If strip fails when applying the temporary bundle, the commits in the
temporary bundle have not yet been applied, so the user will almost
definitely want to apply the bundle. We should be more clear to the
user about that than our current "partial bundle stored in...".

Note that we will probably not be able to recover it automatically,
since whatever made it fail (e.g. a hook) will most likely make it
fail again. We need to give control back to the user to fix the
problem before trying again.
2016-09-19 09:14:32 -07:00
Martin von Zweigbergk
c435ce4f96 strip: report both bundle files in case of exception (issue5368)
If strip fails while recovering the temporary bundle (e.g. because a
hook fails), we tell the user only about the backup bundle, not about
the temporary bundle. Since the user did not ask to strip the commits
in the temporary bundle, that's the more important bundle to mention,
so let's do that (and also mention the backup bundle as usual).
2016-09-15 09:45:29 -07:00