Commit Graph

148 Commits

Author SHA1 Message Date
Pierre-Yves David
11f26a2c51 strip: use the 'writenewbundle' function to get bundle on disk
This will ensure the backup bundle use the best available logic (eg: includes
relevant caches so that we loose less of them on strip.)
2017-05-05 18:15:42 +02:00
Durham Goode
8712c20680 hg: backout optimizing for treemanifests
It turns out that the files list is not sufficient to identify with revlogs have
changed. In a merge commit, no files could've changed but directories would
have. For now let's just backout this optimization.
2017-05-15 18:55:58 -07:00
Durham Goode
fd9ba7b071 strip: make tree stripping O(changes) instead of O(repo)
The old tree stripping logic iterated over every tree revlog in the repo looking
for commits that had revs to be stripped. That's very inefficient in large
repos. Instead, let's look at what files are touched by the strip and only
inspect those revlogs.

I don't have actual perf numbers, since internally we don't use a true
treemanifest, but simply iterating over hundreds of thousands of revlogs takes
many, many seconds, so this should help tremendously when stripping only a few
commits.
2017-05-08 11:35:23 -07:00
Durham Goode
7f29b67f6b strip: move tree strip logic to it's own function
This will allow external extensions to modify tree strip behavior more
precisely.
2017-05-08 11:35:23 -07:00
Pierre-Yves David
64e5cd2f7e upgrade: extract code in its own module
Given about 2/3 or 'mercurial.repair' is now about repository upgrade, I think
it is fair to move it into its own module.

An expected benefit is the ability to drop the 'upgrade' prefix of many
functions. This will be done in coming changesets.
2017-04-07 18:53:17 +02:00
Jun Wu
809dfffea4 repair: use ProgrammingError 2017-03-26 16:53:28 -07:00
Matt Harbison
721da7fc07 repair: use context manager for lock management
If repo.lock() raised inside of the try block, 'tr' would have been None in the
finally block where it tries to release().  Modernize the syntax instead of just
winching the lock out of the try block.

I found several other instances of acquiring the lock inside of the 'try', but
those finally blocks handle None references.  I also started switching some
trivial try/finally blocks to context managers, but didn't get them all because
indenting over 3x for lock, wlock and transaction would have spilled over 80
characters.  That got me wondering if there should be a repo.rwlock(), to handle
locking and unlocking in the proper order.

It also looks like py27 supports supports multiple context managers for a single
'with' statement.  Should I hold off on the rest until py26 is dropped?
2017-03-23 23:47:23 -04:00
Pierre-Yves David
197ab7aeb0 repair: directly use repo.vfs.join
The 'repo.join' method is about to be deprecated.
2017-03-08 16:53:39 -08:00
Pierre-Yves David
37925b72f7 vfs: use 'vfs' module directly in 'mercurial.repair'
Now that the 'vfs' classes moved in their own module, lets use the new module
directly. We update code iteratively to help with possible bisect needs in the
future.
2017-03-02 13:29:43 +01:00
Simon Farnsworth
e0b70e4f7f mercurial: switch to util.timer for all interval timings
util.timer is now the best available interval timer, at the expense of not
having a known epoch. Let's use it whenever the epoch is irrelevant.
2017-02-15 13:17:39 -08:00
Gregory Szorc
abe1c0e17e repair: clean up stale lock file from store backup
Since we did a directory rename on the stores, the source
repository's lock path now references the dest repository's
lock path and the dest repository's lock path now references
a non-existent filename.

So releasing the lock on the source will unlock the dest and
releasing the lock on the dest will no-op because it fails due
to file not found. So we clean up the dest's lock manually.
2016-11-24 18:45:29 -08:00
Gregory Szorc
a400e3d753 repair: copy non-revlog store files during upgrade
The store contains more than just revlogs. This patch teaches the
upgrade code to copy regular files as well.

As the test changes demonstrate, the phaseroots file is now copied.
2016-11-24 18:34:50 -08:00
Gregory Szorc
93504084a0 repair: migrate revlogs during upgrade
Our next step for in-place upgrade is to migrate store data. Revlogs
are the biggest source of data within the store and a store is useless
without them, so we implement their migration first.

Our strategy for migrating revlogs is to walk the store and call
`revlog.clone()` on each revlog. There are some minor complications.

Because revlogs have different storage options (e.g. changelog has
generaldelta and delta chains disabled), we need to obtain the
correct class of revlog so inserted data is encoded properly for its
type.

Various attempts at implementing progress indicators that didn't lead
to frustration from false "it's almost done" indicators were made.

I initially used a single progress bar based on number of revlogs.
However, this quickly churned through all filelogs, got to 99% then
effectively froze at 99.99% when it got to the manifest.

So I converted the progress bar to total revision count. This was a
little bit better. But the manifest was still significantly slower
than filelogs and it took forever to process the last few percent.

I then tried both revision/chunk bytes and raw bytes as the
denominator. This had the opposite effect: because so much data is in
manifests, it would churn through filelogs without showing much
progress. When it got to manifests, it would fill in 90+% of the
progress bar.

I finally gave up having a unified progress bar and instead implemented
3 progress bars: 1 for filelog revisions, 1 for manifest revisions, and
1 for changelog revisions. I added extra messages indicating the total
number of revisions of each so users know there are more progress bars
coming.

I also added extra messages before and after each stage to give extra
details about what is happening. Strictly speaking, this isn't
necessary. But the numbers are impressive. For example, when converting
a non-generaldelta mozilla-central repository, the messages you see are:

   migrating 2475593 total revisions (1833043 in filelogs, 321156 in manifests, 321394 in changelog)
   migrating 1.67 GB in store; 2508 GB tracked data
   migrating 267868 filelogs containing 1833043 revisions (1.09 GB in store; 57.3 GB tracked data)
   finished migrating 1833043 filelog revisions across 267868 filelogs; change in size: -415776 bytes
   migrating 1 manifests containing 321156 revisions (518 MB in store; 2451 GB tracked data)

That "2508 GB" figure really blew me away. I had no clue that the raw
tracked data in mozilla-central was that large. Granted, 2451 GB is in
the manifest and "only" 57.3 GB is in filelogs. But still.

It's worth noting that gratuitous loading of source revlogs in order
to display numbers and progress bars does serve a purpose: it ensures
we can open all source revlogs. We don't want to spend several minutes
copying revlogs only to encounter a permissions error or similar later.

As part of this commit, we also add swapping of the store directory
to the upgrade function. After revlogs are converted, we move the
old store into the backup directory then move the temporary repo's
store into the old store's location. On well-behaved systems, this
should be 2 atomic operations and the window of inconsistency show be
very narrow.

There are still a few improvements to be made to store copying and
upgrading. But this commit gets the bulk of the work out of the way.
2016-12-18 17:00:15 -08:00
Gregory Szorc
b9b6954ea9 repair: begin implementation of in-place upgrading
Now that all the upgrade planning work is in place, we can start
doing the real work: actually upgrading a repository.

The main goal of this commit is to get the "framework" for running
in-place upgrade actions in place.

Rather than get too clever and low-level with regards to in-place
upgrades, our strategy is to create a new, temporary repository,
copy data to it, then replace the old data with the new. This allows
us to reuse a lot of code in localrepo.py around store interaction,
which will eventually consume the bulk of the upgrade code.

But we have to start small. This patch implements adding new
repository requirements. But it still sets up a temporary
repository and locks it and the source repo before performing the
requirements file swap. This means all the plumbing is in place
to implement store copying in subsequent commits.
2016-12-18 16:59:04 -08:00
Gregory Szorc
a3569d4b71 repair: determine what upgrade will do
This commit introduces code for determining what actions/improvements
an upgrade should perform.

The "upgradefindimprovements" function introduces a mechanism to
return a list of improvements that can be made to a repository.
Each improvement is effectively an action that an upgrade will
perform. Associated with each of these improvements is metadata
that will be used to inform users what's wrong and what an
upgrade will do.

Each "improvement" is categorized as a "deficiency" or an
"optimization." TBH, I'm not thrilled about the terminology and
am receptive to constructive bikeshedding. The main difference
between a "deficiency" and an "optimization" is a deficiency
is always corrected (if it deviates from the current config) and
an "optimization" is an optional action that goes above and beyond
to improve the state of the repository (usually by requiring more
CPU during upgrade).

Our initial set of improvements identifies missing repository
requirements, a single, easily correctable problem with
changelog storage, and a set of "optimizations" related to delta
recalculation.

The main "upgraderepo" function has been expanded to handle
improvements. It queries for the list of improvements and determines
which of them will run based on the current repository state and user

I went through numerous iterations of the output format before
settling on a ReST-inspired definition list format. (I used
bulleted lists in the first submission of this commit and could
not get it to format just right.) Even with the various iterations,
I'm still not super thrilled with the format. But, this is a debug*
command, so that should mean we can refine the output without BC
concerns.
2016-12-18 16:51:09 -08:00
Gregory Szorc
f42e2dcaac repair: implement requirements checking for upgrades
This commit introduces functionality for upgrading a repository in
place. The first part that's implemented is testing for upgrade
"compatibility." This is done by examining repository requirements.

There are 5 functions returning sets of requirements that control
upgrading. Why so many functions? Mainly to support extensions.
Functions are easier to monkeypatch than module variables.

Astute readers will see that we don't support "manifestv2" and
"treemanifest" requirements in the upgrade mechanism. I don't have
a great answer for why other than this is a complex set of patches
and I don't want to deal with the complexity of these experimental
features just yet. We can teach the upgrade mechanism about them
later, once the basic upgrade mechanism is in place.

This commit also introduces the "upgraderepo" function. This will be
our main routine for performing an in-place upgrade. Currently, it
just implements requirements checking. The structure of some code in
this function may look a bit weird (e.g. the inline function that is
only called once). But this will make sense after future commits.
2016-12-18 16:16:54 -08:00
Martin von Zweigbergk
e1f0ba8ef9 repair: combine two loops over changelog revisions
This just saves a few lines.
2017-01-04 10:35:04 -08:00
Martin von Zweigbergk
92d0334538 repair: speed up stripping of many roots
repair.strip() expects a set of root revisions to strip. It then
builds the full set of descedants by walking the descandants of
each. It is rare that more than a few roots get passed in, but if that
happens, it will wastefully walk the changelog for each root. So let's
just walk it once.

I noticed this because the narrowhg extension was passing not only
roots, but all the commits to strip. When there were tens of thousands
of commits to strip, this resulted in quadratic behavior with that
extension.
2017-01-04 10:07:12 -08:00
Durham Goode
52b8095f37 manifest: remove last uses of repo.manifest
Now that all the functionality has been moved to manifestlog/manifestrevlog/etc,
we can finally change all the uses of repo.manifest to use the new versions. A
future diff will then delete repo.manifest.

One additional change in this commit is to change repo.manifestlog to be a
@storecache property instead of @property. This is required by some uses of
repo.manifest require that it be settable (contrib/perf.py and the static http
server). We can't do this in a prior change because we can't use @storecache on
this until repo.manifest is no longer used anywhere.
2016-11-10 02:13:19 -08:00
Durham Goode
f980c11277 manifest: delete unused dirlog and _newmanifest functions
As part of migrating all manifest functionality out of manifest.manifest, let's
migrate a couple spots off of manifest.dirlog() to use the revlog specific
accessor. Then we can delete manifest.dirlog() and other unused functions.
2016-11-10 02:13:19 -08:00
Martin von Zweigbergk
422165fd86 repair: make strip() return backup file path
narrowhg wants to strip some commits and then re-apply them after
applying another bundle. Having repair.strip() return the bundle path
will be helpful for it.
2016-10-31 15:40:30 -07:00
FUJIWARA Katsunori
49079a5fce repair: open a file with checkambig=True to avoid file stat ambiguity
Before this patch, if steps below occurs at "the same time in sec",
all of mtime, ctime and size are same between (1) and (3).

  1. append data to revlog-style file (and close transaction)
  2. discard appended data by truncation of strip
  3. append same size but different data to revlog-style file again

Therefore, cache validation doesn't work after (3) as expected.

To avoid such file stat ambiguity around truncation, this patch opens
a file with checkambig=True.

This patch also introduces "with" statement style, to ensure immediate
invocation of close() after truncation, because closing file is the
only trigger to check (and get rid of) file stat ambiguity.

This is a part of ExactCacheValidationPlan.

    https://www.mercurial-scm.org/wiki/ExactCacheValidationPlan
2016-09-22 21:52:00 +09:00
Martin von Zweigbergk
e3fc1041dc strip: don't use "full" and "partial" to describe bundles
The partial bundle is not a subset of the full bundle, and the full
bundle is not full in any way that i see. The most obvious
interpretation of "full" I can think of is that it has all commits
back to the null revision, but that is not what the "full" bundle
is. The "full" bundle is simply a backup of what the user asked us to
strip (unless --no-backup). The "partial" bundle contains the
revisions we temporarily stripped because they had higher revision
numbers that some commit that the user asked us to strip.

The "full" bundle is already called "backup" in the code, so let's use
that in user-facing messages too. Let's call the "partial" bundle
"temporary" in the code.
2016-09-19 09:14:35 -07:00
Martin von Zweigbergk
fe544b2a62 strip: clarify that user action is required to recover temp bundle
If strip fails when applying the temporary bundle, the commits in the
temporary bundle have not yet been applied, so the user will almost
definitely want to apply the bundle. We should be more clear to the
user about that than our current "partial bundle stored in...".

Note that we will probably not be able to recover it automatically,
since whatever made it fail (e.g. a hook) will most likely make it
fail again. We need to give control back to the user to fix the
problem before trying again.
2016-09-19 09:14:32 -07:00
Martin von Zweigbergk
c435ce4f96 strip: report both bundle files in case of exception (issue5368)
If strip fails while recovering the temporary bundle (e.g. because a
hook fails), we tell the user only about the backup bundle, not about
the temporary bundle. Since the user did not ask to strip the commits
in the temporary bundle, that's the more important bundle to mention,
so let's do that (and also mention the backup bundle as usual).
2016-09-15 09:45:29 -07:00
Martin von Zweigbergk
19513cd917 strip: simplify some repeated conditions
We check "if saveheads or savebases" in several places to see if we
should or have created a bundle of the changesets to apply after
truncating the revlogs. One of the conditions is actually just "if
saveheads", but since there can't be savebases without saveheads, that
is effectively the same condition. It seems simpler to check only once
and from then on see if we created the file.
2016-09-15 10:18:56 -07:00
Augie Fackler
cc39e93327 repair: build dirlogs using manifest, rather than repo shortcut method
As before, this was rarely used, so let's get rid of the convenience method.
2016-08-05 13:01:01 -04:00
Martin von Zweigbergk
82a5e7d944 treemanifests: actually strip directory manifests
Stripping has only partly worked since f41815302d49 (repair: use cg3
for treemanifests, 2016-01-19): the bundle seems to have been created
correctly, but revlog entries in subdirectory revlogs were not
stripped. This meant that e.g. "hg verify" would fail after stripping
in a tree manifest repo.

To find the revisions to strip, we simply iterate over all directories
in the repo (included in store.datafiles()). This is inefficient for
stripping few commits, but efficient for stripping many commits. To
optimize for stripping few commits, we could instead walk the tree
from the root and find modified subdirectories, just like we do in the
changegroup code. I'm leaving that for another day.
2016-06-30 13:06:19 -07:00
Augie Fackler
ad67b99d20 cleanup: replace uses of util.(md5|sha1|sha256|sha512) with hashlib.\1
All versions of Python we support or hope to support make the hash
functions available in the same way under the same name, so we may as
well drop the util forwards.
2016-06-10 00:12:33 -04:00
Laurent Charignon
94c489ecf6 strip: invalidate phase cache after stripping changeset (issue5235)
When we remove a changeset from the changelog, the phase cache must be
invalidated, otherwise it could refer to changesets that are no longer in the
repo.

To reproduce the failure, I created an extension querying the phase cache after
the strip transaction is over.

To do that, I stripped two commits with a bookmark on one of them to force
another transaction (we open a transaction for moving bookmarks)
after the strip transaction.

Without the fix in this patch, the test leads to a stacktrace showing the issue:
      repair.strip(ui, repo, revs, backup)
    File "/Users/lcharignon/facebook-hg-rpms/hg-crew/mercurial/repair.py", line 205, in strip
      tr.close()
    File "/Users/lcharignon/facebook-hg-rpms/hg-crew/mercurial/transaction.py", line 44, in _active
      return func(self, *args, **kwds)
    File "/Users/lcharignon/facebook-hg-rpms/hg-crew/mercurial/transaction.py", line 490, in close
      self._postclosecallback[cat](self)
    File "$TESTTMP/crashstrip2.py", line 4, in test
      [repo.changelog.node(r) for r in repo.revs("not public()")]
    File "/Users/lcharignon/facebook-hg-rpms/hg-crew/mercurial/changelog.py", line 337, in node
      return super(changelog, self).node(rev)
    File "/Users/lcharignon/facebook-hg-rpms/hg-crew/mercurial/revlog.py", line 377, in node
      return self.index[rev][7]
  IndexError: revlog index out of range

The situation was encountered in inhibit (evolve's repo) where we would crash
following the volatile set invalidation submitted by Augie in
cbc52a99d057d11790cf5011e877c6f698bf57bf. Before his patch the issue was masked
as we were not accessing the phasecache after stripping a revision.

This bug uncovered another but in histedit (see explanation in issue5235).
I changed the histedit test accordingly to avoid fixing two things at once.
2016-05-12 06:13:59 -07:00
Kostia Balytskyi
47727221bc obsstore: move delete function from obsstore class to repair module
Since one of the original patches was accepted already and people on the
mailing list still have suggestions as to how this should be improved, I'm
implementing those suggestions in the following patches (this and the ones that
might follow).
2016-04-12 04:06:50 -07:00
Martin von Zweigbergk
4cc86f7b27 bundle: move writebundle() from changegroup.py to bundle2.py (API)
writebundle() writes a bundle2 bundle or a plain changegroup1. Imagine
away the "2" in "bundle2.py" for a moment and this change should makes
sense. The bundle wraps the changegroup, so it makes sense that it
knows about it. Another sign that this is correct is that the delayed
import of bundle2 in changegroup goes away.

I'll leave it for another time to remove the "2" in "bundle2.py"
(alternatively, extract a new bundle.py from it).
2016-03-28 14:41:29 -07:00
Anton Shestakov
e4ad4cc290 repair: specify unit for ui.progress in rebuildfncache() 2016-03-11 20:44:40 +08:00
Anton Shestakov
9c0d085179 repair: use 'rebuilding' progress topic in rebuildfncache() 2016-03-11 20:39:29 +08:00
Martin von Zweigbergk
d7ef7dd40a treemanifest: fix debugrebuildfncache
When I taught debugrebuildfncache about dirlogs in ebe9dacc63ba
(treemanifests: fix streaming clone, 2016-02-04), I added a
last-minute "if 'treemanifest' in repo" guard. That should have been
checking for "... in repo.requirements". Fix that and add tests for
it.
2016-02-07 21:44:38 -08:00
Martin von Zweigbergk
e50c296659 treemanifests: fix streaming clone
Similar to the previous patch, the .hg/store/meta/ directory does not
get copied when when using "hg clone --uncompressed". Fix by including
"meta/" in store.datafiles(). This seems safe to do, as there are only
a few users of this method. "hg manifest" already filters the paths by
"data/" prefix. The calls from largefiles also seem safe. The use in
verify needs updating to prevent it from mistaking dirlogs for
orphaned filelogs. That change is included in this patch.

Since the dirlogs will now be in the fncache when using fncachestore,
let's also update debugrebuildfncache(). That will also allow any
existing treemanifest repos to get their dirlogs into the fncache.

Also update test-treemanifest.t to use an a directory name that
requires dot-encoding and uppercase-encoding so we test that the path
encoding works.
2016-02-04 08:34:07 -08:00
Martin von Zweigbergk
857d2206c3 repair: use cg3 for treemanifests
The newly created helper changegroup.safeversion() knows to pick
version 03 if the repo uses treemanifests, so just using that means we
pick the right changegroup version.
2016-01-19 15:38:24 -08:00
Bryan O'Sullivan
541db2c882 with: use context manager for transaction in strip 2016-01-15 13:14:49 -08:00
Bryan O'Sullivan
8bfeb98530 with: use context manager for transaction in strip 2016-01-15 13:14:49 -08:00
Bryan O'Sullivan
da377129a6 with: use context manager in rebuildfncache 2016-01-15 13:14:49 -08:00
Bryan O'Sullivan
155e1de602 with: use context manager in rebuildfncache again 2016-01-15 13:14:49 -08:00
Laurent Charignon
0a74ada9c5 repair: improves documentation of strip regarding locks
This patch adds a comment making it clear that we should hold a lock before
calling repair.strip. The wording is the same than what we have for
obsolete.createmarkers
2015-12-29 10:21:39 -08:00
Laurent Charignon
56e13fa832 repair: use bookmarks.recordchange instead of bookmarks.write
Before this patch we were using the deprecated bookmarks.write api. This
patch replaces the call to bookmarks.write by a call to bookmarks.recordchange.
We move the bookmark code above the code removing the undo file because with
bookmarks.recordchange we have to create a transaction that would create an
undo file.
2015-11-30 16:38:29 -08:00
Pierre-Yves David
0089eccb8e strip: pass source and url to bundle2 processing
Restoring from a 'bundle2' was missing this data.
2015-10-20 16:01:33 +02:00
Augie Fackler
5888f3afd8 repair: use cg?unpacker.apply() instead of changegroup.addchangegroup() 2015-10-13 17:12:46 -04:00
Ryan McElroy
61f504197f strip: factor out revset calculation for strip -B
This will allow reusing it in evolve and overriding it in other extensions.
2015-10-09 14:48:59 -07:00
Pierre-Yves David
30913031d4 error: get Abort from 'error' instead of 'util'
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.

For great justice.
2015-10-08 12:55:45 -07:00
Pierre-Yves David
316994e165 strip: compress bundle2 backup using BZ
Storing uncompressed bundle on disk would be a regression. Strip backup using
bundle2 are now compressed when requested.
2015-09-29 14:42:03 -07:00
Pierre-Yves David
124ee320a0 strip: use bundle2 + cg2 by default when repository use general delta
The bundle10 format (plain changegroup-01) does not support general delta and
result into expensive delta re-computation when stripping. If the repository is
general delta, we store backups as bundle20 containing a changegroup-02 payload.

We remove the experimental feature related to strip backup bundle format because
this achieve the same goal in a leaner way. Removing the experimental option is
fine, that is why it experimental in the first place.

Compression of these bundles are coming in later changesets.
2015-09-29 13:16:51 -07:00
Matt Mackall
dc3c45835d merge with stable 2015-08-12 17:01:50 -05:00