Commit Graph

201 Commits

Author SHA1 Message Date
Jun Wu
db43fda5b7 bundlerepo: always copy bundle parts before processing
This fixes treemanifest-infinitepush.
(grafted from 0364e7f0f9ef7eb92c7ca6f825cd252d1ecdac2e)
(grafted from c62ae0604a5124ae799f3ba7e309c23696fbd03d)
(grafted from e4b056930eb739c01d34a04e852643bf9c175d77)
2018-01-03 05:35:56 -08:00
Pulkit Goyal
16cd827e4d py3: handle keyword arguments correctly in bundlerepo.py
Differential Revision: https://phab.mercurial-scm.org/D1672
2017-12-10 06:36:35 +05:30
Gregory Szorc
2f77487f6f bundle2: don't use seekable bundle2 parts by default (issue5691)
The last commit removed the last use of the bundle2 part seek() API
in the generic bundle2 part iteration code. This means we can now
switch to using unseekable bundle2 parts by default and have the
special consumers that actually need the behavior request it.

This commit changes unbundle20.iterparts() to expose non-seekable
unbundlepart instances by default. If seekable parts are needed,
callers can pass "seekable=True." The bundlerepo class needs
seekable parts, so it does this.

The interrupt handler is also changed to use a regular unbundlepart.
So, by default, all consumers except bundlerepo will see unseekable
parts.

Because the behavior of the iterparts() benchmark changed, we add
a variation to test seekable parts vs unseekable parts. And because
parts no longer have seek() unless "seekable=True," we update the
"part seek" benchmark.

Speaking of benchmarks, this change has the following impact to
`hg perfbundleread` on an uncompressed bundle of the Firefox repo
(6,070,036,163 bytes):

! read(8k)
! wall 0.722709 comb 0.720000 user 0.150000 sys 0.570000 (best of 14)
! read(16k)
! wall 0.602208 comb 0.590000 user 0.080000 sys 0.510000 (best of 17)
! read(32k)
! wall 0.554018 comb 0.560000 user 0.050000 sys 0.510000 (best of 18)
! read(128k)
! wall 0.520086 comb 0.530000 user 0.020000 sys 0.510000 (best of 20)
! bundle2 forwardchunks()
! wall 2.996329 comb 3.000000 user 2.300000 sys 0.700000 (best of 4)
! bundle2 iterparts()
! wall 8.070791 comb 8.060000 user 7.180000 sys 0.880000 (best of 3)
! wall 6.983756 comb 6.980000 user 6.220000 sys 0.760000 (best of 3)
! bundle2 iterparts() seekable
! wall 8.132131 comb 8.110000 user 7.160000 sys 0.950000 (best of 3)
! bundle2 part seek()
! wall 10.370142 comb 10.350000 user 7.430000 sys 2.920000 (best of 3)
! wall 10.860942 comb 10.840000 user 7.790000 sys 3.050000 (best of 3)
! bundle2 part read(8k)
! wall 8.599892 comb 8.580000 user 7.720000 sys 0.860000 (best of 3)
! wall 7.258035 comb 7.260000 user 6.470000 sys 0.790000 (best of 3)
! bundle2 part read(16k)
! wall 8.265361 comb 8.250000 user 7.360000 sys 0.890000 (best of 3)
! wall 7.099891 comb 7.080000 user 6.310000 sys 0.770000 (best of 3)
! bundle2 part read(32k)
! wall 8.290308 comb 8.280000 user 7.330000 sys 0.950000 (best of 3)
! wall 6.964685 comb 6.950000 user 6.130000 sys 0.820000 (best of 3)
! bundle2 part read(128k)
! wall 8.204900 comb 8.150000 user 7.210000 sys 0.940000 (best of 3)
! wall 6.852867 comb 6.850000 user 6.060000 sys 0.790000 (best of 3)

The significant speedup is due to not incurring the overhead to track
payload offset data. Of course, this overhead is proportional to
bundle2 part size. So a multiple gigabyte changegroup part is on the
extreme side of the spectrum for real-world impact.

In addition to the CPU efficiency wins, not tracking offset data
also means not using memory to hold that data. Using a bundle based on
the example BSD repository in issue 5691, this change has a drastic
impact to memory usage during `hg unbundle` (`hg clone` would behave
similarly). Before, memory usage incrementally increased for the
duration of bundle processing. In other words, as we advanced through
the changegroup and bundle2 part, we kept allocating more memory to
hold offset data. After this change, we still increase memory during
changegroup application. But the rate of increase is significantly
slower. (A bulk of the remaining gradual increase appears to be the
storing of revlog sizes in the transaction object to facilitate
rollback.)

The RSS at the end of filelog application is as follows:

Before: ~752 MB
After:  ~567 MB

So, we were storing ~185 MB of offset data that we never even used.
Talk about wasteful!

.. api::

   bundle2 parts are no longer seekable by default.

.. perf::

   bundle2 read I/O throughput significantly increased.

.. perf::

   Significant memory use reductions when reading from bundle2 bundles.

   On the BSD repository, peak RSS during changegroup application
   decreased by ~185 MB from ~752 MB to ~567 MB.

Differential Revision: https://phab.mercurial-scm.org/D1390
2017-11-13 21:10:37 -08:00
Gregory Szorc
d035774cab bundle2: only seek to beginning of part in bundlerepo
For reasons still not yet fully understood by me, bundlerepo
requires its changegroup bundle2 part to be seeked to beginning
after part iteration. As far as I can tell, it is the only
bundle2 part consumer that relies on this behavior.

This seeking was performed in the generic iterparts() API. Again,
I don't fully understand why it was here and not in bundlerepo.
Probably historical reasons.

What I do know is that all other bundle2 part consumers don't
need this special behavior (assuming the tests are comprehensive).
So, we move the code from bundle2's iterparts() to bundlerepo's
consumption of iterparts().

Differential Revision: https://phab.mercurial-scm.org/D1389
2017-11-13 20:12:00 -08:00
Gregory Szorc
f5517d96d5 bundlerepo: rename "bundlefilespos" variable and attribute
Strictly speaking, this variable tracks offsets within the
changegroup, not the bundle.

While we're here, mark a class attribute as private because
it is.

.. api::

   Rename bundlerepo.bundlerepository.bundlefilespos to
   _cgfilespos.

Differential Revision: https://phab.mercurial-scm.org/D1384
2017-11-13 19:12:56 -08:00
Gregory Szorc
27bc3c9021 bundlerepo: rename "bundle" arguments to "cgunpacker"
"bundle" was appropriate for the bundle1 days where a bundle
was a changegroup. In a bundle2 world, changegroup readers
are referred to as "changegroup unpackers."

Differential Revision: https://phab.mercurial-scm.org/D1383
2017-11-13 19:12:17 -08:00
Gregory Szorc
8a88beb1d7 bundlerepo: use early return
I like avoiding patterns that lead to the pyramid of doom.

Differential Revision: https://phab.mercurial-scm.org/D1382
2017-11-11 18:55:04 -08:00
Gregory Szorc
a4f1d34829 bundlerepo: rename _bundle to _cgunpacker
_bundle is really a changegroup unpacker instance. Rename the
variable accordingly.

Differential Revision: https://phab.mercurial-scm.org/D1379
2017-11-11 18:41:14 -08:00
Gregory Szorc
f8da298c93 bundlerepo: assign bundle attributes in bundle type blocks
It is a bit wonky to assign the same object to multiple
attributes and then possibly overwrite them later.

Refactor the code to use a local variable and defer attribute
assignment until the final values are ready.

This required passing the bundle instance to _handlebundle2part().
The only use of this method I could find is Facebook's
treemanifest extension. Since it is a private method, I don't
think it warrants an API callout.

Differential Revision: https://phab.mercurial-scm.org/D1378
2017-11-11 18:34:50 -08:00
Gregory Szorc
d446e5fd4c bundlerepo: make bundle and bundlefile attributes private
These attributes are implementation details and shouldn't be
exposed outside the class.

.. api::

   bundlerepo.bundlerepository.bundle and
   bundlerepo.bundlerepository.bundlefile are now prefixed with an
   underscore.

Differential Revision: https://phab.mercurial-scm.org/D1377
2017-11-11 18:22:36 -08:00
Gregory Szorc
4303257e0a bundlerepo: don't assume there are only two bundle classes
exchange.readbundle() can return a type that represents a stream
clone bundle. Explicitly handle the bundle1 type and raise a
reasonable error message for unhandled bundle types.

Differential Revision: https://phab.mercurial-scm.org/D1376
2017-11-11 18:14:41 -08:00
Gregory Szorc
96510ffc31 bundlerepo: add docstring for bundlerepository class
Differential Revision: https://phab.mercurial-scm.org/D1375
2017-11-11 18:09:16 -08:00
Gregory Szorc
dfe2ba26ec bundlerepo: rename arguments to bundlerepository.__init__
To reflect what they actually are.

Differential Revision: https://phab.mercurial-scm.org/D1374
2017-11-11 18:05:02 -08:00
Gregory Szorc
ba1e1e4a72 bundlerepo: use suffix variable
It looks like the refactor in 0883a2ece555 attempted to establish
this method argument but failed to use it. My editor caught it.

Differential Revision: https://phab.mercurial-scm.org/D1373
2017-11-11 17:07:33 -08:00
Gregory Szorc
f9ea80f7a0 bundlerepo: make methods agree with base class
My editor was complaining about mismatches between method
signatures.

For methods that are implemented, we change arguments to match
the base. For those that aren't, we use variable arguments
because it shouldn't matter.

Differential Revision: https://phab.mercurial-scm.org/D1372
2017-11-11 17:02:31 -08:00
Durham Goode
76228a6bf0 bundle: allow bundlerepo to support alternative manifest implementations
With our treemanifest logic, the manifests are no longer transported as part of
the changegroup and are no longer stored in a revlog. This means the
self.manifestlog line in bundlerepo.filestart no longer calls
_constructmanifest, and therefore does not consume the manifest portion of the
changegroup, which means filestart is not populated and we result in an infinite
loop.

The fix is to make filestart aware that self.manifestlog might not consume the
changegroup part, and consume it manually if necessary.

There's currently no way to test this in core, but our treemanifest extension
has tests to cover this.

Differential Revision: https://phab.mercurial-scm.org/D1329
2017-11-07 10:16:53 -08:00
Durham Goode
fceec8eca5 bundlerepo: update to use new deltaiter api
Differential Revision: https://phab.mercurial-scm.org/D745
2017-09-20 09:39:03 -07:00
Yuya Nishihara
9443bfa027 revlog: update signature of dummy addgroup() in bundlerepo and unionrepo
Per e85296920485, 711178a106a3 and 4d58af51001a.
2017-09-15 23:58:45 +09:00
Durham Goode
8075a9d667 bundlerepo: move bundle2 part handling out to a function
This moves the bundle2 part handling for bundlerepo out to a separate function
so extensions can participate in bundlerepo setup when using bundle2 bundles.

Differential Revision: https://phab.mercurial-scm.org/D290
2017-08-23 12:35:03 -07:00
Durham Goode
22fc2e18a8 bundle2: seek part back during iteration
Previously, iterparts would yield the part to users, then consume the part. This
changed the part after the user was given it and left it at the end, both of
which seem unexpected.  Let's seek back to the beginning after we've consumed
it. I tried not seeking to the end at all, but that seems important for the
overall bundle2 consumption.

This is used in a future patch to let us move the bundlerepo
bundle2-changegroup-part to be handled entirely within the for loop, instead of
having to do a seek back to 0 after the entire loop finishes.

Differential Revision: https://phab.mercurial-scm.org/D289
2017-08-23 12:35:03 -07:00
Durham Goode
538824ea3e bundlerepo: move temp bundle creation to a separate function
A future patch will refactor certain parts of bundlerepo initiatlization such
that we need to create temp bundles from another function. Let's move this to
another function to support that.

Differential Revision: https://phab.mercurial-scm.org/D288
2017-08-23 12:34:56 -07:00
Pierre-Yves David
a0f6321906 configitems: register the 'bundle.mainreporoot' config 2017-06-30 03:31:26 +02:00
Jun Wu
f5ab365fb4 bundlerepo: use raw revision in revdiff()
This is similar to "revlog: use raw revisions in revdiff". revdiff()
generates raw text used in revlog directly.

This makes test-flagprocessor.t happy.
2017-04-03 09:31:39 -07:00
Jun Wu
52198a3918 bundlerepo: fix raw handling in revision()
Similar to fixes in revlog.py, this patch uses "rawtext" to explicitly label
contents expected to be raw, and makes sure content stored in _cache is raw
text.

Now test-flagprocessor.t points us to another issue.
2017-04-06 17:45:47 -07:00
Jun Wu
062c44135c bundlerepo: build revlog index with flags
This fixes bundlerevlog.flags(rev) for any revisions provided by the bundle.

Now test-flagprocessor.t points us to another issue.
2017-04-06 18:06:42 -07:00
Jun Wu
336e7d4e7c bundlerepo: make baserevision return raw text
"baserevision" returns the text that will be used to apply deltas. Since
deltas are against raw texts, "baserevision" should return raw text.

Now test-flagprocessor.t points us to a new error.
2017-04-06 17:43:29 -07:00
Jun Wu
9c45c07c8b bundlerepo: avoid unnecessary node -> rev conversion 2017-03-29 16:28:00 -07:00
Augie Fackler
9a15a28705 py3: use bytearray() instead of array('c', ...) constructions
Portable from 2.6-3.6.
2017-03-12 03:32:21 -04:00
Pierre-Yves David
c8445658f5 vfs: use 'vfs' module directly in 'mercurial.bundlerepo'
Now that the 'vfs' classes moved in their own module, lets use the new module
directly. We update code iteratively to help with possible bisect needs in the
future.
2017-03-02 14:47:03 +01:00
Pulkit Goyal
07314d0686 py3: convert the mode argument of os.fdopen to unicodes (1 of 2)
os.fdopen() does not accepts bytes as its second argument which represent the
mode in which the file is to be opened. This patch makes sure unicodes are
passed in py3 by using pycompat.sysstr().
2017-02-13 20:06:38 +05:30
Remi Chaintron
dfc79cbfc3 revlog: flag processor
Add the ability for revlog objects to process revision flags and apply
registered transforms on read/write operations.

This patch introduces:
- the 'revlog._processflags()' method that looks at revision flags and applies
  flag processors registered on them. Due to the need to handle non-commutative
  operations, flag transforms are applied in stable order but the order in which
  the transforms are applied is reversed between read and write operations.
- the 'addflagprocessor()' method allowing to register processors on flags.
  Flag processors are defined as a 3-tuple of (read, write, raw) functions to be
  applied depending on the operation being performed.
- an update on 'revlog.addrevision()' behavior. The current flagprocessor design
  relies on extensions to wrap around 'addrevision()' to set flags on revision
  data, and on the flagprocessor to perform the actual transformation of its
  contents. In the lfs case, this means we need to process flags before we meet
  the 2GB size check, leading to performing some operations before it happens:
  - if flags are set on the revision data, we assume some extensions might be
    modifying the contents using the flag processor next, and we compute the
    node for the original revision data (still allowing extension to override
    the node by wrapping around 'addrevision()').
  - we then invoke the flag processor to apply registered transforms (in lfs's
    case, drastically reducing the size of large blobs).
  - finally, we proceed with the 2GB size check.

Note: In the case a cachedelta is passed to 'addrevision()' and we detect the
flag processor modified the revision data, we chose to trust the flag processor
and drop the cachedelta.
2017-01-10 16:15:21 +00:00
Remi Chaintron
6d11b9177b revlog: add 'raw' argument to revision and _addrevision
This patch introduces a new 'raw' argument (defaults to False) to revlog's
revision() and _addrevision() methods.
When the 'raw' argument is set to True, it indicates the revision data should be
handled as raw data by the flagprocessor.

Note: Given revlog.addgroup() calls are restricted to changegroup generation, we
can always set raw to True when calling revlog._addrevision() from
revlog.addgroup().
2017-01-05 17:16:07 +00:00
Remi Chaintron
cc88d4a3c4 revlog: merge hash checking subfunctions
This patch factors the behavior of both methods into 'checkhash'.
2016-12-13 14:21:36 +00:00
Pulkit Goyal
97f340e354 py3: use pycompat.getcwd() instead of os.getcwd()
We have pycompat.getcwd() which returns bytes path on Python 3. This patch
changes most of the occurences of the os.getcwd() with pycompat one.
2016-11-23 00:03:11 +05:30
Durham Goode
52b8095f37 manifest: remove last uses of repo.manifest
Now that all the functionality has been moved to manifestlog/manifestrevlog/etc,
we can finally change all the uses of repo.manifest to use the new versions. A
future diff will then delete repo.manifest.

One additional change in this commit is to change repo.manifestlog to be a
@storecache property instead of @property. This is required by some uses of
repo.manifest require that it be settable (contrib/perf.py and the static http
server). We can't do this in a prior change because we can't use @storecache on
this until repo.manifest is no longer used anywhere.
2016-11-10 02:13:19 -08:00
Durham Goode
57cfc4515a manifest: add bundlemanifestlog support
As part of deprecating manifest.manifest we need to make bundlerepo support
manifestlog.
2016-11-11 01:15:59 -08:00
Durham Goode
757b6fb5aa manifest: move manifest creation to a helper function
A future patch will be moving manifest creation to be inside manifestlog as part
of improving our cache guarantees. bundlerepo and unionrepo currently rely on
being able to hook into manifest creation, so let's temporarily move the actual
manifest creation to a helper function for them to intercept.

In the future manifest.manifest() will disappear entirely and this can
disappear.
2016-10-18 17:32:51 -07:00
Durham Goode
34f86b9344 manifest: make one use of _mancache avoid manifestctxs
In a future patch we will change manifestctx and treemanifestctx to no longer
derive from manifestdict and treemanifest, respectively. This means that
consumers of the _mancache will now need to be aware of the different between
the two, until we get rid of the manifest entirely and the _mancache becomes
only filled with ctxs.

This fixes one case of it that can be fixed by using the other cache. Future
patches will address the others uses using the upcoming manifestctx.read()
function.
2016-09-12 14:29:09 -07:00
Pierre-Yves David
cb4c54634b manifest: backed out changeset 3e5e08efafc9
There is some suspicious failure in evolution tests. This changeset was supposed
to be dropped until we investigate.
2016-09-10 01:42:05 +02:00
Durham Goode
e8a39ee6a7 manifest: make uses of _mancache aware of contexts
In a future patch we will change manifestctx and treemanifestctx to no longer
derive from manifestdict and treemanifest, respectively. This means that
consumers of the _mancache will now need to be aware of the different between
the two, until we get rid of the manifest entirely and the _mancache becomes
only filled with ctxs.
2016-08-29 18:02:09 -07:00
Durham Goode
9dfdbc1f92 manifest: break mancache into two caches
The old manifest cache cached both the inmemory representation and the raw text.
As part of the manifest refactor we want to separate the storage format from the
in memory representation, so let's split this cache into two caches.

This will let other manifest implementations participate in the in memory cache,
while allowing the revlog based implementations to still depend on the full text
caching where necessary.
2016-08-17 13:25:13 -07:00
Augie Fackler
ba4d11b62e bundlerepo: add support for treemanifests in cg3 bundles
This is a little messier than I'd like, and I'll probably come back
and do some more refactoring later, but as it is this unblocks
narrowhg. An alternative approach (which I may do as part of the
mentioned refactoring) would be to construct *all* dirlog instances up
front, so that we don't have to keep track of the linkmapper
method. This would avoid a reference cycle between the bundlemanifest
and the bundlerepository, but I was hesitant to do all the work up
front like that.

With this change, it's possible to do 'hg incoming' and 'hg pull' from
bundles in .hg/strip-backup in a treemanifest repository. Sadly, this
doesn't make it possible to 'hg clone' one of those (if you do 'hg
strip 0'), because the cg3 in the bundle gets written without a
treemanifest flag. Since that's going to be an involved refactor in a
different part of the code (which I *suspect* won't touch any of the
code I've just written here), let's leave it as an idea for Later.
2016-08-05 13:08:11 -04:00
Augie Fackler
7b4dc2c6d6 bundlerepo: use supportedincomingversions instead of allsupportedversions
Since bundlerepo is really a pull-like operation, this is the correct
method to use here.
2016-08-04 14:13:35 -04:00
Augie Fackler
7c233d9381 bundlerepo: introduce method to find file starts and use it
This moves us to the modern iter() technique instead of the `while
True` pattern since it's easy. Factored out as a function because I'm
about to need this in a second place.
2016-08-05 13:07:58 -04:00
Augie Fackler
ce135bfec2 bundlerevlog: use for loop over iterator instead of while True
The iter() builtin has a neat pattern where you give it a callable of
no arguments and a sentinel value, and you can then loop over the
function calls like a normal iterator. This cleans up the code a
little.
2016-08-05 13:09:50 -04:00
Augie Fackler
3bb87a6688 bundlerepo: use for loop over iterator instead of while True
The iter() builtin has a neat pattern where you give it a callable of
no arguments and a sentinel value, and you can then loop over the
function calls like a normal iterator. This cleans up the code a
little.
2016-08-05 13:09:24 -04:00
Pierre-Yves David
000dd50a40 bundle2: remove 'experimental.bundle2-exp' boolean config (BC)
All users are migrated to 'devel.legacy.exchange', we can clean up the
experimental namespace.

Marking as (BC) because I know some large installation have bundle2 off and I
want to make sure they notice the change.
2016-08-03 16:23:26 +02:00
Pierre-Yves David
6236bcaa4b bundlerepo: also read the 'devel.legacy.exchange' config
Bundlerepo does its own bundle2 related logic.
2016-08-03 16:42:10 +02:00
liscju
c7ec9d159e i18n: translate abort messages
I found a few places where message given to abort is
not translated, I don't find any reason to not translate
them.
2016-06-14 11:53:55 +02:00
liscju
f82ff5ff29 bundle: warn when update to revision existing only in a bundle (issue5004)
Now its done silently, so unless user really knows what he is doing
will be suprised to find that after update 'hg status' doesn't work.
This commit makes also merge operation warns about missing parent when
revision to merge exists only in the bundle.
2016-03-23 08:55:22 +01:00