Commit Graph

17 Commits

Author SHA1 Message Date
Gregory Szorc
45174d8965 commands: config option to control bundle compression level
Currently, bundle compression uses the default compression level
for the active compression engine. The default compression level
is tuned as a compromise between speed and size.

Some scenarios may call for a different compression level. For
example, with clone bundles, bundles are generated once and used
several times. Since the cost to generate is paid infrequently,
server operators may wish to trade extra CPU time for better
compression ratios.

This patch introduces an experimental and undocumented config
option to control the bundle compression level. As the inline
comment says, this approach is a bit hacky. I'd prefer for
the compression level to be encoded in the bundle spec. e.g.
"zstd-v2;complevel=15." However, given that the 4.1 freeze is
imminent, I'm not comfortable implementing this user-facing
change without much time to test and consider the implications.
So, we're going with the quick and dirty solution for now.

Having this option in the 4.1 release will enable Mozilla to
easily produce and test zlib and zstd bundles with non-default
compression levels in production. This will help drive future
development of the feature and zstd integration with Mercurial.
2017-01-10 11:20:32 -08:00
Gregory Szorc
520beaf8a9 util: implement zstd compression engine
Now that zstd is vendored and being built (in some configurations), we
can implement a compression engine for zstd!

The zstd engine is a little different from existing engines. Because
it may not always be present, we have to defer load the module in case
importing it fails. We facilitate this via a cached property that holds
a reference to the module or None. The "available" method is
implemented to reflect reality.

The zstd engine declares its ability to handle bundles using the
"zstd" human name and the "ZS" internal name. The latter was chosen
because internal names are 2 characters (by only convention I think)
and "ZS" seems reasonable.

The engine, like others, supports specifying the compression level.
However, there are no consumers of this API that yet pass in that
argument. I have plans to change that, so stay tuned.

Since all we need to do to support bundle generation with a new
compression engine is implement and register the compression engine,
bundle generation with zstd "just works!" Tests demonstrating this
have been added.

How does performance of zstd for bundle generation compare? On the
mozilla-unified repo, `hg bundle --all -t <engine>-v2` yields the
following on my i7-6700K on Linux:

engine        CPU time     bundle size   vs orig size   throughput
none            97.0s     4,054,405,584     100.0%       41.8 MB/s
bzip2 (l=9)    393.6s       975,343,098      24.0%       10.3 MB/s
gzip (l=6)     184.0s     1,140,533,074      28.1%       22.0 MB/s
zstd (l=1)     108.2s     1,119,434,718      27.6%       37.5 MB/s
zstd (l=2)     111.3s     1,078,328,002      26.6%       36.4 MB/s
zstd (l=3)     113.7s     1,011,823,727      25.0%       35.7 MB/s
zstd (l=4)     116.0s     1,008,965,888      24.9%       35.0 MB/s
zstd (l=5)     121.0s       977,203,148      24.1%       33.5 MB/s
zstd (l=6)     131.7s       927,360,198      22.9%       30.8 MB/s
zstd (l=7)     139.0s       912,808,505      22.5%       29.2 MB/s
zstd (l=12)    198.1s       854,527,714      21.1%       20.5 MB/s
zstd (l=18)    681.6s       789,750,690      19.5%        5.9 MB/s

On compression, zstd for bundle generation delivers:

* better compression than gzip with significantly less CPU utilization
* better than bzip2 compression ratios while still being significantly
  faster than gzip
* ability to aggressively tune compression level to achieve
  significantly smaller bundles

That last point is important. With clone bundles, a server can
pre-generate a bundle file, upload it to a static file server, and
redirect clients to transparently download it during clone. The server
could choose to produce a zstd bundle with the highest compression
settings possible. This would take a very long time - a magnitude
longer than a typical zstd bundle generation - but the result would
be hundreds of megabytes smaller! For the clone volume we do at
Mozilla, this could translate to petabytes of bandwidth savings
per year and faster clones (due to smaller transfer size).

I don't have detailed numbers to report on decompression. However,
zstd decompression is fast: >1 GB/s output throughput on this machine,
even through the Python bindings. And it can do that regardless of the
compression level of the input. By the time you have enough data to
worry about overhead of decompression, you have plenty of other things
to worry about performance wise.

zstd is wins all around. I can't wait to implement support for it
on the wire protocol and in revlogs.
2016-11-11 01:10:07 -08:00
timeless
dd077e79ae bundle: use single quotes in use warning 2016-09-20 23:46:15 +00:00
Gregory Szorc
4ad5f2e492 bundle2: store changeset count when creating file bundles
The bundle2 changegroup part has an advisory param saying how many
changesets are in the part. Before this patch, we were setting
this part when generating bundle2 parts via the wire protocol but
not when generating local bundle2 files.

A side effect of not setting the changeset count part is that progress
bars don't work when applying changesets. As the tests show, this
impacted clone bundles, shelve, backup bundles, `hg unbundle`, and
anything touching bundle2 files.

This patch adds a backdoor to allow us to pass state from
changegroup generation into the unbundler. We store the number
of changesets in the changegroup in this state and use it to
populate the aforementioned advisory part parameter when generating
the bundle2 bundle.

I concede that I'm not thrilled by how state is being passed in
changegroup.py (it feels a bit hacky). I would love to overhaul the
rather confusing set of functions in changegroup.py with something that
passes rich objects around instead of e.g. low-level generators.
However, given the code freeze for 3.9 is imminent, I'd rather not
undertake this endeavor right now. This feels like the easiest way
to get the parameter added to the changegroup part.
2016-07-17 15:13:51 -07:00
Gregory Szorc
f5105c6a41 util: implement a deterministic __repr__ on sortdict
`hg debugbundle` is calling repr() on bundle2 part params, which are
now util.sortdict instances. Unfortunately, repr() doesn't appear
to be deterministic for util.sortdict. So, we implement one.

We include the type name because that's the common convention for
__repr__ implementations. Having the type name in `hg debugbundle`
is a bit ugly. But it's a debug command and I don't care enough to
fix it.
2016-07-17 15:10:30 -07:00
Gregory Szorc
3a890f3e32 commands: teach debugbundle to print bundle specification
This seems like the most logical place to put this functionality.

Test coverage over existing known bundle specs has been added.
2016-01-14 22:57:55 -08:00
Pierre-Yves David
0341332573 test: use generaldelta in 'test-bundle-type.t'
Generaldelta changes some of the default targets for 'hg bundle'. All cases are
already properly tested but some ambiguous specifications are affected.
2015-10-20 11:48:49 +02:00
Danek Duvall
d23d8113a8 test: test-bundle-type.t needs to work more universally
The cut and head utilities on Solaris have weird differences from the GNU
versions.  The f helper script does a dump more nicely than those tools,
anyway.
2015-11-10 09:58:10 -08:00
Gregory Szorc
c55df1f741 exchange: refactor bundle specification parsing
The old code was tailored to `hg bundle` usage and not appropriate for
use as a general API, which clone bundles will require. The code has
been rewritten to make it more generally suitable.

We introduce dedicated error types to represent invalid and unsupported
bundle specifications. The reason we need dedicated error types (rather
than error.Abort) is because clone bundles will want to catch these
exception as part of filtering entries. We don't want to swallow
error.Abort on principle.
2015-10-13 10:57:54 -07:00
Pierre-Yves David
11cc1a1216 getsubset: get the unpacker version from the bundler
The current setup requires to pass both a packer and, optionally, the version
of the unpacker. This is confusing and error prone as the two value cannot
mismatch. Instead, we simply grab the version from the packer. This fixes a bug
where requesting a cg2 from 'hg bundle' were reported as changegroup 1.

I should have caught that in the initial changeset but I missed it somehow.
2015-10-09 14:59:37 -07:00
Pierre-Yves David
8f390ad0b6 bundle: extend the format of --type to support version and compression
We had some basic undocumented support for uncompressed bundle2 support. We now
have an official extensible syntax to specify both format type and compression
(eg: bzip2-v2).

In practice, this changeset introduce the 'v1' and 'v2' identifier to make it
possible to combine format and compression. The default format is still 'v1'.
We'll care about picking 'v1' or 'v2' in regard with general delta in the next
changesets.
2015-10-01 19:16:00 -07:00
Pierre-Yves David
50e5ae9a7a test-bundle-type: replace unbundle with debugbundle
We now have a convenient command to look at bundle contents, let's use
it.
2015-10-01 20:15:00 -07:00
Thomas Arendsen Hein
498dfaa7d6 pull: print "pulling from foo" before accessing the other repo
1. This is consistent with pushing.
2. This allows to see the URL of the other repo in case accessing the repo
   fails, e.g. wrong ssh path or issues with the https certificate, without
   using --debug or showconfig paths.

Additionally add test for this in the context of ssh with a wrong path.
2015-02-24 10:55:24 +01:00
Bryan O'Sullivan
9b32b66805 commands: move bundle type validation earlier
Checking the bundle type late in the command's execution can mean
that we do work for a long time before complaining about incorrect
user input and aborting.  Guess how I discovered this.
2012-04-13 11:01:07 -07:00
Matt Mackall
08439e0f2d tests: add exit codes to unified tests 2010-09-16 17:51:32 -05:00
Matt Mackall
60627b799a bundlerepo: remove duplication of bundle decompressors 2010-08-25 16:55:54 -05:00
Martin Geisler
9e6d9afbda tests: unify test-bundle-type 2010-08-14 03:23:56 +02:00