Commit Graph

8 Commits

Author SHA1 Message Date
Gregory Szorc
c70926c31e zstd: vendor python-zstandard 0.8.1
This contains a fix for compilation on BSDs.

#no-check-commit
2017-04-08 14:33:20 -07:00
Gregory Szorc
6758d40c63 zstd: vendor python-zstandard 0.8.0
Commit 81e1f5bbf1fc54808649562d3ed829730765c540 from
https://github.com/indygreg/python-zstandard is imported without
modifications (other than removing unwanted files).

Updates relevant to Mercurial include:

* Support for multi-threaded compression (we can use this for
  bundle and wire protocol compression).
* APIs for batch compression and decompression operations using
  multiple threads and optimal memory allocation mechanism. (Can
  be useful for revlog perf improvements.)
* A ``BufferWithSegments`` type that models a single memory buffer
  containing N discrete items of known lengths. This type can be
  used for very efficient 0-copy data operations.

# no-check-commit
2017-04-01 15:24:03 -07:00
Gregory Szorc
3f02063f8e zstd: vendor python-zstandard 0.7.0
Commit 3054ae3a66112970a091d3939fee32c2d0c1a23e from
https://github.com/indygreg/python-zstandard is imported without
modifications (other than removing unwanted files).

The vendored zstd library within has been upgraded from 1.1.2 to
1.1.3. This version introduced new APIs for threads, thread
pools, multi-threaded compression, and a new dictionary
builder (COVER). These features are not yet used by
python-zstandard (or Mercurial for that matter). However,
that will likely change in the next python-zstandard release
(and I think there are opportunities for Mercurial to take
advantage of the multi-threaded APIs).

Relevant to Mercurial, the CFFI bindings are now fully
implemented. This means zstd should "just work" with PyPy
(although I haven't tried). The python-zstandard test suite also
runs all tests against both the C extension and CFFI bindings to
ensure feature parity.

There is also a "decompress_content_dict_chain()" API. This was
derived from discussions with Yann Collet on list about alternate
ways of encoding delta chains.

The change most relevant to Mercurial is a performance enhancement in
the simple decompression API to reuse a data structure across
operations. This makes decompression of multiple inputs significantly
faster. (This scenario occurs when reading revlog delta chains, for
example.)

Using python-zstandard's bench.py to measure the performance
difference...

On changelog chunks in the mozilla-unified repo:

decompress discrete decompress() reuse zctx
1.262243 wall; 1.260000 CPU; 1.260000 user; 0.000000 sys 170.43 MB/s (best of 3)
0.949106 wall; 0.950000 CPU; 0.950000 user; 0.000000 sys 226.66 MB/s (best of 4)

decompress discrete dict decompress() reuse zctx
0.692170 wall; 0.690000 CPU; 0.690000 user; 0.000000 sys 310.80 MB/s (best of 5)
0.437088 wall; 0.440000 CPU; 0.440000 user; 0.000000 sys 492.17 MB/s (best of 7)

On manifest chunks in the mozilla-unified repo:

decompress discrete decompress() reuse zctx
1.367284 wall; 1.370000 CPU; 1.370000 user; 0.000000 sys 274.01 MB/s (best of 3)
1.086831 wall; 1.080000 CPU; 1.080000 user; 0.000000 sys 344.72 MB/s (best of 3)

decompress discrete dict decompress() reuse zctx
0.993272 wall; 0.990000 CPU; 0.990000 user; 0.000000 sys 377.19 MB/s (best of 3)
0.678651 wall; 0.680000 CPU; 0.680000 user; 0.000000 sys 552.06 MB/s (best of 5)

That should make reads on zstd revlogs a bit faster ;)

# no-check-commit
2017-02-07 23:24:47 -08:00
Gregory Szorc
4225a4e399 zstd: prevent potential free() of uninitialized memory
This is a cherry pick of an upstream fix. The free() of uninitialed
memory could likely only occur if a malloc() inside zstd fails.

The patched functions aren't currently used by Mercurial. But I don't
like leaving footguns sitting around.
2017-01-17 10:17:13 -08:00
Gregory Szorc
c3cb00b3e9 zstd: vendor python-zstandard 0.6.0
Commit 63c68d6f5fc8de4afd9bde81b13b537beb4e47e8 from
https://github.com/indygreg/python-zstandard is imported without
modifications (other than removing unwanted files).

This includes minor performance and feature improvements. It also
changes the vendored zstd library from 1.1.1 to 1.1.2.

# no-check-commit
2017-01-14 19:41:43 -08:00
Danek Duvall
267cb097d4 zstd: fix compilation with Solaris Studio
Without these changes, Solaris Studio (12.4) gives us "syntax error: empty
declaration" on these two lines.
2016-11-22 13:32:05 -08:00
Gregory Szorc
2005375cdc zstd: vendor python-zstandard 0.5.0
As the commit message for the previous changeset says, we wish
for zstd to be a 1st class citizen in Mercurial. To make that
happen, we need to enable Python to talk to the zstd C API. And
that requires bindings.

This commit vendors a copy of existing Python bindings. Why do we
need to vendor? As the commit message of the previous commit says,
relying on systems in the wild to have the bindings or zstd present
is a losing proposition. By distributing the zstd and bindings with
Mercurial, we significantly increase our chances that zstd will
work. Since zstd will deliver a better end-user experience by
achieving better performance, this benefits our users. Another
reason is that the Python bindings still aren't stable and the
API is somewhat fluid. While Mercurial could be coded to target
multiple versions of the Python bindings, it is safer to bundle
an explicit, known working version.

The added Python bindings are mostly a fully-featured interface
to the zstd C API. They allow one-shot operations, streaming,
reading and writing from objects implements the file object
protocol, dictionary compression, control over low-level compression
parameters, and more. The Python bindings work on Python 2.6,
2.7, and 3.3+ and have been tested on Linux and Windows. There are
CFFI bindings, but they are lacking compared to the C extension.
Upstream work will be needed before we can support zstd with PyPy.
But it will be possible.

The files added in this commit come from Git commit
e637c1b214d5f869cf8116c550dcae23ec13b677 from
https://github.com/indygreg/python-zstandard and are added without
modifications. Some files from the upstream repository have been
omitted, namely files related to continuous integration.

In the spirit of full disclosure, I'm the maintainer of the
"python-zstandard" project and have authored 100% of the code
added in this commit. Unfortunately, the Python bindings have
not been formally code reviewed by anyone. While I've tested
much of the code thoroughly (I even have tests that fuzz APIs),
there's a good chance there are bugs, memory leaks, not well
thought out APIs, etc. If someone wants to review the code and
send feedback to the GitHub project, it would be greatly
appreciated.

Despite my involvement with both projects, my opinions of code
style differ from Mercurial's. The code in this commit introduces
numerous code style violations in Mercurial's linters. So, the code
is excluded from most lints. However, some violations I agree with.
These have been added to the known violations ignore list for now.
2016-11-10 22:15:58 -08:00
Gregory Szorc
d7da329834 zstd: vendor zstd 1.1.1
zstd is a new compression format and it is awesome, yielding
higher compression ratios and significantly faster compression
and decompression operations compared to zlib (our current
compression engine of choice) across the board.

We want zstd to be a 1st class citizen in Mercurial and to eventually
be the preferred compression format for various operations.

This patch starts the formal process of supporting zstd by vendoring
a copy of zstd. Why do we need to vendor zstd? Good question.

First, zstd is relatively new and not widely available yet. If we
didn't vendor zstd or distribute it with Mercurial, most users likely
wouldn't have zstd installed or even available to install. What good
is a feature if you can't use it? Vendoring and distributing the zstd
sources gives us the highest liklihood that zstd will be available to
Mercurial installs.

Second, the Python bindings to zstd (which will be vendored in a
separate changeset) make use of zstd APIs that are only available
via static linking. One reason they are only available via static
linking is that they are unstable and could change at any time.
While it might be possible for the Python bindings to attempt to
talk to different versions of the zstd C library, the safest thing to
do is link against a specific, known-working version of zstd. This
is why the Python zstd bindings themselves vendor zstd and why we
must as well. This also explains why the added files are in a
"python-zstandard" directory.

The added files are from the 1.1.1 release of zstd (Git commit
4c0b44f8ced84c4c8edfa07b564d31e4fa3e8885 from
https://github.com/facebook/zstd) and are added without modifications.
Not all files from the zstd "distribution" have been added. Notably
missing are files to support interacting with "legacy," pre-1.0
versions of zstd. The decision of which files to include is made by
the upstream python-zstandard project (which I'm the author of). The
files in this commit are a snapshot of the files from the 0.5.0
release of that project, Git commit
e637c1b214d5f869cf8116c550dcae23ec13b677 from
https://github.com/indygreg/python-zstandard.
2016-11-10 21:45:29 -08:00