Commit Graph

12 Commits

Author SHA1 Message Date
Arun Kulshreshtha
d3839ffb07 revisionstore: use Bytes instead of Box<[u8]> in Delta and DataEntry
Summary: Boxed bytes slices (e.g., `Box<[u8]>`, `Rc<[u8]>`) are not very ergonomic to use and are somewhat unusual in Rust code. Use the more common and easier to use `Bytes` type instead. Since this type supports shallow, referenced-counted copies, there shouldn't be any new O(n) copying behavior compared to `Rc<[u8]>`.

Reviewed By: markbt

Differential Revision: D13754730

fbshipit-source-id: d5fbc8e39c84c56d30174f4bb194ee21a14bf944
2019-01-22 14:03:17 -08:00
Arun Kulshreshtha
6a00abcfb0 lz4-pyframe: use failure::Fallible
Summary: Use `failure::Fallible<T>` in place of `Result<T, failure::Error>`.

Reviewed By: singhsrb

Differential Revision: D13754688

fbshipit-source-id: cfbe418f5213884816d4837d1077cd90a17359b6
2019-01-21 18:00:57 -08:00
Arun Kulshreshtha
7c93df4d3b l4-pyframe: migrate to rust 2018
Summary: Migrate crate to Rust 2018.

Reviewed By: singhsrb

Differential Revision: D13754665

fbshipit-source-id: d2ce3994874afa1149229d481084ea66b5e312f8
2019-01-21 18:00:57 -08:00
Jun Wu
22e9000fc9 lz4-pyframe: add compresshc
Summary:
Unfortunately required symbols are not exposed by lz4-sys. So we just declare
them ourselves.

Make sure it compresses better:

  In [1]: c=open('/bin/bash').read();
  In [2]: from mercurial.rust import lz4
  In [3]: len(lz4.compress(c))
  Out[3]: 762906
  In [4]: len(lz4.compresshc(c))
  Out[4]: 626970

While it's much slower for larger data (and compresshc is slower than pylz4):

  Benchmarking (easy to compress data, 20MB)...
            pylz4.compress: 10328.03 MB/s
       rustlz4.compress_py:  9373.84 MB/s
          pylz4.compressHC:  1666.80 MB/s
     rustlz4.compresshc_py:  8298.57 MB/s
          pylz4.decompress:  3953.03 MB/s
     rustlz4.decompress_py:  3935.57 MB/s
  Benchmarking (hard to compress data, 0.2MB)...
            pylz4.compress:  4357.88 MB/s
       rustlz4.compress_py:  4193.34 MB/s
          pylz4.compressHC:  3740.40 MB/s
     rustlz4.compresshc_py:  2730.71 MB/s
          pylz4.decompress:  5600.94 MB/s
     rustlz4.decompress_py:  5362.96 MB/s
  Benchmarking (hard to compress data, 20MB)...
            pylz4.compress:  5156.72 MB/s
       rustlz4.compress_py:  5447.00 MB/s
          pylz4.compressHC:    33.70 MB/s
     rustlz4.compresshc_py:    22.25 MB/s
          pylz4.decompress:  2375.42 MB/s
     rustlz4.decompress_py:  5755.46 MB/s

Note python-lz4 was using an ancient version of lz4. So there could be differences.

Reviewed By: DurhamG

Differential Revision: D13528200

fbshipit-source-id: 6be1c1dd71f57d40dcffcc8d212d40a853583254
2018-12-20 17:54:22 -08:00
Jun Wu
6e88ac4794 lz4-pyframe: provide decompress_into API
Summary:
This allows decompressing into a pre-allocated buffer. After some experiments,
it seems `bytearray` will just break too many things, ex:

- bytearray is not hashable
- bytearray[index] returns an int
- a = bytearray('x'); b = a; b += '3' # will mutate 'a'
- ''.join([bytearray('')]) will raise TypeError

Therefore we have to use zero-copy `bytes` instead, which is less elegent. But
this API change is a step forward.

Reviewed By: DurhamG

Differential Revision: D13528201

fbshipit-source-id: 1cfaf5d55efdc0d6c0df85df9960fe9682028b08
2018-12-20 17:54:22 -08:00
Jun Wu
35c85018cd lz4-pyframe: add a benchmark
Summary:
This gives some sense about how fast it is.

Background: I was trying to get rid of python-lz4, by exposing this to Python.
However, I noticed it's 10x slower than python-lz4. Therefore I added some
benchmark here to test if it's the wrapper or the Rust lz4 code.

It does not seem to be this crate:

```
  # Pure Rust
  compress (100M)                77.170 ms
  decompress (~100M)             67.043 ms

  # python-lz4
  In [1]: import lz4, os
  In [2]: b=os.urandom(100000000);
  In [3]: %timeit lz4.compress(b)
  10 loops, best of 3: 87.4 ms per loop
```

Reviewed By: DurhamG

Differential Revision: D13516205

fbshipit-source-id: f55f94bbecc3b49667ed12174f7000b1aa29e7c4
2018-12-20 17:54:21 -08:00
Jun Wu
3adc813687 codemod: add copyright headers
Summary: This is just the result of running `./contrib/fix-code.py $(hg files .)`

Reviewed By: ikostia

Differential Revision: D10213075

fbshipit-source-id: 88577c9b9588a5b44fcf1fe6f0082815dfeb363a
2018-10-26 15:09:12 -07:00
Jun Wu
e33154698b Back out "Reuse pylz4 encoding between hg and Mononoke into a separate library"
Summary:
Backout D9124508.

This is actually more complex than it seems. It breaks non-buck build
everywhere:

- hgbuild on all platforms. POSIX platforms break because `hg archive` will
  miss `scm/common`. Windows build breaks because of symlink.
- `make local` on GitHub repo because `failure_ext` is not public. The `pylz4`
  Cargo.toml has missing dependencies.

Fixing them correctly seems non-trivial. Therefore let's backout the change to
unblock builds quickly.

The linter change is kept in case we'd like to try again in the future.

Reviewed By: simpkins

Differential Revision: D9225955

fbshipit-source-id: 4170a5f7664ac0f6aa78f3b32f61a09d65e19f63
2018-08-08 12:20:54 -07:00
Tuan Tran
f50d617d2d Reuse pylz4 encoding between hg and Mononoke into a separate library
Summary: Moved the lz4 compression code into a separate module in `scm/common/pylz4` and redirected code referencing the former two files to the new module

Reviewed By: quark-zju, mitrandir77

Differential Revision: D9124508

fbshipit-source-id: e4796cf36d16c3a8c60314c75f26ee942d2f9e65
2018-08-08 10:08:11 -07:00
Durham Goode
51cca830f8 lz4-pyframe: fix compression of 0 length strings
Summary:
The python lz4 framing logic chooses to include no data when the input
string is 0 length. We need to match that logic in order to be compatible with
it.

See https://github.com/steeve/python-lz4/blob/master/src/python-lz4.c#L75

Reviewed By: quark-zju

Differential Revision: D8773951

fbshipit-source-id: 9bc60fc0779eb923f7c663d7e516b519963e8056
2018-07-09 18:02:58 -07:00
Jun Wu
d0c1b6d014 cargo: add a workspace
Summary:
Make `lib` a cargo workspace so building in subprojects would share a
`target` directory and `cargo doc` will build documentation for all
subprojects.

Reviewed By: DurhamG

Differential Revision: D8741175

fbshipit-source-id: 512325bcb23d51e866e764bdc76dddb22c59ef05
2018-07-05 16:06:35 -07:00
Durham Goode
8a6a929876 lz4: add rust lz4 bindings
Summary:
The crates.io lz4 bindings only support the lz4 framed format, while
our python lz4 library produces custom framed compressed blobs. Let's add a new
wrapper around lz4-sys that handles are special framing. We can migrate to the
standard framing later.

Reviewed By: quark-zju

Differential Revision: D7855502

fbshipit-source-id: 04abb1bc784c6be7f22bcd80645d1b50debc93bd
2018-05-16 09:13:18 -07:00