Summary: Boxed bytes slices (e.g., `Box<[u8]>`, `Rc<[u8]>`) are not very ergonomic to use and are somewhat unusual in Rust code. Use the more common and easier to use `Bytes` type instead. Since this type supports shallow, referenced-counted copies, there shouldn't be any new O(n) copying behavior compared to `Rc<[u8]>`.
Reviewed By: markbt
Differential Revision: D13754730
fbshipit-source-id: d5fbc8e39c84c56d30174f4bb194ee21a14bf944
Summary: Use `failure::Fallible<T>` in place of `Result<T, failure::Error>`.
Reviewed By: singhsrb
Differential Revision: D13754688
fbshipit-source-id: cfbe418f5213884816d4837d1077cd90a17359b6
Summary:
Unfortunately required symbols are not exposed by lz4-sys. So we just declare
them ourselves.
Make sure it compresses better:
In [1]: c=open('/bin/bash').read();
In [2]: from mercurial.rust import lz4
In [3]: len(lz4.compress(c))
Out[3]: 762906
In [4]: len(lz4.compresshc(c))
Out[4]: 626970
While it's much slower for larger data (and compresshc is slower than pylz4):
Benchmarking (easy to compress data, 20MB)...
pylz4.compress: 10328.03 MB/s
rustlz4.compress_py: 9373.84 MB/s
pylz4.compressHC: 1666.80 MB/s
rustlz4.compresshc_py: 8298.57 MB/s
pylz4.decompress: 3953.03 MB/s
rustlz4.decompress_py: 3935.57 MB/s
Benchmarking (hard to compress data, 0.2MB)...
pylz4.compress: 4357.88 MB/s
rustlz4.compress_py: 4193.34 MB/s
pylz4.compressHC: 3740.40 MB/s
rustlz4.compresshc_py: 2730.71 MB/s
pylz4.decompress: 5600.94 MB/s
rustlz4.decompress_py: 5362.96 MB/s
Benchmarking (hard to compress data, 20MB)...
pylz4.compress: 5156.72 MB/s
rustlz4.compress_py: 5447.00 MB/s
pylz4.compressHC: 33.70 MB/s
rustlz4.compresshc_py: 22.25 MB/s
pylz4.decompress: 2375.42 MB/s
rustlz4.decompress_py: 5755.46 MB/s
Note python-lz4 was using an ancient version of lz4. So there could be differences.
Reviewed By: DurhamG
Differential Revision: D13528200
fbshipit-source-id: 6be1c1dd71f57d40dcffcc8d212d40a853583254
Summary:
This allows decompressing into a pre-allocated buffer. After some experiments,
it seems `bytearray` will just break too many things, ex:
- bytearray is not hashable
- bytearray[index] returns an int
- a = bytearray('x'); b = a; b += '3' # will mutate 'a'
- ''.join([bytearray('')]) will raise TypeError
Therefore we have to use zero-copy `bytes` instead, which is less elegent. But
this API change is a step forward.
Reviewed By: DurhamG
Differential Revision: D13528201
fbshipit-source-id: 1cfaf5d55efdc0d6c0df85df9960fe9682028b08
Summary:
This gives some sense about how fast it is.
Background: I was trying to get rid of python-lz4, by exposing this to Python.
However, I noticed it's 10x slower than python-lz4. Therefore I added some
benchmark here to test if it's the wrapper or the Rust lz4 code.
It does not seem to be this crate:
```
# Pure Rust
compress (100M) 77.170 ms
decompress (~100M) 67.043 ms
# python-lz4
In [1]: import lz4, os
In [2]: b=os.urandom(100000000);
In [3]: %timeit lz4.compress(b)
10 loops, best of 3: 87.4 ms per loop
```
Reviewed By: DurhamG
Differential Revision: D13516205
fbshipit-source-id: f55f94bbecc3b49667ed12174f7000b1aa29e7c4
Summary: This is just the result of running `./contrib/fix-code.py $(hg files .)`
Reviewed By: ikostia
Differential Revision: D10213075
fbshipit-source-id: 88577c9b9588a5b44fcf1fe6f0082815dfeb363a
Summary:
Backout D9124508.
This is actually more complex than it seems. It breaks non-buck build
everywhere:
- hgbuild on all platforms. POSIX platforms break because `hg archive` will
miss `scm/common`. Windows build breaks because of symlink.
- `make local` on GitHub repo because `failure_ext` is not public. The `pylz4`
Cargo.toml has missing dependencies.
Fixing them correctly seems non-trivial. Therefore let's backout the change to
unblock builds quickly.
The linter change is kept in case we'd like to try again in the future.
Reviewed By: simpkins
Differential Revision: D9225955
fbshipit-source-id: 4170a5f7664ac0f6aa78f3b32f61a09d65e19f63
Summary: Moved the lz4 compression code into a separate module in `scm/common/pylz4` and redirected code referencing the former two files to the new module
Reviewed By: quark-zju, mitrandir77
Differential Revision: D9124508
fbshipit-source-id: e4796cf36d16c3a8c60314c75f26ee942d2f9e65
Summary:
The python lz4 framing logic chooses to include no data when the input
string is 0 length. We need to match that logic in order to be compatible with
it.
See https://github.com/steeve/python-lz4/blob/master/src/python-lz4.c#L75
Reviewed By: quark-zju
Differential Revision: D8773951
fbshipit-source-id: 9bc60fc0779eb923f7c663d7e516b519963e8056
Summary:
Make `lib` a cargo workspace so building in subprojects would share a
`target` directory and `cargo doc` will build documentation for all
subprojects.
Reviewed By: DurhamG
Differential Revision: D8741175
fbshipit-source-id: 512325bcb23d51e866e764bdc76dddb22c59ef05
Summary:
The crates.io lz4 bindings only support the lz4 framed format, while
our python lz4 library produces custom framed compressed blobs. Let's add a new
wrapper around lz4-sys that handles are special framing. We can migrate to the
standard framing later.
Reviewed By: quark-zju
Differential Revision: D7855502
fbshipit-source-id: 04abb1bc784c6be7f22bcd80645d1b50debc93bd