Commit Graph

5 Commits

Author SHA1 Message Date
Mark Thomas
ae0a81f2c2 rust: move bindings to a single python extension
Summary:
Move all Rust bindings to a single python extension, `bindings`.  This should
improve compilation time and make things simpler.

Reviewed By: quark-zju

Differential Revision: D13923866

fbshipit-source-id: 560592b5a6c0c4f1b836c755ef123666a1059164
2019-02-01 17:53:22 -08:00
Jun Wu
9dc21f8d0b codemod: import from the edenscm package
Summary:
D13853115 adds `edenscm/` to `sys.path` and code still uses `import mercurial`.
That has nasty problems if both `import mercurial` and
`import edenscm.mercurial` are used, because Python would think `mercurial.foo`
and `edenscm.mercurial.foo` are different modules so code like
`try: ... except mercurial.error.Foo: ...`, or `isinstance(x, mercurial.foo.Bar)`
would fail to handle the `edenscm.mercurial` version. There are also some
module-level states (ex. `extensions._extensions`) that would cause trouble if
they have multiple versions in a single process.

Change imports to use the `edenscm` so ideally the `mercurial` is no longer
imported at all. Add checks in extensions.py to catch unexpected extensions
importing modules from the old (wrong) locations when running tests.

Reviewed By: phillco

Differential Revision: D13868981

fbshipit-source-id: f4e2513766957fd81d85407994f7521a08e4de48
2019-01-29 17:25:32 -08:00
Jun Wu
22e9000fc9 lz4-pyframe: add compresshc
Summary:
Unfortunately required symbols are not exposed by lz4-sys. So we just declare
them ourselves.

Make sure it compresses better:

  In [1]: c=open('/bin/bash').read();
  In [2]: from mercurial.rust import lz4
  In [3]: len(lz4.compress(c))
  Out[3]: 762906
  In [4]: len(lz4.compresshc(c))
  Out[4]: 626970

While it's much slower for larger data (and compresshc is slower than pylz4):

  Benchmarking (easy to compress data, 20MB)...
            pylz4.compress: 10328.03 MB/s
       rustlz4.compress_py:  9373.84 MB/s
          pylz4.compressHC:  1666.80 MB/s
     rustlz4.compresshc_py:  8298.57 MB/s
          pylz4.decompress:  3953.03 MB/s
     rustlz4.decompress_py:  3935.57 MB/s
  Benchmarking (hard to compress data, 0.2MB)...
            pylz4.compress:  4357.88 MB/s
       rustlz4.compress_py:  4193.34 MB/s
          pylz4.compressHC:  3740.40 MB/s
     rustlz4.compresshc_py:  2730.71 MB/s
          pylz4.decompress:  5600.94 MB/s
     rustlz4.decompress_py:  5362.96 MB/s
  Benchmarking (hard to compress data, 20MB)...
            pylz4.compress:  5156.72 MB/s
       rustlz4.compress_py:  5447.00 MB/s
          pylz4.compressHC:    33.70 MB/s
     rustlz4.compresshc_py:    22.25 MB/s
          pylz4.decompress:  2375.42 MB/s
     rustlz4.decompress_py:  5755.46 MB/s

Note python-lz4 was using an ancient version of lz4. So there could be differences.

Reviewed By: DurhamG

Differential Revision: D13528200

fbshipit-source-id: 6be1c1dd71f57d40dcffcc8d212d40a853583254
2018-12-20 17:54:22 -08:00
Jun Wu
08981fee2e rustlz4: use zero-copy return type
Summary:
Use the newly added zero-copy method to improve Rust lz4 performance. It's now
roughly as fast as python-lz4 when tested by stresstest-compress.py:

  Benchmarking (easy to compress data)...
            pylz4.compress: 10461.62 MB/s
       rustlz4.compress_py:  9379.41 MB/s
          pylz4.decompress:  3802.85 MB/s
     rustlz4.decompress_py:  3975.61 MB/s
  Benchmarking (hard to compress data)...
            pylz4.compress:  5341.69 MB/s
       rustlz4.compress_py:  5012.30 MB/s
          pylz4.decompress:  6768.17 MB/s
     rustlz4.decompress_py:  6651.08 MB/s

(Note: decompress can be visibly faster if we return `bytearray` instead of
`bytes`. However a lot of places expect `bytes`)

Previously, the result looks like:

  Benchmarking (easy to compress data)...
            pylz4.compress: 10810.05 MB/s
       rustlz4.compress_py: 11175.36 MB/s
          pylz4.decompress:  3868.92 MB/s
     rustlz4.decompress_py:   634.56 MB/s
  Benchmarking (hard to compress data)...
            pylz4.compress:  4565.91 MB/s
       rustlz4.compress_py:   622.94 MB/s
          pylz4.decompress:  6887.76 MB/s
     rustlz4.decompress_py:  2854.79 MB/s

Note this changes the return type from `bytes` to `bytearray` for the
`compress` function. `decompress` still returns `bytes`, which is important for
compatibility. Note that zero-copy `bytes` can not be implemented `compress` -
the size of `PyBytes` is unknown and cannot be pre-allocated.

Reviewed By: DurhamG

Differential Revision: D13516211

fbshipit-source-id: b21f852c390722c086aa2f37a758bf3f58af31b4
2018-12-20 17:54:22 -08:00
Jun Wu
3b35a77fe8 rustlz4: expose lz4-pyframe to Python
Summary:
This is intended to replace the python-lz4 library so we have a unified code
path.

However, added benchmark indicates the Rust version is significantly slower
than python-lz4:

  Benchmarking (easy to compress data)...
            pylz4.compress: 10964.14 MB/s
       rustlz4.compress_py: 12126.00 MB/s
          pylz4.decompress:  3908.29 MB/s
     rustlz4.decompress_py:   798.68 MB/s
  Benchmarking (hard to compress data)...
            pylz4.compress:  5615.86 MB/s
       rustlz4.compress_py:   740.32 MB/s
          pylz4.decompress:  6145.68 MB/s
     rustlz4.decompress_py:  2423.99 MB/s

The only case where the Rust version is fine is when the returned data is
small. That suggests rust-cpython was likely doing some memcpy unnecessarily.

Reviewed By: DurhamG

Differential Revision: D13516207

fbshipit-source-id: 72150b15c38bc8d8c7e7717a56a41f48d114db19
2018-12-20 17:54:21 -08:00