Commit Graph

44837 Commits

Author SHA1 Message Date
Maksim Solovjov
a42978939e Add exclude functionality to dirsync
Reviewed By: DurhamG

Differential Revision: D13607512

fbshipit-source-id: 80d48eab8fb49d209f856dd82ba084bf2c059150
2019-01-15 07:14:23 -08:00
Mark Thomas
9d6b58b5cd localrepo: add automigrate mechanism
Summary:
Generalise the `migrateonpull` mechanism of `treestate` into a generic
`automigrate` step that is invoked at the start of pulling.  This will be used
for other migrations in the future.

Reviewed By: liubov-dmitrieva

Differential Revision: D13608718

fbshipit-source-id: d558dc21176a6b8d786836d06414e3fc88a20d47
2019-01-15 07:05:46 -08:00
Mark Thomas
3b9eb801e1 types: use Fallible
Summary:
Use the `Fallible` type alias provided by `failure` rather than defining our
own.

Differential Revision: D13657313

fbshipit-source-id: ae249bc15037cc2be019ce7ce8a440c153aa31cc
2019-01-15 03:50:47 -08:00
Mark Thomas
3570402d79 watchman_client: use Fallible
Summary:
Use the `Fallible` type alias provided by `failure` rather than defining our
own.

Differential Revision: D13657312

fbshipit-source-id: 55134ee93f1f3aaaeefe5644a4a1f2285603bc1c
2019-01-15 03:50:47 -08:00
Mark Thomas
7f1258f091 commitcloudsubscriber: use Fallible
Summary:
Use the `Fallible` type alias provided by `failure` rather than defining our
own.

Differential Revision: D13657314

fbshipit-source-id: f1a379089972f7f0066c49ddedf606d36b7ac260
2019-01-15 03:50:47 -08:00
Mark Thomas
d3709fde5b mononokeapi: use Fallible
Summary:
Use the `Fallible` type alias provided by `failure` rather than defining our
own.

Differential Revision: D13657310

fbshipit-source-id: cae73fc239a6ad30bb6ef56a664d1ef5a2a19b5f
2019-01-15 03:50:47 -08:00
Xavier Deguillard
f170cceea2 revisionstore: Repackable::delete now takes the ownership of self.
Summary:
On some platforms, removing a file can fail if it's still mapped or opened. In
mercurial, this can happen during repack as the datapacks are removed while
still being mapped.

Reviewed By: DurhamG

Differential Revision: D13615938

fbshipit-source-id: fdc1ff9370e2767e52ee1828552f4598105f784f
2019-01-14 21:14:13 -08:00
Xavier Deguillard
736d0ba7d4 pyrevisionstore: Use a RefCell<Option<_>> instead of the {Data,History}Pack
Summary:
The cpython crate forces all the method to take a &self, which forbids
modification of the embedded pack datastructure. Its documentation recommends
using internal mutability for this purpose. Most of the code is wrapped to
avoid lots of boiler plate code.

Reviewed By: DurhamG

Differential Revision: D13640638

fbshipit-source-id: 3b7513b6117d429322efe32868e683239c68806e
2019-01-14 21:14:13 -08:00
Durham Goode
dbca4b04d8 hgsubversion: dont circumvent revmap abstraction in updatemeta
Summary:
In D13433558 we moved lastpulled into the sqlite database for sqlite
backed revmaps. It turns out the update/rebuildmeta code circumvents the revmap
abstraction and attempts to read the lastpulled file directly from disk. In a
sql backed world, this attempt fails and it then rebuilds the entire revmap
which is very slow.

There was a comment stating the code was intentionally not reading from the
revmap, but I don't believe it applies anymore, and the tests pass fine with
this new change.

Reviewed By: singhsrb

Differential Revision: D13662697

fbshipit-source-id: 2db8f346d89053604d34fbda8f531f688cf71210
2019-01-14 20:31:34 -08:00
Xavier Deguillard
678fd5c0fe remotefilelog: Remove old temporary files.
Summary:
We've observed some users with very large hgcache directories that were filled
with temporary pack files that for some reasons were not removed/renamed on a
previous repack.

These files can appear due to a variety of reasons, such as forcibly killing
hg, or a host power-off, or simply due to a bug in mercurial. It is likely that
the later case is what causes some of the hgcache directories to grow and this
patch doesn't attempt on finding the underlying mercurial issue. Rather, let's
alleviate the issue by simply removing the temporary files older than 24h.

Reviewed By: ikostia

Differential Revision: D13646642

fbshipit-source-id: faa0605e322d440a75187e2517cbbcb13031dae0
2019-01-14 14:56:54 -08:00
Aida Getoeva
7046f81a00 bisect -c: fix empty changeset after sparse skip
Summary: If the there is no changeset left after sparse skip in `--command` mode, show the result and return, as it's done in manual testing mode.

Reviewed By: markbt

Differential Revision: D13650568

fbshipit-source-id: 8e867a38858d84d9a10078b74e2087318c81b01e
2019-01-14 11:55:11 -08:00
Aida Getoeva
7775a9c220 bisect: test -c in case when all nodes skipped
Summary: Adding new test to show that sparse skip with `--command` doesn't show the correct answer

Reviewed By: markbt

Differential Revision: D13650567

fbshipit-source-id: f4b6670fe67d6ef2543efedd91d9760e1e6bc74c
2019-01-14 11:55:11 -08:00
Xavier Deguillard
da3dd2319f revisionstore: remove repacked pack files
Summary:
After repacking the data/history packs, we need to cleanup the
repacked files. This was an omission from D13363853.

Reviewed By: markbt

Differential Revision: D13577592

fbshipit-source-id: 36e7d5b8e86affe47cdd10d33a769969f02b8a62
2019-01-11 16:54:15 -08:00
Xavier Deguillard
ce16778656 remotefilelog: set proper file permissions on closed mutable packs.
Summary:
The python version of the mutable packs set the permission to read-only after
writing them, while the rust version keeps them writeable. Let's make the rust
one more consistent.

Reviewed By: markbt

Differential Revision: D13573572

fbshipit-source-id: 61256994562aa09058a88a7935c16dfd7ddf9d18
2019-01-11 16:54:15 -08:00
Liubov Dmitrieva
603daf671b Back out "[mercurial] enable modern way to transport phases on client side"
Summary:
Original commit changeset: 63c72654a345

Backout as there are issue in implementation of this transport on Mononoke side.

It requires better testing.

Reviewed By: StanislavGlebik

Differential Revision: D13635800

fbshipit-source-id: 1cdd43658889f68cd13df757ca6e21de01140dc9
2019-01-11 08:01:44 -08:00
Harvey Hunt
cc3b3d869a archival: Use manifest.matches to improve performance extdiff
Summary:
Previously, archival.py ran a matchfn on all files reported by the manifest. This is slow for treemanifest repos.
Update this to allow the manifest to run the matchfn itself, which is considerably quicker.

Reviewed By: DurhamG, markbt

Differential Revision: D13624052

fbshipit-source-id: eca022170e6bb0c0cf3c845bcafcb4e35207e825
2019-01-11 03:12:59 -08:00
Chad Austin
a948959e2e hg: improve connection closed early error message
Summary:
Several people observed this connection closed early message last week
and were unable to go further, so perhaps it would be helpful to
include a bit of additional data in the error message.

Reviewed By: quark-zju

Differential Revision: D13165697

fbshipit-source-id: 8f7d9d29d52697393a8474d6b8697099b0e33442
2019-01-10 21:21:59 -08:00
Liubov Dmitrieva
2086ce2ed6 enable modern way to transport phases on client side
Summary:
upstream hg started to use separate bundle2 part to push phases but we are stuck with the old
way.

The old way was enabled in facebook.rc but nowhere in the tests except of the one
(where I disabled it) proving that the new way is working.

The original commit introduced the config option as temporary is talking about
pushrebase

changeset:   877f6928075428a4470bd399c82c9b1e9eaba9ad  D6156039
user:        Durham Goode <durham@fb.com>
date:        Wed, 25 Oct 2017 16:39:02 -0800

    configs: set devel.legacy.exchange=phases

    Summary:
      Upstream has enable sending phase bundle parts by default now, but
      our server doesn't have the pushrebase fix to make this work.  Let's
      temporarily disable this until we've updated the servers.

I see in pushrebase support was implemented but the temporary disabling ended
up being permanent.

changeset:   f6cf77230612bdf8be0c7fdd6ab21737fdaf35bf  D1204
user:        Stanislau Hlebik <stash@fb.com>
date:        Mon, 23 Oct 2017 09:36:16 -0800

    pushrebase: handle pushing phases through separate bundle2 part

    Differential Revision: https://phab.mercurial-scm.org/D1204

I would like to switch to this transport because this is the one we are
supporting on Mononoke side and I don't like to have different configs on client side depends on
whether Mononoke is used or not.

Reviewed By: DurhamG

Differential Revision: D13622994

fbshipit-source-id: 63c72654a34584ad31d17b174660166b46f087fb
2019-01-10 14:58:28 -08:00
Mark Thomas
30eb09c931 commitcloud: don't autojoin users who have manually disconnected
Summary:
If a user manually disconnects from Commit Cloud Sync, their next background
backup will automatically reconnect them if `commitcloud.autocloudjoin` is set.

Make the `autocloudjoin` setting only work if the user has never connected
to a workspace before.  Detect the difference between the two by leaving a
`commitcloudrc` file in place after disconnecting.

Reviewed By: liubov-dmitrieva

Differential Revision: D13621476

fbshipit-source-id: ffccd473cb3da592e5b991dd863b8afed45dc83a
2019-01-10 06:37:20 -08:00
Mark Thomas
3298b1c7ab commitcloud: don't updateonmove if backups are disabled
Summary:
If `commitcloud.updateonmove` is set, but backups are disabled, we shouldn't
still follow any new obsmarkers that have appeared (e.g. because of
pullcreatemarkers) when cloud sync runs.

Reviewed By: liubov-dmitrieva

Differential Revision: D13599478

fbshipit-source-id: 10c5c190a08fe5cf72cdfd0165ca61928c0d1800
2019-01-09 07:56:54 -08:00
Mark Thomas
093bb82cca commitcloud: don't updateonmove if the new commit is public
Summary:
The pullcreatemarkers extension interacts badly with commit cloud sync's
updateonmove.  If the current commit has been landed, the next cloud sync
will follow the marker to the landed commit.  This is not usually what the user
wants, and slows down usage as they have to wait for the update to finish
before they can continue working.

Reviewed By: liubov-dmitrieva

Differential Revision: D13599477

fbshipit-source-id: f25d50d9dcb023894f2459e632fbd5ff4d172dd0
2019-01-09 06:12:53 -08:00
Mark Thomas
98417b1ffb configparser: fix warning about unused Result
Summary:
Use of `write!` requires checking for errors, however in this case, there is no
need to use `write!`, as we just want the error as a string.

Reviewed By: ikostia

Differential Revision: D13596497

fbshipit-source-id: 5892025344936936188cf3a8ca227e71eff57d55
2019-01-08 06:19:55 -08:00
Adam Simpkins
47b14e9c2e have fsmonitor explicitly avoid wrapping Eden repository objects
Summary:
Teach the fsmonitor extension about Eden, and have it explicitly avoid
wrapping repository objects for Eden-backed repositories.

Reviewed By: quark-zju

Differential Revision: D13523302

fbshipit-source-id: d1114b24311a933fe46baef74d3e514778bd400b
2019-01-07 13:01:35 -08:00
Mark Thomas
4ce2783f74 remotefilelog: don't prune commits with null linknodes
Summary:
When building changegroups, remotefilelog omits filenodes for which the
linknode is known to be available at the server.  Since linknodes may
now be null, we need to include these filenodes, as we don't know
whether the linknode is available or not.

Reviewed By: quark-zju

Differential Revision: D13504563

fbshipit-source-id: 8d7106d32f4ec3f2e006b253d68b32c031638b4d
2019-01-02 04:43:58 -08:00
Mark Thomas
bde717a925 treemanifest: fixup linknodes when sending to the server
Summary:
When sending a bundle to the server, ensure that the treemanifest linknodes
correctly refer to a commit that is in the bundle.

Treemanifest ignores the linknodes of the subtrees - instead they inherit the
linknode of the root manifest in which they were introduced, so we only need to
fix up the linknode for the root manifest and propagate that to the subtrees.

Reviewed By: quark-zju

Differential Revision: D13504564

fbshipit-source-id: 2f481a4939239784d84d5db12c70d473b3045610
2019-01-02 04:43:58 -08:00
Mark Thomas
00951e42c2 treemanifest: add test demonstrating problem with linknodes
Summary:
When we send a bundle to the server, the treemanifest nodes are sent with
whatever linknode they have.  This linknode might not refer to a commit that
the server knows about, which makes the bundle invalid.  Add a test that
demonstrates that the server aborts in this case.

Pushrebase works fine as it rewrites the commits, generating new linknodes for
all of the trees.

Reviewed By: quark-zju

Differential Revision: D13504565

fbshipit-source-id: 39894d367c111aea5cef7de2d7da122e39d9debe
2019-01-02 04:43:58 -08:00
Mark Thomas
2915dd2086 remotefilelog: don't store draft remotefilelog blobs in memcache
Summary:
If we receive a remotefilelog blob from the server where the history
information contains a null linknode, don't store that blob in memcache.

Reviewed By: quark-zju

Differential Revision: D13517715

fbshipit-source-id: 6ea68391ab2488db223ca261e8303ea24e091915
2019-01-02 04:43:58 -08:00
Mark Thomas
6ae8397b26 pymononokeapi: use cpython-failure for PyErr generation
Summary:
Use the cpython-failure crate for generating `PyResult` from `Result` by
mapping to a `PyErr`.

Reviewed By: DurhamG

Differential Revision: D13464988

fbshipit-source-id: d927f89c111dce737b59905ceeab1d30381a8510
2019-01-02 04:13:20 -08:00
Jun Wu
f6158659f8 configparser: use hardcoded system config path on Windows
Summary:
When I was debugging an eden importer issue with Puneet, we saw errors caused
by important extensions (ex. remotefilelog, lz4revlog) not being loaded.  It
turned out that configpaser was checking the "exe dir" to decide where to
load "system configs". For example, If we run:

  C:\open\fbsource\fbcode\scm\hg\build\pythonMSVC2015\python.exe eden_import_helper.py

The "exe dir" is "C:\open\fbsource\fbcode\scm\hg\build", and system config is
not there.

Instead of copying "mercurial.ini" to every possible "exe dir", this diff just
switches to a hard-coded system config path. It's now consistent with what we
do on POSIX systems.

The logic to copy "mercurial.ini" to "C:\open\fbsource\fbcode\scm\hg" or
"C:\tools\hg" become unnecessary and are removed.

Reviewed By: singhsrb

Differential Revision: D13542939

fbshipit-source-id: 5fb50d8e42d36ec6da28af29de89966628fe5549
2018-12-22 01:53:03 -08:00
Saurabh Singh
b193e23dd2 test-check-fix-code: unbreak test by fixing copyrights
Summary:
`test-check-fix-code.t` was failing due to copyright header missing
from certain files. This commit fixes the files by running

```
contrib/fix-code.py FILE
```

as suggested in the failure message.

Reviewed By: DurhamG

Differential Revision: D13538506

fbshipit-source-id: d8063c9a0e665377a9976abeccb68fbef6781950
2018-12-21 10:03:26 -08:00
Jun Wu
c2b973b47a absorb: fix message when there are only deleted commits
Summary:
Even if no new commits are created, absorb might still have "applied" some
changes by deleting commits. Let's fix the end-user message.

Reviewed By: DurhamG

Differential Revision: D13531959

fbshipit-source-id: 4d942f3ccd8201e8b62c8bc1c86227d41021b5f9
2018-12-20 17:54:23 -08:00
Jun Wu
5e0fc4c563 absorb: use scmutil.cleanupnodes
Summary:
`scmutil.cleanupnodes` was initially ported from absorb to simplify other
commands. Now use it to simplify absorb itself.

This solves a crash when `self.finalnode` is empty (ex. no new commits are
created, only with commits deletion).

Reviewed By: DurhamG

Differential Revision: D13531961

fbshipit-source-id: 7006b5ac5dfc4db897413d18ccd26eedde3c98d9
2018-12-20 17:54:22 -08:00
Jun Wu
a74541c40c scmutil: make cleanupnodes return calculated moves
Summary: It's used in the upcoming patches.

Reviewed By: DurhamG

Differential Revision: D13531962

fbshipit-source-id: 9caf6c3d5ef079082c9fc677ff2c2ef0e492a1db
2018-12-20 17:54:22 -08:00
Jun Wu
de6a5ca10d scmutil: revise cleanupnodes behavior when moving bookmarks backwards
Summary:
Previously when a commit does not have a replacment, it will be moved fair back
to a commit that is not being replaced. For example, when A::C is being rebased
to A'::C' and B disappers due to being empty, the old code would move BOOK-B to
Z, while the new code would move BOOK-B to A':

  C             C'
  |             |
  B BOOK-B  ->  |
  |             |
  A      ------ A' <- new BOOK-B
  |     /
  Z ----           <- old BOOK-B

Note, the current `rebase` implementation overrides the "moves" calcuation used
in cleanupnodes. It already has the new behavior. So there are no changes in
rebase tests.

Right now, the real intended user is absorb, so I'm not adding new tests here.
Without this change, when absorb migrates to cleanupnodes, its test will break.
So I'm not adding new tests here.

Reviewed By: DurhamG

Differential Revision: D13531964

fbshipit-source-id: 03b6afa116e1a7b08b33a2c8856f2e52d6f8043a
2018-12-20 17:54:22 -08:00
Jun Wu
e07d80c6af absorb: stop writing absorb_source metadata
Summary:
It was to workaround the upstream obsmarker design which cannot support cycles.
Now that our internal obsmarkers can have cycles just fine, and the upcoming
mutation metadata makes it impossible to have cycles. Drop the workaround.

Reviewed By: DurhamG

Differential Revision: D13531960

fbshipit-source-id: d569172f0d2d5a3b4e1f6589be44ac21a09604f3
2018-12-20 17:54:22 -08:00
Jun Wu
22ee659eec absorb: default to "yes" for prompting changes
Summary: Otherwise absorb never works with HGPLAIN=1.

Reviewed By: DurhamG

Differential Revision: D13531963

fbshipit-source-id: af598b985501db425405f0c851e196e9eddc2350
2018-12-20 17:54:22 -08:00
Jun Wu
1251bb0736 tests: add a test logging files with filenode collision
Summary:
I have been wondering how hg behaves with filenode collision.  So I added some
tricky-looking tests about it. It actually shows the existing logic is
problematic :(

Reviewed By: DurhamG

Differential Revision: D13011554

fbshipit-source-id: fffb026e05adc8d8de4a1e5692bbee57293cce4e
2018-12-20 17:54:22 -08:00
Jun Wu
03a4b9d606 setup: embed Cython
Summary:
Use an `asset` to download Cython on demand. So we don't need to install Cython
as build dependency on all supported platforms, and maintain the "Cython"
package for those platforms.

Upgrade to the latest Cython by the way.

Reviewed By: singhsrb

Differential Revision: D13513514

fbshipit-source-id: 5ebe9a3e5b785a8f85cd51624663f9cc1e5c66fd
2018-12-20 17:54:22 -08:00
Jun Wu
94565d0386 lz4: use Rust lz4 binding
Summary:
Drop dependency of `python-lz4`.

Add some convertions from bytearray to bytes to make code compatible.

Reviewed By: DurhamG

Differential Revision: D13516212

fbshipit-source-id: 89beb0aa92be4c5442a8e837f509e1eb17bb1512
2018-12-20 17:54:22 -08:00
Jun Wu
fafc7c6b1c tests: re-open store after datapack truncation
Summary:
When I replace lz4 to rust.lz4, the test failed. The change fixes it.  That
also means datapack corruption detection is not that reliable. However, usually
those files are not changed when they are loaded into an in-memory store, so
it's probaby fine.

Also note the python-lz4 used in production has strange behavior when
compressing an empty string:

  In [6]: lz4.compressHC('')
  Out[6]: '\x00\x00\x00\x00\xa0#\xd9\x040\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00'
  In [7]: lz4.compress('')
  Out[7]: '\x00\x00\x00\x00\xa0#\xd9\x040\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00'

  In [9]: rustlz4.compress('')
  Out[9]: bytearray(b'\x00\x00\x00\x00')
  In [10]: rustlz4.compresshc('')
  Out[10]: bytearray(b'\x00\x00\x00\x00')

  In [13]: lz4.compress('1')
  Out[13]: '\x01\x00\x00\x00\x101'
  In [14]: rustlz4.compress('1')
  Out[14]: bytearray(b'\x01\x00\x00\x00\x101')

Reviewed By: DurhamG

Differential Revision: D13528199

fbshipit-source-id: 9b3e8674f989062928900766156a97d28262c8cb
2018-12-20 17:54:22 -08:00
Jun Wu
22e9000fc9 lz4-pyframe: add compresshc
Summary:
Unfortunately required symbols are not exposed by lz4-sys. So we just declare
them ourselves.

Make sure it compresses better:

  In [1]: c=open('/bin/bash').read();
  In [2]: from mercurial.rust import lz4
  In [3]: len(lz4.compress(c))
  Out[3]: 762906
  In [4]: len(lz4.compresshc(c))
  Out[4]: 626970

While it's much slower for larger data (and compresshc is slower than pylz4):

  Benchmarking (easy to compress data, 20MB)...
            pylz4.compress: 10328.03 MB/s
       rustlz4.compress_py:  9373.84 MB/s
          pylz4.compressHC:  1666.80 MB/s
     rustlz4.compresshc_py:  8298.57 MB/s
          pylz4.decompress:  3953.03 MB/s
     rustlz4.decompress_py:  3935.57 MB/s
  Benchmarking (hard to compress data, 0.2MB)...
            pylz4.compress:  4357.88 MB/s
       rustlz4.compress_py:  4193.34 MB/s
          pylz4.compressHC:  3740.40 MB/s
     rustlz4.compresshc_py:  2730.71 MB/s
          pylz4.decompress:  5600.94 MB/s
     rustlz4.decompress_py:  5362.96 MB/s
  Benchmarking (hard to compress data, 20MB)...
            pylz4.compress:  5156.72 MB/s
       rustlz4.compress_py:  5447.00 MB/s
          pylz4.compressHC:    33.70 MB/s
     rustlz4.compresshc_py:    22.25 MB/s
          pylz4.decompress:  2375.42 MB/s
     rustlz4.decompress_py:  5755.46 MB/s

Note python-lz4 was using an ancient version of lz4. So there could be differences.

Reviewed By: DurhamG

Differential Revision: D13528200

fbshipit-source-id: 6be1c1dd71f57d40dcffcc8d212d40a853583254
2018-12-20 17:54:22 -08:00
Jun Wu
4f24bffdde cpython-ext: move pybuf to cpython-ext
Summary:
The `pybuf` provides a way to read `bytes`, `bytearray`, some `buffer` types in
a zero-copy way. The main benefit is to use same code to support different
input types. It's copied to a couple of places. Let's move it to `cpython-ext`.

Reviewed By: DurhamG

Differential Revision: D13516206

fbshipit-source-id: f58881c4bfe651a6fdb84cf317a74c3c8d7a4961
2018-12-20 17:54:22 -08:00
Jun Wu
08981fee2e rustlz4: use zero-copy return type
Summary:
Use the newly added zero-copy method to improve Rust lz4 performance. It's now
roughly as fast as python-lz4 when tested by stresstest-compress.py:

  Benchmarking (easy to compress data)...
            pylz4.compress: 10461.62 MB/s
       rustlz4.compress_py:  9379.41 MB/s
          pylz4.decompress:  3802.85 MB/s
     rustlz4.decompress_py:  3975.61 MB/s
  Benchmarking (hard to compress data)...
            pylz4.compress:  5341.69 MB/s
       rustlz4.compress_py:  5012.30 MB/s
          pylz4.decompress:  6768.17 MB/s
     rustlz4.decompress_py:  6651.08 MB/s

(Note: decompress can be visibly faster if we return `bytearray` instead of
`bytes`. However a lot of places expect `bytes`)

Previously, the result looks like:

  Benchmarking (easy to compress data)...
            pylz4.compress: 10810.05 MB/s
       rustlz4.compress_py: 11175.36 MB/s
          pylz4.decompress:  3868.92 MB/s
     rustlz4.decompress_py:   634.56 MB/s
  Benchmarking (hard to compress data)...
            pylz4.compress:  4565.91 MB/s
       rustlz4.compress_py:   622.94 MB/s
          pylz4.decompress:  6887.76 MB/s
     rustlz4.decompress_py:  2854.79 MB/s

Note this changes the return type from `bytes` to `bytearray` for the
`compress` function. `decompress` still returns `bytes`, which is important for
compatibility. Note that zero-copy `bytes` can not be implemented `compress` -
the size of `PyBytes` is unknown and cannot be pre-allocated.

Reviewed By: DurhamG

Differential Revision: D13516211

fbshipit-source-id: b21f852c390722c086aa2f37a758bf3f58af31b4
2018-12-20 17:54:22 -08:00
Jun Wu
f23c6bc7e3 cpython-ext: add a way to pre-allocate PyBytes
Summary: Make it possible to write content directly into a PyBytes buffer.

Reviewed By: DurhamG

Differential Revision: D13528202

fbshipit-source-id: 8c0a4ed030439a8dc40cdfbd72b1f6734a8b2036
2018-12-20 17:54:22 -08:00
Jun Wu
6e88ac4794 lz4-pyframe: provide decompress_into API
Summary:
This allows decompressing into a pre-allocated buffer. After some experiments,
it seems `bytearray` will just break too many things, ex:

- bytearray is not hashable
- bytearray[index] returns an int
- a = bytearray('x'); b = a; b += '3' # will mutate 'a'
- ''.join([bytearray('')]) will raise TypeError

Therefore we have to use zero-copy `bytes` instead, which is less elegent. But
this API change is a step forward.

Reviewed By: DurhamG

Differential Revision: D13528201

fbshipit-source-id: 1cfaf5d55efdc0d6c0df85df9960fe9682028b08
2018-12-20 17:54:22 -08:00
Jun Wu
7831e2a4ce cpython-ext: add ways to zero-copy Vec<u8> into a Python object
Summary:
I need to convert `Vec<u8>` to a Python object in a zero-copy way for rustlz4
performacne.

Assuming Python and Rust use the same memory allocator, it's possible to transfer
the control of a malloc-ed pointer from Rust to Python. Use this to implement
zero-copy. PyByteArrayObject is chosen because its struct contains such a pointer.
PyBytes cannot be used as it embeds the bytes, without using a pointer.

Sadly there are no CPython APIs to do this job. So we have to write to the raw
structures. That means the code will crash if python is replaced by
python-debug (due to Python object header change). However, that seems less an
issue given the performance wins. If python-debug does become a problem, we can
try vendoring libpython directly.

I didn't implement a feature-rich `PyByteArray` Rust object. It's not easy to
do so outside the cpython crate. Most helper macros to declare types cannot be
reused, because they refer to `::python`, which is not available in the current
crate.

Reviewed By: DurhamG

Differential Revision: D13516209

fbshipit-source-id: 9aa089b309beb71d4d21f6c63fcb97dbc798b5f8
2018-12-20 17:54:22 -08:00
Jun Wu
3b35a77fe8 rustlz4: expose lz4-pyframe to Python
Summary:
This is intended to replace the python-lz4 library so we have a unified code
path.

However, added benchmark indicates the Rust version is significantly slower
than python-lz4:

  Benchmarking (easy to compress data)...
            pylz4.compress: 10964.14 MB/s
       rustlz4.compress_py: 12126.00 MB/s
          pylz4.decompress:  3908.29 MB/s
     rustlz4.decompress_py:   798.68 MB/s
  Benchmarking (hard to compress data)...
            pylz4.compress:  5615.86 MB/s
       rustlz4.compress_py:   740.32 MB/s
          pylz4.decompress:  6145.68 MB/s
     rustlz4.decompress_py:  2423.99 MB/s

The only case where the Rust version is fine is when the returned data is
small. That suggests rust-cpython was likely doing some memcpy unnecessarily.

Reviewed By: DurhamG

Differential Revision: D13516207

fbshipit-source-id: 72150b15c38bc8d8c7e7717a56a41f48d114db19
2018-12-20 17:54:21 -08:00
Jun Wu
35c85018cd lz4-pyframe: add a benchmark
Summary:
This gives some sense about how fast it is.

Background: I was trying to get rid of python-lz4, by exposing this to Python.
However, I noticed it's 10x slower than python-lz4. Therefore I added some
benchmark here to test if it's the wrapper or the Rust lz4 code.

It does not seem to be this crate:

```
  # Pure Rust
  compress (100M)                77.170 ms
  decompress (~100M)             67.043 ms

  # python-lz4
  In [1]: import lz4, os
  In [2]: b=os.urandom(100000000);
  In [3]: %timeit lz4.compress(b)
  10 loops, best of 3: 87.4 ms per loop
```

Reviewed By: DurhamG

Differential Revision: D13516205

fbshipit-source-id: f55f94bbecc3b49667ed12174f7000b1aa29e7c4
2018-12-20 17:54:21 -08:00
Jun Wu
b3893b3d3c indexedlog: add methods on Log to do prefix lookups
Summary:
This exposes the underlying lookup functions from `Index`.

Alternatively we can allow access to `Index` and provide an `iter_started_from`
method on `Log` which takes a raw offset. I have been trying to avoid exposing
raw offsets in public interfaces, as they would change after `flush()` and cause
problems.

Reviewed By: markbt

Differential Revision: D13498303

fbshipit-source-id: 8b00a2a36a9383e3edb6fd7495a005bc985fd461
2018-12-20 15:50:55 -08:00
Jun Wu
3237b77e4c indexedlog: add APIs to lookup by prefix
Summary:
This is the missing API before `indexedlog::Index` can fit in the
`changelog.partialmatch` case. It's actually more flexible as it can provide
some example commit hashes while the existing revlog.c or radixbuf
implementation just error out saying "ambiguous prefix".

It can be also "abused" for the semantics of sorted "sub-keys". By replace
"key" with "key + subkey" when inserting to the index. Looking up using "key"
would return a lazy result list (`PrefixIter`) sorted by "subkey". Note:
the radix tree is NOT efficient (both in time and space) when there are common
prefixes. So this use-case needs to be careful.

Reviewed By: markbt

Differential Revision: D13498301

fbshipit-source-id: 637856ebd761734d68b20c15866424b1d4518ad6
2018-12-20 15:50:55 -08:00