Summary: Per comments on D22429347, add a new `ExtractInnerRef` trait that is similar to `ExtractInner`, but returns a reference to the underlying value. A default implementation is provided for types whose inner value is `Clone + 'static`, so in practice most types will only need to implement `ExtractInnerRef`, whereas the callsite may choose whether it needs a reference or an owned value.
Reviewed By: quark-zju
Differential Revision: D22464158
fbshipit-source-id: 7b97329aedcddb0e51fd242b519e79eba2eed350
Summary: Ensure that all of the components of an EdenAPI response use the same error type.
Reviewed By: quark-zju
Differential Revision: D22443029
fbshipit-source-id: 3e00a8b83677beb5ef2d90630fe9b85760874186
Summary: Add an `add_entry` convenience method to `HgIdMutableDeltaStore`, similar to the one present in `HgIdMutableHistoryStore`.
Reviewed By: quark-zju
Differential Revision: D22443031
fbshipit-source-id: 84fdaae9fbd51e6f2df466b0441ec5f7ce6715f7
Summary:
A common pattern in Mercurial's data storage layer Python bindings is to have a Python object that wraps a Rust object. These Python objects are often passed across the FFI boundary to Rust code, which then may need to access the underlying Rust value.
Previously, the objects that used this pattern did so in an ad-hoc manner, typically by providing an `into_inner` or `to_inner` inherent method. This diff introduces a new `ExtractInner` trait that standardizes this pattern into a single interface, which in turn allows this pattern to be used with generics.
Reviewed By: quark-zju
Differential Revision: D22429347
fbshipit-source-id: cab4c24b8b98c6ef8307f72a9b4726aabdc829cc
Summary:
D22396026 made it so that `HttpClient::send_async` no longer consumes `self`. This means that instead of creating a new HTTP client for each request, we can reuse the same one.
This has the benefit of allowing for connection reuse (which was the point of D22396026), resulting in lower latency for serial fetches.
Reviewed By: quark-zju
Differential Revision: D22397768
fbshipit-source-id: 9d066c1ec64a6aa1b36ec674ef294030c1f90b41
Summary: Allow passing multiple JSON requests to the EdenAPI CLI. The requests will be performed serially, which allows for testing the performance of serial EdenAPI calls.
Reviewed By: quark-zju
Differential Revision: D22397769
fbshipit-source-id: c59e5abf53eee9c2014010672183e202b6f180fc
Summary:
Add a pool of `Multi` handles that the client can reuse across requests.
Previously, `HttpClient`'s async functions had to consume the client in order to have a `'static` lifetime (since `Future`s generally cannot hold references to things outside of themselves). This meant that the each async operation would use its own `Multi` handle, preventing connection reuse across operations since the `Multi` handle maintains a connection cache internally.
With this change, the client can reuse the `Multi` session after an async operation, thereby benefitting from libcurl's caching. Note that the same `Multi` handle still cannot be used by concurrently running `Future`s (as this [would not be thread safe](https://curl.haxx.se/libcurl/c/threadsafe.html)), but once a `Future` has completed its `Multi` handle will return to the pool for use by subsequent requests.
---
(Somewhat tangential)
As is noted in the code comments, `libcurl`'s C API provides a way to share caches across multiple multi sessions: [the "share" interface](https://curl.haxx.se/libcurl/c/libcurl-share.html).
While using this would seems preferable to an ad-hoc solution like this diff, it turns out that the `curl` crate does not provide safe bindings to the share interface. This means that in order to use the share interface, we'd need to directly use the unsafe bindings from `curl-sys`.
In addition to the difficulty of working with unsafe FFI code, the API expects the application to handle synchronization by passing it function pointers to handle locking/unlocking shared resources.
Ultimately, I came to the conclusion that managing lifetimes and synchronization in unsafe code across an FFI boundary would be nontrivial, and ensuring correctness would require a lot of effort that could be avoided by implementing an ad-hoc solution on top of the safe API instead. However, it might make sense to change this to use the share interface in the future.
Reviewed By: quark-zju
Differential Revision: D22396026
fbshipit-source-id: 06eea2ffacdc791527eac9ce4becc457af5c0480
Summary: Update the `revisionstore` and `backingstore` crates to use the new EdenAPI crate.
Reviewed By: quark-zju
Differential Revision: D22378330
fbshipit-source-id: 989f34827b744ff4b4ac0aa10d004f03dbe9058f
Summary: Add a new `EdenApiBlocking` trait that exposes blocking versions of the `EdenApi` trait's methods, for use in non-async code.
Reviewed By: quark-zju
Differential Revision: D22305396
fbshipit-source-id: d0f3a73cad1a23a4f0892a17f18267374e63108e
Summary:
This diff adds an EdenAPI CLI program that allows manually sending requests to the server.
Requests are read from stdin in a JSON format (the same format used by the `make_req` tool and the EdenAPI server integration tests). This makes it easy to create and edit requests during debugging.
Responses are re-serialized as CBOR and written to stdout. (The program will refuse to write output if stdout is a TTY.) These responses can then be analyzed using the `read_res` tool (also used by the EdenAPI server integration tests).
The program prints real-time download statistics during data fetching, allow the user to debug performance in addition to correctness.
The program uses standard `hgrc` files to configure the EdenAPI client, which means that one can simulate production settings by specifying a production `hgrc`. By default, it will read from `~/.hgrc.edenapi` rather than `~/.hgrc` since the user will most likely want to configure this program independently of Mercurial.
Reviewed By: quark-zju
Differential Revision: D22370163
fbshipit-source-id: 5d9974bc05fa960d26cd2c87810f4646e2bc55b4
Summary:
Renamed xdiff functions to avoid linking issues when using both libgit2-sys and xdiff.
When using repo_import tool (https://fburl.com/diffusion/8p6fhjt2) we have libgit2-sys dependency for importing git repos. However, when we derive blame data types, we need to use xdiff functionalities (from_no_parents: https://fburl.com/diffusion/pitukmyo -> diff_hunks: https://fburl.com/diffusion/9f8caan9 -> xdl_diff: https://fburl.com/diffusion/260x66hf). Both libgit2 and eden/scm have vendored versions of xdiff library. Therefore, libgit2-sys and eden/scm share functions with the same signatures, but have different behaviours and when we tried to derive blame, it used libgit2-sys's xdl_diff instead of eden's. This resulted in getting segfaults (https://fburl.com/paste/04gwalpo).
Note: repo_import is the first tool that has tried to import both and the first to run into this issue.
Reviewed By: StanislavGlebik
Differential Revision: D22432330
fbshipit-source-id: f2b965f3926a2dc45de1bf20e41dad70ca09cdfd
Summary:
This diff is a complete, ground-up rewrite of the EdenAPI client. Rather than attempting to use `libcurl` directly, it relies on the new `http_client` crate, which makes the code considerably simpler and allows for a proper async interface.
The most notable change is that `EdenApi` is now an async trait. A blocking API is added later in the stack for use in non-async contexts.
Reviewed By: quark-zju
Differential Revision: D22305397
fbshipit-source-id: 4c1e5d3091d6dd04cf13291e7b7a4217dfdd249f
Summary:
As was pointed out in the review for D22280745 (d73c63d862), `CborStream` is inefficient in situations where the underlying stream produces chunks that are much smaller than the size of the serialized items. To avoid pathological behavior, make `CborStream` buffer the incoming data, and only attempt deserialization if enough data has accumulated.
For now, the buffer size is fixed (with a default of 1MB, chosen arbitrarily). In the future, it might make sense to have the stream adjust the buffer size based on the average size of observed deserialized values.
Reviewed By: quark-zju
Differential Revision: D22370164
fbshipit-source-id: ed940c56ca2cbbfc07f01d47becf6f1d71872872
Summary: On Windows a mmap file cannot be replaced. Detect that and delete manually.
Reviewed By: farnz
Differential Revision: D22428731
fbshipit-source-id: 4d308a07aae02dcaf2aedb7b0267a535c2e09c92
Summary: The 0.3 version (currently being used only in one crate eden/scm/lib/commitcloudsubscriber) is using an old openssl crate which doesn't work with openssl library installed on most machines (Both in FB and on GitHub Actions).
Reviewed By: mitrandir77
Differential Revision: D22430649
fbshipit-source-id: b8fa930841dbcdd4c085d8c9488d768b3526e1c4
Summary: D22381744 updated the version of `futures` in third-party/rust to 0.3.5, but did not regenerate the autocargo-managed Cargo.toml files in the repo. Although this is a semver-compatible change (and therefore should not break anything), it means that affected projects would see changes to all of their Cargo.toml files the next time they ran `cargo autocargo`.
Reviewed By: dtolnay
Differential Revision: D22403809
fbshipit-source-id: eb1fdbaf69c99549309da0f67c9bebcb69c1131b
Summary:
The key is "hit_count" not "hits". This typo caused the trace to always claim
that no data was fetched from memcache, which is obviously not true as the
getpack trace that follows listed significantly less requested keys.
Reviewed By: kulshrax
Differential Revision: D22401592
fbshipit-source-id: ab2ea3e7f8ff3a9c7322678afc8a174e09d6dc09
Summary:
If the revlog on disk was changed to include new commits, read them and avoid
writing duplicated commits (which breaks nodemap building).
Reviewed By: sfilipco
Differential Revision: D22323187
fbshipit-source-id: cdd65f31e65865d9f3868e43416633297896c0f9
Summary: This is used by the next diff.
Reviewed By: sfilipco
Differential Revision: D21944139
fbshipit-source-id: 184c4e97aaeca36c3608665defd1473c9300fb5b
Summary: This will satisfy some use-cases.
Reviewed By: sfilipco
Differential Revision: D21854225
fbshipit-source-id: 76758716b35cfd31dc3843c118917c0fb7609027
Summary: This will help move more Python logic to Rust.
Reviewed By: sfilipco
Differential Revision: D21854224
fbshipit-source-id: b03cbacedc11d77e8c56262437a8d10bd9a89e59
Summary: This is discovered by using it in Python world.
Reviewed By: sfilipco
Differential Revision: D22323186
fbshipit-source-id: 295811e0950b94ad2ad73ad242228b6a3f9765d0
Summary: Adding a same commit multiple times is a no-op.
Reviewed By: sfilipco
Differential Revision: D22323190
fbshipit-source-id: 61a06335581a9cad32dc7e929b841ec69b551a9c
Summary: This adds some test coverage for the revlog DagAlgorithm implementation.
Reviewed By: sfilipco
Differential Revision: D22249157
fbshipit-source-id: a1d347b4d90d0e7f8fb229c317cc75c2b8e16242
Summary:
This makes RevlogIndex compatible with the generic DAG testing API from the dag
crate.
Reviewed By: sfilipco
Differential Revision: D22249156
fbshipit-source-id: 54a3c458e85804968964174eab674e494a6fa8a2
Summary: Some DAG implementations does not support it.
Reviewed By: sfilipco
Differential Revision: D22249158
fbshipit-source-id: ebcdf164677ee647ef44aa1ee3cfd318bac658b0
Summary:
Different implementation might return different orders. They should be
considered correct.
Reviewed By: sfilipco
Differential Revision: D22249159
fbshipit-source-id: 36e4cadf814366f7ee2ed8a778948ff810760550
Summary: This makes it possible to run tests for other DAGs, like the revlog.
Reviewed By: sfilipco
Differential Revision: D22249155
fbshipit-source-id: 205579eeaccd42a21297d965973957168bb8726e
Summary:
For revlog, calculating `only` can have some fast paths that do not scan the
entire changelog.
Reviewed By: sfilipco
Differential Revision: D21944136
fbshipit-source-id: 58391636350f8f19643d59c46d663f55861d6de3
Summary:
This will be used to maintain narrow-heads phase calculation and sunsetting the
revlog-specific changelog.index2.
Reviewed By: sfilipco
Differential Revision: D21944131
fbshipit-source-id: a8bbd1fd24546f4891ffa677476bff750c3faf5f
Summary: The values of `pending_nodes_index` should start from 0 instead of 1.
Reviewed By: sfilipco
Differential Revision: D21944133
fbshipit-source-id: a2a332868f16b398037289c81bf8076d1400c0a7
Summary:
This drops the `file` parameter from the `raw_data` API, making
RevlogIndex easier to use.
Reviewed By: sfilipco
Differential Revision: D21854228
fbshipit-source-id: 259726524d1cc6a1f9d00783e22f9502c7decdeb
Summary:
The reverse `to_id_set` exists.
It turns out that the Python land wants this in many places.
Reviewed By: sfilipco
Differential Revision: D22240175
fbshipit-source-id: b6a3a3a3869dc0c521a21b1d86394421b816632b
Summary:
This provides a way for implementations to optimize the operation.
For segmented changelog, the default implementation is good enough.
For revlog, `only` can have a fast path that does not iterate through the
entire changelog.
A related API `only_both` is added. For revlog it has multiple use-cases,
including narrow-heads phase calculation and revlog.findcommonmissing used by
discovery.
Reviewed By: markbt
Differential Revision: D21944132
fbshipit-source-id: d11660dae85ea6158977eb00d1ceaceddf1d8234
Summary: Use `thiserror` to provide a more ergonomic API for `DataEntry` hash checking. The `.data()` method now simply returns a `Result` rather than a tuple with an ad-hoc enum.
Reviewed By: quark-zju
Differential Revision: D22376164
fbshipit-source-id: fc39cb212ec1ee5830292db4aa5eca18f2c16a2b
Summary: Tidy up some comments in this file.
Reviewed By: ikostia
Differential Revision: D22376165
fbshipit-source-id: ce4760776048aa8e72809b4f828d0ea426fcf878
Summary:
In order to do what the title says, this diff does:
1. Add the `eden/oss/.../third-party/rust/.../Cargo.toml` files. As mentioned in the previous diff, those are required by GitHub so that the third party dependencies that are local in fbsource are properly defined with a "git" dependency in order for Cargo to "link" crates properly.
2. Changes to `eden/scm/Makefile` to add build/install commands for getdeps to invoke. Those command knowing that they are called from withing getdeps context they link the dependencies brought by getdeps into their proper places that match their folder layout in fbsource. Those Makefile commands also pass a GETDEPS_BUILD env to the setup.py invocations so that it knows it is being called withing a getdeps build.
3. Changes to `eden/scm/setup.py` that add "thriftasset" that makes use of the getdeps.py provided "thrift" binary to build .py files out of thrift files.
4. Changes to `distutils_rust` to use the vendored crates dir provided by getdeps.
5. Changes to `getdeps/builder.py` and `getdeps/manifest.py` that enable more fine-grained configuratior of how Makefile builds are invoked.
6. Changes to `getdeps/buildopts.py` and `getdeps/manifest.py` to disable overriding PATH and pkgconfig env, so that "eden/scm" builds in getdeps using system libraries rather than getdeps-provided ones (NOTE: I've tried to use getdeps provided libraries, but the trickiest bit was that Rust links with Python, which is currently not providable by getdeps, so if you try to build everything the system provided Python libraries will collide with getdeps provided ones)
7. Added `opensource/fbcode_builder/manifests/eden_scm` for the getdeps build.
Reviewed By: quark-zju
Differential Revision: D22336485
fbshipit-source-id: 244d10c9e06ee83de61e97e62a1f2a2184d2312f
Summary:
Similar to D7121487 (af8ecd5f80) but works for mutation store. This makes sure at the Rust
layer, mutation entries won't get lost after rebasing or metaeditting a set of
commits where a subset of the commits being edited has mutation relations.
Unlike the Python layer, the Rust layer works for mutation chains. Therefore
some of the tests changes.
Reviewed By: markbt
Differential Revision: D22174991
fbshipit-source-id: d62f7c1071fc71f939ec8771ac5968b992aa253c
Summary:
Enforcing 1:1 handling is causing trouble for multiple reasons:
- Test cases like "Metaedit with descendant folded commits" in
test-mutation.t requires handling non-1:1 entries.
- Currently operations like `metaedit-copy` uses two predecessors (edited,
copied). It is considered 1:1 rewrite (rewriting 1 commit to
another 1 commit). However, it does not use multiple records, therefore
cannot be distinguished from `fold`, which is not 1:1 rewrite and also
has multiple predecessors. We want to include the `metaedit-copy`
entries in the 1:1 DAG too.
Therefore lift the 1:1 restriction and let's see how it works.
Reviewed By: markbt
Differential Revision: D22175008
fbshipit-source-id: 84d870dbcc433a0d0e5611a83c93781bfa59d035
Summary:
This makes it easier to remove cycles in other places.
There are probably fancier and more efficient algorithm for this.
For now I just wrote one that is easy to verify correctness.
Reviewed By: markbt
Differential Revision: D22174975
fbshipit-source-id: 8a2dc755e4bc0b066eda5f42a51208c92409f2f9
Summary: Add a `ToJson` trait as a counterpart to the `FromJson` trait introduced in the last diff. The primary use case for this will be to allow recording and replaying the data fetches that occur during Mercurial operations. A JSON representation was chosen so that the format will be directly compatible with tools like `make_req` used in EdenAPI integration tests.
Reviewed By: markbt
Differential Revision: D22344599
fbshipit-source-id: 52c888bde93a8e86b6dd76cb862337f716b007eb
Summary:
Add a `FromJson` trait to `edenapi_types` and use it instead of the parsing functions when parsing requests from JSON.
Some design commentary:
I've created a custom trait rather than using `TryFrom<serde_json::Value>` for two reasons:
- From a design standpoint, I'd like users to have to explicitly opt-in to this functionality by importing this `FromJson` trait, since this is different from the usual way of deserializing with serde, and it might cause confusion.
- From an implementation standpoint, it turns out that using `TryFrom` in a trait bound causes difficulties with type inference in some situations (in particular, around the associated `Error` type), which necessitates type annotations, making the API less ergonomic to use.
Why not just use `serde::Deserialize` directly?
- The representation here doesn't actually directly match the structure of the underlying type. Instead, the JSON representation has been designed to be easy for humans to edit when writing tests. (For example, keys are replaced with simply arrays, and hashes are represented as hex rather than as byte arrays.)
- In this case, we'd like to work with `serde_json::Value` rather than going directly between bytes and structs, since this allows for a level of dynamism that will be useful later in the stack.
Reviewed By: markbt
Differential Revision: D22320576
fbshipit-source-id: 64b6bed42e1ec599a0da61ae5d55feb7c90101a4