Summary:
A common pattern in Mercurial's data storage layer Python bindings is to have a Python object that wraps a Rust object. These Python objects are often passed across the FFI boundary to Rust code, which then may need to access the underlying Rust value.
Previously, the objects that used this pattern did so in an ad-hoc manner, typically by providing an `into_inner` or `to_inner` inherent method. This diff introduces a new `ExtractInner` trait that standardizes this pattern into a single interface, which in turn allows this pattern to be used with generics.
Reviewed By: quark-zju
Differential Revision: D22429347
fbshipit-source-id: cab4c24b8b98c6ef8307f72a9b4726aabdc829cc
Summary:
Bundle2 chunks have to fit under 2GB, but we have code that simply
returns entire buffers as a chunk, which may be over 2GB. Let's split that up
into smaller chunks.
Reviewed By: quark-zju
Differential Revision: D21235286
fbshipit-source-id: f52366fb5ecebf4f9f00914e044c46e147873bec
Summary: Remove all of the old EdenAPI Python code from Mercurial. For the new EdenAPI client, we intend to expose HTTP fetching through the Rust storage interfaces rather than putting conditionals throughout the Python code.
Reviewed By: quark-zju
Differential Revision: D22405579
fbshipit-source-id: d3c9ed02d9f624b9490e9280b8b0b4f8a127a9b5
Summary:
D22396026 made it so that `HttpClient::send_async` no longer consumes `self`. This means that instead of creating a new HTTP client for each request, we can reuse the same one.
This has the benefit of allowing for connection reuse (which was the point of D22396026), resulting in lower latency for serial fetches.
Reviewed By: quark-zju
Differential Revision: D22397768
fbshipit-source-id: 9d066c1ec64a6aa1b36ec674ef294030c1f90b41
Summary: Allow passing multiple JSON requests to the EdenAPI CLI. The requests will be performed serially, which allows for testing the performance of serial EdenAPI calls.
Reviewed By: quark-zju
Differential Revision: D22397769
fbshipit-source-id: c59e5abf53eee9c2014010672183e202b6f180fc
Summary:
Add a pool of `Multi` handles that the client can reuse across requests.
Previously, `HttpClient`'s async functions had to consume the client in order to have a `'static` lifetime (since `Future`s generally cannot hold references to things outside of themselves). This meant that the each async operation would use its own `Multi` handle, preventing connection reuse across operations since the `Multi` handle maintains a connection cache internally.
With this change, the client can reuse the `Multi` session after an async operation, thereby benefitting from libcurl's caching. Note that the same `Multi` handle still cannot be used by concurrently running `Future`s (as this [would not be thread safe](https://curl.haxx.se/libcurl/c/threadsafe.html)), but once a `Future` has completed its `Multi` handle will return to the pool for use by subsequent requests.
---
(Somewhat tangential)
As is noted in the code comments, `libcurl`'s C API provides a way to share caches across multiple multi sessions: [the "share" interface](https://curl.haxx.se/libcurl/c/libcurl-share.html).
While using this would seems preferable to an ad-hoc solution like this diff, it turns out that the `curl` crate does not provide safe bindings to the share interface. This means that in order to use the share interface, we'd need to directly use the unsafe bindings from `curl-sys`.
In addition to the difficulty of working with unsafe FFI code, the API expects the application to handle synchronization by passing it function pointers to handle locking/unlocking shared resources.
Ultimately, I came to the conclusion that managing lifetimes and synchronization in unsafe code across an FFI boundary would be nontrivial, and ensuring correctness would require a lot of effort that could be avoided by implementing an ad-hoc solution on top of the safe API instead. However, it might make sense to change this to use the share interface in the future.
Reviewed By: quark-zju
Differential Revision: D22396026
fbshipit-source-id: 06eea2ffacdc791527eac9ce4becc457af5c0480
Summary: Update the EdenAPI Python bindings to use the new client. This is mostly just a stopgap measure to allow us to delete the old client code; nothing in production actually uses these bindings anymore, and the new client will primarily be used from Rust.
Reviewed By: quark-zju
Differential Revision: D22379476
fbshipit-source-id: 953e0ffc2ce682869ee234d672a154046b373c1e
Summary: Update the `revisionstore` and `backingstore` crates to use the new EdenAPI crate.
Reviewed By: quark-zju
Differential Revision: D22378330
fbshipit-source-id: 989f34827b744ff4b4ac0aa10d004f03dbe9058f
Summary: Add a new `EdenApiBlocking` trait that exposes blocking versions of the `EdenApi` trait's methods, for use in non-async code.
Reviewed By: quark-zju
Differential Revision: D22305396
fbshipit-source-id: d0f3a73cad1a23a4f0892a17f18267374e63108e
Summary:
This diff adds an EdenAPI CLI program that allows manually sending requests to the server.
Requests are read from stdin in a JSON format (the same format used by the `make_req` tool and the EdenAPI server integration tests). This makes it easy to create and edit requests during debugging.
Responses are re-serialized as CBOR and written to stdout. (The program will refuse to write output if stdout is a TTY.) These responses can then be analyzed using the `read_res` tool (also used by the EdenAPI server integration tests).
The program prints real-time download statistics during data fetching, allow the user to debug performance in addition to correctness.
The program uses standard `hgrc` files to configure the EdenAPI client, which means that one can simulate production settings by specifying a production `hgrc`. By default, it will read from `~/.hgrc.edenapi` rather than `~/.hgrc` since the user will most likely want to configure this program independently of Mercurial.
Reviewed By: quark-zju
Differential Revision: D22370163
fbshipit-source-id: 5d9974bc05fa960d26cd2c87810f4646e2bc55b4
Summary:
There was some missed usage of `Path.resolve`. This diff should cover it all.
```
cli $ rg -F ".resolve"
main.py
967: uid = self.resolve_uid(args.uid)
968: gid = self.resolve_gid(args.gid)
util.py
622: `Path.resolve`. This is a helper method to work around that by using
628: return path.resolve(strict=strict)
```
Reviewed By: chadaustin
Differential Revision: D22459188
fbshipit-source-id: c2a1b132f752cc399ebf34723f26123559939f2a
Summary:
Apparently some of these tests still run in py2. Let's let it fallback
to the old mysql-connector connector.
Reviewed By: xavierd
Differential Revision: D22458822
fbshipit-source-id: add3da42cbd18e6cb5b34b3038d96cf52c7c6387
Summary:
proxy_import_helper.py exists for compatibility with older EdenFS
builds. None of those builds are running anymore, so remove it.
Reviewed By: genevievehelsel
Differential Revision: D22451196
fbshipit-source-id: 4d258b3fafe13bb67bd11259f5d1193a7e5575e6
Summary: This diff defines `Overlaychecker::ProgressCallback` to replace repetitive function type declaration.
Reviewed By: genevievehelsel
Differential Revision: D22243160
fbshipit-source-id: ea05e451817a760b5266879b956eaea48dc8d85e
Summary:
Previously backfill_batch_dangerous method was calling internal derive_impl() method
directly. That wasn't great (after all, we are calling a function whose name suggests it should only be called from inside derive data crate) and this diff changes it so that we call batch_derive() method instead.
This gives a few benefits:
1) We no longer call internal derive_impl function
2) It allows different types of derived data to override batching behaviour.
For example, we've already overriden it for fsnodes and next diff will override
it for blame as well.
To make it compatible with derive_impl() batch_derive() now accepts derive data mode and mapping
Reviewed By: krallin
Differential Revision: D22435044
fbshipit-source-id: a4d911606284676566583a94199195860ffe2ecf
Summary:
From the Rocks DB documentation:
> When opening a DB in a read-write mode, you need to specify all Column
Families that currently exist in a DB. If that's not the case, DB::Open call
will return Status::InvalidArgument()
This can cause problems for us in a couple of situations:
- When we need to rollback from an eden version where we added a column to
our configuration for RocksDB
- When we delete a column from our configuration for RocksDB
To make sure we do not encounter this error we need to make sure that we still
open all the columns existing in the database, even if they are not in our
configured list of family columns.
Reviewed By: wez
Differential Revision: D22425310
fbshipit-source-id: 9822b22cfedf4633f65bbed96f95a546dd3614f4
Summary:
D22206317 (9a6ed4b6ca) added requesting of predecessor information for suspected primordials
by the successor ID. This allows recovery of earlier predecessors when partial
data upload resulted in the history of a commit being extended backwards.
Unfortunately, while the individual requests are fast, the combined request
using `OR` in SQL ended up being very slow for some requests.
Separate out the requests at the application level, and aggregate the results
by concatenating them. `collect_entries` already handles duplicates should any
arise.
Most of the time the successor query will very quickly return no rows, as
it only matters when history is extended backwards, which is expected to be
rare.
Reviewed By: ikostia
Differential Revision: D22456062
fbshipit-source-id: 1e6094b4ac1590a5824e9ae6ef48468766560188
Summary:
Renamed xdiff functions to avoid linking issues when using both libgit2-sys and xdiff.
When using repo_import tool (https://fburl.com/diffusion/8p6fhjt2) we have libgit2-sys dependency for importing git repos. However, when we derive blame data types, we need to use xdiff functionalities (from_no_parents: https://fburl.com/diffusion/pitukmyo -> diff_hunks: https://fburl.com/diffusion/9f8caan9 -> xdl_diff: https://fburl.com/diffusion/260x66hf). Both libgit2 and eden/scm have vendored versions of xdiff library. Therefore, libgit2-sys and eden/scm share functions with the same signatures, but have different behaviours and when we tried to derive blame, it used libgit2-sys's xdl_diff instead of eden's. This resulted in getting segfaults (https://fburl.com/paste/04gwalpo).
Note: repo_import is the first tool that has tried to import both and the first to run into this issue.
Reviewed By: StanislavGlebik
Differential Revision: D22432330
fbshipit-source-id: f2b965f3926a2dc45de1bf20e41dad70ca09cdfd
Summary:
Currently when we are resolving the full command line for a client pid, we only
read the first 256 bytes of the command.
This means that some commands will be truncated, this has come up in some
of our recently added logs. This ups the buffer size so that we can
hopefully get the full command line.
The longer term solution would be to implement the something fancier mentioned
in the comment in the code copied below, but also has drawbacks as mentioned.
> // Could do something fancy if the entire buffer is filled, but it's better
// if this code does as few syscalls as possible, so just truncate the
// result
Reviewed By: wez
Differential Revision: D22436219
fbshipit-source-id: 80a9aecfe148aa3e333ca480c6a8cb8b9c5c86f2
Summary:
Bypass truncation-based transaction if narrow-heads is on.
The transaction abort still works logically because commit references stay
unchanged on abort.
Related EdenFS and Mononoke tests are updated. Mononoke tests probably
shouldn't rely on revlog / fncache implementation details in hg.
Reviewed By: DurhamG
Differential Revision: D22240186
fbshipit-source-id: f97efd60855467b52c9fb83e7c794ded269e9617
Summary:
With narrow-heads, visible heads are explicitly controlled by commit
references. Adding commits can be just writing them out directly.
This mainly removes the "buffered" writes of `00changelog.i`.
Instead of writing pending changes to `00changelog.i.a`, they
are directly written to `00changelog.i` (or buffered in memory
with future changes).
This does not bypass all transaction logic. Truncation can still
happen. Strip is also unaffected.
The change is incomplete. In the future, pending changes will
be written in-memory to the Rust HgCommits struct and we no
longer write directly to revlog.
Reviewed By: DurhamG
Differential Revision: D22240176
fbshipit-source-id: ac9d20ab95ff304fb285a503d2d3db815942d5b3
Summary: This makes pyre aware that `istest` exist on `util`.
Reviewed By: DurhamG
Differential Revision: D22421141
fbshipit-source-id: 50dd264988ffe0e93597df2d540f3de03e8aea4d
Summary:
With modern configs, repo is unfiltered and `ctx.children()` returns unfiltered
commits. Use the revset function `children` instead so invisible children won't
trigger auto restack.
Reviewed By: DurhamG
Differential Revision: D22421689
fbshipit-source-id: 3ec8f616c17254ee9ccfcad96673d209b9163da6
Summary: The test demostrates an issue with the current auto restack logic.
Reviewed By: DurhamG
Differential Revision: D22421690
fbshipit-source-id: e035cd3212357f24322f8eb9ec5941767ad780d9
Summary:
This diff is a complete, ground-up rewrite of the EdenAPI client. Rather than attempting to use `libcurl` directly, it relies on the new `http_client` crate, which makes the code considerably simpler and allows for a proper async interface.
The most notable change is that `EdenApi` is now an async trait. A blocking API is added later in the stack for use in non-async contexts.
Reviewed By: quark-zju
Differential Revision: D22305397
fbshipit-source-id: 4c1e5d3091d6dd04cf13291e7b7a4217dfdd249f
Summary:
As was pointed out in the review for D22280745 (d73c63d862), `CborStream` is inefficient in situations where the underlying stream produces chunks that are much smaller than the size of the serialized items. To avoid pathological behavior, make `CborStream` buffer the incoming data, and only attempt deserialization if enough data has accumulated.
For now, the buffer size is fixed (with a default of 1MB, chosen arbitrarily). In the future, it might make sense to have the stream adjust the buffer size based on the average size of observed deserialized values.
Reviewed By: quark-zju
Differential Revision: D22370164
fbshipit-source-id: ed940c56ca2cbbfc07f01d47becf6f1d71872872
Summary: On Windows a mmap file cannot be replaced. Detect that and delete manually.
Reviewed By: farnz
Differential Revision: D22428731
fbshipit-source-id: 4d308a07aae02dcaf2aedb7b0267a535c2e09c92
Summary:
Diff D22140187 (74da65a38f) upgraded mysql-connector-python, which enabled ssl by
default. Our db doesn't support this, so we disabled it for hgsql but forgot to
for infinitepush and pushrebase. Let's do it for them too.
Reviewed By: krallin
Differential Revision: D22416533
fbshipit-source-id: bc91ccd2ab4d9bc8ba423c8e60fc0191c7ff78c6
Summary:
The goal is to make it easier to implement unit tests, which depend on `LiveCommitSyncConfig`. Specifically, `scs` has a piece of code, which instantiates `mononoke_api::Repo` with a test version of `CommitSyncConfig`. To migrate it to `LiveCommitSyncConfig`, I need to be able to create a test version of that. It **is** possible now, but would require me to turn a supplied instance of `CommitSyncConfig` back into `json`, which is cumbersome. Using a `dyn LiveCommitSyncConfig` there, instead of a concrete struct seems like a good idea.
Note also that we are using this technique in many places: most (all?) of our DB tables are traits, which we then implement for SQL-specific structs.
Finally, this diff does not actually migrate all of the current users of `LiveCommitSyncConfig` (the struct) to be users of `LiveCommitSyncConfig` (the trait), and instead makes them use `CfgrLiveCommitSyncConfig` (the trait impl). The idea is that we can migrate bits to use traits when needed (for example, in an upcoming `scs` diff). When not needed, it's fine to use concrete structs. Again, this is already the case in a a few places: we sometimes use `SqlSyncedCommitMapping` struct directly, instead of `T: SyncedCommitMapping` or `dyn SyncedCommitMapping`.
Reviewed By: StanislavGlebik
Differential Revision: D22383859
fbshipit-source-id: 8657fa39b11101684c1baae9f26becad6f890302
Summary:
This updates the AsyncRead implementations we use in hgproto and
mercurial_bundles to use a LimitedAsyncRead. The upshot of this change is that
we eliminate O(N^2) behavior when parsing the data we receive from clients.
See the earlier diff on this stack for more detail on where this happens, but
the bottom line is that Framed presents a full-size buffer that we zero out
every time we try to read data. With this change, the buffer we zero out is
comparable to the amount of data we are reading.
This matters in commit cloud because bundles might be really big, and a single
big bundle is enough to take an entire core for a spin or 20 minutes (and they
achieve nothing but time out in the end). That being said, it's also useful for
non-commit cloud bundles: we do occasionally receive big bundles (especially
for WWW codemods), and those will benefit from the exact same speedup.
One final thing I should mention: this is all in a busy CPU poll loop, and as I noted
in my earlier diff, the effect persists across our bundle receiving code. This means
it will sometimes result in not polling other futures we might have going.
Reviewed By: farnz
Differential Revision: D22432350
fbshipit-source-id: 33f1a035afb8cdae94c2ecb8e03204c394c67a55
Summary: The 0.3 version (currently being used only in one crate eden/scm/lib/commitcloudsubscriber) is using an old openssl crate which doesn't work with openssl library installed on most machines (Both in FB and on GitHub Actions).
Reviewed By: mitrandir77
Differential Revision: D22430649
fbshipit-source-id: b8fa930841dbcdd4c085d8c9488d768b3526e1c4
Summary:
The dirstate code did not prevent absolute paths from being added to
the structure, but they would cause problems later when those paths were passed
to Rust. We should move the dirstate to use the Rust path type, but for now
let's just block absolute paths.
Reviewed By: quark-zju, xavierd
Differential Revision: D22426592
fbshipit-source-id: 4ae9f004237e4c54336beb03aab29517254ae441
Summary:
We've seen a handful of users complaining about clone failing and not being
able to recover from it. From looking at the various reports and the
stacktraces, I believe this is caused by a flaky connection on the user end
that causes the Python code to retry the getpack calls. Before retrying, the
code will figure out what still needs fetching and this is done via the
getmissing API. When LFS pointers were fetched, the LFS blobs aren't yet
present on disk, and thus the underlying ContentStore::get_missing will a set
of keys that contain some StoreKey::Content keys. The code would previously
fail at this point, but since the key also contains the original key, we can
simply return this, the pointers might be refetched but these are fairly small.
Taking a step back from this bug, the issue really is that the retry logic is
done in code that cannot understand content-keys, and moving it to a part of
the code that understands this would also resolve the issue.
I went with the simple approach for now, but since other remote stores
(EdenAPI, the LFS one, etc) would also benefit from the retry logic, we may
want to move the logic into Rust and remove the getmissing API from the Python
exposed ContentStore.
Reviewed By: DurhamG
Differential Revision: D22425600
fbshipit-source-id: 69c2898cc302d2170cd0f206c89189c341db5278
Summary:
Make zsh_completion complete standard aliases like `checkout`.
This restores the behavior before D18463299 (54451585ce) stack.
Reviewed By: farnz
Differential Revision: D22396737
fbshipit-source-id: 745761041d6d1dec6adba2efb102e2021a01b36b