Summary:
We'd like to remove `{rev}` from templates.
This makes it easier to copy, edit, review, switch the built templates.
Reviewed By: DurhamG
Differential Revision: D22468670
fbshipit-source-id: 59580c413f97dfbee898b97ee64d6812f82d2014
Summary:
This extends the metadata importer to work on corp.
Prefetching metadata for the entries in a tree when we fetch it saves us
an extra round trip to the server to fetch a blob when only the metadata
for that blob is fetched. This is particularly important on corp where the
latencies to the server are larger.
Requesting metadata with out the blob itself happens often in the parsing
phase of buck commands, thus this metadata prefetch should improve
the build times, and other work flows that rely heavily on buck queries.
Reviewed By: chadaustin
Differential Revision: D21698728
fbshipit-source-id: 4072be23f2fa7df33cf46879e8c1d8ddd6c316ba
Summary:
Buck uses the content SHA-1 to identify each of the source files for a target.
During the parsing phase it needs these SHAs, though the content of the
files is not yet needed, and may never be needed if the file has already
been built and is in the buck cache.
Currently, if we do not already have metadata cached for a file when
requested we fetch the contents of the file, and compute the hash.
We want to avoid this.
Eventually this data will be available from the Mononoke EdenAPI server,
but for now we want a temporary solution to unblock the Buck team, and
ship benefits early.
Reviewed By: chadaustin
Differential Revision: D21820913
fbshipit-source-id: 56a7e32519f0fb04881518306d94aaed33527fd9
Summary:
Prefetching metadata for the entries in a tree when we fetch it saves us
an extra round trip to the server to fetch a blob when only the metadata
for that blob is fetched. (This can happen often while parsing targets in
builds)
We will to prefetch the metadata for each of the entries in a tree when
we fetch the tree and store the metadata for each entry under that
entries id (to make looking up the entry metadata by its id quick)
However, we also don't want to unnecessarily fetch data from
the server if we already done so.
To accomplish this we will also store the metadata for each entry under the tree
id in the local store. This will: 1) allow us to check if we have already fetched
the metadata from the server when we are fetching a tree (we only have the
tree id easily available here to storing the metadata under the tree id makes
it much easier/less expensive to do this check). 2) allow us to refil the
metadata for each entry stored under that entries blob id if it has been cleared
from the local store (this may happen is the local store is gets large and gets
partially cleaned to reclaim space).
This implements the method to store tree metadata for all entries under the tree
id and under the blob id for each entry.
Reviewed By: chadaustin
Differential Revision: D22239173
fbshipit-source-id: d4e0ffd642ce0b4034188cfc4eeaf2ea05f54e77
Summary:
Prefetching metadata for the entries in a tree when we fetch it saves us
an extra round trip to the server to fetch a blob when only the metadata
for that blob is fetched. (This can happen often while parsing targets in
builds)
This implements a custom metadata fetcher to fetch this data when we
fetch a tree from the server.
Reviewed By: chadaustin
Differential Revision: D22086639
fbshipit-source-id: 5fe31d375bf6f7376eb67496d553d6b4540fc0c9
Summary:
This will allow adding custom MetadataImporters in different eden builds.
DefaultMetadataImporter provides a no-op version of the interface to be
used by default.
Reviewed By: chadaustin
Differential Revision: D21960834
fbshipit-source-id: aec8a3627ab1223f74466b92a0ebe3290b67b7ed
Summary:
Previously the BackingStore kept a raw pointer to the LocalStore. To do this we relied on EdenServer ensuring the lifetime of the LocalStore exceeds that of the BackingStore.
This makes the LocalStore pointer a shared pointer to explictly make sure that the LocalStores lifetime matches the BackingStores lifetime.
Reviewed By: chadaustin
Differential Revision: D22394597
fbshipit-source-id: c81cb26c6fc8f834bc46d8576ced06ba6a96ac2c
Summary:
This introduces a class to manipulate the metadata for all the entries in a
tree. This adds serialization and deserialization to this class so that it can
be written to the local store.
Why do we need this? We need some way to easily check when we have already
fetched metadata for a tree and do not need to refetch this from the server to
avoid expensive network requests. Later diffs add functionally to store the metadata
for tree entries in the local store under the tree hash using this class.
Reviewed By: chadaustin
Differential Revision: D21959015
fbshipit-source-id: 0c0e8750737f3076c1f9604d0319cab7f2658656
Summary:
In following diffs we will use scs to prefetch meta-data for files, so that this data
will be available with out fetching the file content (which will improve build times
on eden).
This builds up the proxy hash index that serves as a conversion between eden
specific identifiers and commit and path which we will use to index into scs.
Reviewed By: chadaustin
Differential Revision: D21820909
fbshipit-source-id: 17891f6772f49c7c183061d7a4df2fe0a3be9d25
Summary:
In following diffs we will use scs to prefetch meta-data for files, so that this data
will be available with out fetching the file content (which will improve build times
on eden).
SCS indexes trees by an scs specific hash (blake2 content hash) or by the commit
hash and path. Since this is different from the eden hashes and mercurial
hashes, we need another index to go between the current ids we have in eden
and identifiers for scs.
This introduces a proxy hash that serves as this conversion. Because we have
commit hashes around in eden right now, this is an easier route to indexing
into scs currently.
Reviewed By: chadaustin
Differential Revision: D21237648
fbshipit-source-id: 79115ac034a5f062ae879713cd2c1a17f348c725
Summary:
Fixes include:
1. Passing "GETDEPS_BUILD_DIR" and "GETDEPS_INSTALL_DIR" env variable and using them in eden/scm/Makefile rather than assuming the source code is always in the same place regardless getdeps arguments (it isn't).
2. Added "fbthrift-source" and "fb303-source" to avoid unnecessary compilation (at least of fb303) and to put fbthrift and fb303 source code in an easy to locate place inside getdeps' "installed" folder.
Pull Request resolved: https://github.com/facebookexperimental/eden/pull/25
Test Plan: sandcastle, check oss-eden_scm-darwin-getdeps
Reviewed By: farnz
Differential Revision: D22431872
Pulled By: lukaspiatkowski
fbshipit-source-id: 8ccbb090713ec085a5dd56df509eb58ab6fb9e34
Summary: Stage 1 of a migration - next step is to make all users of this trait use new futures, and then I can come back, add lifetimes and references, and leave it modernised
Reviewed By: StanislavGlebik
Differential Revision: D22460164
fbshipit-source-id: 94591183912c0b006b7bcd7388a3d7c296e60577
Summary: Use `pub(crate)` visibility to share the `SelectBookmark` query between modules.
Reviewed By: StanislavGlebik
Differential Revision: D22464059
fbshipit-source-id: 269561f5ab936b730ce2052e50173134ce241ff8
Summary:
Remove the `repo_id` parameter from the `Bookmarks` trait methods.
The `repo_id` parameters was intended to allow a single `Bookmarks` implementation
to serve multiple repos. In practise, however, each repo has its own config, which
results in a separate `Bookmarks` instance for each repo. The `repo_id` parameter
complicates the API and provides no benefit.
To make this work, we switch to the `Builder` pattern for `SqlBookmarks`, which
allows us to inject the `repo_id` at construction time. In fact nothing here
prevents us from adding back-end sharing later on, as these `SqlBookmarks` objects
are free to share data in their implementation.
Reviewed By: StanislavGlebik
Differential Revision: D22437089
fbshipit-source-id: d20e08ce6313108b74912683c620d25d6bf7ca01
Summary:
The dbbookmarks crate is getting too large for a single file. Split it up into
a `store` module, which implements the bookmarks traits, and a `transaction`
module, which handles bookmark transactions.
Reviewed By: krallin
Differential Revision: D22437088
fbshipit-source-id: 629b62de151400cdbf56d502aef061df46c3da81
Summary:
Separate out the `BundleReplayData` from the `BookmarkUpdateReason` enum. There's
no real need for this to be part of the reason, and removing it means we can
abstract away the remaining dependency on Mercurial changeset IDs from
the main bookmarks traits.
Reviewed By: mitrandir77, ikostia
Differential Revision: D22417659
fbshipit-source-id: c8e5af7ba57d10a90c86437b59c0d48e587e730e
Summary: For populating the XDB blobstore, we'd like to copy data from Manifold - the easiest way to do that is to exploit MultiplexedBlobstore's scrub mode to copy data directly.
Reviewed By: krallin
Differential Revision: D22373838
fbshipit-source-id: 550a9c73e79059380337fa35ac94fe1134378196
Summary: All file content should be bytes.
Reviewed By: quark-zju
Differential Revision: D22468694
fbshipit-source-id: 9febc60d0fd1c49c2f5812f3ba9f10b4782ffd11
Summary: Per comments on D22429347, add a new `ExtractInnerRef` trait that is similar to `ExtractInner`, but returns a reference to the underlying value. A default implementation is provided for types whose inner value is `Clone + 'static`, so in practice most types will only need to implement `ExtractInnerRef`, whereas the callsite may choose whether it needs a reference or an owned value.
Reviewed By: quark-zju
Differential Revision: D22464158
fbshipit-source-id: 7b97329aedcddb0e51fd242b519e79eba2eed350
Summary: Add add a `edenapistore` class to that wraps a `EdenApiHgIdRemoteStore`. This class is purely used as a means to set up the stores from Python code, and is only used as a way to get an `Arc<EdenApiHgIdRemoteStore>` to the Rust content store. It has no functionality of its own.
Reviewed By: quark-zju
Differential Revision: D22449702
fbshipit-source-id: ad2094c79da523071b6ed8344c8dde706e448c95
Summary: This is effectively a complete rewrite of the EdenAPI Python bindings to use the new client.
Reviewed By: quark-zju
Differential Revision: D22442903
fbshipit-source-id: c3cf2b2b8291e24d6d4d3a3546ccc69472510567
Summary: Ensure that all of the components of an EdenAPI response use the same error type.
Reviewed By: quark-zju
Differential Revision: D22443029
fbshipit-source-id: 3e00a8b83677beb5ef2d90630fe9b85760874186
Summary: Add an `add_entry` convenience method to `HgIdMutableDeltaStore`, similar to the one present in `HgIdMutableHistoryStore`.
Reviewed By: quark-zju
Differential Revision: D22443031
fbshipit-source-id: 84fdaae9fbd51e6f2df466b0441ec5f7ce6715f7
Summary:
A common pattern in Mercurial's data storage layer Python bindings is to have a Python object that wraps a Rust object. These Python objects are often passed across the FFI boundary to Rust code, which then may need to access the underlying Rust value.
Previously, the objects that used this pattern did so in an ad-hoc manner, typically by providing an `into_inner` or `to_inner` inherent method. This diff introduces a new `ExtractInner` trait that standardizes this pattern into a single interface, which in turn allows this pattern to be used with generics.
Reviewed By: quark-zju
Differential Revision: D22429347
fbshipit-source-id: cab4c24b8b98c6ef8307f72a9b4726aabdc829cc
Summary:
Bundle2 chunks have to fit under 2GB, but we have code that simply
returns entire buffers as a chunk, which may be over 2GB. Let's split that up
into smaller chunks.
Reviewed By: quark-zju
Differential Revision: D21235286
fbshipit-source-id: f52366fb5ecebf4f9f00914e044c46e147873bec
Summary: Remove all of the old EdenAPI Python code from Mercurial. For the new EdenAPI client, we intend to expose HTTP fetching through the Rust storage interfaces rather than putting conditionals throughout the Python code.
Reviewed By: quark-zju
Differential Revision: D22405579
fbshipit-source-id: d3c9ed02d9f624b9490e9280b8b0b4f8a127a9b5
Summary:
D22396026 made it so that `HttpClient::send_async` no longer consumes `self`. This means that instead of creating a new HTTP client for each request, we can reuse the same one.
This has the benefit of allowing for connection reuse (which was the point of D22396026), resulting in lower latency for serial fetches.
Reviewed By: quark-zju
Differential Revision: D22397768
fbshipit-source-id: 9d066c1ec64a6aa1b36ec674ef294030c1f90b41
Summary: Allow passing multiple JSON requests to the EdenAPI CLI. The requests will be performed serially, which allows for testing the performance of serial EdenAPI calls.
Reviewed By: quark-zju
Differential Revision: D22397769
fbshipit-source-id: c59e5abf53eee9c2014010672183e202b6f180fc
Summary:
Add a pool of `Multi` handles that the client can reuse across requests.
Previously, `HttpClient`'s async functions had to consume the client in order to have a `'static` lifetime (since `Future`s generally cannot hold references to things outside of themselves). This meant that the each async operation would use its own `Multi` handle, preventing connection reuse across operations since the `Multi` handle maintains a connection cache internally.
With this change, the client can reuse the `Multi` session after an async operation, thereby benefitting from libcurl's caching. Note that the same `Multi` handle still cannot be used by concurrently running `Future`s (as this [would not be thread safe](https://curl.haxx.se/libcurl/c/threadsafe.html)), but once a `Future` has completed its `Multi` handle will return to the pool for use by subsequent requests.
---
(Somewhat tangential)
As is noted in the code comments, `libcurl`'s C API provides a way to share caches across multiple multi sessions: [the "share" interface](https://curl.haxx.se/libcurl/c/libcurl-share.html).
While using this would seems preferable to an ad-hoc solution like this diff, it turns out that the `curl` crate does not provide safe bindings to the share interface. This means that in order to use the share interface, we'd need to directly use the unsafe bindings from `curl-sys`.
In addition to the difficulty of working with unsafe FFI code, the API expects the application to handle synchronization by passing it function pointers to handle locking/unlocking shared resources.
Ultimately, I came to the conclusion that managing lifetimes and synchronization in unsafe code across an FFI boundary would be nontrivial, and ensuring correctness would require a lot of effort that could be avoided by implementing an ad-hoc solution on top of the safe API instead. However, it might make sense to change this to use the share interface in the future.
Reviewed By: quark-zju
Differential Revision: D22396026
fbshipit-source-id: 06eea2ffacdc791527eac9ce4becc457af5c0480
Summary: Update the EdenAPI Python bindings to use the new client. This is mostly just a stopgap measure to allow us to delete the old client code; nothing in production actually uses these bindings anymore, and the new client will primarily be used from Rust.
Reviewed By: quark-zju
Differential Revision: D22379476
fbshipit-source-id: 953e0ffc2ce682869ee234d672a154046b373c1e
Summary: Update the `revisionstore` and `backingstore` crates to use the new EdenAPI crate.
Reviewed By: quark-zju
Differential Revision: D22378330
fbshipit-source-id: 989f34827b744ff4b4ac0aa10d004f03dbe9058f
Summary: Add a new `EdenApiBlocking` trait that exposes blocking versions of the `EdenApi` trait's methods, for use in non-async code.
Reviewed By: quark-zju
Differential Revision: D22305396
fbshipit-source-id: d0f3a73cad1a23a4f0892a17f18267374e63108e
Summary:
This diff adds an EdenAPI CLI program that allows manually sending requests to the server.
Requests are read from stdin in a JSON format (the same format used by the `make_req` tool and the EdenAPI server integration tests). This makes it easy to create and edit requests during debugging.
Responses are re-serialized as CBOR and written to stdout. (The program will refuse to write output if stdout is a TTY.) These responses can then be analyzed using the `read_res` tool (also used by the EdenAPI server integration tests).
The program prints real-time download statistics during data fetching, allow the user to debug performance in addition to correctness.
The program uses standard `hgrc` files to configure the EdenAPI client, which means that one can simulate production settings by specifying a production `hgrc`. By default, it will read from `~/.hgrc.edenapi` rather than `~/.hgrc` since the user will most likely want to configure this program independently of Mercurial.
Reviewed By: quark-zju
Differential Revision: D22370163
fbshipit-source-id: 5d9974bc05fa960d26cd2c87810f4646e2bc55b4
Summary:
There was some missed usage of `Path.resolve`. This diff should cover it all.
```
cli $ rg -F ".resolve"
main.py
967: uid = self.resolve_uid(args.uid)
968: gid = self.resolve_gid(args.gid)
util.py
622: `Path.resolve`. This is a helper method to work around that by using
628: return path.resolve(strict=strict)
```
Reviewed By: chadaustin
Differential Revision: D22459188
fbshipit-source-id: c2a1b132f752cc399ebf34723f26123559939f2a
Summary:
Apparently some of these tests still run in py2. Let's let it fallback
to the old mysql-connector connector.
Reviewed By: xavierd
Differential Revision: D22458822
fbshipit-source-id: add3da42cbd18e6cb5b34b3038d96cf52c7c6387
Summary:
proxy_import_helper.py exists for compatibility with older EdenFS
builds. None of those builds are running anymore, so remove it.
Reviewed By: genevievehelsel
Differential Revision: D22451196
fbshipit-source-id: 4d258b3fafe13bb67bd11259f5d1193a7e5575e6
Summary: This diff defines `Overlaychecker::ProgressCallback` to replace repetitive function type declaration.
Reviewed By: genevievehelsel
Differential Revision: D22243160
fbshipit-source-id: ea05e451817a760b5266879b956eaea48dc8d85e
Summary:
Previously backfill_batch_dangerous method was calling internal derive_impl() method
directly. That wasn't great (after all, we are calling a function whose name suggests it should only be called from inside derive data crate) and this diff changes it so that we call batch_derive() method instead.
This gives a few benefits:
1) We no longer call internal derive_impl function
2) It allows different types of derived data to override batching behaviour.
For example, we've already overriden it for fsnodes and next diff will override
it for blame as well.
To make it compatible with derive_impl() batch_derive() now accepts derive data mode and mapping
Reviewed By: krallin
Differential Revision: D22435044
fbshipit-source-id: a4d911606284676566583a94199195860ffe2ecf
Summary:
From the Rocks DB documentation:
> When opening a DB in a read-write mode, you need to specify all Column
Families that currently exist in a DB. If that's not the case, DB::Open call
will return Status::InvalidArgument()
This can cause problems for us in a couple of situations:
- When we need to rollback from an eden version where we added a column to
our configuration for RocksDB
- When we delete a column from our configuration for RocksDB
To make sure we do not encounter this error we need to make sure that we still
open all the columns existing in the database, even if they are not in our
configured list of family columns.
Reviewed By: wez
Differential Revision: D22425310
fbshipit-source-id: 9822b22cfedf4633f65bbed96f95a546dd3614f4
Summary:
D22206317 (9a6ed4b6ca) added requesting of predecessor information for suspected primordials
by the successor ID. This allows recovery of earlier predecessors when partial
data upload resulted in the history of a commit being extended backwards.
Unfortunately, while the individual requests are fast, the combined request
using `OR` in SQL ended up being very slow for some requests.
Separate out the requests at the application level, and aggregate the results
by concatenating them. `collect_entries` already handles duplicates should any
arise.
Most of the time the successor query will very quickly return no rows, as
it only matters when history is extended backwards, which is expected to be
rare.
Reviewed By: ikostia
Differential Revision: D22456062
fbshipit-source-id: 1e6094b4ac1590a5824e9ae6ef48468766560188
Summary:
Renamed xdiff functions to avoid linking issues when using both libgit2-sys and xdiff.
When using repo_import tool (https://fburl.com/diffusion/8p6fhjt2) we have libgit2-sys dependency for importing git repos. However, when we derive blame data types, we need to use xdiff functionalities (from_no_parents: https://fburl.com/diffusion/pitukmyo -> diff_hunks: https://fburl.com/diffusion/9f8caan9 -> xdl_diff: https://fburl.com/diffusion/260x66hf). Both libgit2 and eden/scm have vendored versions of xdiff library. Therefore, libgit2-sys and eden/scm share functions with the same signatures, but have different behaviours and when we tried to derive blame, it used libgit2-sys's xdl_diff instead of eden's. This resulted in getting segfaults (https://fburl.com/paste/04gwalpo).
Note: repo_import is the first tool that has tried to import both and the first to run into this issue.
Reviewed By: StanislavGlebik
Differential Revision: D22432330
fbshipit-source-id: f2b965f3926a2dc45de1bf20e41dad70ca09cdfd
Summary:
Currently when we are resolving the full command line for a client pid, we only
read the first 256 bytes of the command.
This means that some commands will be truncated, this has come up in some
of our recently added logs. This ups the buffer size so that we can
hopefully get the full command line.
The longer term solution would be to implement the something fancier mentioned
in the comment in the code copied below, but also has drawbacks as mentioned.
> // Could do something fancy if the entire buffer is filled, but it's better
// if this code does as few syscalls as possible, so just truncate the
// result
Reviewed By: wez
Differential Revision: D22436219
fbshipit-source-id: 80a9aecfe148aa3e333ca480c6a8cb8b9c5c86f2
Summary:
Bypass truncation-based transaction if narrow-heads is on.
The transaction abort still works logically because commit references stay
unchanged on abort.
Related EdenFS and Mononoke tests are updated. Mononoke tests probably
shouldn't rely on revlog / fncache implementation details in hg.
Reviewed By: DurhamG
Differential Revision: D22240186
fbshipit-source-id: f97efd60855467b52c9fb83e7c794ded269e9617
Summary:
With narrow-heads, visible heads are explicitly controlled by commit
references. Adding commits can be just writing them out directly.
This mainly removes the "buffered" writes of `00changelog.i`.
Instead of writing pending changes to `00changelog.i.a`, they
are directly written to `00changelog.i` (or buffered in memory
with future changes).
This does not bypass all transaction logic. Truncation can still
happen. Strip is also unaffected.
The change is incomplete. In the future, pending changes will
be written in-memory to the Rust HgCommits struct and we no
longer write directly to revlog.
Reviewed By: DurhamG
Differential Revision: D22240176
fbshipit-source-id: ac9d20ab95ff304fb285a503d2d3db815942d5b3
Summary: This makes pyre aware that `istest` exist on `util`.
Reviewed By: DurhamG
Differential Revision: D22421141
fbshipit-source-id: 50dd264988ffe0e93597df2d540f3de03e8aea4d
Summary:
With modern configs, repo is unfiltered and `ctx.children()` returns unfiltered
commits. Use the revset function `children` instead so invisible children won't
trigger auto restack.
Reviewed By: DurhamG
Differential Revision: D22421689
fbshipit-source-id: 3ec8f616c17254ee9ccfcad96673d209b9163da6
Summary: The test demostrates an issue with the current auto restack logic.
Reviewed By: DurhamG
Differential Revision: D22421690
fbshipit-source-id: e035cd3212357f24322f8eb9ec5941767ad780d9
Summary:
This diff is a complete, ground-up rewrite of the EdenAPI client. Rather than attempting to use `libcurl` directly, it relies on the new `http_client` crate, which makes the code considerably simpler and allows for a proper async interface.
The most notable change is that `EdenApi` is now an async trait. A blocking API is added later in the stack for use in non-async contexts.
Reviewed By: quark-zju
Differential Revision: D22305397
fbshipit-source-id: 4c1e5d3091d6dd04cf13291e7b7a4217dfdd249f
Summary:
As was pointed out in the review for D22280745 (d73c63d862), `CborStream` is inefficient in situations where the underlying stream produces chunks that are much smaller than the size of the serialized items. To avoid pathological behavior, make `CborStream` buffer the incoming data, and only attempt deserialization if enough data has accumulated.
For now, the buffer size is fixed (with a default of 1MB, chosen arbitrarily). In the future, it might make sense to have the stream adjust the buffer size based on the average size of observed deserialized values.
Reviewed By: quark-zju
Differential Revision: D22370164
fbshipit-source-id: ed940c56ca2cbbfc07f01d47becf6f1d71872872
Summary: On Windows a mmap file cannot be replaced. Detect that and delete manually.
Reviewed By: farnz
Differential Revision: D22428731
fbshipit-source-id: 4d308a07aae02dcaf2aedb7b0267a535c2e09c92
Summary:
Diff D22140187 (74da65a38f) upgraded mysql-connector-python, which enabled ssl by
default. Our db doesn't support this, so we disabled it for hgsql but forgot to
for infinitepush and pushrebase. Let's do it for them too.
Reviewed By: krallin
Differential Revision: D22416533
fbshipit-source-id: bc91ccd2ab4d9bc8ba423c8e60fc0191c7ff78c6
Summary:
The goal is to make it easier to implement unit tests, which depend on `LiveCommitSyncConfig`. Specifically, `scs` has a piece of code, which instantiates `mononoke_api::Repo` with a test version of `CommitSyncConfig`. To migrate it to `LiveCommitSyncConfig`, I need to be able to create a test version of that. It **is** possible now, but would require me to turn a supplied instance of `CommitSyncConfig` back into `json`, which is cumbersome. Using a `dyn LiveCommitSyncConfig` there, instead of a concrete struct seems like a good idea.
Note also that we are using this technique in many places: most (all?) of our DB tables are traits, which we then implement for SQL-specific structs.
Finally, this diff does not actually migrate all of the current users of `LiveCommitSyncConfig` (the struct) to be users of `LiveCommitSyncConfig` (the trait), and instead makes them use `CfgrLiveCommitSyncConfig` (the trait impl). The idea is that we can migrate bits to use traits when needed (for example, in an upcoming `scs` diff). When not needed, it's fine to use concrete structs. Again, this is already the case in a a few places: we sometimes use `SqlSyncedCommitMapping` struct directly, instead of `T: SyncedCommitMapping` or `dyn SyncedCommitMapping`.
Reviewed By: StanislavGlebik
Differential Revision: D22383859
fbshipit-source-id: 8657fa39b11101684c1baae9f26becad6f890302
Summary:
This updates the AsyncRead implementations we use in hgproto and
mercurial_bundles to use a LimitedAsyncRead. The upshot of this change is that
we eliminate O(N^2) behavior when parsing the data we receive from clients.
See the earlier diff on this stack for more detail on where this happens, but
the bottom line is that Framed presents a full-size buffer that we zero out
every time we try to read data. With this change, the buffer we zero out is
comparable to the amount of data we are reading.
This matters in commit cloud because bundles might be really big, and a single
big bundle is enough to take an entire core for a spin or 20 minutes (and they
achieve nothing but time out in the end). That being said, it's also useful for
non-commit cloud bundles: we do occasionally receive big bundles (especially
for WWW codemods), and those will benefit from the exact same speedup.
One final thing I should mention: this is all in a busy CPU poll loop, and as I noted
in my earlier diff, the effect persists across our bundle receiving code. This means
it will sometimes result in not polling other futures we might have going.
Reviewed By: farnz
Differential Revision: D22432350
fbshipit-source-id: 33f1a035afb8cdae94c2ecb8e03204c394c67a55
Summary: The 0.3 version (currently being used only in one crate eden/scm/lib/commitcloudsubscriber) is using an old openssl crate which doesn't work with openssl library installed on most machines (Both in FB and on GitHub Actions).
Reviewed By: mitrandir77
Differential Revision: D22430649
fbshipit-source-id: b8fa930841dbcdd4c085d8c9488d768b3526e1c4
Summary:
The dirstate code did not prevent absolute paths from being added to
the structure, but they would cause problems later when those paths were passed
to Rust. We should move the dirstate to use the Rust path type, but for now
let's just block absolute paths.
Reviewed By: quark-zju, xavierd
Differential Revision: D22426592
fbshipit-source-id: 4ae9f004237e4c54336beb03aab29517254ae441
Summary:
We've seen a handful of users complaining about clone failing and not being
able to recover from it. From looking at the various reports and the
stacktraces, I believe this is caused by a flaky connection on the user end
that causes the Python code to retry the getpack calls. Before retrying, the
code will figure out what still needs fetching and this is done via the
getmissing API. When LFS pointers were fetched, the LFS blobs aren't yet
present on disk, and thus the underlying ContentStore::get_missing will a set
of keys that contain some StoreKey::Content keys. The code would previously
fail at this point, but since the key also contains the original key, we can
simply return this, the pointers might be refetched but these are fairly small.
Taking a step back from this bug, the issue really is that the retry logic is
done in code that cannot understand content-keys, and moving it to a part of
the code that understands this would also resolve the issue.
I went with the simple approach for now, but since other remote stores
(EdenAPI, the LFS one, etc) would also benefit from the retry logic, we may
want to move the logic into Rust and remove the getmissing API from the Python
exposed ContentStore.
Reviewed By: DurhamG
Differential Revision: D22425600
fbshipit-source-id: 69c2898cc302d2170cd0f206c89189c341db5278
Summary:
Make zsh_completion complete standard aliases like `checkout`.
This restores the behavior before D18463299 (54451585ce) stack.
Reviewed By: farnz
Differential Revision: D22396737
fbshipit-source-id: 745761041d6d1dec6adba2efb102e2021a01b36b
Summary:
Rather than dynamically allocating an event loop in the systemd async
code, make all the corresponding functions async, so the caller is
responsible for threading an event loop down.
Reviewed By: genevievehelsel
Differential Revision: D21894106
fbshipit-source-id: 398c769c30c85a3bb210dbc209f34f9f7336996c
Summary: I'd eventually like to use this in the edenfs_monitor, so moving it to `proc_utils` for sharibility.
Reviewed By: chadaustin
Differential Revision: D21998763
fbshipit-source-id: 052e78fb8e58515f98eb465b8041fd0e621fc9da
Summary: I'd eventually like to use this in the edenfs_monitor, so I'm adding this to `proc_utils` for future ease of use.
Reviewed By: chadaustin
Differential Revision: D21987390
fbshipit-source-id: 076672b44311c2a1e0cac934c0674a18a87649af
Summary: On macOS, "Icon?" (aka "Icon\r") is a sometimes added. This file is weird to ignore, and should be ignored using "Icon\r\r" or "Icon[/r]". This won't be hidden with "Icon\r" or "Icon\r"
Reviewed By: chadaustin
Differential Revision: D22050682
fbshipit-source-id: 51d7d4c2414a07b959120455ae991d2425c1ea4d
Summary:
I want to update the health check to stop averaging averages (like in
D22394014). To do this, I need those counters.
Reviewed By: ahornby
Differential Revision: D22410196
fbshipit-source-id: aa5cbfe6607be3b953887f1639e1de54baac7389
Summary:
Just knowing the number of fetched undesired files doesn't give the full
picture. e.g. fetching lots of small files is better than fetching single
multi-Gb file.
So knowing the size of files is helpful
Reviewed By: krallin
Differential Revision: D22408400
fbshipit-source-id: 7653c1cdceccf50aeda9ce8a4880ee5178d4b107
Summary:
This is causing the Mononoke and hg tests to break on an undefined variable.
This looks like it might have been a refactoring accident when the code
defining this got extracted into D22240176 (but the code using it landed in
D22240177 (ec58e72903)).
This diff unbreaks the tests by defaulting the parameter to False (which
seems coherent with the idea of that latter diff), and puts it in a place
that'll trigger a merge conflict for quark-zju when he rebases D22240176.
Reviewed By: farnz
Differential Revision: D22408588
fbshipit-source-id: 496808742a13dfeb17989123742a0aa8bae17b38
Summary: [edenfs] Some minor modifications to verbiage in documentation
Reviewed By: chadaustin
Differential Revision: D22394129
fbshipit-source-id: 2d662e56d621fd5e5d5ba6de284ca3d08f8bd4e5
Summary:
checkunknown is quite expensive since it has to read the contents of
every untracked file, which can be 10's of thousands of non-parallel stats and
reads. For files that don't exist in the working copy, it's just wasted work to
stat for the files at all. Status can efficiently tell us what files are
unknown, so let's use that to triage most "unknown" files to normal writes
before we even get to checkunknown.
The downside of this approach is that it makes an additional call to status,
which is not cached (only non-unknown+non-ignore+non-clean status calls are
cached). We could add more caching if this is a problem.
This doesn't help the case where a user might have 10k+ untracked files due to a
ctrl+c'd checkout, but we'll improve that in a future diff.
Reviewed By: quark-zju
Differential Revision: D22366758
fbshipit-source-id: b54fec113dc162f97a35e705ed083ddd14babe55
Summary: D22381744 updated the version of `futures` in third-party/rust to 0.3.5, but did not regenerate the autocargo-managed Cargo.toml files in the repo. Although this is a semver-compatible change (and therefore should not break anything), it means that affected projects would see changes to all of their Cargo.toml files the next time they ran `cargo autocargo`.
Reviewed By: dtolnay
Differential Revision: D22403809
fbshipit-source-id: eb1fdbaf69c99549309da0f67c9bebcb69c1131b
Summary:
Automatic run to suppress type errors.
#pyreupgrade
Differential Revision: D22404242
fbshipit-source-id: a7eabe6c6eb0dc9f29b3cf01f780c34fff1c6810
Summary:
The key is "hit_count" not "hits". This typo caused the trace to always claim
that no data was fetched from memcache, which is obviously not true as the
getpack trace that follows listed significantly less requested keys.
Reviewed By: kulshrax
Differential Revision: D22401592
fbshipit-source-id: ab2ea3e7f8ff3a9c7322678afc8a174e09d6dc09
Summary:
The Mercurial's concept of `null` revision (hardcoded as 20 zeros) is a
headache to special case. See https://www.mercurial-scm.org/wiki/RevsetVirtualRevisionPlan.
The Rust DAG layer cannot handle it. Make pydag drop the nullid or nullrev when
crossing the Python -> Rust boundary.
A cleaner way to handle `null` might be:
- Create a new vertex in the DAG in memory that has empty content.
Calculate its commit hash normally. The commit is isolated from other parts
of the commit graph. It has no parents and no children.
The vertex has an assigned Id, which is not zero if the repo is not empty.
- Assign the `null` special name (like how we do for `tip`) to the commit.
- Remove all hard-code special cases of the 20-zero `nullid`.
That would allow things like `hg up null`, `hg diff -r null -r X` to continue
work without special casing it in the commit graph layer.
Reviewed By: sfilipco
Differential Revision: D22240188
fbshipit-source-id: 707af47cbf36a7df60097a17d69094aae89d3250
Summary:
The new Rust structure has enough features to replace index2.
This will eventually allow us to delete index2 related logic (namely,
pyrevlogindex).
Reviewed By: sfilipco
Differential Revision: D22240178
fbshipit-source-id: 1af9e6045c8d8d1a220a6abad6d33b129a3afa70
Summary:
The old revlog C code works with strip. The new Rust code is never designed to
work with strip (especially the segmented changelog does not support strip).
In the strip code path, just reload the changelog after strip to avoid issues.
Reviewed By: sfilipco
Differential Revision: D22323188
fbshipit-source-id: c4f790c66372c28a71173cf16910ad1d7cb89223
Summary:
Strip is a special case. `tr.changes["revs"]` can contain invalidated
revision numbers. Do not use it.
Reviewed By: sfilipco
Differential Revision: D22240179
fbshipit-source-id: 6b9d29e099f821a7fc7aa6055dc8eccf4597ffd0
Summary:
This exposes the Rust HgCommits object. It will replace changelog operations
gradually.
This diff only makes changelog load the new Rust HgCommits structure, and
maintain it side-by-side with the original data structures when there are
changes. It does not replace real changelog operations yet.
Reviewed By: sfilipco
Differential Revision: D22240177
fbshipit-source-id: b585c1585defdc133d2b9ef2fda4aea8702152bf
Summary:
If the revlog on disk was changed to include new commits, read them and avoid
writing duplicated commits (which breaks nodemap building).
Reviewed By: sfilipco
Differential Revision: D22323187
fbshipit-source-id: cdd65f31e65865d9f3868e43416633297896c0f9
Summary:
Change pydag from using concreate `namedag` and `memnamedag` to trait objects:
- `commits`: High-level read-write commits storage, supports Rust `HgCommits`
(segmented changelog), `MemHgCommits`, and `RevlogCommits`.
- `dagalgo`: maps to the `DagAlgorithm` Rust trait.
- `idmap`: maps to the `IdConvert + PrefixLookup` Rust traits.
The idea is that we move the revlog / segmented changelog difference from Python
to behind Rust trait objects so the Python code looks overall cleaner, the Rust
revset alternative gets exercised early, and switching from revlog to segmented
changelog becomes easier.
Reviewed By: sfilipco
Differential Revision: D21796242
fbshipit-source-id: 3a4a3ff3d9e7e46059d1ed3461a55003c352e82d
Summary: This is used by the next diff.
Reviewed By: sfilipco
Differential Revision: D21944139
fbshipit-source-id: 184c4e97aaeca36c3608665defd1473c9300fb5b
Summary: This will satisfy some use-cases.
Reviewed By: sfilipco
Differential Revision: D21854225
fbshipit-source-id: 76758716b35cfd31dc3843c118917c0fb7609027
Summary: This will help move more Python logic to Rust.
Reviewed By: sfilipco
Differential Revision: D21854224
fbshipit-source-id: b03cbacedc11d77e8c56262437a8d10bd9a89e59
Summary: This is discovered by using it in Python world.
Reviewed By: sfilipco
Differential Revision: D22323186
fbshipit-source-id: 295811e0950b94ad2ad73ad242228b6a3f9765d0
Summary: Adding a same commit multiple times is a no-op.
Reviewed By: sfilipco
Differential Revision: D22323190
fbshipit-source-id: 61a06335581a9cad32dc7e929b841ec69b551a9c
Summary: This adds some test coverage for the revlog DagAlgorithm implementation.
Reviewed By: sfilipco
Differential Revision: D22249157
fbshipit-source-id: a1d347b4d90d0e7f8fb229c317cc75c2b8e16242
Summary:
This makes RevlogIndex compatible with the generic DAG testing API from the dag
crate.
Reviewed By: sfilipco
Differential Revision: D22249156
fbshipit-source-id: 54a3c458e85804968964174eab674e494a6fa8a2
Summary: Some DAG implementations does not support it.
Reviewed By: sfilipco
Differential Revision: D22249158
fbshipit-source-id: ebcdf164677ee647ef44aa1ee3cfd318bac658b0
Summary:
Different implementation might return different orders. They should be
considered correct.
Reviewed By: sfilipco
Differential Revision: D22249159
fbshipit-source-id: 36e4cadf814366f7ee2ed8a778948ff810760550
Summary: This makes it possible to run tests for other DAGs, like the revlog.
Reviewed By: sfilipco
Differential Revision: D22249155
fbshipit-source-id: 205579eeaccd42a21297d965973957168bb8726e
Summary:
For revlog, calculating `only` can have some fast paths that do not scan the
entire changelog.
Reviewed By: sfilipco
Differential Revision: D21944136
fbshipit-source-id: 58391636350f8f19643d59c46d663f55861d6de3
Summary:
This will be used to maintain narrow-heads phase calculation and sunsetting the
revlog-specific changelog.index2.
Reviewed By: sfilipco
Differential Revision: D21944131
fbshipit-source-id: a8bbd1fd24546f4891ffa677476bff750c3faf5f
Summary: The values of `pending_nodes_index` should start from 0 instead of 1.
Reviewed By: sfilipco
Differential Revision: D21944133
fbshipit-source-id: a2a332868f16b398037289c81bf8076d1400c0a7
Summary:
This drops the `file` parameter from the `raw_data` API, making
RevlogIndex easier to use.
Reviewed By: sfilipco
Differential Revision: D21854228
fbshipit-source-id: 259726524d1cc6a1f9d00783e22f9502c7decdeb
Summary:
The reverse `to_id_set` exists.
It turns out that the Python land wants this in many places.
Reviewed By: sfilipco
Differential Revision: D22240175
fbshipit-source-id: b6a3a3a3869dc0c521a21b1d86394421b816632b
Summary:
This provides a way for implementations to optimize the operation.
For segmented changelog, the default implementation is good enough.
For revlog, `only` can have a fast path that does not iterate through the
entire changelog.
A related API `only_both` is added. For revlog it has multiple use-cases,
including narrow-heads phase calculation and revlog.findcommonmissing used by
discovery.
Reviewed By: markbt
Differential Revision: D21944132
fbshipit-source-id: d11660dae85ea6158977eb00d1ceaceddf1d8234
Summary:
Sometimes you want to fetch a file. Using curl and the LFS server works, but
this really should be part of Mononoke admin.
Reviewed By: ikostia
Differential Revision: D22397472
fbshipit-source-id: 17decf4aa2017a2c1be52605a254692f293d1bcd
Summary:
This got broken when we moved to Tokio 0.2. Let's fix it and add a test to make
sure it does not regress.
Reviewed By: ikostia
Differential Revision: D22396261
fbshipit-source-id: a8359aee33b4d6d840581f57f91af6c03125fd6a
Summary: Use `thiserror` to provide a more ergonomic API for `DataEntry` hash checking. The `.data()` method now simply returns a `Result` rather than a tuple with an ad-hoc enum.
Reviewed By: quark-zju
Differential Revision: D22376164
fbshipit-source-id: fc39cb212ec1ee5830292db4aa5eca18f2c16a2b
Summary:
The streaming pager expects bytes as inputs, but we were sending
strings to the progress buffer. This fixes it.
Reviewed By: quark-zju
Differential Revision: D22394395
fbshipit-source-id: 4acbfc08ca624ca3c794e6e369df669e370e5b42
Summary:
The test relies on Python revlog implementation details which
do not exist in the Rust revlog implementation.
Reviewed By: DurhamG
Differential Revision: D22240183
fbshipit-source-id: b245b35e561c3364618a0e199244df030cc47942
Summary:
The original test is unmaintainable. Rewrite it to test key features.
I dropped detailed tests about merge conflict / content handling.
In the future we probalby will have a clean Rust implementation of "applying
diff between X and Y to Z" which can replace various unmaintainable patch
application logic in Python. We can test that Rust library extensively and
commands will just use the clean library (ex. revert, backout)
Reviewed By: sfilipco
Differential Revision: D22240184
fbshipit-source-id: 4d6c65fe02ccc92e64c62a48f702187678973086
Summary:
`debugstrip` is an operation that depends on multiple legacy components (revlog
strip, truncate-based transaction). They are incompatible with modern configs
(no truncation, heads-based visibility, metalog-based transaction).
Avoid using it in the test.
Reviewed By: DurhamG
Differential Revision: D22240187
fbshipit-source-id: ec215d75fb766957a3d6f58e491ef815f5bedbdc
Summary:
Change by `fix-revnum.py`. Part of the tests using `hg debugstrip`,
which I'm trying to avoid.
Reviewed By: DurhamG
Differential Revision: D22240181
fbshipit-source-id: a569b712fe4b985378e5c61c000deecccefbc488
Summary:
Since tests do not use it, and we have long disabled it in production.
Let's just disable the command unconditionally.
Reviewed By: DurhamG
Differential Revision: D22368834
fbshipit-source-id: 7ebc5b07c4044b6809defc06437cda7256cb2ebf
Summary:
`hg rollback` was long disabled in production setup. It has weird behavior and
is likely incompatible with modern transaction frameworks. Remove its usage in
tests.
Reviewed By: DurhamG
Differential Revision: D22240180
fbshipit-source-id: 453684ebbc77132e09b1b717b6ad1e106dcad214
Summary:
End-users have been using visibleheads + narrowheads for a while, and hgsql
does not require any filtering, and most tests are migrated to modern configs
(visibility + narrow heads). Now it's time to consider removing the repoview
layer.
This removes complexities around `changelog.filteredrevs` and various different
`repoview` objects with caching problems (ex. I have seen that `repo` and `unfi`
have inconsistent phasecache causing they calculate phases differently and it's
quite hard to reason about confidently).
This will also make it easier to migrate to segmented changelog.
Reviewed By: DurhamG
Differential Revision: D22201084
fbshipit-source-id: 3661c26dd72a64b5005d86e164af4da5a6895649
Summary:
This diff adds two new bits of functionality to `LiveCommitSyncConfig`:
- getting all possible versions of `CommitSyncConfig` for a given repo
- getting `CommitSyncConfig` for a repo by version name
These bits are meant to be used in:
- `commit_validator` and `bookmarks_validator`, which would
need to run validation against a specific config version
- `mononoke_admin`, which would need to be able to query all versions,
display the version used to sync two commits and so on
Reviewed By: StanislavGlebik
Differential Revision: D22235381
fbshipit-source-id: 42326fe853b588849bce0185b456a5365f3d8dff
Summary:
For various reasons (ex. wrong configs like investigating test repos) the
initialization can fail. Ignore them.
Reviewed By: DurhamG
Differential Revision: D22368942
fbshipit-source-id: ae01dcc499f63f373b0f7bec00554ea8074ae7cf
Summary:
This updates the virtually_sharded_blobstore to deduplicate puts only if the
data being put is actually the data we have put in the past. This is done by
keeping track of the hash of things we've put in the presence cache.
This has 2 benefits:
- This is safer. We only dedupe puts we 100% know succeeded (because this
particular instance was the one to attempt the put).
- This is creates less surprises, notably it lets us overwrite data in the
backing store (if we are writing something different).
Reviewed By: StanislavGlebik
Differential Revision: D22392809
fbshipit-source-id: d76a49baa9a5749b0fb4865ee1fc1aa5016791bc
Summary:
Running those on my devserver, I noticed they can be a bit flaky. They're are
racy on the purpose, but let's relax them a bit.
We have a lot of margin here — our blobstore is rate limited at once request
every 10ms, and we need to do 100 requests (the goal is to show that they don't
all wait), so 100ms is fine to prove that they're not rate limited when sharing
the same data.
Reviewed By: StanislavGlebik
Differential Revision: D22392810
fbshipit-source-id: 2e3c9cdf19b0e4ab979dfc000fbfa8da864c4fd6
Summary:
When we look up how a commit was synced, we frequently need to know which version of `CommitSyncConfig` was used to sync it. Specifically, this is useful for admin tooling and commit validator, which I am planning to migrate to use versioned `CommitSyncConfig` in the near future.
Later I will also include this information into `RewrittenAs` variant of `CommitSyncOutcome`, so that we expose it to real users. I did not do it in this diff to keep is small and easy to review. And because the other part is not ready :P
Reviewed By: StanislavGlebik
Differential Revision: D22255785
fbshipit-source-id: 4312e9b75e2c5f92ba018ff9ed9149efd3e7b7bc
Summary: When I've implemented this method I didn't test it for preserving the order of the input changesets and I've noticed my mistake when I was testing the scmquery part.
Reviewed By: StanislavGlebik
Differential Revision: D22374981
fbshipit-source-id: 4529f01370798377b27e4b6a706fc192a1ea928e
Summary:
Add the `scsc list-bookmarks` command, which lists bookmarks in a repository.
If a commit id is also provided, `list-bookmark` will be limited to bookmarks that
point to that commit of one of its descendants.
Reviewed By: mitrandir77
Differential Revision: D22361240
fbshipit-source-id: 17067ba47f9285b8137a567a70a87fadcaabec80
Summary:
There is inevitably interaction between caching, deduplication and rate
limiting:
- You don't want the rate limiting to be above caching (in the blobstore stack,
that is), because you shouldn't rate limits cache hits (this is where we are
today).
- You don't want the rate limiting to below deduplication, because then you get
priority inversion where a low-priority rate-limited request might hold the
semaphore while a higher-priority, non rate limited request wants to do the
same fetch (we could have moved rate limiting here prior to introducing
deduplication, but I didn't do it earlier because I wanted to eventually
introduce deduplication).
So, now that we have caching and deduplication in the same blobstore, let's
also incorporate rate limiting there!.
Note that this also brings a potential motivation for moving Memcache into this
blobstore, in case we don't want rate limiting to apply to requests before they
go to the _actual_ blobstore (I did not do this in this diff).
The design here when accessing the blobstore is as follows:
- Get the semaphore
- Check if the data is in cache, if so release the semaphore and return the
data.
- Otherwise, check if we are rater limited.
Then, if we are rate limited:
- Release the semaphore
- Wait for our turn
- Acquire the semaphore again
- Check the cache again (someone might have put the data we want while we were
waiting).
- If the data is there, then return our rate limit token.
- If the data isn't there, then proceed to query the blobstore.
If we aren't rate limited, then we just proceed to query the blobstore.
There are a couple subtle aspects of this:
- If we have a "late" cache hit (i.e. after we waited for rate limiting), then
we'll have waited but we won't need to query the blobstore.
- This is important when a large number of requests from the same key
arrive at the same time and get rate limited. If we don't do this second
cache check or if we don't return the token, then we'll consume a rate
limiting token for each request (instead of 1 for the first request).
- If a piece of data isn't cacheable, we should treat it like a cache hit with
regard to semaphores (i.e. release early), but like a miss with regard to
rate limits (i.e. wait).
Both of those are addressed captured in the code by returning the `Ticket` on a
cache hit. We can then choose to either return the ticket on a cache hit, or wait
for it on a cache miss.
(all of this logic is captured in unit tests, we can remove any of the blocks
there in `Shards::acquire` and a test will fail)
Reviewed By: farnz
Differential Revision: D22374606
fbshipit-source-id: c3a48805d3cdfed2a885bec8c47c173ee7ebfe2d
Summary:
Sometimes we take a token then realize we don't want it. In this case, giving it back is convenient.
This adds this!
Reviewed By: farnz
Differential Revision: D22374607
fbshipit-source-id: ccf47e6c75c37d154704645c9e826f514d6f49f6
Summary:
This is a mirror image of a diff, which made backsyncer use `LiveCommitSyncConfig`: we want to use configerator-based live configs, when we run in the continuous tailing mode.
As no-op iteration time used to be 10s and that's a bit wasteful for tests, this diff changes it to be configurable.
Finally, because of instantiating various additional `CommitSyncerArgs` structs, this diff globs out some of the `using repo` logs (which aren't very useful as test signals anyway, IMO).
Reviewed By: StanislavGlebik
Differential Revision: D22209205
fbshipit-source-id: fa46802418a431781593c41ee36f468dee9eefba
Summary: Tidy up some comments in this file.
Reviewed By: ikostia
Differential Revision: D22376165
fbshipit-source-id: ce4760776048aa8e72809b4f828d0ea426fcf878
Summary: This diff actually start to use the option
Reviewed By: krallin
Differential Revision: D22373943
fbshipit-source-id: fe23da9c3daa1f9f91a5ee5e368b33e0091aa9c1
Summary:
Previously if a blame request was rejected (e.g. because a file was too large)
then we returned BlameError::Error.
This doesn't look correct, because there's BlameError::Rejected. This diff
makes it so that fetch_blame function returns BlameError::Rejected
Reviewed By: aslpavel
Differential Revision: D22373948
fbshipit-source-id: 4859809dc315b8fd66f94016c6bd5156cffd7cc2
Summary:
In the next diffs we'll need to read override_blame_filesize_limit from derived
data config, and this config is stored in BlobRepo.
this diff makes a small refactoring to pass BlobRepo to fetch_full_file_content
Reviewed By: krallin
Differential Revision: D22373946
fbshipit-source-id: b209abce82c0279d41173b5b25f6761659a92f3d
Summary: This will make adding blame file size limit override the next diffs easier
Reviewed By: krallin
Differential Revision: D22373945
fbshipit-source-id: 4857e43c5d80596340878753ea90bf31d7bb3367
Summary:
We're always yielding zero or one child during traversal, bounded traversal is
unnecessary here
Differential Revision: D22242148
fbshipit-source-id: b4c8a1279ef7bd15e9d0b3b2063683f45e30a97a
Summary:
Let's use new option in CLI. Unfortunately we can't easily accept commit ids in
named params so it has to be a postional one.
Differential Revision: D22234412
fbshipit-source-id: a9c27422fa65ae1c42cb1c243c7694507a957437
Summary:
If anything were to go wrong, we'd be happy to know which puts we ignored. So,
let's log them.
Reviewed By: farnz
Differential Revision: D22356714
fbshipit-source-id: 5687bf0fc426421c5f28b99a9004d87c97106695
Summary:
Eventually, I plan to make this the default, but for now I'd like to make it
something we can choose to turn on or off as a cmd argument (so we can start
with the experimental tier and Fastreplay).
Note that this mixes volatile vs. non-volatile pools when accessing the pools
for cacheblob. In practice, those pools are actually volatile, it's just that
things don't break if you access them as non-volatile.
Reviewed By: farnz
Differential Revision: D22356537
fbshipit-source-id: 53071b6b21ca5727d422e10f685061c709114ae7
Summary:
I canaried this on Fastreplay, but unfortunately that showed that sometimes we
just deadlock, or get so slow we might as well be deadlocked (and it happens
pretty quickly, after ~20 minutes). I tried spawning all the `get()` futures,
and that fixes the problem (but it makes gettreepack noticeably slower), so
that suggests something somewhere is creating futures, polling them a little
bit, then never driving them to completion.
For better or worse, I'd experienced the exact same problem with the
ContextConcurrencyBlobstore (my initial attempt at QOS, which also used a
semaphore), so I was kinda expecting this to happen.
In a sense, this nice because I we've suspected there were things like that in
the codebase for a while (e.g. with the occasional SQL timeout we see where it
looks like MySQL responds fast but we don't actually poll it until past the
timeout), and it gives us a somewhat convenient repro.
In another sense, it's annoying because it blocks this work :)
So, to work around the problem, for now, let's spawn futures to force the work
to complete when a semaphore is held. I originally had an unconditional spawn
here, but that is too expensive for the cache-hit code path and slows things
down (by about ~2x).
However, having it only if we'll query the blobstore isn't not as expensive,
and that seems to be fine (in fact it is a ~20% p99 perf improvement,
though the exact number depends on the number of shard we use for this, which I've had to tweak a bit).
https://pxl.cl/1c18H
I did find what I think is one potential instance of this problem in
`bounded_traversal_stream`, which is that we never try to poll `scheduled` to
completion. Instead, we just poll for the next ready future in our
FuturesUnordered, and if that turns out to be synchronous work then we'll just
re-enqueue more stuff (and sort of starve async work in this FuturesUnordered).
I tried updating bounded traversal to try a fairer implementation (which polls
everything), but that wasn't sufficient to make the problem go away, so I think
this is something we have to just accept for now (note that this actually has
some interesting perf impact in isolation: it's a free ~20% perf improvement on
p95+: https://pxl.cl/1c192
see 976b6b92293a0912147c09aa222b2957873ef0df if you're curious
Reviewed By: farnz
Differential Revision: D22332478
fbshipit-source-id: 885b84cda1abc15c51fbc5dd34473e49338e13f4
Summary: Like it says in the title. Those are useful!
Reviewed By: farnz
Differential Revision: D22332479
fbshipit-source-id: f9bddad75fcbed2593c675f9ba45965bd87f1575
Summary:
The goal of this blobstore is to dedupe reads by waiting for them to finish and
hit cache instead (and also to dedupe writes, but that's not relevant here).
However, this is not a desirable feature if a blob cannot be stored in cache,
because then we're serializing accesses for no good reason. So, when that
happens, we store "this cannot be stored in cache", and we release reads
immediately.
Reviewed By: farnz
Differential Revision: D22285269
fbshipit-source-id: be7f1c73dc36b6d58c5075172e5e3c5764eed894
Summary:
I'm going to store things that aren't quite the exact blobs in here, so on the
off chance that we somehow have two caching blobstores (the old one and this
one) that use the same pools, we should avoid collisions by using a prefix.
And, since I'm going to use a prefix, I'm adding a newtype wrapper to not use
the prefixed key as the blobstore key by accident.
Differential Revision: D22285271
fbshipit-source-id: e352ba107f205958fa33af829c8a46896c24027e
Summary:
This introduces a caching blobstore that deduplicates reads and writes. The
underlying motivation is to improve performance for processes that might find
themsleves inadvertently reading the same data concurrently from a bunch of
independent callsites (most of Mononoke), or writing the same bit of data over
and over again.
The latter is particularly useful for things like commit cloud backfilling in
WWW, where some logger commits include the same blob being written hundreds or
thousands of times, and cause us to overload the underlying Zippy shard in
Manifold. This is however a problem we've also encountered in the past in e.g.
the deleted files manifest and had to solve there. This blobstore is a little
different in the sense that it solves that problem for all writers.
This comes at the cost of writes being dropped if they're known to be
redundant, which prevents updates through this blobstore. This is desirable for
most of Mononoke, but not all (notably, for skiplist updates it's not great).
For now, I'm going to add this behind an opt-in flag, and later on I'm planning
to make it opt-out and turn it off there (I'm thinking to use the CoreContext
for this).
Reviewed By: farnz
Differential Revision: D22285270
fbshipit-source-id: 4e3502ab2da52a3a0e0e471cd9bc4c10b84a3cc5
Summary: This allowed me to compare two alternative approaches to queue draining, and generally seems like a useful thing to do.
Reviewed By: krallin
Differential Revision: D22364733
fbshipit-source-id: b6c76295c85b4dec6f0bfd7107c30bb4e4a28942
Summary: It's useful to derive all enabled derived data at once
Reviewed By: krallin
Differential Revision: D22336338
fbshipit-source-id: 54bc27ab2c23c175913fc02e6bf05d18a54c249c
Summary:
We've recently added an option to perform a stack move in megarepolib. A "stack
move" it's a stack of commits that move a files according to a mover. Now let's
expose it in the megarepotool
Reviewed By: ikostia
Differential Revision: D22312486
fbshipit-source-id: 878d4b2575ed2930bbbf0b9b35e51bb41393e622
Summary:
Printing it can take too much time.
Use a larger threshold than `list` or `set`, since `hg` commonly uses a dict
for command line options, and that can exceeds 8 items.
As we're here, also fix the traceback order so it's "most recent call last".
Reviewed By: kulshrax
Differential Revision: D22362003
fbshipit-source-id: 3d2f4664bec6b4cfaf42b8e5d2fc47b0f3d96411
Summary:
In order to do what the title says, this diff does:
1. Add the `eden/oss/.../third-party/rust/.../Cargo.toml` files. As mentioned in the previous diff, those are required by GitHub so that the third party dependencies that are local in fbsource are properly defined with a "git" dependency in order for Cargo to "link" crates properly.
2. Changes to `eden/scm/Makefile` to add build/install commands for getdeps to invoke. Those command knowing that they are called from withing getdeps context they link the dependencies brought by getdeps into their proper places that match their folder layout in fbsource. Those Makefile commands also pass a GETDEPS_BUILD env to the setup.py invocations so that it knows it is being called withing a getdeps build.
3. Changes to `eden/scm/setup.py` that add "thriftasset" that makes use of the getdeps.py provided "thrift" binary to build .py files out of thrift files.
4. Changes to `distutils_rust` to use the vendored crates dir provided by getdeps.
5. Changes to `getdeps/builder.py` and `getdeps/manifest.py` that enable more fine-grained configuratior of how Makefile builds are invoked.
6. Changes to `getdeps/buildopts.py` and `getdeps/manifest.py` to disable overriding PATH and pkgconfig env, so that "eden/scm" builds in getdeps using system libraries rather than getdeps-provided ones (NOTE: I've tried to use getdeps provided libraries, but the trickiest bit was that Rust links with Python, which is currently not providable by getdeps, so if you try to build everything the system provided Python libraries will collide with getdeps provided ones)
7. Added `opensource/fbcode_builder/manifests/eden_scm` for the getdeps build.
Reviewed By: quark-zju
Differential Revision: D22336485
fbshipit-source-id: 244d10c9e06ee83de61e97e62a1f2a2184d2312f
Summary:
The test case assumed that clone would return data in order from the server.
That is not a valid assumption and Mononoke doesn't return data in order.
Reviewed By: xavierd
Differential Revision: D22364636
fbshipit-source-id: abfcbe0074a08c9a76c42d351ce5c792eb65e24f