Summary:
In a future diff we want to start allowing a background thread to fetch
tree data while the diff algorithm is running. To do so, we need the gil to not
be held during the diff algorithm. But, the diff algorithm needs to run the
matcher, and the matcher may be in Python. So let's introduce a new matcher
wrapper that acquires the gil.
In the future we should get rid of the python matchers entirely, which will
eliminate this problem.
Reviewed By: kulshrax
Differential Revision: D29971826
fbshipit-source-id: 8a9ba0ea65a0b4748e39178cdf4a08c922755b02
Summary:
In a future diff we'll want to run the treemanifest diff algorithm on a
separate thread from Python, so Python can be used for fetching from a parallel
thread. To do this, we need the tree to be accessible across threads, so let's
put it behind an Arc<RwLock<>>.
Reviewed By: quark-zju
Differential Revision: D29971825
fbshipit-source-id: 6b3ef1025eb7840b905bf60785e05da96980a2d7
Summary:
Previously treemanifest kept a reference to the Python store, and any
call would have to go up to Python then come back down to Rust. Since our stores
are now 100% Rust (at least at the top layer), let's just downcast it to the
appropriate Rust store and store that instead.
This helps in a future diff where we want to access the store without taking the
gil.
Reviewed By: kulshrax
Differential Revision: D29971828
fbshipit-source-id: 77ff11897045282c9e6a6029b126dcdd20c8e9db
Summary:
The diff introduces several small changes:
* It adds logging for the blob size, which can be useful to analyze latency of the `put`/`get` operations.
* Logging of the multiplex id as a multiplexed blobstore configuration used.
* I also added sampling for the `get`/`is_present` with the same rate as is used in blobstore trace table (it seems reasonable to me). `put` is not sampled, because it's not in the blobstore trace. Errors and "some failed others none" are not sampled either.
* Also some small refactoring to make the code look better.
Reviewed By: StanislavGlebik
Differential Revision: D30490848
fbshipit-source-id: a4fef8a1f1f7622054c75afbe09fe4a55d44ac19
Summary: Added a kill switch to enable/disable predictive prefetch profiles similar to the existing one for regular prefetch profiles (D24803728 (7dccb8a49f)). This can be set manually in a user's config or via the cli `eden prefetch-profile disable-predictive/enable-predictive` commands.
Reviewed By: genevievehelsel
Differential Revision: D30404139
fbshipit-source-id: 01900f4030ef6991124f89a67ea404ff2f07ffeb
Summary:
Added eden prefetch-profile activate-predictive/deactivate-predictive subcommands to activate and deactivate predictive prefetch profiles. This will update the checkout config to indicate if predictive prefetch profiles are currently active or not, and stores the overridden num_dirs if specified on activate (--num-dirs N). If activate is called twice with different num_dirs, the value is updated (only one is stored). Unless --skip-prefetch is specified, a predictive prefetch with num_dirs globs (or the default inferred in the daemon) is run.
Also added fetch-predictive [--num-dirs N], which will:
1. if num_dirs is specified: fetch num_dirs globs predictively
2. if num_dirs is not specified, and predictive fetch is active: get the active num_dirs from the checkout config and fetch globs predictively
3. if num_dirs is not specified, and predictive fetch is not active: fetch the default num_dirs (inferred in the daemon)
Added --if-active to fetch-predictive. If set, fetch will not run if predictive prefetch profiles have not been activated (predictive-prefetch-active in checkout-config). Used for post pull hook.
Reviewed By: genevievehelsel
Differential Revision: D30306235
fbshipit-source-id: ba02c2bc976128704c8ab0c3d567637265b7c95d
Summary:
Part of Rust-analyzer update.
Updated the affected sources (migrage from Url#into_string to Into#into
Reviewed By: jsgf
Differential Revision: D30344564
fbshipit-source-id: fc3ccbe25d7b3d9369a01dfb6b7f8e6a200a7083
Summary:
Made changes to ensure that numResults is always a 32 bit unsigned int, and startTime and endTime are 64 bit unsigned ints. This is to ensure consistency across the smartservice and the endpoint in the daemon.
Also, updated the scuba query in the smartservice to only consider dirs with > 1 access (may update this later to accept a configurable lower bound on access count, but for now, including access=1 doesn't make sense).
Reviewed By: genevievehelsel
Differential Revision: D30396526
fbshipit-source-id: 10e7bd969928da91ab29d413280a1ff956db438c
Summary:
This is now only used in HgQueuedBackingStore::logBackingStoreFetch, and
manually inlining it allows for the lock to be taken once instead of once per
path, reducing the number of times the lock needs to be acquired.
Differential Revision: D30494771
fbshipit-source-id: 2d59d0343e48051e4d9c4fc196e66bcb79e7ac71
Summary: While `eden trace hg` already prints queue time when it's over 1ms, this diff adds fb303 counters for import tree/block queue time so that we can have the percentiles.
Reviewed By: xavierd
Differential Revision: D30492275
fbshipit-source-id: 3601aeb9b51b2f55f189a0e0a753fd6ef29d7341
Summary: Currently, the store loops through the requests, calls HgImporter, then waits with `getTry`. This diff makes the change to kickoff all tree imports from HgImporter then waits for future fulfillment with `collectAll`.
Reviewed By: xavierd
Differential Revision: D30486459
fbshipit-source-id: 918e52be818a2064cf04d24f455d23c1ca618434
Summary:
This allows us to get more insights on race condition (ex. pull reverts part of
what cloud sync did) issues.
Reviewed By: andll
Differential Revision: D30415135
fbshipit-source-id: c99ce77d2748e503aea523e485be5b7a57ee8b98
Summary: The "this" and "other" changed should be swapped.
Reviewed By: andll
Differential Revision: D30415134
fbshipit-source-id: 7c14294c6a5926547960e236983879f3c6b746bd
Summary:
Instead of having 2 functions with one taking a single proxy hash, and the
other taking a vector, we can simply have a single function taking a
`folly::Range` and pass a range of one for the single proxy hash case.
Reviewed By: chadaustin
Differential Revision: D30490724
fbshipit-source-id: 5d57f5a5ffc2a5085369c61a2318edd54b24b448
Summary:
Many diffs include multiple tags as prefixes, however the current implementation only greedily takes prefix as the substring up to ']'.
## Issues
Current behaviour for
- **non-tag right bracket:** `My fake diff title with [link]()` -> `My fake diff title with [link]`
- **multiple tags:** `[hg][extensions] fake diff title` -> `[hg]`
## Solution
Use regex to capture all prefix tags:
```
(?:\[.*?\])+
```
## Explanation
- Non capturing group ( `(?: ... )` )
- matching the open bracket ( `\[` )
- and any character as few times as necessary (lazy) ( `.*?` )
- and the closing bracket ( `\]` )
- matching the group as many times as needed (greedy) ( `+` )
Also note `re.match()` matches from the beginning of the string (unlike `re.search()`)
Reviewed By: ronmrdechai
Differential Revision: D30415867
fbshipit-source-id: e09d4e6d2759d0106d41d1a5d4e607ec34eef3fa
Summary:
By default, atomics are using the most strict memory ordering forcing barriers
to be used. Since this atomic doesn't need any ordering, we can make it
relaxed.
Reviewed By: chadaustin
Differential Revision: D30459630
fbshipit-source-id: ff50aac919031d9bae8b870b41a6134331546a5f
Summary:
The recordFetch is an implementation detail of a BackingStore and thus we don't
need to explicitely make it virtual.
Differential Revision: D30459635
fbshipit-source-id: 34f847ca906f81924c99c26b4e8af646e91fd735
Summary:
When prefetching a large number of blobs, repeatly checking whether we should
log accesses to files can become expensive. Since the state of the config isn't
expected to change in the entire batch, we can simply test it once and bail if
logging isn't enabled.
Reviewed By: chadaustin
Differential Revision: D30458698
fbshipit-source-id: b48b9e0ad24585a76d8ce5948f5831db27e08eab
Summary: Looks like we never use this, thus let's simply remove it.
Differential Revision: D30454812
fbshipit-source-id: 28242a2144da4bab9d24debc1a60eeebcdcbaad5
Summary:
When a prefetch request is transformed into many blob requests, we query
RocksDB sequentially for all the proxy hashes, this can be quite expensive and
is also far less efficient than querying RocksDB concurrently with all the
hashes.
As a bonus, this also futurize the code a bit.
Reviewed By: chadaustin
Differential Revision: D30454068
fbshipit-source-id: 5fd238b752a662919e739451c0c1e92f66919ebf
Summary:
Since these are always used as SemiFuture, let's simply make them SemiFuture
from the get go.
Differential Revision: D30452901
fbshipit-source-id: b0863f363ce0cdb921a73d02c43fc82c1614a3dc
Summary:
Looking at strobelight when performing an `eden prefetch` shows that a lot of
time is spent copying data around. The list of hash to prefetch is for instance
copied 4 times, let's reduce this to only one time when converting Hash to a
ByteRange.
Reviewed By: chadaustin
Differential Revision: D30433285
fbshipit-source-id: 922e6e5c095bd700ee133e9bb219904baf2ae1ac
Summary:
Once the request has been dequeued, we no longer need to hold the lock, thus
let's release it to allow other threads to enqueue/dequeue requests.
Differential Revision: D30409797
fbshipit-source-id: a527c67a6bd9f47da5a3930364fd8fae0d1bc427
Summary:
In vast majority of cases I expect file/directory history to be linear i.e. no
merges. In that case there's no need to fetch generation number.
Since fetching generation number can trigger reads from db I'd rather avoid
doing that if that's not necessary, and this is what this diff does - it
doesn't start fetching generation numbers while we have a linear history.
Reviewed By: mitrandir77
Differential Revision: D30483093
fbshipit-source-id: 526fd33619c70cc4e0bb033a0048250b650fb2be
Summary:
To be precise, it's ordering by generation number and parent order i.e. we'd
like to show first parents ahead of second parents.
Note that parent ordering is very basic at the moment, and won't always order
commits correctly when both parents have the same generation numbers. We can
improve it in the future, but I believe it shouldn't be a big issue now.
Reviewed By: mitrandir77
Differential Revision: D30483089
fbshipit-source-id: 67e13b5757831d652b57d6ad42b6135005a0b621
Summary:
There were two variables that controlled the output of `list_file_history` -
one was `history` variable, another was `bfs`. `history` variable got all
immediate ancestors, and bfs was used to fetch new ancestors.
This split makes it hard to add generation number ordering, so in this diff I
suggest to remove `history` variable altogether and just use `bfs` to control
the order of the history.
Reviewed By: mitrandir77
Differential Revision: D30483094
fbshipit-source-id: 0a4cac771383e17e61f58354a30d4e6db7e6547f
Summary:
They will need to become async in the next diffs, let's make them async now to
make next diffs smaller
Reviewed By: mitrandir77
Differential Revision: D30483091
fbshipit-source-id: 8174a2d4618a7dd2721d00d7acd7d700bd57afd1
Summary:
In the next diffs I'd like to make it possible to change the ordering of
commits - currently we only support "bfs order", while I'd like to add
"generation number order".
Reviewed By: mitrandir77
Differential Revision: D30483090
fbshipit-source-id: 82d5a14b26495f5583ca38793023ce3521682237
Summary:
Simple binary that can be used for syncing changesets to hg servers. It's very
simple - it just moves a bookmark, and then wait until hg sync job syncs it
(obviously it means that if sync job is not running, then it isn't going to be
synced).
It also supports syncing only a straight line of commits with no merges -
merges add complexity here, so I decided to not deal with this complexity for
now.
Reviewed By: mitrandir77
Differential Revision: D30447234
fbshipit-source-id: e4624586e4fc53212c1b13a2cd622aa9474a20b8
Summary:
This diff is a step towards uploading snapshots to the ephemeral blobstore.
It adds:
- EphemeralChangesets implementation. This is the trait used for storing changesets, and also their generation numbers. Here we are using a SQL table to store mappings between snapshots and bubbles, as well as their generation number. It fetches information from the blobstore as well, which is different from "non-snapshots" as well, but this can be later optimised to use another table if necessary.
- EphemeralRepoView, a container that has a changesets object and a repo_blobstore, both of which first check the ephemeral blobstore, and then the persistent blobstore, and are useful for dealing with snapshots.
Reviewed By: StanislavGlebik
Differential Revision: D30370979
fbshipit-source-id: bf8e1d3c111d307c1ffbad56e1255a77a4871591
Summary:
It's `O(tr.get('nodes'))`, which does not scale. With BFS prefetch, it's not
that slow if the tree isn't prefetched. It has already been disabled for major
repos D23912965, and causes slowness with lazy changelog (see below).
Reviewed By: StanislavGlebik
Differential Revision: D30454407
fbshipit-source-id: 8027b5e5f1ee09a5f1ffe98a638585345464dd3d
Summary: This will let ```SetPathRootId``` suppose files
Reviewed By: chadaustin
Differential Revision: D29978308
fbshipit-source-id: df22af8bce4a707a7db51ef543c0e3e78cdcef06
Summary: This diff renames ```SetPathRootId``` to ```SetPathObjectId``` as we want to support BLOB
Reviewed By: chadaustin
Differential Revision: D30404536
fbshipit-source-id: f34446ec20aeaf87f5f61e29e421a9bceb0b2a4a
Summary: This will add the same getTreeEntryForRootId to ObjectStore
Reviewed By: chadaustin
Differential Revision: D29920475
fbshipit-source-id: 15bfc6a2ba70cce2095dfcf1f434fd7087605e04
Summary: Add a new method to backingstore so we can get TreeEntry by rootID
Reviewed By: chadaustin
Differential Revision: D29889482
fbshipit-source-id: 93e63624e75c7d559c4de6f68821a8efa0e0c184
Summary: The nth ancestor revset was crashing with multiple base revisions, e.g. "(. + .^)~". Fix variable shadowing issue in revset.ancestorspec.
Differential Revision: D30375233
fbshipit-source-id: 37a78bf1000a40872600e587733a84029f68343b
Summary: This is an untended pop and would throw if there is no stale apfs volumes (and would remove one less volume if there are stale volumes).
Reviewed By: xavierd
Differential Revision: D30432642
fbshipit-source-id: 193d9c15f393a66bc8b43b5f31579c1fe972a7f1
Summary:
Restructure the interface of `http-client::AsyncResponse` to make it easier to avoid misuse.
Specifically, both async and non-async responses now consist of two parts: a head (represented by the new `Head` type) and a body. This solves the problem of being able to access the response headers while consuming the response body: there is now an `into_parts` method on `AsyncResponse` that returns `(Head, AsyncBody)`, decoupling ownership of the parts. This approach was inspired by `hyper::Response`.
Previously, this was accomplished by allowing the body to be moved out of the response and replaced with an empty body. This meant that subsequent calls could incorrectly receive an empty body.
Additionally, `AsyncBody` is now an actual type (instead of an alias) which exposes `raw` and `decoded` methods for accessing the body stream. This makes it very explicit what's happening under the hood, and also minimizes the chance of the user forgetting to decode the response.
The new interface looks like:
```
(head, body) = res.into_parts();
// Choose one of the following:
let decoded_content = body.decoded(); // Automatically decompressed content.
let cbor_content = body.cbor(); // Content as deserialized CBOR entries.
let raw_content = body.raw(); // Raw on-wire content.
// Can still access response headers and status.
let status = head.status();
```
One-line usage is still possible with this interface:
```
let content = res.into_body().decoded().try_concat().await?;
```
Reviewed By: yancouto
Differential Revision: D30436322
fbshipit-source-id: 59911afc34b356a9e3295828ac63da5e295f77a6
Summary: Now that Mercurial's `http-client` crate has built-in support for decompressing responses, use that instead of manually doing it in the LFS code.
Reviewed By: andll
Differential Revision: D30269969
fbshipit-source-id: 9189aa1193e947625c1c98735303e0e038b88901
Summary:
In order to support compressed EdenAPI responses, Mercurial's `http-client` needs to be able to understand the `Content-Encoding` response header.
Since we're using libcurl under the hood, ordinarily we'd just need to set `CURLOPT_ACCEPT_ENCODING`, which sets the `Accept-Encoding` header in the request, and causes libcurl to automatically decompress the response.
Unfortunately, it seems that the Rust bindings build libcurl in a way without support for modern compression algorithms like `zstd` and `brotli`. (When I tested it, it seemed to only support `gzip` and `deflate`.) Since we explicitly want to support `zstd` compression, we have no choice but to decompress the received data ourselves.
Reviewed By: andll
Differential Revision: D30267341
fbshipit-source-id: 8627471ec38669fd9836622cd127423c67f2458e
Summary: Add getters for accessing the fields of `Response` and `AsyncResponse`, and make the fields private. This will make it easier to add support for automatic content decompression.
Reviewed By: yancouto
Differential Revision: D30270216
fbshipit-source-id: 8717f127775286ae799df6bcbe0c47b3aa46aa8d
Summary:
The RelativePath is always built from a valid valid one, thus re-validating it
is not necessary.
Reviewed By: chadaustin
Differential Revision: D30410686
fbshipit-source-id: 3e46359f68b1693a0a2af310466fc73d105cf2c0
Summary:
This adds allowlisted configs to FS trace event sample, which would facilitate A/B testing and parameter tuning. For example, if we want to verify if a larger `hg:import-batch-size` would speed up read operations, we can:
1. split users into two groups, one having size of 16 and another having 32.
2. make sure `hg:import-batch-size` is included in `telemetry:request-sampling-config-allowlist` config.
3. wait for events to populate and compare the durations.
Reviewed By: xavierd
Differential Revision: D30322855
fbshipit-source-id: b3cbdcb64f78d35b8708948db495b2d956cab327
Summary:
The current fb303 counters only report aggregated latency while we want to track Eden performance under different version, os, channel, and configs. So I am setting up a new logging mechanism for this purpose.
This diff introduces the class `FsEventLogger` for sampling and logging. There are 3 configs introduced by this diff. The configs are reloaded every 30 minutes.
1. `telemetry:request-sampling-config-allowlist`
A list of config keys that we want to attach to scuba events.
2. `telemetry:request-samples-per-minute`
Max number of events logged to scuba per minute per mount.
3. `telemetry:request-sampling-group-denominators`
* Each type of operation has a "sampling group" (defaulted to 0, which is dropping all).
* We use this sampling group as index to look up its denominator in this config.
* The denominator is then used for sampling. e.g. `1/x` of the events are send to scuba, if we haven't reached the cap specified by #2.
Example workflow:
1. receive tracing event
2. look up denominator of the sampling group of the operation type
3. sample based on the denominator
4. check that we have not exceeded the logging cap per min
5. create sample and send to scribe
Reviewed By: xavierd
Differential Revision: D30288054
fbshipit-source-id: 8f2b95c11c718550a8162f4d1259a25628f499ff
Summary:
For large megarepos that are result of merging multiple smaller repos
navigating to pre-merge history is usually not what user wants. The checkout
will be slow (in non-edenfs repos), the tooling that expects some repo
structure won't work etc.
Reviewed By: StanislavGlebik
Differential Revision: D30394205
fbshipit-source-id: 23fc4fc31bf01d4cc14f6e3baa1e1165a26a1896