Summary:
Blobstore healer has a logic, which prevents it from doing busy work, when the
queue is empty. This is implemented by means of checking whether the DB query
fetched the whole `LIMIT` of values. Or that is the idea, at least. In
practice, here's what happens:
1. DB query is a nested one: first it gets at most `LIMIT` distinct
`operation_key` entries, then it gets all rows with such entries. In practice
this almost always means `# of blobstores * LIMIT` rows, as we almost always
succeed writing to every blobstore
2. Once this query is done, the rows are grouped by the `blobstore_key`, and a
future is created for each such row (for simplicity, ignore that future may not
be created).
3. We then compare the number of created futures with `LIMIT` and report an
incomplete batch if the numbers are different.
This logic has a flaw: same `blobstore_key` may be written multiple times with
different `operation_key` values. One example of this: `GitSha1` keys for
identical contents. When this happens, grouping from step 2 above will produce
fewer than `LIMIT` groups, and we'll end up sleeping for nothing.
This is not a huge deal, but let's fix it anyway.
My fix also adds some strictly speaking unnecessary logging, but I found it
helpful during this investigation, so let's keep it.
The price of this change is collecting two `unique_by` calls, both of which
allocates a temporary hash set [1] of the size `LIMIT * len(blobstore_key) * #
blobstores` (and another one with `operation_key`). For `LIMIT=100_000`
`len(blobstore_key)=255`, `# blobstores = 3` we have roughly 70 mb for the
larger one, which should be ok.
[1] https://docs.rs/itertools/0.9.0/itertools/trait.Itertools.html#method.unique
Reviewed By: ahornby
Differential Revision: D22293204
fbshipit-source-id: bafb7817359e2c867cf33c319a886653b974d43f
Summary:
Besides the obvious typos in the code, it would also never go past the first
\n, and the splitted list first element would be an empty file...
Reviewed By: DurhamG
Differential Revision: D22324297
fbshipit-source-id: 339df3a4e4acc4e0630ffa3a7b0b3d266a20b666
Summary:
To check if dynamicconfigs are being generated or not, let's log the
age of the config to dev_command_timers. We also log the age of the remote_cache
so we can detect if hosts are not fetching data from the server.
Reviewed By: quark-zju
Differential Revision: D22323516
fbshipit-source-id: 1fdeef3eaa5d58566606bfc57d8ad96e752db811
Summary: Move old EdenAPI crate to `scm/lib/edenapi/old` to make room for the new crate. This old code will eventually been deleted once all references to it are removed from the codebase.
Reviewed By: quark-zju
Differential Revision: D22305173
fbshipit-source-id: 45d211340900192d0488543ba13d9bf84909ce53
Summary: This diff adds a new `CborStream` type which wraps a `TryStream` of bytes and attempts to deserialize the data from CBOR as it is received. This allows the caller to begin processing deserialized values while the download is still going on, which is important in situations where a server might return a large streaming response composed of many small entries (as is the case with EdenAPI).
Reviewed By: quark-zju
Differential Revision: D22280745
fbshipit-source-id: 9d7fa52e4e67cf7de6bed37c28e5e7afd69449cd
Summary: It is possible to properly initialize the response buffer without using `once_cell`, so remove it to simplify the code.
Reviewed By: quark-zju
Differential Revision: D22264747
fbshipit-source-id: 5890cac7fa598fc80cfd017eae111c0b9fdc6227
Summary:
Add the methods `send_async` and `send_async_with_progress` to `HttpClient`. These methods provide a `Futures`-based async interface that will make it easy to use the client from async Rust code.
Just like in D22231555, this is built on top of the previously introduced streaming API. When a batch request is sent, the client will start a Multi session on a task in a blocking worker thread using `tokio::task::spawn_blocking`. This means that right now, the implementation is not /truly/ async, but it should be possible to change the implementation in the future to avoid using any blocking I/O without needing to change the public interface.
Since all of the requests are part of the same Multi session, they will all proceed concurrently and, if possible, be multiplexed over the same connection in the case of multiple HTTP/2 requests to the same server (which is going to be our main use-case).
Unfortunately, since libcurl does not have any internal synchronization, ownership of the Multi session needs to be passed to the worker thread, meaning that the Multi handle will be dropped once the requests are complete. This means that connections will not be re-used when these methods are called several times serially. The API should make it obvious to the user that the internal state is not preserved since these methods both consume the `HttpClient` itself.
Reviewed By: quark-zju
Differential Revision: D22251488
fbshipit-source-id: 37caf64024cb12b95df5124379209550899d093d
Summary: In order to maximize the surface area of the client that gets exercised by the `http_cli` binary, make it use the new async interface (even though it is not strictly necessary here). Since the async interface is ultimately built on top of the synchronous interface, sending requests this way will still exercise parts of the synchronous interface too.
Reviewed By: quark-zju
Differential Revision: D22242944
fbshipit-source-id: 80d73cc152d0c38673436457c1e1040e4198095e
Summary:
This diff adds `AsyncResponse`, an asynchronous version of `Response` built on top of the `Streaming` handler along with a new `ChannelReceiver` that forwards the received data into async channels. The result is essentially a shim that makes libcurl usable with async/await syntax.
`ChannelReceiver` currently uses unbounded channels. This should be OK because the space usage of the channel should increase [at most] at the same rate as if we used `Buffered`. Essentially, in the worst case (if nothing was consuming items from the channel), this would have the same behavior as if we had simply used a non-streaming handler.
Reviewed By: quark-zju
Differential Revision: D22231555
fbshipit-source-id: 6ee767355bfce6d400f447ee690d974666751f16
Summary: Add a new `Header` type that represents a parsed header line. Notably, `libcurl` treats both the initial status line and trailing CRLF as headers; the new struct handles these cases in a strongly-typed way. This is particularly useful when working with streaming responses, as this means that the `Receiver` can tell the status code upfront rather than waiting until the end of the request, and the `Receiver` can tell once all headers (except for trailers) have been received.
Reviewed By: quark-zju
Differential Revision: D22228632
fbshipit-source-id: 06f1a21d7af25b37269bb449a1e53237ec74490a
Summary:
Add a new `Streaming` handler that allows for asynchronously processing streaming responses. A user of this crate can create a struct that implements the new `Receiver` trait, and this struct will be provided with data as it comes in from the network.
Two new methods have been added to `HttpClient` to make streaming requests: `stream` and `stream_with_progress`. Notably, these new methods are not themselves asynchronous. Just like `HttpClient::send` and `send_with_progress`, there methods will block until all of the given requests have completed.
The difference is that in this case, rather than buffering all of the data for every request in the batch, each request's associated `Receiver` will be called every time the request makes progress. This will avoid excessive buffering and allow data to be processed as it arrives.
Reviewed By: quark-zju
Differential Revision: D22201914
fbshipit-source-id: ce586b05a008b371557c84957aa5aa59afcc5c7c
Summary:
D22235561 (c1a6d6fd21) changed eden's dirstate.py to expect unicode strings in
python 3, but in eden_dirstate_map we were intentionally encoding them first.
Let's fix that to use unicode strings all the way through until the
serialization.
Reviewed By: quark-zju
Differential Revision: D22319105
fbshipit-source-id: e3c53fc46150ea81b2878c3bfac2cc1ad60e5590
Summary:
We want to start using remote configs now, but the current dynamicconfig validation logic will remove any configs that don't exactly match a preexisting config. To fix this, let's add remote configs to our allow list. This will prevent the validation logic from removing them as mismatches.
This logic is temporary and will be removed once we finish the migration.
Reviewed By: quark-zju
Differential Revision: D22313015
fbshipit-source-id: 3247b28e7d529dac0983cc021127da36600fda4e
Summary:
An assertion error is raised if `eden doctor` is in the middle of a merge. This is because we enter a specific "if" condition in the case that mercurial has two parent commits, and EdenFS only ever tracks `p0`, so EdenFS simply sets `p1` to the null commit in `_select_new_parents()`. Specifically, this is in the case in which both `_old_dirstate_parents` and `_old_snapshot` are not None.
Because `_old_dirstate_parents` has `p1` set to nonnull, and Eden thinks it is null , the check `self._new_parents != self._old_dirstate_parents` would be `True` even though there was actually no error.
Reviewed By: chadaustin
Differential Revision: D22048525
fbshipit-source-id: 9a19cc092e2bd80db0e01fb38533a1007640bee6
Summary: None of this code is used, just remove it.
Reviewed By: genevievehelsel
Differential Revision: D22083565
fbshipit-source-id: ad24a0e8ea04797e146ebb9b99c125539197cb89
Summary:
With D22303305 (4868f5bf5b) the test is no longer valid - the mmap happened in Rust and
cannot be tested from Python wrapping functions. Remove the test.
Reviewed By: DurhamG
Differential Revision: D22316110
fbshipit-source-id: f7e16e2ac72908c836a7aeeefa1fb0ef035d01fc
Summary:
Previous commit: D22233127 (fa1caa8c4e)
In this diff, I added rewrite commit path functionality using Mover https://fburl.com/diffusion/6rnf9q2f to repo_import.
Given a prefix (e.g. new_repo), we prepend the paths of the files extracted from the bonsaichangesets given by gitimport (e.g. folder1/file1 => new_repo/folder1/file1). Previously, we did this manually when importing a git repo (https://www.internalfb.com/intern/wiki/Mercurial/Admin/ImportingRepos/) using convert extension.
Reviewed By: StanislavGlebik
Differential Revision: D22307039
fbshipit-source-id: 322533e5d6cbaf5d7eec589c8cba0c1b9c79d7af
Summary: Also fix up the parser test that now fails with this change
Reviewed By: StanislavGlebik
Differential Revision: D22306340
fbshipit-source-id: 820aad48068471b03cbc1c42107c443bfa680607
Summary:
D21626209 (38d6c6a819) changed revlogindex to read `00changelog.i` by its own instead of
taking the data from Python. That turns out to be racy. The `00changelog.i`
might be changed between the Rust and Python reads and that caused issues.
This diff makes Python re-use the indexdata read by Rust so they are guaranteed
the same.
Reviewed By: DurhamG
Differential Revision: D22303305
fbshipit-source-id: 823bf3aefc970a4a6ce8ab58bccf972a78f6de70
Summary:
This will be used by the next change.
The reason we use a `buffer` or `memoryview` instead of Python `bytes` is to expose
the buffer in a zero-copy way. That is important for startup performance.
Reviewed By: DurhamG
Differential Revision: D22303306
fbshipit-source-id: 3f7c8dff3575b998e025cd5940faa0c183b11626
Summary:
If we're running commands from a user that only has read access, the
debugdynamicconfig commands are going to fail. Let's exit early and quickly if
that's the case, instead of spending a lot of cpu generating a config only to
fail.
Reviewed By: quark-zju
Differential Revision: D22244127
fbshipit-source-id: 24f806772ba5c08e400efb3abc7ebda228d473a5
Summary:
Since we are querying intern for remote configs, we don't want to spam
the servers with requests if they're down. Therefore let's implement some basic
rate limiting to prevent us from querying the server too often. The default
behavior is limiting it to once every 5 minutes.
We only generate new configs once every 15 minutes, so generally this rate limit
shouldn't have any effect, but if there are errors in the generation process
it's possible for generation to happen much more frequently, so this will guard
us from hitting the server too frequently.
Reviewed By: quark-zju
Differential Revision: D22243316
fbshipit-source-id: bbccaf63da95af1edc3128f4d2047a32f90e53ba
Summary:
The HG_TEST_REMOTE_CONFIG environment variable was added to allow tests
to declare custom remote config values, but we can also use it to make canarying
easier.
With this change, users can do `HG_TEST_REMOTE_CONFIG=configerator hg
debugdynamicconfig` to test a change, after running arc build in their
configerator.
We might want to simplify this further in the future to some sort of hidden dev
command line flag, like `hg debugdynamicconfig --canary-remote`
Reviewed By: quark-zju
Differential Revision: D22081459
fbshipit-source-id: 07977097347af9d5872402beeda0ed9160176e7e
Summary: Now that we fetch remote configs, let's apply them locally.
Reviewed By: quark-zju
Differential Revision: D22079767
fbshipit-source-id: aafc9a2e1e6a60b7b6087eaf256dafce30ca5a1e
Summary:
Fetches configs from a remote endpoint and caches them locally. If the
remote endpoint fails to respond, we use the cached version.
Reviewed By: quark-zju
Differential Revision: D22010684
fbshipit-source-id: bd6d4349d185d7450a3d18f9db2709967edc2971
Summary:
Adds the hg client config thrift structure to thrift-types so we can
use it in both buck and make local.
Reviewed By: quark-zju
Differential Revision: D21875370
fbshipit-source-id: 45e585ca5a90307cbeb68240f210006986ec7e84
Summary: This will be used for commits_between replacement
Differential Revision: D22234236
fbshipit-source-id: c0c8550d97a9e8b42034d605e24ff54251fbd13e
Summary: Some SCMQuery queries need just a list of commit hashes instead of full coverage.
Reviewed By: markbt
Differential Revision: D22165006
fbshipit-source-id: 9eeeab72bc4c88ce040d9d2f1a7df555a11fb5ae
Summary: This way we can go from list of changesets into changet ids that we're returning as an answer in few queries.
Differential Revision: D22165005
fbshipit-source-id: 4da8ab2a89be0de34b2870044e44d35424be5510
Summary: It can be useful in other places as well, not only in blobimport
Reviewed By: krallin
Differential Revision: D22307314
fbshipit-source-id: f7d8c91101edc2ed4f230f7ef6796e39fbea5117
Summary: Convert the bookmarks traits to use new-style `BoxFuture<'static>` and `BoxStream<'static>`. This is a step along the path to full `async`/`await`.
Reviewed By: farnz
Differential Revision: D22244489
fbshipit-source-id: b1bcb65a6d9e63bc963d9faf106db61cd507e452
Summary:
Older versions of EdenFS do not return the `fetchCountsByPid` field in the
`getAccessCounts()`.
The Python thrift client code returns this as `None` instead of as an empty
dictionary. This behavior arguably seems like a bug in the thrift code, since
the field is not marked optional. However, updating the thrift behavior would
have much wider implications for other projects. Additionally it's probably
not worth putting a lot of effort in to the older "py" thrift generator code.
Update the `edenfsctl` code to explicitly use an empty dictionary if the value
received from the thrift call is `None`
Reviewed By: fanzeyi
Differential Revision: D22302992
fbshipit-source-id: eced35a19d86e34174f73e27fdc61f1e2ba6a57f
Summary:
`gcdir` is racy. Use `tryunlink` instead of `unlink` so files deleted by other
processes won't crash hg.
Reviewed By: kulshrax
Differential Revision: D22288395
fbshipit-source-id: c3a162871dd569ca7248df86f43d6287ca6d9aab
Summary: It was removed by D22129585 (1020f76e7d). Skip testing it.
Reviewed By: kulshrax
Differential Revision: D22288183
fbshipit-source-id: 07b483028f75df5af9565c9ed693f2299d43f4b2
Summary:
Sometimes `Ctrl+C` the test runner does not fully stop it. From gdb it seems
the test runner is waiting for a thread which might have deadlocked. The
progress thread does not have anything critical that need to sync back to
the main program. Avoid waiting for it to make Ctrl+C work better.
Reviewed By: kulshrax
Differential Revision: D22290453
fbshipit-source-id: bdc5260cbd339cc392728834330609343c0048d3
Summary: Use `TryInto` to convert from a `Request` to a `curl::Easy2` handle, rather than using an inherent method. As with the previous diffs in this stack, the intent is to make it possible to work with handlers in a generic manner.
Reviewed By: quark-zju
Differential Revision: D22201913
fbshipit-source-id: 707c110334b41834f161abf625006a8b81e9d4eb
Summary: Add a new `Configure` trait that provides a common interface for configuring handlers. This will allow handlers to be used in generic contexts, which will be important once we have more than one handler type.
Reviewed By: quark-zju
Differential Revision: D22201916
fbshipit-source-id: 3c297439d398e30a882889c51ea3b6cc33e7d12e
Summary: Move the code that splits headers into a new util submodule, so that it can be shared between handlers.
Reviewed By: quark-zju
Differential Revision: D22201911
fbshipit-source-id: ff3bcd1e166042593f3715fee67e87942e4f72f3
Summary: In preparation for adding a streaming handler, create a handler directory and move `Buffered` to a submodule therein.
Reviewed By: quark-zju
Differential Revision: D22201915
fbshipit-source-id: f90bb6a24dd2137900df825bd23a12201107e9cc