Summary: Use autocargo for all EdenAPI code.
Reviewed By: quark-zju
Differential Revision: D22344400
fbshipit-source-id: 522e82fcc76792ed01ca2cdbfa169c23be5bf38f
Summary:
When rebasing a long stack, it's common to have a long obsoleted (`x`) stack.
The `x` stack is likely duplicated with their successors therefore generally
not that useful. They take a lot of space, making it harder to find useful
commits. Collapse them in smartlog output by only showing their heads and
roots.
This is disabled for automation as VSCode @ FB has issues rendering the
"unstable" commits.
Screenshot of before vs after:
{F242018914}
Reviewed By: markbt
Differential Revision: D22289661
fbshipit-source-id: 1afce073e14abe8c23a05d52d60847fc5e0bfb66
Summary:
The backtrace looks like:
Traceback (most recent call last):
File "edenscm/mercurial/dispatch.py", line 688, in _callcatch
return scmutil.callcatch(ui, func)
File "edenscm/mercurial/scmutil.py", line 147, in callcatch
ui.traceback()
File "edenscm/mercurial/ui.py", line 1462, in traceback
tb = util.smarttraceback(exc[2])
File "edenscm/mercurial/util.py", line 4651, in smarttraceback
reprvalue = _render(value)
File "edenscm/mercurial/util.py", line 4572, in _render
result = repr(value)
File "edenscm/mercurial/context.py", line 100, in __repr__
return r"<%s %s>" % (type(self).__name__, self)
File "edenscm/mercurial/context.py", line 94, in __str__
return short(self.node())
File "edenscm/mercurial/node.py", line 55, in short
return hex(node[:6])
TypeError: 'NoneType' object has no attribute '__getitem__'
Reviewed By: DurhamG
Differential Revision: D22342419
fbshipit-source-id: 0b0df98903b97657acff61b2d02121dfb667be50
Summary:
The subcommands it ran returned bytes then treated them as strings.
Let's decode them to strings.
Reviewed By: quark-zju
Differential Revision: D22339397
fbshipit-source-id: d9d503bd97271c649ad67c74d098b572c1bd7dea
Summary:
The (re)construction process for the IdMap will generate millions of rows
to be inserted in our database. We want to throttle the inserts so that
the database doesn't topple over.
Reviewed By: ikostia
Differential Revision: D22104349
fbshipit-source-id: 73b7c2bab12ae0cd836080bcf1eb64586116e70f
Summary:
Simple implementation that queries the MyAdmin service to fetch replication
lag.
Caching like in sqlblob::facebook::myadmin will probably come in a follow
up change.
Reviewed By: StanislavGlebik
Differential Revision: D22104350
fbshipit-source-id: fbd90174d528ddae4045e957c343e6c213f70d26
Summary:
ReplicaLagMonitor is aimed to generalize over different stategies of fetching
the replication lag in a SQL database. Querying a set of connections is one
such strategy.
Reviewed By: ikostia
Differential Revision: D22104348
fbshipit-source-id: bbbeccb55a664e60b3c14ee17f404982d09f2b25
Summary:
Hardlinks aren't supported on EdenFS, let's not silently allow them since this
can cause unintended behavior. It's possible that this might break some
workflows, but it's better to be strict about this.
Reviewed By: genevievehelsel
Differential Revision: D22250849
fbshipit-source-id: 9f90e6a8e9c9715bb4b925c3d7f952e965683081
Summary:
Errors are usually a sign that we're doing something wrong. Ignoring them can
thus lead to weird behavior, hence let's not ignore them. This may have the
potential to breaking user applications if they were doing something we're not
handling properly, but it was likely that EdenFS was broken in a subtle way. We
should be able to better understand what's wrong and fix it properly then.
Reviewed By: genevievehelsel
Differential Revision: D22250852
fbshipit-source-id: 91daa60065b5621542aee1ca3207d1d7c54f8639
Summary:
When files are moved in and out the repo, the paths being passed in might be
empty. For code clarity, let's treat these as file creation and removal.
We also do not need to actually tell ProjectedFS about removed files, since
ProjectedFS is actually telling us that the file is moved/removed.
Reviewed By: fanzeyi
Differential Revision: D22250851
fbshipit-source-id: a678e58eb9c36d4dcbf1bb41d56b38fef8943284
Summary:
SQLite's `LIKE` operator is case insensitive by default. This doesn't match MySQL, and
also seems like a surprising default. Set the pragma on every connection to make it
case sensitive.
Reviewed By: farnz
Differential Revision: D22332419
fbshipit-source-id: 4f503eeaa874e110c03c27300467ddc02dc9b365
Summary:
Whether a bookmark is publishing or not is not specific to Mercurial - it also affects
whether a commit is draft, so it is interesting to the Bonsai world.
Rename `BookmarkHgKind` to `BookmarkKind` to make this clear.
Since negatives are more awkward to work with, rename `PublishingNotPullDefault` to
`Publishing` and `PullDefault` to `PullDefaultPublishing` to make it clearer that
pull-default bookmarks are also publishing.
We can't rename the database column, so that remains as `hg_kind`.
Reviewed By: StanislavGlebik
Differential Revision: D22307782
fbshipit-source-id: 9e686a98cc5eaf9af722fa62fac5ffd4844967fd
Summary:
If we don't wait on these futures, they might never execute, which may lead to
EdenFS not being aware of files no longer present in the overlay.
For now, just add .get() to these futures, but really, we need to make these be
futures themselves and use continuation style code.
Reviewed By: fanzeyi
Differential Revision: D22326570
fbshipit-source-id: 7584981bec284f35f8d2689a3a5dafd1a111a3d8
Summary:
Blobstore healer has a logic, which prevents it from doing busy work, when the
queue is empty. This is implemented by means of checking whether the DB query
fetched the whole `LIMIT` of values. Or that is the idea, at least. In
practice, here's what happens:
1. DB query is a nested one: first it gets at most `LIMIT` distinct
`operation_key` entries, then it gets all rows with such entries. In practice
this almost always means `# of blobstores * LIMIT` rows, as we almost always
succeed writing to every blobstore
2. Once this query is done, the rows are grouped by the `blobstore_key`, and a
future is created for each such row (for simplicity, ignore that future may not
be created).
3. We then compare the number of created futures with `LIMIT` and report an
incomplete batch if the numbers are different.
This logic has a flaw: same `blobstore_key` may be written multiple times with
different `operation_key` values. One example of this: `GitSha1` keys for
identical contents. When this happens, grouping from step 2 above will produce
fewer than `LIMIT` groups, and we'll end up sleeping for nothing.
This is not a huge deal, but let's fix it anyway.
My fix also adds some strictly speaking unnecessary logging, but I found it
helpful during this investigation, so let's keep it.
The price of this change is collecting two `unique_by` calls, both of which
allocates a temporary hash set [1] of the size `LIMIT * len(blobstore_key) * #
blobstores` (and another one with `operation_key`). For `LIMIT=100_000`
`len(blobstore_key)=255`, `# blobstores = 3` we have roughly 70 mb for the
larger one, which should be ok.
[1] https://docs.rs/itertools/0.9.0/itertools/trait.Itertools.html#method.unique
Reviewed By: ahornby
Differential Revision: D22293204
fbshipit-source-id: bafb7817359e2c867cf33c319a886653b974d43f
Summary:
Besides the obvious typos in the code, it would also never go past the first
\n, and the splitted list first element would be an empty file...
Reviewed By: DurhamG
Differential Revision: D22324297
fbshipit-source-id: 339df3a4e4acc4e0630ffa3a7b0b3d266a20b666
Summary:
To check if dynamicconfigs are being generated or not, let's log the
age of the config to dev_command_timers. We also log the age of the remote_cache
so we can detect if hosts are not fetching data from the server.
Reviewed By: quark-zju
Differential Revision: D22323516
fbshipit-source-id: 1fdeef3eaa5d58566606bfc57d8ad96e752db811
Summary: Move old EdenAPI crate to `scm/lib/edenapi/old` to make room for the new crate. This old code will eventually been deleted once all references to it are removed from the codebase.
Reviewed By: quark-zju
Differential Revision: D22305173
fbshipit-source-id: 45d211340900192d0488543ba13d9bf84909ce53
Summary: This diff adds a new `CborStream` type which wraps a `TryStream` of bytes and attempts to deserialize the data from CBOR as it is received. This allows the caller to begin processing deserialized values while the download is still going on, which is important in situations where a server might return a large streaming response composed of many small entries (as is the case with EdenAPI).
Reviewed By: quark-zju
Differential Revision: D22280745
fbshipit-source-id: 9d7fa52e4e67cf7de6bed37c28e5e7afd69449cd
Summary: It is possible to properly initialize the response buffer without using `once_cell`, so remove it to simplify the code.
Reviewed By: quark-zju
Differential Revision: D22264747
fbshipit-source-id: 5890cac7fa598fc80cfd017eae111c0b9fdc6227
Summary:
Add the methods `send_async` and `send_async_with_progress` to `HttpClient`. These methods provide a `Futures`-based async interface that will make it easy to use the client from async Rust code.
Just like in D22231555, this is built on top of the previously introduced streaming API. When a batch request is sent, the client will start a Multi session on a task in a blocking worker thread using `tokio::task::spawn_blocking`. This means that right now, the implementation is not /truly/ async, but it should be possible to change the implementation in the future to avoid using any blocking I/O without needing to change the public interface.
Since all of the requests are part of the same Multi session, they will all proceed concurrently and, if possible, be multiplexed over the same connection in the case of multiple HTTP/2 requests to the same server (which is going to be our main use-case).
Unfortunately, since libcurl does not have any internal synchronization, ownership of the Multi session needs to be passed to the worker thread, meaning that the Multi handle will be dropped once the requests are complete. This means that connections will not be re-used when these methods are called several times serially. The API should make it obvious to the user that the internal state is not preserved since these methods both consume the `HttpClient` itself.
Reviewed By: quark-zju
Differential Revision: D22251488
fbshipit-source-id: 37caf64024cb12b95df5124379209550899d093d
Summary: In order to maximize the surface area of the client that gets exercised by the `http_cli` binary, make it use the new async interface (even though it is not strictly necessary here). Since the async interface is ultimately built on top of the synchronous interface, sending requests this way will still exercise parts of the synchronous interface too.
Reviewed By: quark-zju
Differential Revision: D22242944
fbshipit-source-id: 80d73cc152d0c38673436457c1e1040e4198095e
Summary:
This diff adds `AsyncResponse`, an asynchronous version of `Response` built on top of the `Streaming` handler along with a new `ChannelReceiver` that forwards the received data into async channels. The result is essentially a shim that makes libcurl usable with async/await syntax.
`ChannelReceiver` currently uses unbounded channels. This should be OK because the space usage of the channel should increase [at most] at the same rate as if we used `Buffered`. Essentially, in the worst case (if nothing was consuming items from the channel), this would have the same behavior as if we had simply used a non-streaming handler.
Reviewed By: quark-zju
Differential Revision: D22231555
fbshipit-source-id: 6ee767355bfce6d400f447ee690d974666751f16
Summary: Add a new `Header` type that represents a parsed header line. Notably, `libcurl` treats both the initial status line and trailing CRLF as headers; the new struct handles these cases in a strongly-typed way. This is particularly useful when working with streaming responses, as this means that the `Receiver` can tell the status code upfront rather than waiting until the end of the request, and the `Receiver` can tell once all headers (except for trailers) have been received.
Reviewed By: quark-zju
Differential Revision: D22228632
fbshipit-source-id: 06f1a21d7af25b37269bb449a1e53237ec74490a
Summary:
Add a new `Streaming` handler that allows for asynchronously processing streaming responses. A user of this crate can create a struct that implements the new `Receiver` trait, and this struct will be provided with data as it comes in from the network.
Two new methods have been added to `HttpClient` to make streaming requests: `stream` and `stream_with_progress`. Notably, these new methods are not themselves asynchronous. Just like `HttpClient::send` and `send_with_progress`, there methods will block until all of the given requests have completed.
The difference is that in this case, rather than buffering all of the data for every request in the batch, each request's associated `Receiver` will be called every time the request makes progress. This will avoid excessive buffering and allow data to be processed as it arrives.
Reviewed By: quark-zju
Differential Revision: D22201914
fbshipit-source-id: ce586b05a008b371557c84957aa5aa59afcc5c7c
Summary:
D22235561 (c1a6d6fd21) changed eden's dirstate.py to expect unicode strings in
python 3, but in eden_dirstate_map we were intentionally encoding them first.
Let's fix that to use unicode strings all the way through until the
serialization.
Reviewed By: quark-zju
Differential Revision: D22319105
fbshipit-source-id: e3c53fc46150ea81b2878c3bfac2cc1ad60e5590
Summary:
We want to start using remote configs now, but the current dynamicconfig validation logic will remove any configs that don't exactly match a preexisting config. To fix this, let's add remote configs to our allow list. This will prevent the validation logic from removing them as mismatches.
This logic is temporary and will be removed once we finish the migration.
Reviewed By: quark-zju
Differential Revision: D22313015
fbshipit-source-id: 3247b28e7d529dac0983cc021127da36600fda4e
Summary:
An assertion error is raised if `eden doctor` is in the middle of a merge. This is because we enter a specific "if" condition in the case that mercurial has two parent commits, and EdenFS only ever tracks `p0`, so EdenFS simply sets `p1` to the null commit in `_select_new_parents()`. Specifically, this is in the case in which both `_old_dirstate_parents` and `_old_snapshot` are not None.
Because `_old_dirstate_parents` has `p1` set to nonnull, and Eden thinks it is null , the check `self._new_parents != self._old_dirstate_parents` would be `True` even though there was actually no error.
Reviewed By: chadaustin
Differential Revision: D22048525
fbshipit-source-id: 9a19cc092e2bd80db0e01fb38533a1007640bee6
Summary: None of this code is used, just remove it.
Reviewed By: genevievehelsel
Differential Revision: D22083565
fbshipit-source-id: ad24a0e8ea04797e146ebb9b99c125539197cb89
Summary:
With D22303305 (4868f5bf5b) the test is no longer valid - the mmap happened in Rust and
cannot be tested from Python wrapping functions. Remove the test.
Reviewed By: DurhamG
Differential Revision: D22316110
fbshipit-source-id: f7e16e2ac72908c836a7aeeefa1fb0ef035d01fc
Summary:
Previous commit: D22233127 (fa1caa8c4e)
In this diff, I added rewrite commit path functionality using Mover https://fburl.com/diffusion/6rnf9q2f to repo_import.
Given a prefix (e.g. new_repo), we prepend the paths of the files extracted from the bonsaichangesets given by gitimport (e.g. folder1/file1 => new_repo/folder1/file1). Previously, we did this manually when importing a git repo (https://www.internalfb.com/intern/wiki/Mercurial/Admin/ImportingRepos/) using convert extension.
Reviewed By: StanislavGlebik
Differential Revision: D22307039
fbshipit-source-id: 322533e5d6cbaf5d7eec589c8cba0c1b9c79d7af
Summary: Also fix up the parser test that now fails with this change
Reviewed By: StanislavGlebik
Differential Revision: D22306340
fbshipit-source-id: 820aad48068471b03cbc1c42107c443bfa680607
Summary:
D21626209 (38d6c6a819) changed revlogindex to read `00changelog.i` by its own instead of
taking the data from Python. That turns out to be racy. The `00changelog.i`
might be changed between the Rust and Python reads and that caused issues.
This diff makes Python re-use the indexdata read by Rust so they are guaranteed
the same.
Reviewed By: DurhamG
Differential Revision: D22303305
fbshipit-source-id: 823bf3aefc970a4a6ce8ab58bccf972a78f6de70
Summary:
This will be used by the next change.
The reason we use a `buffer` or `memoryview` instead of Python `bytes` is to expose
the buffer in a zero-copy way. That is important for startup performance.
Reviewed By: DurhamG
Differential Revision: D22303306
fbshipit-source-id: 3f7c8dff3575b998e025cd5940faa0c183b11626
Summary:
If we're running commands from a user that only has read access, the
debugdynamicconfig commands are going to fail. Let's exit early and quickly if
that's the case, instead of spending a lot of cpu generating a config only to
fail.
Reviewed By: quark-zju
Differential Revision: D22244127
fbshipit-source-id: 24f806772ba5c08e400efb3abc7ebda228d473a5
Summary:
Since we are querying intern for remote configs, we don't want to spam
the servers with requests if they're down. Therefore let's implement some basic
rate limiting to prevent us from querying the server too often. The default
behavior is limiting it to once every 5 minutes.
We only generate new configs once every 15 minutes, so generally this rate limit
shouldn't have any effect, but if there are errors in the generation process
it's possible for generation to happen much more frequently, so this will guard
us from hitting the server too frequently.
Reviewed By: quark-zju
Differential Revision: D22243316
fbshipit-source-id: bbccaf63da95af1edc3128f4d2047a32f90e53ba
Summary:
The HG_TEST_REMOTE_CONFIG environment variable was added to allow tests
to declare custom remote config values, but we can also use it to make canarying
easier.
With this change, users can do `HG_TEST_REMOTE_CONFIG=configerator hg
debugdynamicconfig` to test a change, after running arc build in their
configerator.
We might want to simplify this further in the future to some sort of hidden dev
command line flag, like `hg debugdynamicconfig --canary-remote`
Reviewed By: quark-zju
Differential Revision: D22081459
fbshipit-source-id: 07977097347af9d5872402beeda0ed9160176e7e
Summary: Now that we fetch remote configs, let's apply them locally.
Reviewed By: quark-zju
Differential Revision: D22079767
fbshipit-source-id: aafc9a2e1e6a60b7b6087eaf256dafce30ca5a1e
Summary:
Fetches configs from a remote endpoint and caches them locally. If the
remote endpoint fails to respond, we use the cached version.
Reviewed By: quark-zju
Differential Revision: D22010684
fbshipit-source-id: bd6d4349d185d7450a3d18f9db2709967edc2971
Summary:
Adds the hg client config thrift structure to thrift-types so we can
use it in both buck and make local.
Reviewed By: quark-zju
Differential Revision: D21875370
fbshipit-source-id: 45e585ca5a90307cbeb68240f210006986ec7e84
Summary: This will be used for commits_between replacement
Differential Revision: D22234236
fbshipit-source-id: c0c8550d97a9e8b42034d605e24ff54251fbd13e
Summary: Some SCMQuery queries need just a list of commit hashes instead of full coverage.
Reviewed By: markbt
Differential Revision: D22165006
fbshipit-source-id: 9eeeab72bc4c88ce040d9d2f1a7df555a11fb5ae
Summary: This way we can go from list of changesets into changet ids that we're returning as an answer in few queries.
Differential Revision: D22165005
fbshipit-source-id: 4da8ab2a89be0de34b2870044e44d35424be5510
Summary: It can be useful in other places as well, not only in blobimport
Reviewed By: krallin
Differential Revision: D22307314
fbshipit-source-id: f7d8c91101edc2ed4f230f7ef6796e39fbea5117
Summary: Convert the bookmarks traits to use new-style `BoxFuture<'static>` and `BoxStream<'static>`. This is a step along the path to full `async`/`await`.
Reviewed By: farnz
Differential Revision: D22244489
fbshipit-source-id: b1bcb65a6d9e63bc963d9faf106db61cd507e452
Summary:
Older versions of EdenFS do not return the `fetchCountsByPid` field in the
`getAccessCounts()`.
The Python thrift client code returns this as `None` instead of as an empty
dictionary. This behavior arguably seems like a bug in the thrift code, since
the field is not marked optional. However, updating the thrift behavior would
have much wider implications for other projects. Additionally it's probably
not worth putting a lot of effort in to the older "py" thrift generator code.
Update the `edenfsctl` code to explicitly use an empty dictionary if the value
received from the thrift call is `None`
Reviewed By: fanzeyi
Differential Revision: D22302992
fbshipit-source-id: eced35a19d86e34174f73e27fdc61f1e2ba6a57f
Summary:
`gcdir` is racy. Use `tryunlink` instead of `unlink` so files deleted by other
processes won't crash hg.
Reviewed By: kulshrax
Differential Revision: D22288395
fbshipit-source-id: c3a162871dd569ca7248df86f43d6287ca6d9aab
Summary:
Sometimes `Ctrl+C` the test runner does not fully stop it. From gdb it seems
the test runner is waiting for a thread which might have deadlocked. The
progress thread does not have anything critical that need to sync back to
the main program. Avoid waiting for it to make Ctrl+C work better.
Reviewed By: kulshrax
Differential Revision: D22290453
fbshipit-source-id: bdc5260cbd339cc392728834330609343c0048d3
Summary: Use `TryInto` to convert from a `Request` to a `curl::Easy2` handle, rather than using an inherent method. As with the previous diffs in this stack, the intent is to make it possible to work with handlers in a generic manner.
Reviewed By: quark-zju
Differential Revision: D22201913
fbshipit-source-id: 707c110334b41834f161abf625006a8b81e9d4eb
Summary: Add a new `Configure` trait that provides a common interface for configuring handlers. This will allow handlers to be used in generic contexts, which will be important once we have more than one handler type.
Reviewed By: quark-zju
Differential Revision: D22201916
fbshipit-source-id: 3c297439d398e30a882889c51ea3b6cc33e7d12e
Summary: Move the code that splits headers into a new util submodule, so that it can be shared between handlers.
Reviewed By: quark-zju
Differential Revision: D22201911
fbshipit-source-id: ff3bcd1e166042593f3715fee67e87942e4f72f3
Summary: In preparation for adding a streaming handler, create a handler directory and move `Buffered` to a submodule therein.
Reviewed By: quark-zju
Differential Revision: D22201915
fbshipit-source-id: f90bb6a24dd2137900df825bd23a12201107e9cc
Summary: Instead of returning a bool, make the client callback return a `Result`. A new error type called `Abort` has been added which the client can return to signal that the driver should abort all active transfers and return early. This results in more idiomatic code, and gives the callback the ability to specify the reason for the abort.
Reviewed By: quark-zju
Differential Revision: D22201912
fbshipit-source-id: 46d72e28f754132a4ef30defa40c4a5d09fe8e07
Summary: Move stats collection into `MultiDriver`. This makes sense from a design standpoint since the driver is the thing performing the requests. It will also reduce code duplication when streaming responses are added later in the stack.
Reviewed By: quark-zju
Differential Revision: D22201910
fbshipit-source-id: fd3174bb6f5a452901b405341b2c001dca1a9832
Summary: Implement `Send` for `HttpClient`. I've split this out into its own diff since unsafe code warrants additional scrutiny. See the comment in the code for details on correctness.
Reviewed By: quark-zju
Differential Revision: D22157713
fbshipit-source-id: 1a92ebb51142a98d3996686197b77ad7500c19db
Summary: Report transfer stats such as bytes downloaded, uploaded, elapsed time, and number of requests upon completion of a batch of requests. This code was adapted from the EdenAPI client.
Reviewed By: quark-zju
Differential Revision: D22157711
fbshipit-source-id: 57b814e7a923f85467a79ee48bddd48b2f2b253c
Summary:
Add an `HttpClient` struct that is intended as the primary API for this crate. This struct is a wrapper around libcurl's "multi" interface, which allows sending several requests concurrently. Notably, if HTTP/2 is used, this allows for potentially multiplexing
many requests to the same server over the same connection.
This code is essentially a generalized version of the `multi_request` function from the EdenAPI crate.
Reviewed By: quark-zju
Differential Revision: D22157710
fbshipit-source-id: cbd7f82555444c7d1e1932944257a6949c60f34e
Summary: Add the ability for the client to monitor the collective progress of a set of transfers. This will be used in the next diff to allow monitoring several concurrent requests. Most of this code was adapted from the EdenAPI client's progress module, with some modifications.
Reviewed By: quark-zju
Differential Revision: D22157709
fbshipit-source-id: d474cd46db29bebf64049629dce69d975e220e3a
Summary: The method name is symlink, not rmdir.
Reviewed By: genevievehelsel
Differential Revision: D22291297
fbshipit-source-id: 5dc37b053e06c965fd47df79990fc40adc097f87
Summary: Most of them need extra server-side bookmarks for deciding phases.
Reviewed By: DurhamG
Differential Revision: D22117739
fbshipit-source-id: 711bf96063913fd6148125a5628f0b0f4efbf489
Summary:
Enable narrow-heads.
Changed log revset from `:` to `all()` to make the test compatible.
Reviewed By: krallin
Differential Revision: D22200495
fbshipit-source-id: 148a82e77c953b9e7dbed055ef464c318e56cafa
Summary:
Enable narrow-heads, and mutation. Disable obsmarker related features.
Change phase manipulation to `debugmakepublic` which works with narrow-heads.
Reviewed By: krallin
Differential Revision: D22200511
fbshipit-source-id: 8dec050f137e6cc055015fe084eb4cc67faa1216
Summary:
Enable narrow-heads.
The test output seems a bit unstable - sometimes I got 28 there. So I globbed
it out.
Reviewed By: krallin
Differential Revision: D22200497
fbshipit-source-id: f005381a341d88c0bcbb09150e7d1878df7a38f3
Summary:
Enable narrow-heads.
Change the revset `:` to `all()`. With narrow-heads, `:` selects all commits
including those that are not referred by visible heads. The `all()` revset
only selects commits reachable from visible heads.
Reviewed By: krallin
Differential Revision: D22200498
fbshipit-source-id: beb863d42069ae898e419a4a75b3a707c72ae1f9
Summary:
Enable remotenames, selectivepull, and narrow-heads. Use the new stream clone
code path.
Selectivepull makes a difference. `hg pull -r HASH` also pulls the selected
bookmarks so an extra `pull` was unnecessary. Change the clone command to use
`-U` to trigger the new clone code path.
Reviewed By: krallin
Differential Revision: D22200499
fbshipit-source-id: 764202098c7e8afdbb5e2ee83679da7570c08c90
Summary:
Enable remotenames and narrow-heads.
Local bookmarks are replaced by remote bookmarks, causing the test change.
Reviewed By: krallin
Differential Revision: D22200500
fbshipit-source-id: aeee528d1766e0642c12e78a6c1a50cadc8a579a
Summary:
Enable remotenames and narrow-heads.
The commits become 'draft' because there are no remote bookmarks.
Reviewed By: krallin
Differential Revision: D22200514
fbshipit-source-id: 04d0befa7c22756e936a28ffdcdf1305057cf062
Summary:
Enable remotenames and narrow-heads.
The test was migrated cleanly. The only change is that local bookmarks are
replaced by remote bookmarks.
Reviewed By: krallin
Differential Revision: D22200510
fbshipit-source-id: f5b8cd2ed125e9fc4e5daac897851d91fef5693f
Summary:
Enable remotenames and narrow-heads.
Local bookmarks are replaced by remote bookmarks.
Reviewed By: krallin
Differential Revision: D22200503
fbshipit-source-id: 41ac4f4f606011dcaf6d0d9867b01fb77b9a79d8
Summary:
Enable remotenames and narrow-heads.
Phase exchange is gone because of narrow-heads.
The remtoenames extension was written suboptimally so it issued a second
bookmarks request (which, hopefully can be removed by having selective
pull everywhere and migrate pull to use the new API).
Reviewed By: krallin
Differential Revision: D22200506
fbshipit-source-id: c522bb9fc1396d813e0f1f380c4290445bab3db3
Summary:
Enable remotenames and narrow-heads. The `master_bookmark` is no longer a local
bookmark in the client repo.
Reviewed By: krallin
Differential Revision: D22200513
fbshipit-source-id: bc3c1715ce21f45a35bc67148eb00e44944bea6e
Summary:
Enable remotenames and narrow-heads. The server gets one more request from
remotenames.
Reviewed By: krallin
Differential Revision: D22200502
fbshipit-source-id: 26bc28b19438c7be4a19eae6be728c83b113f822
Summary:
Enable remotenames and narrow-heads. The client gets remote bookmarks instead
of local bookmarks during clone and phases are decided by remote bookmarks.
Reviewed By: krallin
Differential Revision: D22200515
fbshipit-source-id: 12a9e892855b3a8f62f01758565de5f224c4942b
Summary:
Change the template to show remote bookmarks, which will be more relevant once
we migrate to modern configs. Namely, phases will be decided by remote bookmarks.
The named branches logic was mostly removed from the code base. Therefore
drop the `{branches}` template.
Reviewed By: StanislavGlebik
Differential Revision: D22200512
fbshipit-source-id: 8eca3a71ff88b8614023f4920a448156fcd712d5
Summary:
Most tests pass without changes. Some incompatible tests are added to the
special list.
Reviewed By: krallin
Differential Revision: D22200505
fbshipit-source-id: 091464bbc7c9c532fed9ef91f2c955d6e4f2df0b
Summary: This is the final diff of the stack - it starts logging pushed commits to scribe
Reviewed By: farnz
Differential Revision: D22212755
fbshipit-source-id: ec09728408468acaeb1c214d43f930faac30899b
Summary:
Failing push if we failed to log to scribe doesn't make a lot of sense. By that
time the ship has sailed - commit has already been pushed and by failing the
request we can't undo that. It will just create an annoyance by whoever is
pushing.
Instead let's log it to scuba
Reviewed By: farnz
Differential Revision: D22256687
fbshipit-source-id: 2428bbf1db4cef6fa80777ad65184fab1804fa9c
Summary:
At the moment we can't test logging to scribe easily - we don't have a way to
mock it. Scribe are supposed to help with that.
They will let us to configure all scribe logs to go to a directory on a
filesystem similar to the way we configure scuba. The Scribe itself will
be stored in CoreContext
Reviewed By: farnz
Differential Revision: D22237730
fbshipit-source-id: 144340bcfb1babc3577026191428df48e30a0bb6
Summary: Many tests are incompatible. But many are passing.
Reviewed By: kulshrax
Differential Revision: D22052475
fbshipit-source-id: 1f30ac2b0fe034175d5ae818ec2be098dbd5283d
Summary:
With narrow-heads, phases are not tracked by remotenames. There is no need to
do a separate phase exchange.
Reviewed By: kulshrax
Differential Revision: D22130170
fbshipit-source-id: 6ee8cda338253aea5df09e779ecaf3d29910f622
Summary:
Many tests want to change commit phase to public. We don't want to leak this
feature to end-users. Therefore make a debug command.
Reviewed By: kulshrax
Differential Revision: D22052481
fbshipit-source-id: 4ce6a61120da0d74374465e97ac0ec0ff9638088
Summary:
The test depends on too much details of the obsstore and is hard to
migrate to mutation. The undo command in production actually uses
a different code path (with narrow-heads, it rewrites metalog directly),
which is somewhat covered by test-undo-narrow-heads. Ideally we extend
the undo test with more cases under modern configs. For now let's just
remove the test with legacy configs to unblock migration progress.
Reviewed By: kulshrax
Differential Revision: D22174961
fbshipit-source-id: 57baf5e6f59938069b44a5db8c5ae1662f094f84
Summary:
It has too many dependencies on legacy components that it's too hard to
migrate. `test-rebase-mutation` is the test that is more relevant.
Reviewed By: kulshrax
Differential Revision: D22174996
fbshipit-source-id: 40889bc2f80021839ce0ffe13098404e8aa97fbb
Summary:
The test is about syncing with obsmarkers. We no longer use obsmarkers so the
test is no longer relevant.
Reviewed By: kulshrax
Differential Revision: D22174999
fbshipit-source-id: 7615852b734319bc838b238f4c9b54252ba5c355
Summary:
The test is mostly about manipulating phases and phase exchange in bundle2.
Both of them are gone with narrow-heads, which makes phases immutable.
Practically, phases affect whether commits are immutable and that are covered
by individual tests (ex. amend tests check amend fails on public commits).
Reviewed By: kulshrax
Differential Revision: D22174955
fbshipit-source-id: cabc0de552f0ede54a23a6e6eec47b14c2a863b9
Summary:
With narrow-heads (enabled in production and most tests) hiddenerror will
never happen. Remove the extension.
Reviewed By: kulshrax
Differential Revision: D22174987
fbshipit-source-id: e3e3823d3da89dfd51b60f65f8e138ea723cc00f
Summary:
We don't use the flag processor framework for things other than LFS in the
non-Rust code. It is coupled with revlog, which is going away. Our LFS usage is
covered by other tests.
Reviewed By: kulshrax
Differential Revision: D22130168
fbshipit-source-id: 78a04df49961e2faca0def71daeec6a40e45c4b5
Summary:
We have narrow-heads on in production. Phases are decided by remote bookmarks
instead of exchanged separately. Remove the test since it's no longer relevant.
Reviewed By: sfilipco
Differential Revision: D22052477
fbshipit-source-id: c60382448d5e4d3d2d1225762b6c133873016881
Summary:
The return order of `index2.headsancestors` and `index.headrevs` are different.
Fix it. This affects the default commit to update to after `hg clone` in some
tests.
Reviewed By: sfilipco
Differential Revision: D22200496
fbshipit-source-id: 273201461c814241ce8dafa6f7ca00f72ea83eab
Summary:
If the repo is newly created, do not show the migration message.
This makes migration smoother.
Reviewed By: sfilipco
Differential Revision: D22052480
fbshipit-source-id: 4857bcb3e38094501eaa63a52f09d49853cf761b
Summary:
For Lua hooks, we needed to know whether to run the hook per file, or per changeset. Rust hooks know this implicitly, as they're built-in to the server.
Stop having the tests set an unnecessary config
Reviewed By: krallin
Differential Revision: D22282799
fbshipit-source-id: c9f6f6325823d06d03341f04ecf7152999fcdbe7
Summary:
D21642461 (46d2b44c0e) converted Mononoke server to use the
`--mononoke-config-path` common argument style to select a config path.
Now that this change has been running for a while, remove the extra logic in
the server that allowed it to accept both the deprecated `--config_path / -P`
and the new arg.
Reviewed By: ikostia
Differential Revision: D22257386
fbshipit-source-id: 7da4ed4e0039d3659f8872693fa4940c58bae844
Summary:
`get_entry_with_small_repo_mapings` is a function that turns a `CommitEntry`
struct into `CommitEntryWithSmallReposMapped` struct - the idea being that this
function looks up hashes of commits into which the original commit from the
large repo got rewritten (in practice rewriting may have happened in the small
-> large direction, but it is not important for the purpose of this job). So it
establishes a mapping. Before this
diff, it actually established `Large<ChangesetId> ->
Option<(Small<ChangesetId>, Option<BookmarkName>)>` mapping, meaning that it
recorded into which bookmark large bookmark was rewritten. This was a useless
information (as evidenced by the fact that it was ignored by the
`prepare_entry` function, which turns `CommitEntryWithSmallReposMapped` into
`EntryPreparedForValidation`. It is useless because bookmarks are mutable and
it is impossible to do historic validation of the correctness of bookmark
renaming: bookmarks may have been correctly renamed when commits where pushes,
but they may be incorrectly renamed now and vice-versa. To deal with bookmarks,
we have a separate job, `bookmarks_validator`.
So this diff stops recording this useless information. As a bonus, this will
make migration onto `LiveCommitSyncConfig` easier.
Reviewed By: StanislavGlebik
Differential Revision: D22235389
fbshipit-source-id: c02b3f104a8cbd1aaf76100aa0930efeac475d42
Summary: We need to be able to query `synced_commit_mapping` to understand which `version_name` was used to sync commits. That `version_name` will be needed to produce `CommitSyncConfig` by utilizing upcoming `LiveCommitSyncConfig` APIs. And `CommitSyncConfig` is needed to create `CommitSyncer`. So let's extract this fn out of `CommitSyncer`, as it's an independent functionality really
Reviewed By: farnz
Differential Revision: D22244952
fbshipit-source-id: 53e55139efd423174176720c8bf7e3ecc0dcb0d7
Summary:
This allows narrow-heads head calculation without the remotenames extension.
The test about related internal APIs is deleted. Other tests will cover the heads calculation indirectly.
Reviewed By: sfilipco
Differential Revision: D22052476
fbshipit-source-id: 99ad01b075b93e135d7f6c1bc27d837b19337726
Summary:
In configerator, extras might be non-utf8 and plain bytes. This causes 2
issues, the first one is when extracting the file list out of the change, the
other one is when dealing with the extras directly. For the first one, only
decode the file list, and for the second one, use the surrogateescape error
handling when decoding/encoding them.
Reviewed By: quark-zju
Differential Revision: D22260068
fbshipit-source-id: 19ef0248c69ea2b75bb91e59c3cdaed869110950
Summary:
The bad type on the default value is causing crashes when there are no "local:commits" fields on a phabricator version.
Created from Diffusion's 'Open in Editor' feature.
Reviewed By: chadaustin, grakkpl
Differential Revision: D22259485
fbshipit-source-id: b11a82b2b099aea373734af96cbd4834f5c5dcbc
Summary:
This diff migrates `backsyncer_cmd` (the thing that runs in the separate backsyncer job, as opposed to bakcsyncer, triggered from push-redirector) onto `LiveCommitSyncConfig`. Specifically, this means that on every iteration of the loop, which calls `backsync_latest` we reload `CommitSyncConfig` from configerator, build a new `CommitSyncer` from it, and then pass that `CommitSyncer` to `backsync_latest`.
One choice made here is to *not* create `CommitSyncer` on every iteration of the inner loop of `backsync_latest` and handle live configs outside. The reason for this is twofold:
- `backsync_latest` is called form `PushRedirector` methods, and `PushRedirector` is recreated on each `unbundle` using `LiveCommitSyncConfig`. That call provides an instance of `CommitSyncer` used to push-redirect a commit we want to backsync. It seems strictly incorrect to try and maybe use a different instance.
- because of some other consistency concerns (different jobs getting `CommitSyncConfig` updates at different times), any sync config change needs to go through the following loop:
- lock the repo
- land the change
- wait some time, until all the possible queues (x-repo sync and backsync) are drained
- unlock the repo
- this means that it's ok to have the config refreshed outside of `backsync_latest`
Reviewed By: farnz
Differential Revision: D22206992
fbshipit-source-id: 83206c3ebdcb2effad7b689597a4522f9fd8148a
Summary: Add a simple CLI to allow the HTTP client to be tested manually.
Reviewed By: quark-zju
Differential Revision: D22228930
fbshipit-source-id: 12fea3131ec6d8c3df4457fb74a09ea52f42c066
Summary: Allow setting the CA certificate bundle, similar to the curl commands `--cacert` flag. This is required for integration tests since those servers use dummy certificates signed by a fake CA.
Reviewed By: quark-zju
Differential Revision: D22203833
fbshipit-source-id: 261e6c2904504c3a98f95b4a5b5b6ed24cb7402d
Summary:
Add a simple HTTP client library, based on libcurl. This crate is essentially an attempt to factor out the HTTP code from the Eden API client, since over time it had accumulated all of the pieces needed for a general-purpose HTTP client library. Factoring it out will help clean up the EdenAPI code base and allow code reuse by other crates that also need to work with HTTP.
This initial diff introduces the `Request` and `Response` types which can be used to build, send, and see the results of individual HTTP requests. In this diff, requests can only be made serially, but later in the stack it will be possible to run many requests concurrently (potentially multiplexed over the same connection in the case of HTTP/2).
The HTTP functionality in the EdenAPI client had very little unit test coverage (it relied primarily on Mononoke's integration tests). With this crate, I've added many unit tests (often involving mocked HTTP servers) to help ensure correctness.
Reviewed By: quark-zju
Differential Revision: D22157712
fbshipit-source-id: 3b0823ece26b19979980841727f1eefcf0519ad5
Summary:
While suppressing notification should in theory be faster than getting
notification, it however causes a couple of issues due to the fact that EdenFS
is simply not creating inodes for any of the files in these directories.
Renaming a file in and out of these directories would for instance fail, due to
the inode not being present.
Since these directories are very low traffic overall and on Linux/macOS these
are not treated specifically (well, except for .eden, but that's a different
story), let's do the same on Windows.
Reviewed By: chadaustin
Differential Revision: D22250850
fbshipit-source-id: c13ed29faedc33c98b1a30227e44afc3f2c84c89
Summary: Compacts metalog by copying current root into new metalog and creating (or updating) a metalog-internal pointer file
Reviewed By: quark-zju
Differential Revision: D22100213
fbshipit-source-id: 7cea17dde46ac4fa2c84da873df68c536dca4119
Summary:
Before this diff only the main Mononoke server binary was able to use fs-based
`ConfigStore`, which is pretty useful in integration tests.
Reviewed By: farnz
Differential Revision: D22256618
fbshipit-source-id: 493a064a279250d01469c9ff7f747585581caf51
Summary: We designed the schema to make this simple to implement - it's literally a metadata read and a metadata write.
Reviewed By: ikostia
Differential Revision: D22233922
fbshipit-source-id: b392b4a3a23859c6106934f73ef60084cc4de62c
Summary:
Eventually, we want everything to be `async`/`await`; as a stepping stone in that direction, switch the remaining lobstore traits to new-style futures.
This just pushes the `.compat()` out to old-style futures, but it makes the move to non-'static lifetimes easier, as all the compile errors will relate to lifetime issues.
Reviewed By: krallin
Differential Revision: D22183228
fbshipit-source-id: 3fe3977f4469626f55cbf5636d17fff905039827
Summary:
This is to avoid passing `String` around. Will be useful in one of the next
diffs, where I add querying `LiveCommitSyncConfig` by versions.
Reviewed By: krallin
Differential Revision: D22243254
fbshipit-source-id: c3fa92b62ae32e06d7557ec486d211900ff3964f
Summary: I have previously moved the gitimport functionality (D22159880 (2cf5388835)) into a separate library, since repo_import shares similar behaviours. In this diff, I setup repo_import to be able to call gitimport to get the commits and changes. (Next steps include using Mover to set the paths of the files in the commits given by gitimport)
Reviewed By: StanislavGlebik
Differential Revision: D22233127
fbshipit-source-id: 4680c518943936f3e29d21c91a2bad60108e49dd
Summary:
This makes it possible to implement sparse profile based target determinator
cleanly. The old approach is to change `.hg/sparse` every time, and run
`hg log --sparse FILE`, which is hacky and less efficient.
Reviewed By: kulshrax
Differential Revision: D15798327
fbshipit-source-id: 5d46e5b2619f70a911324776b39829446e87b932
Summary:
This was causing `hg mv` to fail due to trying to hash a unicode path, but
Python3 refuses to hash anything but bytes.
Reviewed By: DurhamG
Differential Revision: D22235561
fbshipit-source-id: 3eb80b8e02d442a4036ab7be7ea5c139bd24ff5e
Summary:
The new `atomic_write_symlink` API handles platform weirdness especially on
Windows with mmap. Use it to avoid issues.
Reviewed By: DurhamG
Differential Revision: D22225317
fbshipit-source-id: c04a3948c30834e1025a541fc66b371654ed77e4
Summary:
This diff aims to solve `atomic_write` issues on Windows. Namely:
- `tempfile` left overs if temp files are not deleted on Drop.
- `tempfile` does unnecessary `chmod`.
- For mmap-ed files, it has to be deleted before `atomic_write`, causing
reader to have a chance to see inconsistent data.
This diff solves the above issues by:
- Use extra GC to clean up older files. Do not realy on successful `Drop`.
- Do not use `tempfile` and do not set permissions.
- Use a symlink so the symlink can still be atomic-replaced while the real
content is being mmaped.
Reviewed By: DurhamG
Differential Revision: D22225039
fbshipit-source-id: d45bb198a53f8beeef71798cdb9ae57f9b4b8cd3
Summary:
Eventually, we want everything to be `async`/`await`; as a stepping stone in that direction, switch some of the blobstore interfaces to new-style `BoxFuture` with a `'static` lifetime.
This does not enable any fixes at this point, but does mean that `.compat()` moves to the places that need old-style futures instead of new. It also means that the work needed to make the transition fully complete is changed from a full conversion to new futures, to simply changing the lifetimes involved and fixing the resulting compile failures.
Reviewed By: krallin
Differential Revision: D22164315
fbshipit-source-id: dc655c36db4711d84d42d1e81b76e5dddd16f59d
Summary: Make crecord python 3 compatible by using bytes and floor division.
Reviewed By: quark-zju
Differential Revision: D22201151
fbshipit-source-id: b7a69aa9cfaa30c75d016f2e0d51f5b955fcc4c0
Summary:
If the first client to send mutation data for a commit is only aware of partial
history for that commit, the primordial commit that is determined will be the
earliest of those commits. If another client comes along later with a longer
history, the new set of commits will be assigned a different primordial commit.
Make sure that when this happens, we still fetch the full history. We do this
by including the successor in the search-by-primordial case, which allows us
to join together disconnected histories at the cost of one extra round-trip to
the database.
Note that the fast path for addition of a single mutation will not fill in the
missing history. This is an acceptable trade-off for the faster performance
in the usual case.
Reviewed By: mitrandir77
Differential Revision: D22206317
fbshipit-source-id: 49141d38844d6cddc543b6388f0c31dbc70dcbc5
Summary:
By design, the mutation history of a commit should not have any cycles. However,
synthetic entries created by backfilling obsmarkers may inadvertently create
erroneous cycles, which must be correctly ignored by the mutation store.
The mutation store is affected by cycles in two ways:
* Self-referential entries (created by backfilling "revive" obsmarkers) must
be dropped early on, as these will overwrite any real mutation data for
that successor.
* Larger cycles will prevent determination of the primordial commit for
primordial optimization. Here we drop all entries that are part of the cycle.
These entries will not be shareable via the mutation store.
Note that it is still possible for cycles to form in the store if they are
added in multiple requests - the first request with a partial cycle will
allow determination of a primordial commit which is then used in subsequent
requests. That's ok, as client-side cycle detection will break the cycle in
these entries.
As we move away from history that has been backfilled from obsmarkers, this
will become less of a concern, as cycles in pure mutation data are impossible
to create.
Reviewed By: mitrandir77
Differential Revision: D22206318
fbshipit-source-id: a57f30a19c482c7cde01cbd26deac53b7bb5973f
Summary:
Push supported multiple bookmarks in theory, but in practice we never used it.
Since we want to start logging pushed commits in the next diffs we need to decide what to do with
bookmarks, since at the moment we can log only a single bookmark to scribe
let's just allow a single bookmark push
Reviewed By: farnz
Differential Revision: D22212674
fbshipit-source-id: 8191ee26337445ce2ef43adf1a6ded3e3832cc97
Summary:
In the next diffs it will be passed to unbundle processing so that we can use
scribe category to log pushed commits
Reviewed By: krallin
Differential Revision: D22212616
fbshipit-source-id: 17552bda11f102041a043f810125dc381e478611
Summary:
Remove data collection for obsmarker-related things:
* The obsstore size.
* The last 100 lines of `hg debugobsolete`.
* The unfiltered smartlog. The data normally available here is replaced by the
`hg debugmetalog` and `hg debugmutation` output. This is also usually a very
slow command.
Reviewed By: quark-zju
Differential Revision: D22207980
fbshipit-source-id: 4f7c0fe6571ad06ac331ced2540752c1937fb0eb
Summary: That was like 50% of the point of this change, and somehow I forgot to do it.
Reviewed By: farnz
Differential Revision: D22231923
fbshipit-source-id: 4a4daaeaa844acd219680907c0b5a5fdacdf535c
Summary:
Similarly to how we have `PushRedirectorArgs`, we need `CommitSyncerArgs`: a struct, which a long-living process can own and periodically create a real `CommitSyncer` out of it, by consuming freshly reloaded `CommitSyncConfig`.
It is a little unfortunate that I am introducing yet another struct to `commit_rewriting/cross_repo_sync`, as it's already pretty confusing with `CommitSyncer` and `CommitSyncRepos`, but hopefully `CommitSyncerArgs`'s purpose is simple enough that it can be inferred from the name. Note that this struct does have a few convenience methods, as we need to access things like `target_repo` and various repo ids before we even create a real `CommitSyncer`. This makes it's purpose a little less singular, but still fine IMO.
Reviewed By: StanislavGlebik
Differential Revision: D22197123
fbshipit-source-id: e2d993e186075e33acec00200d2aab10fb893ffd
Summary:
This fn is not used anywhere except tests, and its only difference from
`backsync_all_latest` is in the fact that it accepts a limit. So let's rename
`backsync_all_latest` into `backsync_latest` and make it accept a limit arg.
I decided to use a custom enum instead of `Option` so that people don't have to
open fn definition to understand what `BacksyncLimit::Limit(2)` or
`BacksyncLimit::NoLimit` mean.
Reviewed By: StanislavGlebik
Differential Revision: D22187118
fbshipit-source-id: 6bd97bd6e6f3776e46c6031f775739ca6788ec8c
Summary:
This diff enables `unbundle` flow to start creating `push_redirector` structs from hot-reloaded `CommitSyncConfig` (by using the `LiveCommitSyncConfig` struct).
Using `LiveCommitSyncConfig` unfortunately means that we need to make sure those tests, which don't use standard fixtures, need to have both the `.toml` and the `.json` commit sync configs present, which is a little verbose. But it's not too horrible.
Reviewed By: StanislavGlebik
Differential Revision: D21962960
fbshipit-source-id: d355210b5dac50d1b3ad277f99af5bab56c9b62e
Summary:
`LiveCommitSyncConfig` is intended to be a fundamental struct, on which live push-redirection and commit sync config for push-redurector, x-repo sync job, backsyncer, commit and bookmark validators are based.
The struct wraps a few `ConfigStore` handles, which allows it to query latest values every time one of the public methods is called. Callers receive parsed structs/values (`true`/`false` for push redirection config, `CommitSyncConfig` for the rest), which they later need to use to build things like `Mover`, `BookmarkRenamer`, `CommitSyncer`, `CommitRepos` and so on. For now the idea is to rebuild these derived structs every time, but we can later add a memoization layer, if the overhead is going to be large.
Reviewed By: StanislavGlebik
Differential Revision: D22095975
fbshipit-source-id: 58e1f1d8effe921b0dc264fffa785593ef188665
Summary:
It seems the `tempfile` crate sometimes fails to delete temporary files.
Workaround it by scanning and deleting them on Windows.
Add logging so we can know when to remove the bandaid.
Reviewed By: xavierd
Differential Revision: D22222339
fbshipit-source-id: c322134a1e3425294d85578f4649ca75a0e18a76
Summary:
When a file is mmap'ed, removing it will always fail, even with all the rename
magic. The only option that works is to ask the OS to remove the file when
there is no other file handles to it. In Python, we can use the O_TEMPORARY for
that.
Reviewed By: quark-zju
Differential Revision: D22224572
fbshipit-source-id: bee564a3006c8389f506633da5622aa7a27421ac
Summary:
On Windows, paths are case insensitive (but the filesystem is case preserving),
and thus `open("FILE.TXT")` and `open("file.txt")` refer to the same file. When
that file is not materialized and its parent directory isn't yet enumerated,
PrjFS will call the PRJ_GET_PLACEHOLDER_INFO_CB with the file name passed in to
the `open` call. In this callback, if the passed in name refers to a valid
file, it needs to call PrjWritePlaceholderInfo to populate the directory entry.
Here is what the documentation for that function states:
"For example, if the PRJ_GET_PLACEHOLDER_INFO_CB callback specifies
dir1\dir1\FILE.TXT in callbackData->FilePathName, and the provider's backing
store contains a file called File.txt in the dir1\dir2 directory, and
PrjFileNameCompare returns 0 when comparing the names FILE.TXT and
File.txt, then the provider specifies dir1\dir2\File.txt as the value of
this parameter."
While the documentation doesn't state how that name is used internally, we can
infer (and test) that the returned case will be used as the canonical
representation of that file, ie: the one that a directory listing will see.
Since the PathMap code already does a case insensitive search, we just need to
make sure to use what it returns instead of re-using the name used for the search.
The only caveat to all of this is the original comment that describe that
`metadata.name` can't be used as it causes crashes. From what I can tell, this
was written in later 2018, and I believe is no longer relevant: the
`metadata.name` field was simply not populated.
Reviewed By: wez
Differential Revision: D21799627
fbshipit-source-id: aee877cc2d5f057944fcd39b1d59f0e97de6315c
Summary: Not that useful and does not align with the direction we are headed.
Reviewed By: quark-zju
Differential Revision: D22213796
fbshipit-source-id: ffd86fc1a9207c134448836d0e54e48510a11135
Summary:
updated `eden top` to:
- obtain PID-fetchCounts data from the updated -`getAccessCounts` thrift call in the previous diff
- display that data in a new column `FUSE FETCH`
Reviewed By: kmancini
Differential Revision: D22101430
fbshipit-source-id: 6584e71ce3a4629c73469607ca0a4c6ffd63e46f
Summary:
This diff does three things:
- moves existing `CommitSyncConfig` validation from `config.rs` into
`convert/commit_sync.rs`, so that any user of `impl Convert for
RawCommitSyncConfig` gets it for free
- adds another thing to validate `CommitSyncConfig` against (large repo is one
of the small repos)
- adds `RawCommitSyncConfig` validation for something that can be lost when
looking at `CommitSyncConfig` (no duplicate small repos).
Reviewed By: markbt
Differential Revision: D22211897
fbshipit-source-id: a9820cc8baf427da66ce7dfc943e25eb67e1fd6e
Summary:
This diff updates `getAccessCounts` to
- obtain the PID-fetchCount map data added in the previous diff
- put that data into its `result`
Reviewed By: kmancini
Differential Revision: D22101232
fbshipit-source-id: 1d41715339d418b03c17f6c93a7a497b432973ae
Summary:
Both optional and pid_t weren't found and the right includes needed to be
provided. On Windows, the ProcessNameCache isn't compiled (yet), and since it
looks like the process name is optional in the BackingStoreLogger, let's not
provide it for now.
Reviewed By: fanzeyi
Differential Revision: D22215581
fbshipit-source-id: 31a7e7be62cd3d14108dc437d3dfabfb9e62f8d5
Summary:
The sshaskpass extension forks the current process and attempts to do
some tty magic. If there was an exception though, the exception could go up the
stack and trigger the resumption of the pull logic, resulting in pull executing
twice.
The fix is to move the `_silentexception(terminate=True)` context to wrap the
entire child process, so we're guaranteed to exit in all cases.
I also fixed the str-vs-byte issue that caused the original exception.
Reviewed By: xavierd
Differential Revision: D22211476
fbshipit-source-id: 5f5ca6b33b425e517650f9a83cab605a4c9783de
Summary: The extension was deprecated by the eol extension.
Reviewed By: DurhamG
Differential Revision: D22129826
fbshipit-source-id: 293a57b4039f424154955454e0a7a74dc7d23069
Summary:
The revision is passed down as a 40-bytes ascii string, and therefore it needs
decoding before being usable.
Reviewed By: DurhamG
Differential Revision: D22210203
fbshipit-source-id: b84bca5f89cbe4f267de1281c1a9ed55409174d2
Summary:
In python3, sys.std{in,out} are opened in text mode, but when actual fds are
passed in, we fdopen them in bytes mode. Since the code expects these to be
bytes, let's fdopen them in bytes mode too.
Reviewed By: DurhamG
Differential Revision: D22196541
fbshipit-source-id: b913267918af1c3bdf819e243a384312a2a27df0
Summary: When running integration tests we should make the paths absolute, but kept it relative so far. This results it breaking the tests.
Reviewed By: krallin
Differential Revision: D22209498
fbshipit-source-id: 54ca3def84abf313db32aecfac503c7a42ed6576
Summary:
So automation can detect abandoned transactions and run `hg recover`,
define an exit code. None of sysexit.h seemed to fit this case, so
start with 90.
Reviewed By: quark-zju
Differential Revision: D22198980
fbshipit-source-id: 5f267a2671c843f350668daaa14de34752244d4b
Summary:
If we don't read the body for a response, then Hyper cannot return the
connection to the pool. So, let's do it automatically upon dropping. This will
typically happen when we send a request to upstream then don't read the
response.
I seem to remember this used to work fine at some point, but looking at the
code I think it's actually broken now and we don't reuse upstream connections
if we skip waiting for upstream in a batch request. So, let's fix it once and
for all with a more robust abstraction.
Reviewed By: HarveyHunt
Differential Revision: D22206742
fbshipit-source-id: 2da1c008556e1d964c1cc337d58f06f8d691a916
Summary:
This was old Tokio 0.1 code that needed channels for spawns, but in 0.2 that
actually is built-in to tokio::spawn, so let's use this.
Reviewed By: HarveyHunt
Differential Revision: D22206738
fbshipit-source-id: 8f89ca4f7afc8dd888fe289e8d597148976cc54c
Summary:
This fixes a bit of a tech debt item in the LFS Server. We've had this
discard_stream functon for a while, which was necessary because if you just
drop the data stream, you get an error on the sending end.
This makes the code more complex than it needs to be, since you need to always
explicitly discard data streams you don't want instead of just dropping them.
This fixes that by letting us support a sender that tolerates the receiver
being closed, and just ignores those errors.
Reviewed By: HarveyHunt
Differential Revision: D22206739
fbshipit-source-id: d209679b20a3724bcd2e082ebd0d2ce10e9ac481
Summary:
We have a lot of integration tests for LFS, but a handful of unit tests don't
hurt for some simpler changes. Let's make it easier to write those.
Reviewed By: HarveyHunt
Differential Revision: D22206741
fbshipit-source-id: abcb73b35c01f28dd54cc543cd0a746327d3787b
Summary:
This diff is probably going to sound weird ... but xavierd and I both think
this is the best approach for where we are right now. Here is why this is
necessary.
Consider the following scenario
- A client creates a LFS object. They upload it to Mononoke LFS, but not
upstream.
- The client shares this (e.g. with Sandcastle), and includes a LFS pointer.
- The client tries to push this commit
When this happens, the client might not actually have the object locally.
Indeed, the only pieces of data the client is guaranteed to have is
locally-authored data.
Even if the client does have the blob, that's going to be in the hgcache, and
uploading from the hgcache is a bit sketchy (because, well, it's a cache, so
it's not like it's normally guaranteed to just hold data there for us to push
it to the server).
The problem boils down to a mismatch of assumptions between client and server:
- The client assumes that if the data wasn't locally authored, then the server
must have it, and will never request this piece of data again.
- The server assumes that if the client offers a blob for upload, it can
request this blob from the client (and the client will send it).
Those assumptions are obviously not compatible, since we can serve
not-locally-authored data from LFS and yet want the client to upload it, either
because it is missing in upstream or locally.
This leaves us with a few options:
- Upload from the hg cache. As noted above, this isn't desirable, because the
data might not be there to begin with! Populating the cache on demand (from
the server) just to push data back to the server would be quite messy.
- Skip the upload entirely, either by having the server not request the upload
if the data is missing, by having the server report that the upload is
optional, or by having the client not offer LFS blobs it doens't have to the
server, or finally by having the client simply disobey the server if it
doesn't have the data the server is asking for.
So, why can we not just skip the upload? The answer is: for the same reason we
upload to upstream to begin with. Consider the following scenario:
- Misconfigured client produces a commit, and upload it to upstream.
- Misconfigured client shares the commit with Sandcastle, and includes a LFS
pointer.
- Sandcastle wants to push to master, so it goes to check if the blob is
present in LFS. It isn't (Mononoke LFS checks both upstream and internal, and
only finds the blob in upstream, so it requests that the client submit the
blob), but it's also not not locally authored, so we skip the push.
- The client tries to push to Mononoke
This push will fail, because it'll reference LFS data that is not present in
Mononoke (it's only in upstream).
As for how we fix this: the key guarantee made by our proxying mechanism is
that if you write to either LFS server, your data is readable in both (the way
we do this is that if you write to Mononoke LFS, we write it to upstream too,
and if you write to upstream, we can read it from Mononoke LFS too).
What does not matter there is where the data came from. So, when the client
uploads, we simply let it submit a zero-length blob, and if so, we take that to
mean that the client doesn't think it authored data (and thinks we have it), so
we try to figure out where the blob is on the server side.
Reviewed By: xavierd
Differential Revision: D22192005
fbshipit-source-id: bf67e33e2b7114dfa26d356f373b407f2d00dc70
Summary:
In Python3, array indexing into a byte string returns a int, not a string.
Let's instead use the struct module to extract a byte string out of it that we
can then decode afterwards.
Reviewed By: DurhamG
Differential Revision: D22097226
fbshipit-source-id: e6b306b4d3bcf2ba08422296603b56fcadbb636e
Summary:
These were broken mostly due a test issue (using bytes instead of str), and a
small difference when printing "units_per_sec", which was an int in python2,
and was computed as a float in python3.
Reviewed By: DurhamG
Differential Revision: D22095813
fbshipit-source-id: 8af8332dad5366d2c168485b120a984ff1ba558a
Summary:
Due to Thrift design of "include" statements in fbcode the thrift structures has to be contained in folders that are identical to the folder layout inside fbcode.
This diff changes the folder layout on Cargp.toml files and in fbcode_builder, there will be a next diff that changes this for ShipIt as well.
Reviewed By: ikostia
Differential Revision: D22208707
fbshipit-source-id: 65f9cafed2f0fcc8398a3887dfa622de9e139f68
Summary:
We've had a case where a client sent a bundle that contained an LFS pointer to
Sandcastle. On that original repo, the LFS blob and pointer were in the
hgcache, which signified that the server has it. When that bundle gets applied
on Sandcastle, the pointer then makes its way onto the local lfs pointer store
and the blob will be fetched from the LFS server itself properly.
Where it gets tricky is that Sandcastle then tried to re-bundle the change and
thus tried to re-upload it. The expectation from the client perspective is that
since the blob was fetched from the server in the first place, the server will
just not ask for the blob to be re-uploaded. The asumption proved to be
semi-inacurate in the case where the LFS server mirrors all the writes to a
secondary LFS server. When that secondary LFS server is standalone and not
Mononoke, it may not have the blob in question, and thus the LFS server may
request the blob to be re-uploaded so it can write it to the secondary LFS
server.
Unfortunately, things start breaking down at this point. Since the blob isn't
present in the local store the client can't rely on being able to read it and
fetching it from the server to then upload it is silly and a bit too complex.
Instead, what we can do is teach the server of this specific scenario by
manually setting the Content-Length to 0 in the case of an upload where the
blob wasn't found in the local store. The server recognizes this and will
manually copy the blob to the secondary LFS server.
Reviewed By: krallin
Differential Revision: D22192718
fbshipit-source-id: 67c5ba1a751cc07d5d5d51e07703282d8175b010
Summary:
If a commit changes modes (i.e. executable, symlink or regular) of a lot of files but
doesn't change their content then we don't need to put these filenodes to the
generated bundle. Mercurial stores mode in manifest, so changing the mode
doesn't change the filenode.
Reviewed By: ikostia
Differential Revision: D22206736
fbshipit-source-id: f64ee8a34281cd207c92653b927bf9109ccbe1b4
Summary:
The `amend` command can take a list of files (or file patterns) to limit the
amend to a subset of files. Make this clear in the help text.
Reviewed By: ikostia, krallin
Differential Revision: D22206906
fbshipit-source-id: 9f26b95a628bb5bbf5eae92c16f8852f44207d25
Summary:
I landed D22118926 (e288354caf) yesterday expecting those messages at about the same time
xavierd landed D21987918 (4d13ce1bcc), which removed them. This removes them from the
tests.
Reviewed By: StanislavGlebik
Differential Revision: D22204980
fbshipit-source-id: 6b1d696c93a07e942f86cd8df9a8e43037688728
Summary:
If the nodemap file is seriously out of sync, it's harder to recover on Windows,
because nodemap mmap in other processes prevents `atomic_write`.
Use `util::path::remove_file` to remove the old mmap files and improve the
success rate of nodemap rewrites.
Reviewed By: xavierd
Differential Revision: D22168134
fbshipit-source-id: 577df978b3175eac3257794f5e5ff7522cf21871
Summary:
Make it possible to delete files that are mmap-ed. This avoids the extra
complexity of cleaning leftover files.
Reviewed By: xavierd
Differential Revision: D22168135
fbshipit-source-id: edb33a75638c668945feae49608c3fff25e742a4
Summary:
Update snapshot command for python3 compatibility.
This one was a bit weird, and there are some future changes we could make if we care. First, our CBOR implementation doesn't support CBOR strings (utf8 strings), only CBOR bytestrings (just byte sequences). This would be easy to add, but would be binary-incompatible with previous snapshots, if we made no other changes.
We also don't take full advantage of the supported types we already have, ie we encoded version (an integer in Python) as a string, then wrote that string as a CBOR bytestring, which makes the very awkward encodeutf8(str(version)) you see in this change, instead of just plain version, which would work round-trip perfectly fine as far as I can tell, but would break compatibility with existing snapshots.
As is, I made only the most conservatve change, Python2 snapshots should be binary-identical to Python3 snapshots, and they should both be compatible with snapshots made before this change. We could get rid of the vast majority of the conversions by simply adding CBOR string support, which would be a breaking change without special handling.
Reviewed By: quark-zju
Differential Revision: D22195085
fbshipit-source-id: cead42db0c5f84b99e305e5a7f1d839a35becac6
Summary:
Move utility function _getoptbwrapper out of Python2-gated block.
Convert undo.py to use integer division in both python2 and python3.
Reviewed By: xavierd
Differential Revision: D22125861
fbshipit-source-id: 9a283065c98eb122c4488285870e026748257b3c
Summary:
With the Rust stores being enabled by default in the code, time to actually
remove the code that wasn't using it. A lot of code still remains as the
treemanifest code still uses it. Migrating the treemanifest code to use the
Rust store would allow for even more code to be removed!
Reviewed By: DurhamG
Differential Revision: D21987918
fbshipit-source-id: 8e0becd4a1fb2ac81e26b8517d13e6d06ab06f65
Summary:
Now that the workers are in Rust, we no longer need the forker version in
Python. For now, the Python LFS extension still uses the threaded worker so
keep this one for now, when that extension will be removed we can remove the
rest of the worker code.
In theory, not all repository would benefit from the Rust workers, but these
are not supported at FB due to not being remotefilelog based.
Reviewed By: DurhamG
Differential Revision: D21987295
fbshipit-source-id: d17b9730651671608cf13f7abe6a9bb32251e140
Summary:
The Rust store code has been enabled everywhere for a few weeks now, let's
enable it by default in the code. Future changes will remove the config as well
as all the code associated with the non Rust store code.
The various tests changes are due to small difference between the Rust code and
the Python one, the biggest one being it's handling of corrupted packfiles. The
old code silently ignored them, while the new one errors out for local
packfiles. The test-lfs-bundle.t difference is just due to an ordering
difference between Python and Rust.
Reviewed By: kulshrax
Differential Revision: D21985744
fbshipit-source-id: 10410560193476bc303a72e7583f84924a6de820
Summary:
Whenever data is redacted on the server, a specific tombstone is returned when
fetching it. Make sure that whenever we update the file on disk, we write a
nice message to the user instead of the tombstone itself.
While this code could have been moved into the Rust store code itself, I prefer
to leave it to its users to decide what to do with redacted data. EdenFS for
instance may want to prevent access to it instead of showing the redacted
message.
Reviewed By: kulshrax
Differential Revision: D21999345
fbshipit-source-id: 39a83cdf5ea4567628a13fbd59520b9677aba749
Summary:
On Windows, `pathlib.Path.resolve` has an use of uninitialized memory bug in Python 3.6.2. Lego Windows has this version of Python 3 installed.
When `eden prefetch` runs, it will first attempt to read the config file at the root of the repository (`$REPOROOT/.eden/config`). Because of this bug, when we normalize the path in edenfsctl, it will get some random bytes as a result. This subsequently causes `toml.read` to fail as it is unable to read a path containing NUL byte.
As you can see, this is the same exception as we see on Windows:
```
In [1]: open("test\x00")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1-57764b20d660> in <module>
----> 1 open("test\x00")
ValueError: embedded null byte
```
Reviewed By: xavierd
Differential Revision: D22177997
fbshipit-source-id: ab2565c8946d809bc15bc1597b749fb5e9440ca0
Summary:
The dynamicconfig logic had an assert checking that a config was not
added by the dynamicconfig twice. This isn't actually a valid assert since the
caller could load the config as many times as it wanted.
This happened in hg doctor, where it loaded the ui once manually (without a repo
object) then it loaded it again during the creation of the repo object. I went
ahead and cleaned up hg doctor to not do this, but let's get rid of the assert
anyway.
Reviewed By: quark-zju
Differential Revision: D22194273
fbshipit-source-id: 9491d35fe14523ad3d9cb69b4ca0d615149dd0f0
Summary:
D22009699 recently introduced some logic that access repo.localvfs
during repo setup. This doesn't exist for peer repos, so let's exist early if
this is a peer.
Reviewed By: xavierd
Differential Revision: D22197586
fbshipit-source-id: 8f717aa21a0b45f8f85a1ff800ee4a1f86b94bdf
Summary: This makes it easier to run tests with non-buck build.
Reviewed By: sfilipco
Differential Revision: D22174992
fbshipit-source-id: 11100d608cb78973ae7a63056667f0be4a86ae00
Summary: This might make it easier for us to see what processes are hanging.
Reviewed By: sfilipco
Differential Revision: D22009696
fbshipit-source-id: a1c1f97976084160a1b4e4a3ff7d2f9781240f6f
Summary:
There are lots of "hanging" questions in the support group. While not all of
them are our fault (ex. mergedriver), we don't have an easy reply to tell
users what's actually going on.
This diff adds a way to write sigtrace periodically so we can include it in
rage output.
Reviewed By: sfilipco
Differential Revision: D22009699
fbshipit-source-id: 5349f613b08507ed02cc0e3fa00963fd7d4c3d21
Summary:
A lot of extensions wrap runcommand just to get the command name.
Let's define a proper API for it - ui.cmdname provides the resolved
command name.
Reviewed By: sfilipco
Differential Revision: D22009694
fbshipit-source-id: 4b8744ee5314924ea09c9f325570ce53c849fef5
Summary: The tracing APIs and error context APIs can achieve similar effects.
Reviewed By: xavierd
Differential Revision: D22129585
fbshipit-source-id: 0626e3f4c1a552c69c046ff06ac36f5e98a6c3d8
Summary:
`eden top` often adds extra quotes to the end of the commands in the process
table, this removes those pesky quotes
Reviewed By: genevievehelsel
Differential Revision: D21440316
fbshipit-source-id: f1200a28a5345691fcce517526d119f44e5993d0
Summary: Make test-test-commit-interactive.t python 3 compatible. Note I wasn't sure if it was worthwhile to spend too much time on the translation portion of the test. If someone objects to the deletion, I can spend more time on it.
Reviewed By: DurhamG
Differential Revision: D22122960
fbshipit-source-id: d232971581fca0532ed05c8ca1fdb7b5e8bd99e0
Summary:
Eden can sometimes unexpectedly fetch files from the server, and we want
to know why this is happening. This adds logging for the source of
data fetching in edens backing store to help obviate why these fetches
are happening.
This temporarily adds the logging in the HgQueuedBacking store to get a naive
version of logging rolled out sooner. Follow up changes will move this logging
closer to the data fetching itself if possible (in HgDatapackStore and HgImporter).
Reviewed By: chadaustin
Differential Revision: D22012572
fbshipit-source-id: b1b012ce4ee133fbacecd586b7365c3c5a5386df
Summary:
We have seen that eden will unexpectedly fetch data, we want to know why.
This adds the plumbing to interact with edens current logging to be able to
log when eden fetches data from the server and what caused eden to do this
fetch. Later changes will use the classes created here to log the cause of data
fetches.
Reviewed By: chadaustin
Differential Revision: D22051013
fbshipit-source-id: 27d377d7057e66f3e7a304cd7004f8aa44f8ba62
Summary:
Recently the server team added an un-used directory to test that eden would not
fetch these as a test for the upcoming repo merge. They saw that these files
were fetched a non trivial number of times. We want to know why eden is causing
these fetches.
This adds the pid and interface through which the client is interacting with eden to
ObjectFetchContext for this purpose. This information will be logged in later
changes.
note: currently this is only for fetches through Fuse (thrift interface to follow).
Reviewed By: chadaustin
Differential Revision: D22050919
fbshipit-source-id: 49b93257a0e6d910f48b1e8ec6471527e056d22a
Summary:
This passes ObjectFetchContext into the backing store to prepare for adding
logging for the cause of server fetches.
In following changes I will add logging in the HgQueuedBackingStore.
Ultimately we will want to move this logging to be closer to the data fetching
(in HgDatapackStore and HgImporter), but I plan to temporarily add logging to
the HgQueuedBackingStore to simplify so that we can more quickly roll out.
Reviewed By: chadaustin
Differential Revision: D22022992
fbshipit-source-id: ccb428458cbf7a1e33aaf9be9d0d766c45acedb3
Summary:
In following changes I will be threading ObjectFetchContext into the backing
store importing process, since this will start to be used more outside of the
ObjectStore, I am moving this class into its own files.
Reviewed By: chadaustin
Differential Revision: D22022488
fbshipit-source-id: 1a291fea6e0fd56855936962363dfc9f6de8533d
Summary: Let's not heal 10000 blobs in parallel, that's a little too much data.
Reviewed By: farnz
Differential Revision: D22186543
fbshipit-source-id: 939fb5bc83b283090e979ac5fe3efc96191826d3
Summary:
If we're going to iterate through the whole manifest, we should probably
prefetch it. Otherwise, we might end up doing a whole lot of sequential
fetching. We saw this this week when a change landed in sparse profiles that
caused requests to Mononoke to increase 100-fold.
Unfortunately, I don't think we can selectively only fetch the things we are
missing, so this just goes ahead and fetches everything unconditionally. If
there is a better way to do this, I'm all ears.
Reviewed By: StanislavGlebik, xavierd
Differential Revision: D22118926
fbshipit-source-id: f809fa48a7ff7b449866b42b247bf1da30097caa
Summary: This makes it easier to understand why we're fetching so much data.
Reviewed By: quark-zju
Differential Revision: D22114905
fbshipit-source-id: 7cb6d5c9aebcc8c9089891e030b02176e208bd0f
Summary: This got broken in D22115015 — this fixes it.
Reviewed By: farnz
Differential Revision: D22186138
fbshipit-source-id: 54c05466cdbd3be4f6887a852f099351ea5e891e