Summary:
This struct is intended to be a single entry-point to the megarepo logic. It is also intended to be owned directly by scs_server without the `Mononoke` struct (from `mononoke_api`) intermediary. In effect, this means that mononoke server won't be affected by `MegarepoApi` at all.
Apart from adding this struct, this diff also adds instantiation of prod `AsyncMethodRequestQueue` and wires it up to the scs_server to enqueue and poll requests.
Reviewed By: StanislavGlebik
Differential Revision: D28356563
fbshipit-source-id: b67ee48387d794fd333d106d3ffd40124086c97e
Summary: I use tags extensively and I love them to be supported as well.
Reviewed By: asm89
Differential Revision: D28348565
fbshipit-source-id: 7d94d048b734c91e7d74a1c3efeefc87943066ad
Summary:
With `MegarepoApi` struct in play, there is a genuine need to have two repo
factories in a single process: this allows the structure to be self-sufficient
and instantiated without any references to `Mononoke` from `monooke_api`.
While this need could be solved by just wrapping a `RepoFactory` in an `Arc`,
it seems like most of it is already clonable, so let's just make it fully
clonable by fixing a few remaining places. (I prefer this over `Arc`, because
there's less refactoring of unrelated code). Given that there will likely be a
single digit of repo factories instantiated in a single process, the difference
between a single arc's clone (`Arc<RepoFactory>`) and ~10 arc clones (what I
did) is negligible.
Differential Revision: D28379860
fbshipit-source-id: fbddbdc913fedcd5846366344bc2f2c1ec4bd91e
Summary: This way implementing MegarepoApi is more convenient.
Reviewed By: krallin
Differential Revision: D28355487
fbshipit-source-id: e7643e854ee46fe6cb9c4a882f6c677bf4e77262
Summary:
Partial getbundle optimization didn't work correctly if one merge parent was an ancestor
of another - it might return a parent commit before a child commit. Say we had a graph like this
```
C
| \
| B
| /
A
```
Previously partial getbundle optimization could have visited A before it
visited B, and returned commits in the wrong order, which in turn would lead to
"hg pull" failures. The reason for that is that we don't order commits by generation number.
This diff fixes it by using UniqueHeap to sort commits by generation number
before returning them. Also note that as the name suggests UniqueHeap stores
only unique values in the heap, so we don't need to keep separate `visited`
hashset
Reviewed By: krallin
Differential Revision: D28378145
fbshipit-source-id: 9467fb7cfa8386e9e2725b80f386240b522ff3ee
Summary:
Just as with rewrite_commit function that I moved a to commit_transformation
not so long ago (D28259214 (df340221a0)), let's also move copy_file_contents. The motivation
is because we are going to use it in the next diff for sync_changeset method.
Reviewed By: ikostia
Differential Revision: D28352402
fbshipit-source-id: 12288a51540c9793d988e4063735bcbc1c3b1a7f
Summary:
Those blobs are designed to fit in cachelib, so we shouldn't attempt to
zstd-encoded them (not to mention that they don't usually compress very well
since many of those blobs come from binary files, though that's not true of
all).
However, we weren't actually doing this right now. There were a few reasons:
- Our threshold didn't allow enough headroom. I don't know when exactly this
got introduced (of indeed if that has worked since we introduced cachelib
compression).
- We serialize a bunch of extra metadata that we really don't need as it's a
bit meaningless once it's gone through the cache (we notably don't serialize
this in Memcache). This diff updates us to just store bytes here.
Differential Revision: D28350387
fbshipit-source-id: 4d684ab58cea137044e20951ec4cbb21240b8dfc
Summary: See the next diff for motivation: this makes it easier to implement.
Differential Revision: D28350388
fbshipit-source-id: 026605cf8296a945d6cc81b7f36d9198325bf13c
Summary:
Test is flaky: https://www.internalfb.com/intern/test/281474993296146?ref_report_id=0
I suppose this happens due to mechanics how we measure qps - with low volume, some averaging or bucketing might not work as precisely as with a lot of qps we have in normal, prod scenarios.
Lowering the threshold by 1 should fix this.
Reviewed By: ahornby
Differential Revision: D28350150
fbshipit-source-id: 694bfb8cce1935704e35b27f7d4455439d4bfffe
Summary:
I should've made them structs from the beginning, but of course I thought that
I know better and these tokens can definitely not be richer than just strings.
Well, it turns out we need them to be richer. Specific reason is that in theory
a single Mononoke (or scs_server) instance can run with multiple storage
configs. For us this means that one target's requests may be stored in one
db, while another target's requests - in another one. For blobstores this is
even used in practice, while for xdb it's just a theoretical thing, but we need
to support it nevertheless.
To do so, let's add the ability to query the target (and, correspondingly, the
Mononoke repo) from any king of params our async methods receive: ThriftParams
or Token implementors.
In addition, this diff really implements `AddScubaParams` and `AddScubaResponse` for more things than before, so there's that.
Finally, apart from making tokens structured, this diff also changes an interface in two more ways:
- it adds optional `message` fields to several params structs
- it adds `changesets_to_merge` to `MegarepoChangeTargetConfigParams`
Reviewed By: StanislavGlebik
Differential Revision: D28333999
fbshipit-source-id: 99bd19b040b59ee788ef661dda3171cc56254d33
Summary: Instead of passing a client certificate path to libcurl, load the certificate into memory and pass it to libcurl as a blob using `CURLOPT_SSLCERT_BLOB`. This allows us to convert the certificate format in-memory from PEM to PKCS#12, the latter of which is supported by the TLS engines on all platform (and notably SChannel on Windows, which does not support PEM certificate).
Reviewed By: quark-zju
Differential Revision: D27637069
fbshipit-source-id: f7f8eaafcd1498fabf2ee91c172e896a97ceba7e
Summary:
The Rust `openssl` crate will using dynamic linking by default when built with `cargo`. This is a problem on Windows, since we only support cargo-based builds on that platform, but OpenSSL is not present in the system's shared library search paths.
Since we already have a copy of OpenSSL uploaded to LFS, the simplest solution is to just copy the required DLLs right next to the Mercurial executable so that they will be found at launch.
A better solution would probably be to use static linking here. From reading the crate's documentation (and build script), it seems like setting `OPENSSL_STATIC=1` during the build should force static linking, but in practice I have not been able to get this to work.
Reviewed By: DurhamG
Differential Revision: D28368579
fbshipit-source-id: 3fceaa8d081650d60356bc45ebee9c91ef474319
Summary:
split full sync into 3 steps
Commit cloud by default pulls only 30 days of commits.
Users often see some of their commits are missing in their smartlog.
I discovered that most of the users know the '--full' option (`hg cloud sync --full`) but not the 'max_sync_age' config option.
So, they try --full option but it could fail due to very very old commits we haven't migrated to Mononoke.
Users often don't really need those commits but it's not nice that the whole sync run failed.
We know that at least latest 2 years of commits are present in Mononoke.
So if we split a bit how we sync with --full option works, it would at least result in partially successfully sync for the latest 2/3 years of commits.
Reviewed By: mitrandir77
Differential Revision: D28352355
fbshipit-source-id: b5bacd7d5256191528613e3c0bcbb21b0104ac3c
Summary:
deprecate 4 commits at a time limitation for unhydrated pulls
This could speedify cloud join commands significantly (by many X times) and hg cloud sync --full command.
Reviewed By: farnz
Differential Revision: D28351849
fbshipit-source-id: f9f3d7a5c07d61cb51a5bb6284afaad963662c94
Summary: This is going to be used in the next diffs.
Reviewed By: StanislavGlebik
Differential Revision: D28333977
fbshipit-source-id: ad52d307e13ae9bd662209ef7ec6afdcf0ee24c7
Summary:
While ImmediateFuture are expected to be used on values that are mostly
immediate, there are cases where it won't be. In these cases we need a way to
wait for the computation/IO to complete. In order to achieve this, we need to
transform an ImmediateFuture onto a SemiFuture.
Reviewed By: fanzeyi
Differential Revision: D28293941
fbshipit-source-id: 227c0acf1e22e4f23a948ca03f2c92ccc160c862
Summary:
When a T can be default constructed, make an ImmediateFuture default
constructible.
Reviewed By: fanzeyi
Differential Revision: D28292874
fbshipit-source-id: 4c239cc9c3f448652b2bcdc103ea1a81ace46402
Summary: This should help the compiler generate even better code.
Reviewed By: chadaustin
Differential Revision: D28153979
fbshipit-source-id: b1d84c92af4fa760c92624c53d7f57330d7706fa
Summary:
This has been here for a little while, but it's worth changing. Currently, we
entirely discard logs coming via a CoreContext in EdenAPI.
We don't typically log many of those anywhere in Mononoke, but when we do they
tend to be for error conditions, warnings, or aggregated reporting, and can be
quite meaningful as a result.
So, let's update to not discard them. To make it easy to differentiate those
logs from EdenAPI request-level logs (even though they'll both have `service` =
`edenapi`), let's give the latter a Log Tag (which is consistent with what
we do in repo client).
Differential Revision: D28350733
fbshipit-source-id: 3b12a4b56f28435460186e1f7578163ca7bdaebc
Summary:
Previously it was possible to write configs only, now it's possible to read
them as well.
Reviewed By: ikostia
Differential Revision: D28326571
fbshipit-source-id: d946201a384cc3998d1c197b7eabb77b9f35129d
Summary:
Adding mappng to keep track of two things:
1) keep track of the latest source commit that was synced into a given target - this will be used during sync_changeset() method to validate if a parent changeset of a given changeset was already synced
2) which source commit maps to what target commit
Reviewed By: ikostia
Differential Revision: D28319908
fbshipit-source-id: f776d294d779695e99d644bf5f0a5a331272cc14
Summary: This is the same change as D27137328 (a9a1b73418) but for macFUSE.
Reviewed By: kmancini
Differential Revision: D28328029
fbshipit-source-id: c58e146dba2e7e3bdb320f2b5e80946e4a7b3afe
Summary: With the addition of the ability to "background" the prefetches in the daemon itself, we can remove the subprocess backgrounding in the python layer and just depend on the internal backgrounding.
Reviewed By: chadaustin
Differential Revision: D27825274
fbshipit-source-id: aa01dc24c870704272186476be34d668dfff6de5
Summary: getTreeForManifest is no longer called, so remove it.
Reviewed By: genevievehelsel
Differential Revision: D28306796
fbshipit-source-id: e51a32fa7d75c54b2e3525e88c162247b4496560
Summary:
This is going to be use to rewrite (or transform) commits from source to
target. This diff does a few tihngs:
1) adds a MultiMover type and a function that produces a mover given a config. This is similar to Mover type we used for fbsource<-> ovrsource megarepo sync, though this time it can produce a few target paths for a given source path.
2) Moves `rewrite_commit` function from cross_repo_sync to megarepo_api, and make it work with MultiMover.
Reviewed By: ikostia
Differential Revision: D28259214
fbshipit-source-id: 16ba106dc0c65cb606df10c1a210578621c62367
Summary:
This crate is a foundation for the async requests support in megarepo service.
The idea is to be able to store serialized parameters in the blobstore upon
request arrival, and to be able to query request results from the blobstore
while polling.
This diff manipulates the following classes of types:
- param types for async methods: self-explanatory
- response types: these contain only a resulting value of a completed successful execution
- stored result types: these contain a result value of a completed execution. It may either be successful or failed. These types exist for the purpose of preserving execution result in the blobstore.
- poll-response types: these contain and option of a response. If the optional value is empty, this means that the request is not yet ready
- polling tokens: these are used by the client to ask about the processing status for a submitted request
Apart from that, some of these types have both Rust and Thrift counterparts, mainly for the purposes of us being able to implement traits for Rust types.
Relationships between these types are encoded in various traits and their associated types.
The lifecycle of an async request is as follows therefore:
1. the request is submitted by the client, and enqueued
1. params are serialized and saved into a blobstore
1. an entry is created in the SQL table
1. the key from that table is used to create a polling token
1. some external system processes a request [completely absent form this diff]
1. it notices a new entry in the queue
1. it reads request's params from the blobstore
1. it processes the request
1. it preserves either a success of a failure of the request into the blobstore
1. it updates the SQL table to mention that the request is now ready to be polled
1. the client polls the request
1. queue struct receives a polling token
1. out of that token it constructs DB keys
1. it looks up the request row and checks if it is in the ready state
1. if that is the case, it reads the result_blobstore_key value and fetches serialized result object
1. now it has to turn this serialized result into a poll response:
1. if the result is absent, poll response is a success with an empty payload
1. if the result is present and successful, poll response is a success with the result's successful variant as a payload
1. if the result is present and is a failure, the polling call throws a thrift exception with that failure
Note: Why is there yet another .thrift file introduced in this diff? I felt like these types aren't a part of the scs interface, so they don't belong in `source_control.thrift`. On the other hand, they wrap things defined in `source_control.thrift,` so I needed to include it.
Reviewed By: StanislavGlebik
Differential Revision: D27964822
fbshipit-source-id: fc1a33a799d01c908bbe18a5394eba034b780174
Summary: Log the blobstore id as part of sampled pack info. This is allows running the walker pack info logging directly agains a multiplex rather than invoke it for one component at a time.
Reviewed By: farnz
Differential Revision: D28264093
fbshipit-source-id: 0502175200190527b7cc1cf3c48b8154c8b27c90
Summary:
When sampling multiplex stores its interesting to know which component of the store one is sampling.
This adds a new SamplingBlobstorePutOps struct with implements the BlobstorePutOps that multiplex blobstore requires. Its connected up to blobstore factory in the next diff.
Reviewed By: farnz
Differential Revision: D28264444
fbshipit-source-id: 560de455854b6a6794b969d02046d67d372efd37
Summary: What we're trying to do here is all explained in the inline comments.
Reviewed By: farnz
Differential Revision: D28287486
fbshipit-source-id: 605c5272118b9d0b76f6284f4e81febe4b6f652e
Summary: Right now this is not very useful. Let's make it more useful.
Reviewed By: DurhamG
Differential Revision: D28281653
fbshipit-source-id: ef3d7acb61522549cca397048c841d1afb089b9b
Summary:
These are undermaintained, and need an update for oncall support. Start by moving to CXX, which makes maintenance easier.
In the process, I've fixed a couple of oddities in the API that were either due to the age of the code, or due to misunderstandings propagating through bindgen that CXX blocks, and fixed up the users of those APIs.
Reviewed By: dtolnay
Differential Revision: D28264737
fbshipit-source-id: d18c3fc5bfce280bd69ea2a5205242607ef23f28
Summary:
Because cachelib is not initialised at this point, it returns `None` unconditionally.
I'm refactoring the cachelib bindings so that this returns an error - take it out completely for now, leaving room to add it back in if caching is useful here
Reviewed By: sfilipco
Differential Revision: D28286986
fbshipit-source-id: cd9f43425a9ae8f0eef6fd32b8cd0615db9af5f6
Summary: This wants to use Scuba so it needs this.
Reviewed By: StanislavGlebik
Differential Revision: D28282511
fbshipit-source-id: 6d3a2b6316084f7e16f5a2f92cfae1d101a9c2d3
Summary:
This makes it easier to see what builder functions were registered:
% EDENSCM_LOG=edenapi=debug lhg log -r .
May 06 16:40:29.355 DEBUG edenapi::builder: registered eagerepo::api::edenapi_from_config to edenapi Builder
Reviewed By: DurhamG
Differential Revision: D28271366
fbshipit-source-id: f6c7c3aa9f29c3e47c2449e3d5fc16474aa338b0
Summary:
Adding support for the stables template keyword in stablerev extension.
This keyword calls out to a script specified in the config stablerev.stables_cmd to get a list of stable aliases for a given revision.
Reviewed By: quark-zju
Differential Revision: D28204529
fbshipit-source-id: 3c5b21846ce6f686afddd00d3326a54b85be87dd
Summary:
The server1 was not used after D27629318 (ba7e1c6952) while the test intentionally wants to
exercise graph isomorphism. So let's revive server1 in the test.
Reviewed By: andll
Differential Revision: D28269926
fbshipit-source-id: 0a04031415f559f8a6eb81f1e2f2530329a2a3bc
Summary:
In blobstore factory we can end up with duplicate layers of wrapper blobstores like ReadOnlyBlobstore.
For the multiplex, its inner stores get throttling, readonly etc wrappers, and it itself only writes to its queue if an inner store succeeds, which it can't when inner store has ReadOnlyBlobstore wrapper.
Differential Revision: D28250832
fbshipit-source-id: 5a3f85584b9cce17ca7ce4b83cdb2117644850db
Summary: Add support for logging the inner parts of a multiplex blobstore stack. This helps understand what wrappers have been applied.
Differential Revision: D28230927
fbshipit-source-id: 873ee30ec00fdc2dfc79b47e5831231c51e2ce0d
Summary:
Fixing the bulkops fetch size to MAX_FETCH_STEP means we can use the chunk size option to control how many changesets are walked together without affecting query performance.
This will allow more accurate first referencing commit time lookup for files and manifests, as all commits in a chunk could possibly discover them, with smaller chunks the discovered mtime becomes less approximate, at the possible cost of some walk rate performance if one was to run with very small chunk sizes.
Differential Revision: D28120030
fbshipit-source-id: 0010d0672288c6cc4e19f5e51fd8b543a087a74a
Summary: Knowing the numeric changeset id is useful in next diff when chunking in walker is loading from bulkops in large chunks, but then walking commits in smaller chunks.
Differential Revision: D28127581
fbshipit-source-id: c5b3e6c2a94e33833d701540428e1ff4f8898225
Summary: Found this useful while debugging the pack sampling
Differential Revision: D28118243
fbshipit-source-id: d94b0b87125a9863f56f72029c484909a3696329
Summary:
We were only incrementing this on `readline`, which resulted in very low
numbers. While in there, I also removed `self._totalbytes` as that was unused.
Reviewed By: johansglock
Differential Revision: D28260141
fbshipit-source-id: 6d9008f9342adaf75eecc8ed8c872f64212cd1f7
Summary:
In an NFS mount, the InodeNumber are sent to the client as the unique
identifier for a file. For caching purposes, the client will issue a GETATTR
call on that InodeNumber prior to opening it, to see if the file changed and
thus whether its cache needs to be invalidated.
In EdenFS, the checkout process does unfortunately replace file inodes
entirely, causing new InodeNumber to be created, and thus after an update, an
NFS client would not realize that the content changed, and would thus return
the old content to the application. To solve this, we could approach it in 2
different ways:
- Build a different kind of handle to hand over to the NFS client
- Keep InodeNumber constant during checkout.
After trying the first option, it became clear that this would effectively need
to duplicate a lot of functionality from the InodeMap, but with added memory
consumption. This diff attempts to do the second one.
Reviewed By: chadaustin
Differential Revision: D28132721
fbshipit-source-id: 94d470e33174bb9ffd7db00e1b37924096aac8e9