Summary:
Like it says in the title, this updates our HTTP stack to Hyper. There are a
few reasons to do this here:
- We can remove all the manual parsing & generation of HTTP, and instead let
Hyper (i.e. a HTTP framework) handle HTTP for us.
- We can use / reuse more pre-existing abstractions for things where we have to
implement HTTP handling (rather than just try to update to a websocket ASAP),
like the net speed test.
- And finally, my main motivation for this is that this will make it much
easier to load EdenAPI into Mononoke Server as a service. At this point, we
have a `Request` to pass into a function that returns a `Response`, which is
exactly what EdenAPI is, so hooking it in will be trivial.
There's a lot going on in this diff, so here is an overview. Overall, our
connection handling process is:
- Accept connection
- Perform TLS handshake
- Check if the remote is trusted
- Check ALPN:
- If hgcli, then read the preamble then run wireproto over the socket.
- If http, hand off the socket to Hyper. Hyper will call into our
MononokeHttpService (which is a Hyper Service) once per request.
- If websocket upgrade, accept the upgrade, then run wireproto over the
resulting I/O (i.e. the upgraded connection). An upgrade takes over the
connection, so implicitly this means there won't be further requests.
- If health check or net speed test, handle it. There might be multiple
requests here via connection reuse.
- This is where hooking EdenAPI will happen. We can instantiate Gotham
here: it also is a Hyper Service, so we just need to call it.
While in there, I've modelled those various states using structs instead of
passing a handful of arguments here or there.
Reviewed By: johansglock
Differential Revision: D26018641
fbshipit-source-id: dd757d72fe0f17f7c98c1948a6aa34d0c4d95cbf
Summary:
Like it says in the title, this updates our implementation of Globalrevs to
be a little more relaxed, and allows you to create and move bookmarks as long as
they are ancestors of the "main" Globalrevs bookmark (but NOT to pushrebase to
them later, because we only want to allow ONE globalrevs-publishing bookmark
per repo).
While in there, I also deduplicated how we instantiate pushrebase hooks a
little bit. If anything, this could be better in the pushrebase crate, but
that'd be a circular dependency between pushrebase & bookmarks movement.
Eventually, the callsites should probably be using bookmarks movement anyway,
so leaving pushrebase as the low-level crate and bookmarks movement as the high
level one seems reasonable.
Reviewed By: StanislavGlebik
Differential Revision: D26020274
fbshipit-source-id: 5ff6c1a852800b491a16d16f632462ce9453c89a
Summary:
Prior to this diff, only mononoke server initialized
`MononokeScubaSampleBuilder` in a way that used observability context, and
therefore respected vebosity settings.
Let's make a generic sample initializing function use this config too.
Reviewed By: ahornby
Differential Revision: D26156986
fbshipit-source-id: 632bda279e7f3905367b82db5b36f81264156de3
Summary: This is more flexible.
Reviewed By: StanislavGlebik
Differential Revision: D26168559
fbshipit-source-id: 5946b8b06b3a577f1a8398a228467925f748acf7
Summary: Just as we have global strings/ints, let's have per-repo ones.
Reviewed By: StanislavGlebik
Differential Revision: D26168541
fbshipit-source-id: f31cb4d556231d8f13f7d7dd521086497d52288b
Summary:
Please see added test. Without this diff such test does not even compile,
as `new_values_by_repo` is moved out by `self.#names.swap(Arc::new(new_values_by_repo));` after processing the first tunable (line 202).
Reviewed By: StanislavGlebik
Differential Revision: D26168371
fbshipit-source-id: 3cd9d77b72554eb97927662bc631611fa91eaecb
Summary:
If your disk1 is an external HFS-formatted disk, then
eden_apfs_mount_helper will fail to create apfs subvolumes on
it. Instead, use the disk backing the mount.
Reviewed By: fanzeyi
Differential Revision: D26096296
fbshipit-source-id: baa45181afb6610a095c864eb3183e5af76ec4e0
Summary:
It is already broken with segmented changelog (it assumes 0..len(repo) are
valid revs). It is super slow and cannot be optimized efficiently. The _only_
non-zero-exit-code usage in the past month is like:
hg log -r 'reverse(children(ancestors(remote/master) and branchpoint()) and draft() and age("<4d"))'
which takes 40 to 100s and can be rewritten using more efficient queries like `parents(roots(draft()))`.
Reviewed By: singhsrb
Differential Revision: D26158011
fbshipit-source-id: 7957710f27af8a83920021a228e4fa00439b6f3d
Summary:
Migrate most crates to tokio 1.0. The exception is edenfs-client, which has
some dependencies on `//common/rust/shed/fbthrift_ext` and seems non-trivial
to upgrade. It creates a separate tokio runtime so it shouldn't be affected
feature-wise.
Reviewed By: singhsrb
Differential Revision: D26152862
fbshipit-source-id: c84c43b1b1423eabe3543bccde34cc489b7805be
Summary: Configure segmented changelog to use caching when caching is requested.
Reviewed By: krallin
Differential Revision: D26121496
fbshipit-source-id: d0711a5939b5178b3a93d081019cfab47996da40
Summary:
Caching for the IdMap to speed things up.
Values for a key do not change. The IdMap is versioned for a given Repository.
We use the version of the IdMap in the generation of the cache keys. We set the
"site version" to be the IdMap version.
Reviewed By: krallin
Differential Revision: D26121498
fbshipit-source-id: 7e82e40b818d1132a7e86f4cd7365dd38056348e
Summary: This implementation is used for all things that are cached in Monononoke.
Reviewed By: quark-zju
Differential Revision: D26121497
fbshipit-source-id: a0088b539f3c3656921ab9a7a25c6442996aed18
Summary:
Logging all these throttling notifications is not necessary. There can
sometimes be big batches of fetches (like 100s of K). Lets reduce this by a
factor of 1000.
Note we also would like to add logging of what process triggered these fetches
what endpoint they use etc. This will help us identify the workflows causing it,
so we could address them or skip aux data fetching in these code paths.
But this requires some fiddling with ObjectFetchContext and the logging
code, so its gonna take a bit longer :(
Reviewed By: genevievehelsel
Differential Revision: D25505654
fbshipit-source-id: e7c40164db86fadf4baf0afd0c52879e0cb2568b
Summary:
For `repo.transaction("tr-name")`, this records `Transaction: tr-name` to
metalog commit description.
It can be helpful to narrow things down for commands with multiple
transactions.
In the future we might want to attach more data to the logging (ex. what the
commit cloud local, remote states at the time of syncing). However I didn't
do it now since metalog is designed to hold repository data, not too much
logging data. With a better logging infra we might want to move `config` out
from metalog, associated with metalog root ids.
Reviewed By: DurhamG
Differential Revision: D25984805
fbshipit-source-id: 59c074272cff555c6ff11dd755f7e3ce9a292eb6
Summary:
On setup SCS initializes the repos concurrently, warming up derived data for each repo, warming bookmark cache and fetching skiplists.
Fetching skiplists is an expensive operation and includes two steps: async get a large blob from the Blobstore and then sync deserialization of the blob.
While running on the same task as warming the bookmark cache, it takes all CPU and the other futures have to wait and can't process results returned by MySQL queries or connect to the DB. Thus SCS eventually fail to acquire a new connection or to perform a query in a reasonable time and terminates.
Spawning skiplists in a separate task helps to unlock the thread where the warm is running.
This was first noticed in TW tasks because after the MySQL rollout some of the SCS tasks started to take an hour to start.
To debug this and localize the issue, we put debug output to see what exactly blocks the repo initialization and, turned out it, when skiplists fetching started the rest was blocked.
Reviewed By: StanislavGlebik
Differential Revision: D26128171
fbshipit-source-id: fe9e1882af898950cf16d8e939dc6bc6be56510e
Summary:
The `async_common.py` needs to be ignored for Python 2 build since it
cannot be parsed by Python 2.
Reviewed By: xavierd
Differential Revision: D26142682
fbshipit-source-id: f921e7a35781b3336ba745e886380afc26d5ca36
Summary:
`blobstore_healer` works by healing blobs in turn, with some level of concurrency. Healing each blob consumes at least `O(blob_size)` or memory, so healing multiple blobs consumes their combined size of memory. Because blob sizes are not distributed uniformly, we cannot just calculate the desired concurrency level once and for all. Prior to this diff, this is what we did and whenever a few multi-gb blobs ended up in the same concurrently-healed batch, the healer OOMed. To help with this problem, this diff starts using dynamic concurrency - it assigns weight to each healed blob and only concurrently heals up to a certain total weight of blobs. This way, we can limit the total amount of memory consumed by our healer.
This solution is not perfect for a variety of reasons:
- if a single blob is larger than the total allowed weight, we'll still let it through. It's better than never healing it, but it means that OOMs are still possible in theory.
- we do not yet know the sizes of all the blobs in the queue. To mitigate that, I took a look at the known sizes distribution and saw that between 0 and 2KB is the most common size range. I defaulted to 1KB size of the unknown blob
Note 1: I had to make `heal_blob` consume it's args instead of borrowing them because `buffered_weight_limited` needs `'static` lifetime for the futures.
Note 2: When using `futures_ext`, I explicitly rename them to `futures_03_ext`, despite the fact that `blobstore_healer` does not depend on the older version. This is because `Cargo.toml` uses the same `[dependencies]` section for the combined dependencies of all the targets in the same `TARGETS` file. As there are other targets that claim the name of `futures_ext` for 0.1 version, I decided that it's easier to just use `_03_` name here than fix in other places. We can always change that of course.
Reviewed By: krallin
Differential Revision: D26106044
fbshipit-source-id: 4931d86d6e85d055ed0eefdd357b9ba6266a1c37
Summary: This was just a thin wrapper around with_metadata_database_config and it was using old futures, so remove it.
Differential Revision: D26100512
fbshipit-source-id: 22aa40ed73df2555645ba1d639fee3ae3dd38a09
Summary:
Currently the data layer eats all errors from remote stores and treats
them as KeyErrors. This hides connection issues from users behind obscure
KeyErrors. Let's make it so that any non-key error reported by the remote store
is propagated up as a legitimate error.
This diff makes Http errors from EdenApi show up with a nicer error message,
suggesting that the user run fixmyserver.
Further fixes will probably be necessary to categorize other errors from the
remote store more nicely.
Reviewed By: quark-zju
Differential Revision: D26117726
fbshipit-source-id: 7d7dee6ec101c6a1d226185bb27423d977096050
Summary:
Before this change we could throttle only based on one identity matching one of the identities from user's set of identities.
Now we'll be able to match a subset of user's identities.
Depends on D26125638.
Reviewed By: krallin
Differential Revision: D26125637
fbshipit-source-id: 534326264b9093e46fbdda846516fdaceb40c931
Summary:
For fsnodes and skeleton manifests it should be possible to allow gaps in the
commits that are backfilled. Any access to a commit in one of these gaps can
be quickly derived from the nearest ancestor that is derived. Since each
commit's derived data is independent, there is no loss from omitting them.
Add the `--gap-size` option to `backfill_derived_data`. This allows `fsnodes`,
`skeleton_manifests` and other derived data types that implement it in the
future to skip some of the commits during backfill. This will speed up
backfilling and reduced the amount of data that is stored for infrequently
accessed historical commits.
Reviewed By: StanislavGlebik
Differential Revision: D25997854
fbshipit-source-id: bf4df3f5c16a913c13f732f6837fc4c817664e29
Summary:
While looking into scs performance we noticed that fetching skiplist from
blobstore takes a lot of cpu. And looks like the slow part comes from lots of
allocation that we are doing in sql blob. Even though we might have successfully
fetched the skiplist from manifold, we'd still waste cpu time trying to assemble
object in xdb blobstore.
This diff fixes it by doing pre-allocation - however note that we don't have the precise size we need to
allocate, only an upper bound (number of chunks * max chunk size).
I opted to allocate less so that we don't waste memory on small requests (i.e.
it's not great to allocate 1mb buffer for 100 bytes object).
Note that I suspect that underlying problem might be in BytesMut extend_from_slice()
implementation - it might be unoptimized. However this is just a hunch, I haven't investigated it.For now let's do preallocation on our side.
Reviewed By: krallin
Differential Revision: D26145078
fbshipit-source-id: a50ba72656ffe6053af993fdec07ce55ddddacf3
Summary:
Replace `common::Job` by using `futures::Join` and `futures::Ready`.
We still need a heterogeneous variant of `Either`, where the output types of the
two futures differ, so extract this from `Job` as `common::Either2`, which
returns `either::Either<LeftFuture::Out, RightFuture::Out>`.
Reviewed By: ahornby
Differential Revision: D25867668
fbshipit-source-id: 13c90b212c64ca5eae67217a1cecd9aee5e40a38
Summary:
krallin noticed that we aren't using warm bookmark cache anymore. Turned out
the reason was in the fact that client uses `listkeyspatterns` call to fetch
bookmarks and not `listkeys`. This diff makes `listkeyspatterns` use warm
bookmark cache as well.
Reviewed By: markbt
Differential Revision: D26124605
fbshipit-source-id: 637db8d66934cabc1793f9f615fefddd07c3af62
Summary:
When comparing file contents with the version on disk we were
recreating the wctx and parent ctx for each file. This prevents us from
benefiting from the caching inside the ctx object, and with the new changelog
backend (which lacks certain caching) it can slow down hg status by 5x when
dealing with 10k+ pending changes.
Let's reuse the ctx objects to avoid this.
Reviewed By: quark-zju
Differential Revision: D26135419
fbshipit-source-id: ac25c69c26965362d0e5bfa92288d21f3e230be7
Summary:
Add a `edenscm.tracing` Python module that mirrors some of the APIs in the Rust
tracing eco-system:
- span, {info,debug,trace,warn,error}_span (for tracing)
- event, {info,debug,trace,warn,error} (for logging)
- instrument (decorator for function tracing)
One thing that needs to be careful is that callsites are meant to be static:
multiple calls to a same code location should reuse a single callsite. That
is achieved using (id(code), line) from the Python stack as the key.
Reviewed By: sfilipco
Differential Revision: D26013750
fbshipit-source-id: 63b89a73186d876af9fc1b311fbd7948d0a6e27a
Summary:
Similar to Rust tracing's instrument. It's a decorator for functions that will
generate spans when called.
Reviewed By: sfilipco
Differential Revision: D26021958
fbshipit-source-id: bbb648ab07d6db233cd16f56a5f0441df07e1c2e
Summary:
This allows Python to create span callsites understood by Rust tracing,
and operate on the spans (ex. enter, exit, record, etc.).
Reviewed By: sfilipco
Differential Revision: D26013747
fbshipit-source-id: 2783b29750e3279c5481422bddc83366ff7a3548
Summary:
This allows Python to create callsites for one-off events understood by Rust.
This diff adds the "EventCallsite" for logging one-off events.
Reviewed By: sfilipco
Differential Revision: D26013749
fbshipit-source-id: 2520928dc360852afe2780267036e9d22c212191
Summary:
Practically there could be a lot of duplicated strings. For example, `module`
name could be the same for all functions in a file. `cat` could be a commonly
used field name, etc. Intern the strings so we don't waste too much space.
Reviewed By: sfilipco
Differential Revision: D26013748
fbshipit-source-id: 2740319019ce5058acfd7a9d2bb972c08a9386de
Summary:
To make Python code integrate deeper with the Rust tracing eco-system (ex. be
aware of layered subscribers, and filters), add APIs to create tracing Callsite
on the fly.
Unfortunately, the tracing APIs make it a bit too verbose.
Reviewed By: sfilipco
Differential Revision: D26003017
fbshipit-source-id: c3a36bcbaf4b7bbeafc3e2a8ddfdc7193c0972d3
Summary:
Limit the span or event count that a callsite can log so we can handle highly
repetitive calls. This is similar to D23307793 (d8e775f423) but applies to the Rust tracing
APIs.
Reviewed By: sfilipco
Differential Revision: D26090876
fbshipit-source-id: 394f52d697fc5e9ba5b900cebc9405a9f29220ce
Summary: Those APIs are internal details. Public APIs do not need them.
Reviewed By: sfilipco
Differential Revision: D26090877
fbshipit-source-id: b6e3ac76de719d1db71c2b57da3e8ef76015089e
Summary:
The name is being taken by stdlib:
warning: a method with this name may be added to the standard library in the future
--> eden/scm/lib/dag/src/spanset.rs:228:14
|
228 | .binary_search_by(|probe| span.low.cmp(&probe.low))
| ^^^^^^^^^^^^^^^^
|
= note: `#[warn(unstable_name_collisions)]` on by default
= warning: once this method is added to the standard library, the ambiguity may cause an error or change in behavior!
= note: for more information, see issue #48919 <https://github.com/rust-lang/rust/issues/48919>
= help: call with fully qualified syntax `BinarySearchBy::binary_search_by(...)` to keep using the current method
= help: add `#![feature(vecdeque_binary_search)]` to the crate attributes to enable `VecDeque::<T>::binary_search_by`
Reviewed By: sfilipco
Differential Revision: D26092424
fbshipit-source-id: d2cdf7d73d2f808f038817c9dc9f4c531ff643bd
Summary:
Yesterday we had an alarm when blobstore sync queue got overloaded again. This
time it was caused by a large commit cloud commit landing and writing lots of
content and alias blobs.
As we discussed before, let's add an option that would allow us to not write to
blobstore sync queue for commit cloud pushes of content and aliases.
It would slightly increase the latency, but will protect blobstore sync queue
from overloading.
Reviewed By: farnz
Differential Revision: D26129038
fbshipit-source-id: 0e96887e3aa3cf26880899c820f556bb16c437cb
Summary:
When running large repos its interesting to be able to reduce memory usage by clearing all or part of the walk state between chunks.
This adds ability to clear the node data by node type, and to clear the interning maps by the type of interned data type
The only interned type not clearable is the bonsai changeset id as that is used to maintain the list of deferred edges for later chunks to process.
Reviewed By: krallin
Differential Revision: D25867960
fbshipit-source-id: 4b48f03b1a1b8fef1c5ded952bdcd6b1241dcc32
Summary:
Split the WalkVisitor trait into TailingWalkVisitor to allow &mut self references on the methods called from
tail.rs.
This diff also has logging changes to show the chunk bounds and introduce a chunking log tag. This helped in testing.
Reviewed By: krallin
Differential Revision: D26054258
fbshipit-source-id: c74af558f1da98689a38ca61363baf7ee52a265e
Summary:
If the LFS server is overloaded then we send 429s to clients. However,
we also do this for our health checks which results in the load balancer
thinking that we're unhealthy.
Update the throttling middleware so that it doesn't throttle health checks.
Reviewed By: krallin
Differential Revision: D26125384
fbshipit-source-id: 19260caf27c6c84b6b51e6a75a5533e220d429aa
Summary:
When SQL gc starts up, all blobs written since the last time round
have an unknown generation. Set them in a bulk operation, so that we end the
mark process with all blobs belonging to a generation
Reviewed By: ahornby
Differential Revision: D26102600
fbshipit-source-id: f6403623182f92486abdac91e5ae7322678246bb
Summary: Not much to add .. Guess we gotta update a stub here.
Reviewed By: ahornby
Differential Revision: D26124590
fbshipit-source-id: efc4f324b5fed15cff46b358c2b491480e9b73fb
Summary:
The LFS server exports ODS stats for both client and server errors for
every combination of repo and method. However, it's possible for there to be
errors without a relevant repo or method (e.g. accessing a random path or
rejecting client requests due to rate limiting).
I'd like to add a detector to catch an increase in 429s (the result of rate
limiting clients) so we can detect issues like S220624 earlier.
Update the ODS middleware to first log HTTP status before moving into per-repo
and per-method stats. I've also included explicit counters for 429s and 404s.
NOTE: I have left the repo and method counters as we use them in our detectors.
However, I intend to update our dashboards to no longer use them. In the future
we may want to delete them.
Reviewed By: krallin
Differential Revision: D26102576
fbshipit-source-id: a2ff9c4d9a80bb11f2b2cf647041b035b687d798
Summary: New warnings on the docs side - fix them up
Reviewed By: StanislavGlebik
Differential Revision: D26108490
fbshipit-source-id: d7d74837570778e392c7662fd56c770f0b52988d
Summary:
Add a command to import the whole git tree as a single bonsai changeset. This
command maybe useful if we don't need the whole git history.
Reviewed By: krallin
Differential Revision: D26076645
fbshipit-source-id: 21b712776af1906ca7b06af088d98848bab907b8
Summary: I'm going to need this functionality in the next diff, so let's move it.
Reviewed By: krallin
Differential Revision: D26105565
fbshipit-source-id: bc31242713a4a37a3a48e17f6aae7690da7087f3
Summary:
We're hitting master overload while trying to mark generations.
Make use of a read replica to short-circuit work if the replica knows it's all
OK so that we catch up over time by moving load to replicas
Reviewed By: ahornby
Differential Revision: D26003713
fbshipit-source-id: f18458c8b10d66e3a945391d226848810691762a
Summary:
Provide a command that measures per-generation sizes, to be run in
Chronos.
Reviewed By: ahornby
Differential Revision: D25981511
fbshipit-source-id: 7d9877dd5d04ac6881b5024e0ba132c234499d5a
Summary:
In prod, I see errors that imply a master move or a one-off DB
failure, like:
```
Error: While executing InsertGeneration query
Caused by:
Mysql Query failed: Failed The MySQL server is running with the --read-only
(super) option so it cannot execute this statement.
```
and
```
Error: While executing InsertGeneration query
Caused by:
Mysql Query failed: TimedOut [7002](MySQL Client) Query timed out (no rows seen, took 10000 ms)
```
Teach garbage marking to retry after a random delay,
Reviewed By: StanislavGlebik
Differential Revision: D25974428
fbshipit-source-id: d905ea2c9da19de79b1b8322193831774c3f1ce9
Summary:
We have fallback logic to go to the leader if the data we want is missing in
the replica, but right now it's backwards so we go to the leader to find data
we actually *did* find in the replica (and we don't go to the leader for
missing data).
Reviewed By: sfilipco
Differential Revision: D26103898
fbshipit-source-id: 535abab2a3093165f1d55359d102a7a7cb542a9c