Summary:
`ObservabilityContext` is a structure that helps logging facilities within Mononoke to make logging decisions. Specifically, the upcoming `DynamicLoggingDrain` and already existing `MononokeScubaSampleBuilder` will have this structure as a component and decide whether a particular logging statement (slog or scuba) should go ahead.
Internally, `ObservabilityContext` is a wrapper around data received from a [configerator endpoint](https://www.internalfb.com/intern/configerator/edit/?path=scm%2Fmononoke%2Fobservability%2Fobservability_config.cconf).
This diff makes a few unobvious decisions about how this is organized. My goals were:
1. to have production (i.e. reading from configerator), static (i.e. emulating current prod behavior) and test variants of `ObservabilityContext`
1. to avoid having consumers know which of these variants are used
1. to avoid making all consumers of `ObservabilityContext` necessarily generic
1. to avoid using dynamic dispatch of `ObservabilityContext`'s methods
Points 3 and 4 mean that `ObservabilityContext` cannot be a trait. `enum` is a common solution in such cases. However, if `ObservabilityContext` is an `enum`, consumers will know which variant they are using (because `enum` variants are public). So the solution is to use a private enum wrapped in a struct.
Reviewed By: mitrandir77
Differential Revision: D25287759
fbshipit-source-id: da034c71570137e8a8fb7749b1e4ad43be482f66
Summary:
A user ran into an issue where some buck/build related automation had
added a redirection named `../../../Users/username/reponame/.lsp-buck-out`.
The valid name for this is `.lsp-buck-out` because redirections must always be
repo-root-relative.
This commit adds missing validation for this, and makes it slightly more
convenient for consuming tools to work with by allowing an absolute path to the
redirection to be specified.
Why not simply resolve the path from the cwd and use that? The command has
always been defined in terms of repo-root-relative paths, and there are a
number of cases where CI and other build tools can start in unexpected
subdirectories and that means that paths like `../buck-out` can have an
ambiguous or surprising resolution. There are tools that follow the
documentation and correctly specify repo-root-relative paths that would either
be broken outright or left with newly ambiguous behavior if we suddenly started
to accept and resolve relative paths for ourselves.
I'm not opposed to changing the behavior in that way in the future, but it
requires a bit more of a coordinated effort than I have time to wrangle right
now, so this diff is aimed at surfacing breakage rather than magically
resolving it.
This commit fixes up the behavior so that paths like `../foo` are explicitly
checked for and generate an error, but absolute paths like
`/Users/username/reponame/foo` (which are unambiguous) are now resolved to the
appropriate repo-root-relative path automatically.
That makes it a bit easier for consuming tools to swing through a behavior
change; they can now simply pass the absolute path rather than messing around
with trying relativize the paths to the repo root.
Reviewed By: kmancini
Differential Revision: D25261490
fbshipit-source-id: 1706b54fc15c2dad334cdf6c75cca5e6e44ed97a
Summary:
A while back, we saw that concurrent directory creation would lead to EdenFS
being confused and failing to record some of the created directories. This then
caused EdenFS to no longer being in sync with what was on disk. To handle this
case, we've had to manually creating these directories recursively.
What I didn't realize at the time was that these concurrent notifications could
also happen on removal this time, and if a directory removal notification wins
the race against the removal of its last children, that directory wouldn't be
removed and EdenFS would once again be confused about the state of the
repository.
Fixing this is a bit trickier than directory creation as it's more racier.
Consider a directory that is being removed, and then immediately recreated with
a file in it in a different process. The naive approach of simply force
removing all of the children of a directory when handling the removal
notification would clash with the file creation. We could argue that nobody
should be doing this, but there would be an unhandled race, and thus a bug
where data would potentially be lost[0].
We can however fix this bug slightly differently. For file/directory removal,
we can actually hook onto the pre-callback, ie: one that happens before the
file/directory is no longer visible on disk. This inherently eliminate the race
altogether as the callback will be guaranteed to run when none of its children
are present, and if a race happens with a file creation in it, we can simply
fail the removal properly.
The only tricky bit is for the renaming logic, as renaming a file is logically
a removal followed by a creation. For that reason, I've moved part of the
renaming bits to the pre-callback too.
In theory, this change may negatively affect workloads that do concurrent
directory removal as the duration during which a file/directory is visible
ondisk now includes the EdenFS callback while it didn't before. Such workflows
should be fairly rare and/or redirected to avoid EdenFS altogether if
performance matters.
[0]: This left-over file that EdenFS wouldn't be aware of would also later
cause the checkout code to fail due to invalidation failures triggered when
trying to invalidate that directory. This would be fairly hard to debug.
Reviewed By: fanzeyi
Differential Revision: D25112381
fbshipit-source-id: 9300499ce872ad93d0a687f0e61b7e2a9caf9556
Summary:
On Windows, the FS refcount is used to indicate that ProjectedFS knows about
this inode and either has a placeholder on disk, or a plain file. The first
event only occurs on lookup (similarly to Linux/macOS), while the second one
happens when files are created by the user and we receive a notification about
it.
In order to avoid races and to miss necessary invalidation, the refcount has to
be incremented after the placeholder has been created, and the refcount is
decremented before the invalidation is performed. This is straightforward to
achieve for notifications, but requires passing a callback to the PrfsChannel.
Reviewed By: chadaustin
Differential Revision: D24800819
fbshipit-source-id: 0e7ea7ed3a9ca0414e3e727fba975045546d82d1
Summary:
On Windows, the refcount will be increased when a placeholder is written on
disk, and decremented when it is invalidated. Since the invalidation completely
clears the inode on disk, we need to make sure we don't keep an inode loaded
after it is invalidated. For that reason, let's just allow the refcount to be
either 0 or 1.
Reviewed By: chadaustin
Differential Revision: D24764567
fbshipit-source-id: 5d09fb9f53bf36d517d0bfb083107f176c33c2a7
Summary: Can just pass on the iterator
Reviewed By: ikostia
Differential Revision: D25216892
fbshipit-source-id: 79c08737477ac7ed1f824c50105d5977ee592126
Summary: Its now the default for this binary, might as well shorten the test command lines
Reviewed By: ikostia
Differential Revision: D25219717
fbshipit-source-id: 8074145c6f05f26ab7fa18d2ff399482ad592885
Summary: Reduces boilerplate for binaries usually run in this mode, notably the walker
Reviewed By: ikostia
Differential Revision: D25216883
fbshipit-source-id: e31d2a6aec7da3baafd8bcf208cf79cc696752c0
Summary: This is useful to prevent accidentally consuming too much. Enabled it for the walker
Reviewed By: ikostia
Differential Revision: D25216880
fbshipit-source-id: e80f490d6ece40d64cc8609e7d6b80d0ecbb1671
Summary: Reduces boiler plate on command line for binaries like walker that want different default
Reviewed By: krallin
Differential Revision: D25216876
fbshipit-source-id: 0df474568d28e0726be223e9dc0a760523063d21
Summary: Darkisilon cell consists of multiple hosts which shares underlying storage, so write to one of them is visible for all hosts. Lets spread requests between all these hosts. I'll get list of hosts from the smc tier and will randomly connect to one on each request.
Reviewed By: krallin
Differential Revision: D25163782
fbshipit-source-id: b28085dd37b15972469b7334a47def473e10f34e
Summary: This allows us to be more flexible in choosing authentication and expands variables used in configuration.
Reviewed By: singhsrb
Differential Revision: D25304008
fbshipit-source-id: 636893a9eaec31ca5acfa02f72931d5e56b695d0
Summary:
Update pywatchman/bser.c to define `PY_SSIZE_T_CLEAN` before including
`Python.h` and then receive parsed string lenghts as `Py_ssize_t` rather than
`int`. This eliminates some runtime Python warnings about the old `int`
behavior being deprecated.
Reviewed By: chadaustin
Differential Revision: D25294778
fbshipit-source-id: 7db678ed918db3bf4745ba716585ed6475d1c910
Summary:
EdenFS can now show notifications to the user in case something wrong is
happening.
Reviewed By: chadaustin
Differential Revision: D24864354
fbshipit-source-id: fabc30f14bc022b4367af562481235fe984df458
Summary:
One of the main sub-par user experience on Windows is the lack of notification
of any kind when EdenFS can't reach the Mercurial servers. Prior to this diff,
the callbacks would never return, causing commands to simply hangs for the
user.
As a first step, let's add a timeout, a later step will hook the notification
mechanism used on macOS/Linux to display a notification when timeouts occurs.
The only callback that doesn't have a proper timeout is the notification one,
as timing out on these would mean that EdenFS won't have registered that some
files/directories have been materialized which will lead to inconsistencies
later.
Reviewed By: kmancini
Differential Revision: D24809645
fbshipit-source-id: 0ddd9d443a17db405a3edbaa8edecf3764c31d37
Summary:
As described in the previous diff, unmounting a repo while a request is pending
would lead to a use after free. To solve this, we can wrap the inner channel
with a shared_ptr, and set it to NULL whenever unmount is in progress.
While this solution has a fairly large overhead due to requiring at least 2
atomics per callbacks (one for the lock, the second one for the shared_ptr
copy), it is correct. A future improvement will swap these with an RCU pointer
to reduce the callback cost to almost nothing.
Reviewed By: chadaustin
Differential Revision: D25071423
fbshipit-source-id: 77d14a38403bef3e276d3e5e48e6fd95dd641964
Summary:
There is currently a race condition where unmounting a repo can happen
concurrently with a ProjectedFS notification/callback. Depending on who wins
that race, this can lead to a use-after-free as the PrjfsChannel/EdenMount
would be freed but the callback would still have reference to it.
To solve this, we need to keep track of inflight requests, and in particular
make sure that memory isn't freed before all the pending callbacks have
completed. And that effectively means that we need to refcount the channel used
by these callbacks so we only free the memory when nobody else is using it.
The first step towards this is splitting the channel in 2 halves.
Reviewed By: chadaustin
Differential Revision: D25071422
fbshipit-source-id: 743f38c9b19ba534961d06ea6f2ddc96b685fe19
Summary: convert blobrepo:blobrepo_common to new type futures
Reviewed By: StanislavGlebik
Differential Revision: D25245426
fbshipit-source-id: d3db0e417545b2b0043e48a536737586005ac4bb
Summary:
I'd like to experiment with splitting this into its own service. To do that, I
don't want to have to include a path, since it's only used for reporting an
error that will never occur (because for that service I'll be using the
"generate" variant of the filenode id). Let's just make it optional.
Reviewed By: lukaspiatkowski
Differential Revision: D25220901
fbshipit-source-id: 6d3cf70a63b077de18a7d43f3b65766b453c425e
Summary: Like it says in the title. Let's turn this into an async fn.
Reviewed By: lukaspiatkowski
Differential Revision: D25220902
fbshipit-source-id: b5de783adaca05919eb5cd6858c8b0aaf03ddfc2
Summary:
This returns a Result of a tuple, but:
- This never errs.
- Nothing ever reads the left side of the tuple.
So let's stop doing that.
Reviewed By: StanislavGlebik
Differential Revision: D25219887
fbshipit-source-id: f33dcf6f6e68cb17b40c4638470312afae0662e6
Summary:
Like it says in the title. We've had this comment about potentially using this
for a couple of years now. It seems a bit unlikely at this point that we're
going to use this, but currently it makes the code that provides uploading hg
entries more complex than it needs to be, so let's just get rid of this.
Reviewed By: lukaspiatkowski, StanislavGlebik
Differential Revision: D25219728
fbshipit-source-id: 61d254edef16d24a4c29f96f983f894863b5232a
Summary:
This modifies our object popularity mechanism to account for the size of the
objects being downloaded. Indeed, considering our bottleneck is bandwidth, we
forcing similar routing for 10 downloads of a 10MB object and 10 downloads of a
1GB object doesn't make that much sense.
This diffs updates our counting so that we now record the object size instead
of a count. I'll set up routing so that we disallow consistent routing when a
single object exceeds 250MiB/s of throughput ( = 1/4th of a task).
It's worth noting that this will be equivalent to what we have right now for
our most problematic objects (GraphQL schemas in Fbsource, 35M each), given
that we "unroute" at 150 requests / 20 seconds (`150 * 35 / 20 = 262`).
The key difference here is that this will work for all objects.
This does mean LFS needs to cache and know about content metadata. That's not
actually a big deal. Indeed, over a week, we serve 25K distinct objects
(https://fburl.com/scuba/mononoke_lfs/a2d26s1a), so considering content
metadata is a hundred bytes (and so is the key), we are looking at a few MBs of
cache space here.
As part of this, I've also refactored our config handling to stop duplicating
structures in Configerator and LFS by using the Thrift objects directly (we
still have a few dedicated structs when post-processing is necessary, but we
no longer have anything that deserializes straight from JSON).
Note that one further refinement here would be to consistently route but to
more tasks (i.e. return one of 2 routing tokens for an object that is being downloaded
at 500MiB/s). We'll see if we need that.
Reviewed By: HarveyHunt
Differential Revision: D24361314
fbshipit-source-id: 49e1f86cf49357f60f1eac298a753e0c1fcbdbe5
Summary:
D24761026 (caa684450f) formatted the default cachelib size with the specifier
`{:3}`. This specifier pads the left side of the string with spaces if there
are less than 3 digits.
Unfortunately, this means that attempting to parse the string into an `f64`
fails. Here's a minimal example:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=11a1e819f9919f7d02565cb8fa561b85
Remove the format specifier and instead call `.to_string()`.
Reviewed By: ahornby
Differential Revision: D25302079
fbshipit-source-id: 461dd628a312a967f6cf5958d2e5d51b72b0ffd8
Summary:
Right now this is the same as our Filestore threshold so we always attempt to
compress Filestore chunks even though they were designed to fit in cache.
Whoops.
Reviewed By: ikostia
Differential Revision: D25274080
fbshipit-source-id: f17b54710fc36ca7c11c74247d038bf73777f7f9
Summary:
Cachelib wasn't initialized so a production configuration would crash because
the cache pools wouldn't be found.
Reviewed By: singhsrb
Differential Revision: D25291880
fbshipit-source-id: 382db0efda072b11da587e863566e816bd5393ca
Summary: Convert `ChangesetFetcher` to new type futures
Reviewed By: StanislavGlebik
Differential Revision: D25244213
fbshipit-source-id: 4207386d81397a930a566db008019bb8f31bf602
Summary:
The `repo.names` namespace lookup should not return `[None]`. Detect and avoid
that to avoid crashes.
Reviewed By: singhsrb
Differential Revision: D25261462
fbshipit-source-id: c7a0bba557717ab292a05d753569c68112e7f0b0
Summary:
If skeleton manifests are not enabled for a repository, ignore the tunables
that delay casefolding checks until land-time.
Reviewed By: StanislavGlebik
Differential Revision: D25271607
fbshipit-source-id: dcaf9291da31d0f57b3b632888ed688ecd6cecda
Summary:
Create `BlobstoreMapping` as a trait with the common implementations for
derived data mappings that are stored in the blobstore.
Reviewed By: StanislavGlebik
Differential Revision: D25099915
fbshipit-source-id: 8a62fbb809918045336944c8cd3584b109811012
Summary:
Batch derivation was disabled as it caused problems for other derived data types.
This was because the default batch implementation was wrong: it attempted to derive
potentially linear stacks concurrently, which would cause O(n^2) derivations.
Fix the default implementation to be a naive sequential iteration, and re-enable
batch derivation for fsnodes and skeleton manifests.
Reviewed By: StanislavGlebik
Differential Revision: D25218195
fbshipit-source-id: 730555829f092cc36e1c81cf02c2b4962a3904da
Summary:
On the second and subsequent linear stack in a batch, the parent fsnode may
have been derived in the previous iteration of the loop. Since we haven't
completed this batch yet, the mappings have not been stored, and so the
attempt to derive the parent will result in the parent being derived again,
repeating the work of this batch sequentially.
Apply the optimization used in skeleton manifests and fetch the parent fsnodes
out of the result we are accumulating.
Reviewed By: StanislavGlebik
Differential Revision: D25218194
fbshipit-source-id: 5cc49204b53984f8aa73542f1a794a6251eb2b2e
Summary:
Similar to fsnodes, allow skeleton manifests to be derived in parallel in large
batches by splitting the changesets into linear stacks with no internal
conflicts, and deriving each changeset in that batch in parallel.
Reviewed By: StanislavGlebik
Differential Revision: D25218196
fbshipit-source-id: e578de9ffd472e732abb1e2ef9cd19c073280cd4
Summary:
The Mononoke SCS tests can be flaky as the server starts to accept SSL
connections some time before it is fully ready to accept requests, which can
result in request timeouts.
Use a simple request to test for the server being fully up and ready to
handle requests.
Reviewed By: krallin
Differential Revision: D25250245
fbshipit-source-id: b79106ed51e5163ebe5cd1db7b0deaab0035b9bc
Summary: Update the few test cases remaining on older option to the newer option
Reviewed By: ikostia
Differential Revision: D25219710
fbshipit-source-id: 50af0dcac7ed980ec4c7180cf81e3c00ecc18b95
Summary: Remove this now it is the walker default. Makes command lines shorter
Reviewed By: ikostia
Differential Revision: D25219551
fbshipit-source-id: bc5ad4237cad35218a0b4c54aa81eb20edb3f3e1
Summary:
This will reduce boilerplate command line for the walker, as most of the time we want to run it with readonly storage
Because the existing --readonly-storage flag can't take a value this introduces a new --with-readonly-storage=<true|false> option
Reviewed By: krallin
Differential Revision: D25216871
fbshipit-source-id: e1b83b428a9c3787f48c18fd396d23ac95991b77
Summary:
Previously needed to pass in cachelib settings once to MononokeAppBuilder and once to parse_and_init_cachelib.
This change adds a MononokeClapApp and MononokeMatches that preserve the settings, thus preventing the need to pass them in twice (and thus avoiding possible inconsistency)
MononokeMatches uses MaybeOwned to hold the inner ArgMatches, which allows us to hold both the usual reference case from get_matches and an owned case for get_matches_from which is used in test cases.
Reviewed By: krallin
Differential Revision: D24788450
fbshipit-source-id: aad5fff2edda305177dcefa4b3a98ab99bc2d811
Summary:
Show cachelib cache_size default in --help usage so its clear what you'll get if no command line args passed
Because we need to convert from bytes to GiB, the lifetime of the help string isn't long enough for clap's reference recieving default_value, so use OnceCell to be able to pass a static reference.
Reviewed By: krallin
Differential Revision: D24761026
fbshipit-source-id: 81b5e7ceb832d5cb55cc9faef59c5e6432f7c4b0
Summary:
Move expected_item_size_byte into CachelibSettings, seems like it should be there.
To enable its use also exposes a parse_and_init_cachelib method for callers that have different defaults to default cachelibe settings.
Reviewed By: krallin
Differential Revision: D24761024
fbshipit-source-id: 440082ab77b5b9f879c99b8f764e71bec68d270e
Summary:
We moved to APFS subvolumes a while back, so it's time
to kill this bit.
Reviewed By: chadaustin
Differential Revision: D25260825
fbshipit-source-id: 7487fb6d477f2650e2aedb0c4dfc4f45a10c9807
Summary:
Introduce `edenapithin` crate, which offers C bindings to EdenAPI Client.
There are two top-level `owned` types, `EdenApiClient` and `TreeEntryFetch`, which represent `Box`ed results from calls to EdenAPI's blocking client. The C / C++ code is responsible for calling the associated `free` functions for these types, as well as `OwnedString`, which is used to represent error variants of a `Result` as a string.
Most functionality is provided through functions which operate on and return references into these top-level owned types, providing access into Rust `Result`s and `Vec`s (manually-monomorphized), and EdenApi's `TreeEntry` and `TreeChildEntry`.
Additional non-pointer types are defined in the `types` module, which do not require manual memory management.
C++ bindings are not included currently, but will be introduced soon.
Reviewed By: fanzeyi
Differential Revision: D24866065
fbshipit-source-id: bb15127b84cdbc6487b2d0e1798f37ef62e5b32d
Summary:
Introduce a new API type, `TreeAttributes`, corresponding to the existing type `WireTreeAttributesRequest`, which exposes which optional attributes are available for fetching. An `Option<TreeAttributes>` parameter is added to the `trees` API, and if set to `None`, the client will make a request with TreeAttributes::default().
The Python bindings accept a dictionary for the attributes parameter, and any fields present will overwrite the default settings from TreeAttributes::default(). Unrecognized attributes will be silently ignored.
Reviewed By: kulshrax
Differential Revision: D25041255
fbshipit-source-id: 5c581c20aac06eeb0428fff42bfd93f6aecbb629