Summary:
Both the getBlobMetadata and fetchBlobMetadata were publishing to the same counter, making them indistinguishable.
Created from CodeHub with https://fburl.com/edit-in-codehub
Reviewed By: genevievehelsel
Differential Revision: D46343472
fbshipit-source-id: 04e09372c21dd5cb09336259ff34a5394d10d3b4
Summary:
We've seen a case where a large `hg update` was taking an absurdly long time in
`ObjectStore::getTree` but the telemetry was showing us that most of the time
wasn't spent fetching trees from Mercurial! The suspicion is that most of the
time was spent in the LocalStore but with no evidence to prove it.
Let's thus add some timing telemetry to various LocalStore read requests to
fill this gap.
Reviewed By: mshroyer
Differential Revision: D46154456
fbshipit-source-id: b439ac48889ed3db71db136ff6c1cc91f48c926a
Summary:
Telemetry exist for where Blob and BlobMetadata are coming from, but not for
Tree. Let's bring the Tree telemetry up to par.
Reviewed By: genevievehelsel
Differential Revision: D46154455
fbshipit-source-id: ab3c31d55b6a91009289b3b07853fa574bbaa137
Summary:
We almost never intend to include folly/String.h, especially in
headers, and it's a somewhat expensive header.
We sometimes use the string transformation functions in .cpp files,
but rarely in .h. Therefore, remove that dependency where we can.
Reviewed By: genevievehelsel
Differential Revision: D45672957
fbshipit-source-id: 11743156388aff5c61cfe6b46e385a95687a7a31
Summary: This include is no longer necessary, so we can remove it.
Reviewed By: genevievehelsel
Differential Revision: D45620895
fbshipit-source-id: d8d9ae00cc38ad1331f8f6fb1f1c55b9d4276095
Summary: Newer versions of fmt require that you pass an iterator rather than a direct memory buffer as an output to the `format_to()` family of functions. This does that to unblock an upgrade to a newer fmt version.
Differential Revision: D45555419
fbshipit-source-id: a4e2217299ada0ab5e56942dacc9107153e4260d
Summary:
TraceBus isn't easy to use correctly. Clarify some comments to make
the rules more explicit and unify the two constructors to allow
(noexcept) entry construction in place.
Reviewed By: xavierd
Differential Revision: D44657217
fbshipit-source-id: 1fa7c0e3c4ffb169be2b7b0cd1ffa2ea07dfeeb1
Summary:
This should reduce the number of allocations when pushing and popping
events into ActivityBuffer.
Also, remove -inl.h because our clangd does not support it.
Reviewed By: xavierd
Differential Revision: D44593366
fbshipit-source-id: 7dc0f2aa457b44bebe9471edd3c7e688d09534f5
Summary:
Now that the Mercurial backingstore knows how to fetch aux data, let's thread
this through the ObjectStore.
Reviewed By: kmancini
Differential Revision: D44110102
fbshipit-source-id: c57da05066d80fee199e45b4a4223168a196e3de
Summary:
The external OSS build is broken because thrift has a really long path and
cargo/git on Windows do not lick such long paths.
I'm not fixing that. But the external OSS build is the only one we run in CI any
more. In the meantime the internal build has broken because no one has been
watching.
There were 4 different breakages.
Reviewed By: chadaustin
Differential Revision: D44189633
fbshipit-source-id: 2eedbc2b3bbf5d1def075d99f11f2273dbb1f4ab
Summary: We are running into fileNotification() failures way more often than we expect on Sandcastle (5% of certain jobs are failing). Let's add some logging here to see if the failures are also happening to users.
Reviewed By: xavierd
Differential Revision: D44350819
fbshipit-source-id: 3be7578dccc297071280f528b53270a63994a99d
Summary:
DurationScope has a pair of atomic reference count operations at the
beginning and end of every recorded duration. To avoid that overhead,
reference EdenStats with RefPtr and, in production EdenFS, store a
global, unowned EdenStats object.
This gives us the benefits of explicit dependencies and no global
state in tests, while avoiding atomic RMWs in the release build.
Reviewed By: xavierd
Differential Revision: D44323723
fbshipit-source-id: 1b3384d2e6a0a2959fd79774a8ba46afc4c176ca
Summary:
During checkout, these 2 are called a large amount of time, let's make sure we
have some telemetry so we can understand how much time is spent doing
invalidation during checkout.
Reviewed By: chadaustin
Differential Revision: D44311711
fbshipit-source-id: 5af62fe7fd4b37972458bc545bfaa2f4b4d2ca53
Summary: The counter should be has_overlay_file, not has_overlay_dir.
Reviewed By: fanzeyi
Differential Revision: D44274455
fbshipit-source-id: 55563a0da14ec0ef1a089e3073cc8fd290d2f35d
Summary: Add this so we know where events in edenfs_events came from (Python CLI, Rust CLI, or the daemon).
Reviewed By: mshroyer
Differential Revision: D44044474
fbshipit-source-id: e23af5e186121657dfabf25c5c50882cc9aec923
Summary:
We've seen several cases where the sqlite database is corrupted causing EdenFS
to fail to start and requiring manual remediation. On Windows, we can always
reconstruct the sqlite database from scratch due to FSCK being able to build it
from scratch. Thus, we can simply delete the database on disk and continue
starting up.
Reviewed By: chadaustin
Differential Revision: D44155034
fbshipit-source-id: de05c814796ab8f76fd3cd9a3e98df438431c657
Summary:
Nothing is being enqueued just yet, but this teaches EdenFS how to fetch
batches of aux data.
In the case where the aux data cannot be fetched, a nullptr auxdata is returned
to the higher level which will then trigger a blob fetch to compute aux data.
Reviewed By: chadaustin
Differential Revision: D44110104
fbshipit-source-id: 53df1496addf3a9dae521ffcdba5b060b8fce16a
Summary:
Our telemetry doesn't capture any data regarding how long Overlay operations
are taking. Since there is a suspicion that checkout is bounded on the Overlay,
let's add these counters to better understand the cost of the Overlay.
Reviewed By: genevievehelsel
Differential Revision: D44118779
fbshipit-source-id: 2e8658c9b0629a0a4f9c6f4535ec876527386b5f
Summary:
Understanding how many trees and blobs were accessed but not fetched during
checkout is a useful metric that informs us as to how much was served from
caches. This is useful as checkout is more or less proportional in time to the
number of accessed inodes and the number of accessed trees+blobs will thus tell
us how many TreeInode and FileInode were loaded and traversed.
Reviewed By: chadaustin
Differential Revision: D44117153
fbshipit-source-id: 56991c51f2e4f501486d64ab5598f149f2708b77
Summary:
As pointed out in the comments of StructuredLoggerFactory, the previous logic wasn't actually recording anything to ODS, it was simply incrementing a local counter.
Instead, let's use EdenStats + EdenFB303Collector to ensure the counter is recorded to ODS correctly.
Reviewed By: chadaustin
Differential Revision: D43803768
fbshipit-source-id: ecd277766e0d8106c1e573a787b6a65aa0f3fd28
Summary:
This will allow to better understand how many inodes were invalidated as well
as how long GC ran for.
Reviewed By: mshroyer
Differential Revision: D43199251
fbshipit-source-id: 5e598ba9ac02fe32b999284eba0baa7aa64d3770
Summary: Telemetry isn't strictly required when starting Eden, so let's not throw an exception when we fail to start our subprocess logger.
Reviewed By: xavierd
Differential Revision: D43544234
fbshipit-source-id: 36d5a538eb43b3fb880e047229cbfa4906d2796b
Summary:
Currently EdenFs does not realize that truncated files have changed because
PrjFS sends a "file closed no modification" notification instead of a
"file closed and modified" notification.
Microsoft is fixing this, but in the meantime to work around this we are
listening to a new notification "file about to be converted to a full file".
This notification is sent anytime a placeholder is converted to a full file.
So we are going to be generally listening to more notifications. Notifications
are handled on a background thread, so there should not be a big filesystem io
perfomance hit from this.
However, we do wait to process all notifications before we answer thrift
requests. more notifications could mena this slows down. so I am adding a counter
here to be able to directly monitor this.
Mike's rolling out ODS counter collection on windows currently (D43198806), so
we should be able to monitor this to make sure we are not taking to severe a
performance hit.
Reviewed By: MichaelCuevas
Differential Revision: D43286103
fbshipit-source-id: 46a10c0ed1f1447bff48a9f82fc092939ebdaab0
Summary:
Andrew Krieger reported that eden doesn't seem to notice when files are
truncated on windows. And it repros pretty easily:
```
fbclone fbsource tmpfbsource
cd tmpfbsource/arvr/libraries/avatar/Libraries/asset/items/gltf
Clear-Content src/GltfCompactSkinningData.cpp
hg st
# empty
```
Eden is getting a file closed and NOT modified notification. Microsoft has
confirmed that this is unexpected and would fix it.
But in the meantime there is something we can do to work around it. We do get
the notification "pre convert to full". Full files should be materialized in
EdenFS, so we can use this notification to mark the inode as materialized early.
Handing notifications is async, so this should not effect performance.
Benchmarking creating a lot of files, shows that this doesn't seem to make much
of a difference.
We will roll this out with a config, and then roll it back once microsoft's fix
rolls out.
Reviewed By: xavierd
Differential Revision: D42465547
fbshipit-source-id: cc41fa8c7ad7fbf96ae4a9af5b3a7e6f477fc449
Summary: This diff adds a `TraceBus::hasSubscription` method so we can quickly check if a TraceBus has any subscriber. This way we can avoid paying the cost of constructing tracing events if there is no subscriber.
Reviewed By: chadaustin
Differential Revision: D41526562
fbshipit-source-id: f89f43ee7739149d51c6a483bd582c7c93e9fcd9
Summary:
When investigating the `QueueTimeout` issue with `globFiles` in EdenFS, it is handy to understand which thread is a code block being executed on. `folly::Future/SemiFuture` is subtle and easy to make mistakes when it comes to thread scheduling.
This diff introduces a new type of `TraceBus` that generates events based on `TaskTraceBlock` -- a scope guard style tracing utility that will report which thread the given block is executed on, along with some other interesting information.
The trace_stream binary is going to produce a Chrome JSON trace format so you can easily analyze the result with tools like `chrome://tracing` or Perfetto. (Note: `task` mode produces JSON for inspection, `task-chrome-trace` will produce Chrome JSON trace format).
Reviewed By: xavierd
Differential Revision: D41390587
fbshipit-source-id: 7fdc19b9ec87318f4c6dc8b196153ffb2568c8ba
Summary:
Now that EdenFS will ignore failures to start the e-menu, we should still keep
track of how often this occurs so we can alert on it if the failure rate is too
high and correct the issue proactively.
Reviewed By: MichaelCuevas
Differential Revision: D42273427
fbshipit-source-id: d86033e3f2811eef840711ff401f79210370d81b
Summary:
I was looking at adding telemetry for when the E-Menu fails to start in
D42272618 (eda170e8b4) but remembered that in-process telemetry wasn't enabled on Windows.
Attempting to make it compile succeeded and with a trivial tweak was even able
to get telemetry!
My understanding as to why this "just works" is probably historical: this code
uses to rely on folly::Subprocess which only runs on Linux and macOS as it
relies on fork+exec. However, the code was switched to use SpawnProcess which
also runs on Windows, and thus removing the ifdef was sufficient.
Reviewed By: chadaustin
Differential Revision: D42273007
fbshipit-source-id: 8377328599349da0687414314ea13d2a54bbce94
Summary:
This re=implements the backed-out D41393677 (6384a0f39a) with some changes:
- Only log if the size metadata is _less than_ the size of the file contents
we're returning. This way we avoid erroneously logging partial reads, as
happened in T139036919.
- Only log once per Nfsd3ServerProcessor instance.
Reviewed By: chadaustin
Differential Revision: D41751040
fbshipit-source-id: 2503ecb488b0600bdeabbb11e532e4590edbd280
Summary:
get_tree and get_blob were misleading counters. They did not include
reads from disk. In preparation for adding encompassing get_tree and
get_blob counters, rename to fetch_tree and fetch_blob.
Reviewed By: xavierd
Differential Revision: D41566434
fbshipit-source-id: eae98035f4d0c366f58c6c22eda42ae5c537c8e4
Summary:
Log to text logs and Scuba if there's an NFS READ where the size of the file
contents didn't match the size in metadata. This way we can see whether size
mismatches are still happening much in practice, in order to know how to
prioritize further investigation.
Based on kmancini's D41239395, but adds Scuba logging too.
Reviewed By: chadaustin, kmancini
Differential Revision: D41393677
fbshipit-source-id: 968c2866d8fd353e89ad871080a57308cacdc102
Summary:
We've had a couple of failures after landing D40818724 (f199e93924) due to some places still
building AbsolutePath from a literal. On Windows, this is almost always a bug
due to paths usually not being UNC.
To avoid this issue, we can simply forbid an AbsolutePath from being built in
the first place from a path, instead forcing the use of canonicalPath. This was
suggeted by mshroyer in the orignal diff.
Reviewed By: chadaustin
Differential Revision: D41229194
fbshipit-source-id: 3451bdcba276c87f98326b025e05f6a4eddbd1b7
Summary:
Almost by definition, any deleter of the form `ptr->method(); delete
ptr` can have `method` moved into the destructor. Make that
refactoring to RequestContext so we can manage its lifetime without
`std::shared_ptr`.
Reviewed By: kmancini
Differential Revision: D40312841
fbshipit-source-id: 5795b2a9064a8a6cf829f02114e3dcc9edf4a4a8
Summary: Sometimes its hard to debug how a watchman query is sent to EdenFS, this logs known expensive queries to the local log file as well as externally using the `StructuredLogger`
Reviewed By: kmancini
Differential Revision: D38961075
fbshipit-source-id: 51234867f7a9eb97a379860eddb584bf3a970013
Summary:
We record too many time series. In particular, while quantile stats
are useful for durations (or any statistic that represents a gauge or
level value), we don't need them for counters. For counters, we're
primarily interested in sums and rates.
Therefore, stop recording quantiles for counters.
Reviewed By: genevievehelsel
Differential Revision: D40117924
fbshipit-source-id: f89a8b74277a972eaca046d636740c97a6d1bda8
Summary:
This allows us to remove ifdefs and begin enforcing consistent
prefixes in StatsGroup.
Reviewed By: genevievehelsel
Differential Revision: D39782327
fbshipit-source-id: c453e0ada5d5d9d3b221c707f67dce7d1e0a970d
Summary:
It semes likely that many of EdenFS's recent performance woes are
caused by the hg backing store not returning aux metadata. To catch
these situations sooner going forward, add a counter for aux metadata
lookup misses.
Reviewed By: genevievehelsel
Differential Revision: D39780833
fbshipit-source-id: 94653fb56509ef7564e64ff3bd822a6027bbb8db
Summary:
Before we can delete support for proxy hashes, it's important to know
how often they're accessed. Add a counter which records each time we
attempt to read a proxy hash from LocalStore.
Reviewed By: xavierd
Differential Revision: D39742152
fbshipit-source-id: a9a28be0b1724b84bb902a3693f3ec3f4b71eea9
Summary:
ActivityRecorder's use of Stat can be replaced with direct usage of
fb303 QuantileStatsWrapper.
Reviewed By: xavierd
Differential Revision: D39713538
fbshipit-source-id: 3b354eefa3d9b4f63f9915d921fbefb8f14a11d3
Summary:
The public EdenStats API was both too chatty and exposed too many
implementation details. Don't mention thread-locals in the public API,
and replace uses of `addValue` with explicit `addDuration` on Duration
and `increment` on Counter.
Reviewed By: genevievehelsel
Differential Revision: D39661497
fbshipit-source-id: 8f6f2a9d72a4c385cde32e72fba0cebf8f5b4274
Summary:
This is a small change that paves the way for the future brevity
improvements.
Reviewed By: genevievehelsel
Differential Revision: D39661360
fbshipit-source-id: 282e410a18f12841b7aa310dc0ad211377352b3a
Summary:
StatsGroup more clearly indicate these structs exist to semantically
group statistics.
Reviewed By: genevievehelsel
Differential Revision: D39642306
fbshipit-source-id: 9914cc4f770dcfd2ff6ac85d23b5d54be2c3795e
Summary:
Add an overall duration statistic around ObjectStore::getTree,
ObjectStore::getBlob, and ObjectStore::getBlobMetadata. We have
statistics for individual BackingStore requests, but not ObjectStore,
including the cache hit times.
Reviewed By: MichaelCuevas
Differential Revision: D39640315
fbshipit-source-id: e4c1a2d2ad9f553fe32029846cab167866230fcc
Summary:
Rename "channel" to "fs channel", rename StatPtr to DurationStatPtr,
and generalize DurationStatPtr to every stat category.
Reviewed By: xavierd
Differential Revision: D39631597
fbshipit-source-id: 147f9a9d9fe97a39c0d38ccecf57e73c93af0d8c