Commit Graph

157 Commits

Author SHA1 Message Date
Xavier Deguillard
1c20e574cc telemetry: fix typo in counter name
Summary:
Both the getBlobMetadata and fetchBlobMetadata were publishing to the same counter, making them indistinguishable.

Created from CodeHub with https://fburl.com/edit-in-codehub

Reviewed By: genevievehelsel

Differential Revision: D46343472

fbshipit-source-id: 04e09372c21dd5cb09336259ff34a5394d10d3b4
2023-05-31 20:38:26 -07:00
Xavier Deguillard
fdbf7eadfb store: add telemetry for LocalStore operations
Summary:
We've seen a case where a large `hg update` was taking an absurdly long time in
`ObjectStore::getTree` but the telemetry was showing us that most of the time
wasn't spent fetching trees from Mercurial! The suspicion is that most of the
time was spent in the LocalStore but with no evidence to prove it.

Let's thus add some timing telemetry to various LocalStore read requests to
fill this gap.

Reviewed By: mshroyer

Differential Revision: D46154456

fbshipit-source-id: b439ac48889ed3db71db136ff6c1cc91f48c926a
2023-05-25 15:48:11 -07:00
Xavier Deguillard
e73ab91be0 store: add telemetry for where Tree were obtained from
Summary:
Telemetry exist for where Blob and BlobMetadata are coming from, but not for
Tree. Let's bring the Tree telemetry up to par.

Reviewed By: genevievehelsel

Differential Revision: D46154455

fbshipit-source-id: ab3c31d55b6a91009289b3b07853fa574bbaa137
2023-05-24 17:08:42 -07:00
Chad Austin
8c7994003d break EdenServer.h's dependency on EdenMount.h
Reviewed By: kmancini

Differential Revision: D45924234

fbshipit-source-id: e979f7e3c4e902c48c7ea8b9ea8eb0bc5565a572
2023-05-18 09:44:50 -07:00
Chad Austin
3f9cee4f2a remove all but one inclusion of folly/String.h in headers
Summary:
We almost never intend to include folly/String.h, especially in
headers, and it's a somewhat expensive header.

We sometimes use the string transformation functions in .cpp files,
but rarely in .h. Therefore, remove that dependency where we can.

Reviewed By: genevievehelsel

Differential Revision: D45672957

fbshipit-source-id: 11743156388aff5c61cfe6b46e385a95687a7a31
2023-05-11 11:10:01 -07:00
Chad Austin
641d1544df fix getdeps build
Summary: This include is no longer necessary, so we can remove it.

Reviewed By: genevievehelsel

Differential Revision: D45620895

fbshipit-source-id: d8d9ae00cc38ad1331f8f6fb1f1c55b9d4276095
2023-05-09 10:27:42 -07:00
Scott Ramsby
229990b06b Use iterators when calling format_to() on memory buffers
Summary: Newer versions of fmt require that you pass an iterator rather than a direct memory buffer as an output to the `format_to()` family of functions. This does that to unblock an upgrade to a newer fmt version.

Differential Revision: D45555419

fbshipit-source-id: a4e2217299ada0ab5e56942dacc9107153e4260d
2023-05-04 15:09:43 -07:00
Chad Austin
a65d1cbd11 small clarifications to TraceBus
Summary:
TraceBus isn't easy to use correctly. Clarify some comments to make
the rules more explicit and unify the two constructors to allow
(noexcept) entry construction in place.

Reviewed By: xavierd

Differential Revision: D44657217

fbshipit-source-id: 1fa7c0e3c4ffb169be2b7b0cd1ffa2ea07dfeeb1
2023-04-05 16:27:02 -07:00
Chad Austin
7eacc91e4f use RingBuffer in ActivityBuffer to avoid allocations
Summary:
This should reduce the number of allocations when pushing and popping
events into ActivityBuffer.

Also, remove -inl.h because our clangd does not support it.

Reviewed By: xavierd

Differential Revision: D44593366

fbshipit-source-id: 7dc0f2aa457b44bebe9471edd3c7e688d09534f5
2023-04-03 23:40:55 -07:00
Xavier Deguillard
e8c214db57 store: support fetching aux data via ObjectStore
Summary:
Now that the Mercurial backingstore knows how to fetch aux data, let's thread
this through the ObjectStore.

Reviewed By: kmancini

Differential Revision: D44110102

fbshipit-source-id: c57da05066d80fee199e45b4a4223168a196e3de
2023-03-29 15:35:52 -07:00
Katie Mancini
ce0287d81c get oss windows build back to functioning
Summary:
The external OSS build is broken because thrift has a really long path and
cargo/git on Windows do not lick such long paths.

I'm not fixing that. But the external OSS build is the only one we run in CI any
more. In the meantime the internal build has broken because no one has been
watching.

There were 4 different breakages.

Reviewed By: chadaustin

Differential Revision: D44189633

fbshipit-source-id: 2eedbc2b3bbf5d1def075d99f11f2273dbb1f4ab
2023-03-29 12:52:44 -07:00
Michael Cuevas
96622ff4bd PrjFS: add LogEvent for fileNotification failures
Summary: We are running into fileNotification() failures way more often than we expect on Sandcastle (5% of certain jobs are failing). Let's add some logging here to see if the failures are also happening to users.

Reviewed By: xavierd

Differential Revision: D44350819

fbshipit-source-id: 3be7578dccc297071280f528b53270a63994a99d
2023-03-24 21:21:47 -07:00
Chad Austin
e419dd2799 remove the reference-counting overhead from DurationScope
Summary:
DurationScope has a pair of atomic reference count operations at the
beginning and end of every recorded duration. To avoid that overhead,
reference EdenStats with RefPtr and, in production EdenFS, store a
global, unowned EdenStats object.

This gives us the benefits of explicit dependencies and no global
state in tests, while avoiding atomic RMWs in the release build.

Reviewed By: xavierd

Differential Revision: D44323723

fbshipit-source-id: 1b3384d2e6a0a2959fd79774a8ba46afc4c176ca
2023-03-24 13:50:40 -07:00
Xavier Deguillard
d1d5134f70 prjfs: add telemetry for invalidation times
Summary:
During checkout, these 2 are called a large amount of time, let's make sure we
have some telemetry so we can understand how much time is spent doing
invalidation during checkout.

Reviewed By: chadaustin

Differential Revision: D44311711

fbshipit-source-id: 5af62fe7fd4b37972458bc545bfaa2f4b4d2ca53
2023-03-22 20:04:41 -07:00
Xavier Deguillard
cacde4e855 telemetry: fix typo for hasOverlayFile counter
Summary: The counter should be has_overlay_file, not has_overlay_dir.

Reviewed By: fanzeyi

Differential Revision: D44274455

fbshipit-source-id: 55563a0da14ec0ef1a089e3073cc8fd290d2f35d
2023-03-21 17:05:40 -07:00
Michael Cuevas
c1e781d702 add logged_by field to edenfs events.
Summary: Add this so we know where events in edenfs_events came from (Python CLI, Rust CLI, or the daemon).

Reviewed By: mshroyer

Differential Revision: D44044474

fbshipit-source-id: e23af5e186121657dfabf25c5c50882cc9aec923
2023-03-21 15:50:30 -07:00
Xavier Deguillard
fecc45e91e inodes: recover from corrupted sqlite database
Summary:
We've seen several cases where the sqlite database is corrupted causing EdenFS
to fail to start and requiring manual remediation. On Windows, we can always
reconstruct the sqlite database from scratch due to FSCK being able to build it
from scratch. Thus, we can simply delete the database on disk and continue
starting up.

Reviewed By: chadaustin

Differential Revision: D44155034

fbshipit-source-id: de05c814796ab8f76fd3cd9a3e98df438431c657
2023-03-17 21:38:48 -07:00
Xavier Deguillard
27e7b2b340 telemetry: remove unused stats
Summary: These are unused, let's remove them.

Reviewed By: chadaustin

Differential Revision: D44110103

fbshipit-source-id: 7c292f7be0a79f645afc80ef3e6f405e7edbf282
2023-03-17 14:13:02 -07:00
Xavier Deguillard
8eb72c47a7 store: allow batch of aux data to be fetched
Summary:
Nothing is being enqueued just yet, but this teaches EdenFS how to fetch
batches of aux data.

In the case where the aux data cannot be fetched, a nullptr auxdata is returned
to the higher level which will then trigger a blob fetch to compute aux data.

Reviewed By: chadaustin

Differential Revision: D44110104

fbshipit-source-id: 53df1496addf3a9dae521ffcdba5b060b8fce16a
2023-03-17 14:13:02 -07:00
Xavier Deguillard
36407b2f7e inodes: add ODS telemetry for the Overlay
Summary:
Our telemetry doesn't capture any data regarding how long Overlay operations
are taking. Since there is a suspicion that checkout is bounded on the Overlay,
let's add these counters to better understand the cost of the Overlay.

Reviewed By: genevievehelsel

Differential Revision: D44118779

fbshipit-source-id: 2e8658c9b0629a0a4f9c6f4535ec876527386b5f
2023-03-17 11:41:26 -07:00
Xavier Deguillard
5479f3c911 inode: enhance checkout telemetry
Summary:
Understanding how many trees and blobs were accessed but not fetched during
checkout is a useful metric that informs us as to how much was served from
caches. This is useful as checkout is more or less proportional in time to the
number of accessed inodes and the number of accessed trees+blobs will thus tell
us how many TreeInode and FileInode were loaded and traversed.

Reviewed By: chadaustin

Differential Revision: D44117153

fbshipit-source-id: 56991c51f2e4f501486d64ab5598f149f2708b77
2023-03-16 11:27:06 -07:00
Michael Cuevas
10ee757717 log SubprocessScribeLogger failures as ERR
Reviewed By: genevievehelsel

Differential Revision: D43803735

fbshipit-source-id: 22e09ce7d7d179e06b8fef7c3c37f21731574d8c
2023-03-07 02:32:19 -08:00
Michael Cuevas
2e9638d1d8 use EdenStats in StructuredLoggerFactory
Summary:
As pointed out in the comments of StructuredLoggerFactory, the previous logic wasn't actually recording anything to ODS, it was simply incrementing a local counter.

Instead, let's use EdenStats + EdenFB303Collector to ensure the counter is recorded to ODS correctly.

Reviewed By: chadaustin

Differential Revision: D43803768

fbshipit-source-id: ecd277766e0d8106c1e573a787b6a65aa0f3fd28
2023-03-07 02:32:19 -08:00
Xavier Deguillard
5793abfa0b service: add telemetry for working copy GC
Summary:
This will allow to better understand how many inodes were invalidated as well
as how long GC ran for.

Reviewed By: mshroyer

Differential Revision: D43199251

fbshipit-source-id: 5e598ba9ac02fe32b999284eba0baa7aa64d3770
2023-02-24 10:48:53 -08:00
Michael Cuevas
714e5d1a9c avoid failing on startup when telemetry fails
Summary: Telemetry isn't strictly required when starting Eden, so let's not throw an exception when we fail to start our subprocess logger.

Reviewed By: xavierd

Differential Revision: D43544234

fbshipit-source-id: 36d5a538eb43b3fb880e047229cbfa4906d2796b
2023-02-23 13:45:33 -08:00
Katie Mancini
0a2ccea5ff add counter for filesystem sync duration
Summary:
Currently EdenFs does not realize that truncated files have changed because
PrjFS sends a "file closed no modification" notification instead of a
"file closed and modified" notification.

Microsoft is fixing this, but in the meantime to work around this we are
listening to a new notification "file about to be converted to a full file".

This notification is sent anytime a placeholder is converted to a full file.
So we are going to be generally listening to more notifications. Notifications
are handled on a background thread, so there should not be a big filesystem io
perfomance hit from this.

However, we do wait to process all notifications before we answer thrift
requests. more notifications could mena this slows down. so I am adding a counter
here to be able to directly monitor this.

Mike's rolling out ODS counter collection on windows currently (D43198806), so
we should be able to monitor this to make sure we are not taking to severe a
performance hit.

Reviewed By: MichaelCuevas

Differential Revision: D43286103

fbshipit-source-id: 46a10c0ed1f1447bff48a9f82fc092939ebdaab0
2023-02-15 14:15:58 -08:00
Katie Mancini
cb5244b09f listen to PRE_CONVERT_TO_FULL
Summary:
Andrew Krieger reported that eden doesn't seem to notice when files are
truncated on windows. And it repros pretty easily:

```
fbclone fbsource tmpfbsource
cd tmpfbsource/arvr/libraries/avatar/Libraries/asset/items/gltf
Clear-Content src/GltfCompactSkinningData.cpp
hg st
# empty
```

Eden is getting a file closed and NOT modified notification. Microsoft has
confirmed that this is unexpected and would fix it.

But in the meantime there is something we can do to work around it. We do get
the notification "pre convert to full". Full files should be materialized in
EdenFS, so we can use this notification to mark the inode as materialized early.

Handing notifications is async, so this should not effect performance.
Benchmarking creating a lot of files, shows that this doesn't seem to make much
of a difference.

We will roll this out with a config, and then roll it back once microsoft's fix
rolls out.

Reviewed By: xavierd

Differential Revision: D42465547

fbshipit-source-id: cc41fa8c7ad7fbf96ae4a9af5b3a7e6f477fc449
2023-02-14 16:31:47 -08:00
Zeyi (Rice) Fan
0b7baa9025 telemetry: add hasSubscription to TraceBus and optimize TaskTraceBlock
Summary: This diff adds a `TraceBus::hasSubscription` method so we can quickly check if a TraceBus has any subscriber. This way we can avoid paying the cost of constructing tracing events if there is no subscriber.

Reviewed By: chadaustin

Differential Revision: D41526562

fbshipit-source-id: f89f43ee7739149d51c6a483bd582c7c93e9fcd9
2023-01-18 16:41:25 -08:00
Zeyi (Rice) Fan
c0de090e8e trace: introduce TaskTraceBlock for tracing tasks
Summary:
When investigating the `QueueTimeout` issue with `globFiles` in EdenFS, it is handy to understand which thread is a code block being executed on. `folly::Future/SemiFuture` is subtle and easy to make mistakes when it comes to thread scheduling.

This diff introduces a new type of `TraceBus` that generates events based on `TaskTraceBlock` -- a scope guard style tracing utility that will report which thread the given block is executed on, along with some other interesting information.

The trace_stream binary is going to produce a Chrome JSON trace format so you can easily analyze the result with tools like `chrome://tracing` or Perfetto. (Note: `task` mode produces JSON for inspection, `task-chrome-trace` will produce Chrome JSON trace format).

Reviewed By: xavierd

Differential Revision: D41390587

fbshipit-source-id: 7fdc19b9ec87318f4c6dc8b196153ffb2568c8ba
2023-01-18 16:41:25 -08:00
Xavier Deguillard
efedbe98bd service: add telemetry when e-menu fails to start
Summary:
Now that EdenFS will ignore failures to start the e-menu, we should still keep
track of how often this occurs so we can alert on it if the failure rate is too
high and correct the issue proactively.

Reviewed By: MichaelCuevas

Differential Revision: D42273427

fbshipit-source-id: d86033e3f2811eef840711ff401f79210370d81b
2023-01-03 15:33:33 -08:00
Xavier Deguillard
4346acbdce telemetry: enable SubprocessScribeLogger on Windows
Summary:
I was looking at adding telemetry for when the E-Menu fails to start in
D42272618 (eda170e8b4) but remembered that in-process telemetry wasn't enabled on Windows.
Attempting to make it compile succeeded and with a trivial tweak was even able
to get telemetry!

My understanding as to why this "just works" is probably historical: this code
uses to rely on folly::Subprocess which only runs on Linux and macOS as it
relies on fork+exec. However, the code was switched to use SpawnProcess which
also runs on Windows, and thus removing the ifdef was sufficient.

Reviewed By: chadaustin

Differential Revision: D42273007

fbshipit-source-id: 8377328599349da0687414314ea13d2a54bbce94
2023-01-03 15:33:33 -08:00
Mark Shroyer
a9322aeae7 Restore metadata size mismatch logging
Summary:
This re=implements the backed-out D41393677 (6384a0f39a) with some changes:

  - Only log if the size metadata is _less than_ the size of the file contents
    we're returning.  This way we avoid erroneously logging partial reads, as
    happened in T139036919.

  - Only log once per Nfsd3ServerProcessor instance.

Reviewed By: chadaustin

Differential Revision: D41751040

fbshipit-source-id: 2503ecb488b0600bdeabbb11e532e4590edbd280
2022-12-14 03:18:23 -08:00
Chad Austin
2c1bf63afe add a get_tree and get_blob counter
Summary: Add store.hg duration counters that cover indexedlog hits.

Reviewed By: xavierd

Differential Revision: D41566559

fbshipit-source-id: 94cb35f59527a2b2bdcb5e8e29f2793b19830a68
2022-12-01 15:02:14 -08:00
Chad Austin
4c23260289 rename the inaccurate hg get_blob and get_tree counters to fetch_blob and fetch_tree
Summary:
get_tree and get_blob were misleading counters. They did not include
reads from disk. In preparation for adding encompassing get_tree and
get_blob counters, rename to fetch_tree and fetch_blob.

Reviewed By: xavierd

Differential Revision: D41566434

fbshipit-source-id: eae98035f4d0c366f58c6c22eda42ae5c537c8e4
2022-12-01 15:02:14 -08:00
Mark Shroyer
83bb997ee9 Back out "Log metadata size mismatch on NFS READ"
Summary:
Original commit changeset: 968c2866d8fd

Original Phabricator Diff: D41393677 (6384a0f39a)

Reviewed By: xavierd

Differential Revision: D41624043

fbshipit-source-id: 051981d201d28905d97385012eecf58e75ce8634
2022-11-30 14:56:35 -08:00
Mark Shroyer
6384a0f39a Log metadata size mismatch on NFS READ
Summary:
Log to text logs and Scuba if there's an NFS READ where the size of the file
contents didn't match the size in metadata.  This way we can see whether size
mismatches are still happening much in practice, in order to know how to
prioritize further investigation.

Based on kmancini's D41239395, but adds Scuba logging too.

Reviewed By: chadaustin, kmancini

Differential Revision: D41393677

fbshipit-source-id: 968c2866d8fd353e89ad871080a57308cacdc102
2022-11-28 14:42:03 -08:00
Xavier Deguillard
b7d87d807a utils: deny building an AbsolutePath from a literal
Summary:
We've had a couple of failures after landing D40818724 (f199e93924) due to some places still
building AbsolutePath from a literal. On Windows, this is almost always a bug
due to paths usually not being UNC.

To avoid this issue, we can simply forbid an AbsolutePath from being built in
the first place from a path, instead forcing the use of canonicalPath. This was
suggeted by mshroyer in the orignal diff.

Reviewed By: chadaustin

Differential Revision: D41229194

fbshipit-source-id: 3451bdcba276c87f98326b025e05f6a4eddbd1b7
2022-11-22 16:20:30 -08:00
Chad Austin
f9688dece5 remove finishRequest because it can be a destructor
Summary:
Almost by definition, any deleter of the form `ptr->method(); delete
ptr` can have `method` moved into the destructor. Make that
refactoring to RequestContext so we can manage its lifetime without
`std::shared_ptr`.

Reviewed By: kmancini

Differential Revision: D40312841

fbshipit-source-id: 5795b2a9064a8a6cf829f02114e3dcc9edf4a4a8
2022-10-25 14:09:02 -07:00
Genevieve (Genna) Helsel
2f059592f1 log if a ** glob was requested
Summary: Sometimes its hard to debug how a watchman query is sent to EdenFS, this logs known expensive queries to the local log file as well as externally using the `StructuredLogger`

Reviewed By: kmancini

Differential Revision: D38961075

fbshipit-source-id: 51234867f7a9eb97a379860eddb584bf3a970013
2022-10-14 09:31:06 -07:00
Chad Austin
e48d9fe586 stop recording quantile stats for counters
Summary:
We record too many time series. In particular, while quantile stats
are useful for durations (or any statistic that represents a gauge or
level value), we don't need them for counters. For counters, we're
primarily interested in sums and rates.

Therefore, stop recording quantiles for counters.

Reviewed By: genevievehelsel

Differential Revision: D40117924

fbshipit-source-id: f89a8b74277a972eaca046d636740c97a6d1bda8
2022-10-05 15:19:10 -07:00
Chad Austin
6857ef5676 split FsChannelStats into FuseStats, NfsStats, PrjfsStats
Summary:
This allows us to remove ifdefs and begin enforcing consistent
prefixes in StatsGroup.

Reviewed By: genevievehelsel

Differential Revision: D39782327

fbshipit-source-id: c453e0ada5d5d9d3b221c707f67dce7d1e0a970d
2022-09-30 16:48:23 -07:00
Chad Austin
7ad33cdcde remove the redundant 'hgBackingStore' prefix from the counter names in HgBackingStoreStats
Reviewed By: genevievehelsel

Differential Revision: D39780986

fbshipit-source-id: 803ecdfabe7840c7f2f4f583239f0f6e7fc12998
2022-09-30 16:48:23 -07:00
Chad Austin
2637f83999 add a counter for aux metadata misses
Summary:
It semes likely that many of EdenFS's recent performance woes are
caused by the hg backing store not returning aux metadata. To catch
these situations sooner going forward, add a counter for aux metadata
lookup misses.

Reviewed By: genevievehelsel

Differential Revision: D39780833

fbshipit-source-id: 94653fb56509ef7564e64ff3bd822a6027bbb8db
2022-09-30 16:48:23 -07:00
Chad Austin
b1a29d9401 add a counter for proxy hash loads
Summary:
Before we can delete support for proxy hashes, it's important to know
how often they're accessed. Add a counter which records each time we
attempt to read a proxy hash from LocalStore.

Reviewed By: xavierd

Differential Revision: D39742152

fbshipit-source-id: a9a28be0b1724b84bb902a3693f3ec3f4b71eea9
2022-09-30 11:35:05 -07:00
Chad Austin
4c5ed8a35a break ActivityRecorder dependency on EdenStats
Summary:
ActivityRecorder's use of Stat can be replaced with direct usage of
fb303 QuantileStatsWrapper.

Reviewed By: xavierd

Differential Revision: D39713538

fbshipit-source-id: 3b354eefa3d9b4f63f9915d921fbefb8f14a11d3
2022-09-30 11:35:05 -07:00
Chad Austin
50b9eafa6b remove the thread-local implementation detail from the public EdenStats API
Summary:
The public EdenStats API was both too chatty and exposed too many
implementation details. Don't mention thread-locals in the public API,
and replace uses of `addValue` with explicit `addDuration` on Duration
and `increment` on Counter.

Reviewed By: genevievehelsel

Differential Revision: D39661497

fbshipit-source-id: 8f6f2a9d72a4c385cde32e72fba0cebf8f5b4274
2022-09-30 11:35:05 -07:00
Chad Austin
a504c0010e remove some StatsGroup boilerplate
Summary:
This is a small change that paves the way for the future brevity
improvements.

Reviewed By: genevievehelsel

Differential Revision: D39661360

fbshipit-source-id: 282e410a18f12841b7aa310dc0ad211377352b3a
2022-09-30 11:35:05 -07:00
Chad Austin
88412a776e rename EdenThreadStats to StatsGroup and use a type-indexed accessor
Summary:
StatsGroup more clearly indicate these structs exist to semantically
group statistics.

Reviewed By: genevievehelsel

Differential Revision: D39642306

fbshipit-source-id: 9914cc4f770dcfd2ff6ac85d23b5d54be2c3795e
2022-09-30 11:35:05 -07:00
Chad Austin
c87d4d24d1 record overall ObjectStore lookup durations
Summary:
Add an overall duration statistic around ObjectStore::getTree,
ObjectStore::getBlob, and ObjectStore::getBlobMetadata. We have
statistics for individual BackingStore requests, but not ObjectStore,
including the cache hit times.

Reviewed By: MichaelCuevas

Differential Revision: D39640315

fbshipit-source-id: e4c1a2d2ad9f553fe32029846cab167866230fcc
2022-09-30 11:35:05 -07:00
Chad Austin
142166abaf light EdenStats refactoring
Summary:
Rename "channel" to "fs channel", rename StatPtr to DurationStatPtr,
and generalize DurationStatPtr to every stat category.

Reviewed By: xavierd

Differential Revision: D39631597

fbshipit-source-id: 147f9a9d9fe97a39c0d38ccecf57e73c93af0d8c
2022-09-23 17:45:28 -07:00