Commit Graph

33 Commits

Author SHA1 Message Date
Xavier Deguillard
37df55b270 telemetry: consolidate Fuse/PrjFS stats in ChannelThreadStats
Summary:
This helps make RequestData slightly more generic by depending less on Fuse
specific types/code.

Reviewed By: chadaustin

Differential Revision: D23467487

fbshipit-source-id: 830f8269e2114c2968dcc49d3b5574e687191e4d
2020-09-02 15:28:39 -07:00
Wez Furlong
154d7309c9 eden: introduce SpawnedProcess
Summary:
This commit introduces a new process spawning class derived
from the ChildProcess class in the watchman codebase.

`SpawnedProcess` is similar to folly::Subprocess but is designed around the
idea that we will use a system provided spawning API to start a process, rather
than assuming the use of `fork`.

`fork` is to be avoided because it can be expensive for processes with large
address spaces and also because it interacts poorly with threads on macOS.  In
particular, we see the objC runtime terminating our process in some scenarios
where fork and threads are mixed.

There are some important differences from `folly::Subprocess` and that means
that some assumptions and uses need to be altered slightly from their prior
workings.  For example, detaching a SpawnedProcess moves the responsibility of
waiting on the child to a periodic task as there is no way to detach via
posix_spawn without also using fork.

On the plus side, this commit allows unifying spawning between posix and
windows systems, which simplifies the code!

Reviewed By: xavierd

Differential Revision: D23287763

fbshipit-source-id: b662af1d7eaaa9ed445c42f6c5765ae9af975eea
2020-09-01 13:31:32 -07:00
Katie Mancini
3827b9787d add fetch type to data fetch logging
Summary:
Having the type of data fetched can help in debugging where these fetches are
comming from. In the currently logs figuring out if a data fetch is blob or
tree requires some manual work. When looking at a big bunch of fetches this is
not super practical.

So this includes this info in our logging.

Reviewed By: chadaustin

Differential Revision: D23243444

fbshipit-source-id: 9abe5180c5d2afc0d02b27ba6a6b76401e86556e
2020-08-21 17:38:14 -07:00
Xavier Deguillard
e5558221ac store: plumb CMD_CAT_TREE
Summary:
With Mercurial now supporting CMD_CAT_TREE for efficiently fetching and reading
trees, we can plumb this onto EdenFS. At startup time, we detect whether
Mercurial supports CMD_CAT_TREE and use that method, otherwise, we fallback to
the old CMD_FETCH_TREE.

Reviewed By: wez

Differential Revision: D23044953

fbshipit-source-id: 9aea5c5b82e97039a75ef18976a155dcb6e150bc
2020-08-12 08:17:25 -07:00
Xavier Deguillard
07df8faf5e win: when creating a file/directory, create the parents too
Summary:
As opposed to FUSE, ProjectedFS sends notifications for file/directory creation
after the fact, and for directory that means these will be visible on disk before
EdenFS may be aware of it. While EdenFS usually process it quickly, a heavily
multi-threaded application that tries to concurrently create a directory
hierarchy may end up sending notifications to EdenFS in a somewhat out of order
fashion.

Since this should be a very rare occurence, we make this a very slow path by
being optimistic and calling `getInode` first, and then only if that fails, we
aggressively create all the parent directories. During a buck build of ~1k
jobs, this happened only 3 times.

If we fully think this through, this change doesn't fully fix the race, as a
similar race can now happen when a create and remove/rename operations are
concurrent. However, a client performing these operations concurrently is
either aware that this is racy and should handle these properly, or is most
likely buggy. Both of these should significantly reduce the likelyhod of this
happening, thus, I'm leaving this unfixed for now.

To better understand how frequently this happens, I've added a stat counter.
For now, these aren't published to ODS, but this will be tackled later.

Reviewed By: wez

Differential Revision: D22783484

fbshipit-source-id: ea3aafc2f77b65d3967f697f68114921d5909137
2020-07-29 12:17:17 -07:00
Ailin Zhang
36fd61dbaa send fetch heavy events to Scuba
Summary: This diff updated `ObjectStore` to send a `FetchHeavy` event to Scuba when the number of fetching requests of a process has reached 2000.

Reviewed By: fanzeyi

Differential Revision: D22292104

fbshipit-source-id: b7ac48412868216ea960c8681a5fb71c587952bc
2020-07-02 07:57:15 -07:00
Xavier Deguillard
bd26254f79 eden: fix windows build
Summary:
Both optional and pid_t weren't found and the right includes needed to be
provided. On Windows, the ProcessNameCache isn't compiled (yet), and since it
looks like the process name is optional in the BackingStoreLogger, let's not
provide it for now.

Reviewed By: fanzeyi

Differential Revision: D22215581

fbshipit-source-id: 31a7e7be62cd3d14108dc437d3dfabfb9e62f8d5
2020-06-24 15:12:47 -07:00
Katie Mancini
8d32611a23 add data fetch logger
Summary:
We have seen that eden will unexpectedly fetch data, we want to know why.

This adds the plumbing to interact with edens current logging to be able to
log when eden fetches data from the server and what caused eden to do this
fetch. Later changes will use the classes created here to log the cause of data
fetches.

Reviewed By: chadaustin

Differential Revision: D22051013

fbshipit-source-id: 27d377d7057e66f3e7a304cd7004f8aa44f8ba62
2020-06-23 10:02:41 -07:00
Chad Austin
407c817d7a switch from folly benchmark to google benchmark
Summary:
Google Benchmark is easier to use, has more built-in functionality,
and more accurate default behavior than Folly Benchmark, so switch
EdenFS to use it.;

Reviewed By: simpkins

Differential Revision: D20273672

fbshipit-source-id: c90c49878592620a83d2821ed4bc75c20e599a75
2020-04-30 09:36:01 -07:00
Chad Austin
61e738cd84 use enumValue instead of static_cast<int>
Summary:
Where appropriate, replace uses of `static_cast<int>` with
`enumValue`.

Reviewed By: simpkins

Differential Revision: D20975196

fbshipit-source-id: 581643366ea7eda5d1961238b0693cf45c4eec94
2020-04-28 18:59:34 -07:00
Katie Mancini
48804631f7 expose number pending fuse requests
Summary:
This sets up the counters that will allow us to expose the number
of pending FUSE requests in Eden top.

As D20846826 mentions adding metrics for FUSE request gives
visibility into fuse requests and overall health of eden.

This provides more insight beyond the metrics for live FUSE requests
since it shows the kernels view of FUSE requests. Looking at the difference
between the number of pending and live request, can identify issues
that arise at the interface between eden and FUSE and monitor how
quickly fuse workers are processing requests.

**note**: this is only for linux since macos has no equivalent to
`/sys/fs/fuse/connections`

Reviewed By: chadaustin

Differential Revision: D21074489

fbshipit-source-id: c0951f0dfd4fa764be28d8686d08cd0dd807db37
2020-04-28 13:28:01 -07:00
Katie Mancini
37c264f457 make FUSE metrics more efficient
Summary:
Tl;DR Reduces costs of fuse request mertics by reducing lock contention.

D20922194 adds tracking for FUSE request metrics, this makes tracking those
metrics more efficient.  Since every user request goes through the FUSE channel,
we want to reduce the cost of these metrics as much  as possible (originally
mentioned in a comment D20922194).

The synchronization used to track the metrics is costly especially
when the lock is contended.

To reduce the cost, each FuseChannel will have a thread local copy of metrics.
Each will still be synchronized to allow for reading the metrics and for
Requests moved to other threads that will need to access the metrics. However,
the lock should be contended less often since only requests from a single fuse
channel thread will access it.

Reviewed By: chadaustin

Differential Revision: D21043792

fbshipit-source-id: ce58a0cbce334095976233bfac7578d39c81bb55
2020-04-23 10:46:16 -07:00
Katie Mancini
d850667e19 expose live FUSE requests
Summary:
This exposes metrics for the live FUSE requests (the duration
of the longest outstanding request and the number of outstanding
requests).

Because FUSE is the interface through which the user mostly interacts
with the file system they provide good metrics to judge if the perfomance
of eden is normal, or there may be an issue.

Exposing these counters this way will send them to ods, so it will not only
allow for debuging current issues, but can be used to look back at previous
problems. This data could also be used for alerting or more proactive
remediation.

Metrics are exposed per checkout to allow seeing which checkout was
having issues. This data will aggregated in `eden top` to be used as
an overall health indicator, but should more information be needed it
will be logged in ods.

Reviewed By: chadaustin

Differential Revision: D20922194

fbshipit-source-id: 16208883417acb77b62bf712cfdd9068c5420303
2020-04-22 12:33:34 -07:00
Katie Mancini
16b5e289ae track live FUSE requests
Summary:
This sets up the counters that will allow us to expose metrics for
live FUSE requests in Eden top.

This will allow us to track the number of live FUSE requests and the duration
of the longest FUSE request. If the duration of the longest FUSE request has
become very long, the number of FUSE requests is stuck at some value or
increasingly growing these can indicate there is a problem with the Fuse Channel
or something effecting this.

This provides more insight beyond the metrics for imports since many of the
FUSE requests will not cause imports, so this monitors more functionality.
Further because all user interaction with the filesystem goes through FUSE,
these metrics will give a good overall indication weather Eden is performing
normally.

Reviewed By: chadaustin

Differential Revision: D20846826

fbshipit-source-id: 34d834d193833a3c91d95fbb87361ddd863f64ca
2020-04-15 10:17:58 -07:00
Katie Mancini
e299b71988 Explicitly track current/pending imports -- refactor and expose current metrics
Summary:
As mentioned in D20629833, adding metrics for live
imports in `eden top` gives more transparency to the
imports process and makes identifying import related
issues easier. This is set up to expose metrics for live
imports like those for pending imports in `eden top`.

Similar to D20611728 exposing this via these counters
will log this data. Having this data persisted will allow
tracking the performance of imports, and does the set
up for more pro-active fixing of issues. Further we can
look back to see issues that are no longer occurring, but
still of interest.

This also refactors the registration code so that it requires
no copy pasting to add a new counter. Avoiding copy paste
errors when adding more counters and making it easier to
maintain.

Reviewed By: chadaustin

Differential Revision: D20630813

fbshipit-source-id: 8a7a2a0135c7b7a5cde960b84dcb434c6c99eaeb
2020-04-09 12:35:22 -07:00
Katie Mancini
0137a8ccb4 Explicitly track current/pending imports -- add current metrics
Summary:
Previously `eden top` shows metrics for imports that are queued
and fetching data (though there is a bug here fixed above), we
want to provide more granularity to help with debugging:

This explicitly separates "current" and "pending" imports:
- "live": The import is live, ie it is currently importing data
  - having this metric allows debugging the case that there
is a problem fetching data
- "pending": The import is queued, live, or getting data from the
cache
  - having this metric allows debugging the case that there is a
problem initiating the request, for example a request is being
starved on the queue

**note**: I moved the watches closer to the import function call
instead of renaming the currently broken pending import watches
to make it more clear in the code what these are suppose to be
timing

Reviewed By: chadaustin

Differential Revision: D20629833

fbshipit-source-id: 84ef75057149a648a51418a5cc93be87e3b3d6b5
2020-04-06 11:45:58 -07:00
Katie Mancini
3a035094f8 Record Mercurial tree import time
Summary: - added logging only around the import tree call to capture non-queue related wait time

Reviewed By: chadaustin, fanzeyi

Differential Revision: D20207472

fbshipit-source-id: d88bb34ce224a26ff2be100d7789ddeff608006d
2020-03-03 11:44:28 -08:00
Katie Mancini
52e211fe8e Record Mercurial file import time
Summary:
- added logging only around the import blob call to capture non-queue related wait time
- added to `test_reading_file_gets_file_from_hg` in `integration.stats_test.HgBackingStoreStatsTest`  to test import blob logging in addition to the get blob loging

(not yet done for importing trees, will do in next diff)

Reviewed By: chadaustin

Differential Revision: D20201215

fbshipit-source-id: c89281fe7d3d6e89d111ac8cce9014adff44ac40
2020-03-03 11:44:27 -08:00
Genevieve Helsel
9b8960eef4 add success field to daemon start logging
Summary: records if a start was successful or not

Reviewed By: simpkins

Differential Revision: D19817810

fbshipit-source-id: b67253099781bb534b7e2fb26a09ba41c1f0bd69
2020-02-18 08:05:51 -08:00
Genevieve Helsel
f52ee41ae9 log daemon shutdowns
Summary: logs information about daemon shutdowns, including duration, if it was a graceful restart or not, and if it was successful or not.

Reviewed By: fanzeyi

Differential Revision: D19817811

fbshipit-source-id: a851de8872f98952d046fb148a27fbf9f05b2212
2020-02-11 09:04:33 -08:00
Chad Austin
4e1f60fc24 log checkout type and fetch counts to scuba
Summary: In addition to duration and success, log object fetch counts and checkout type to Scuba.

Reviewed By: fanzeyi

Differential Revision: D19334276

fbshipit-source-id: dabf52427f2ebda2b58df93194df39d52f4fcb4f
2020-02-05 16:05:30 -08:00
Yedidya Feldblum
8e6f529d4a Use cacheline_aligned in Tracer
Summary: [Eden] Use `cacheline_aligned` in `Tracer`.

Reviewed By: aary

Differential Revision: D19628287

fbshipit-source-id: 4730c995bc9c11a3d1e485971eef54cb9b329aa8
2020-02-03 14:47:07 -08:00
Chad Austin
7c6efd32e7 rename LogEvent fields so names in code match names in the log
Summary:
Multiple times I've gotten mixed up by the different casing of the
LogEvent fields versus the names in the structured log. Change the
field names to match the log names.

Reviewed By: genevievehelsel

Differential Revision: D19585307

fbshipit-source-id: 3ccbb9ec5e18155367bc85e9db00bc8d556812c4
2020-01-31 10:42:26 -08:00
Chad Austin
e033f203f7 report successful and unsuccessful mounts, including whether the overlay was clean or not
Summary:
I have a suspicion that most edenfs starts are unclean. To test that
hypothesis, add a "mount" structured event type that records whether
the mount was cleanly shut down, how long mounting took, and whether
it's a graceful restart.

Reviewed By: genevievehelsel

Differential Revision: D19585161

fbshipit-source-id: 16eedfeb6181742e64b45bca1ed3cce3edb663cb
2020-01-31 10:42:26 -08:00
Genevieve Helsel
283c6de0df log mismatched parents during staus
Summary: scuba logging for mismatched parent commits, it would be nice to have a count on how often this is encountered so we can pinpoint spikes of this case. I am not sure how helpful having the parent commits would be in the event, my thought there was that it could help see how far away the commits are and if one of the commtis doesn't exist (like in the case of stripping commtis with a cancelled hg split), but I am open to having an empty event as well if storing these strings is too expensive.

Reviewed By: fanzeyi

Differential Revision: D19432378

fbshipit-source-id: fa3e7cd6aef832c4f918f339b781590b7484cfdc
2020-01-16 12:01:11 -08:00
Chad Austin
d0ce25afba log to scuba upon automatic gc
Summary:
A spike in automatic GCs usually implies something has gone wrong. Log
an event for each one, recording the cache size prior to the GC and
the cache size after.

Reviewed By: simpkins

Differential Revision: D18902580

fbshipit-source-id: 158b2635733a415a9fcc7c412b2c0f44ed04aa01
2019-12-10 16:16:35 -08:00
Chad Austin
21ab8fcb67 report daemon starts to scuba
Summary: Report edenfs startup time to Scuba.

Reviewed By: fanzeyi

Differential Revision: D18093269

fbshipit-source-id: d2dce804a998e76f67b2bcfa3417f46ae0004dd6
2019-11-13 20:16:37 -08:00
Chad Austin
62169acd37 report checkout durations to scuba
Summary: Log checkout durations to the structured logger.

Reviewed By: fanzeyi

Differential Revision: D18071996

fbshipit-source-id: 6c5a3ceeb6925058d02eebc4a44f0473680b31b5
2019-11-13 20:16:37 -08:00
Chad Austin
987dd994b6 add a StructuredLogger instance to ServerState
Summary:
Allocate and hook up a StructuredLogger at startup if the EdenConfig
has the appropriate values set.

Reviewed By: simpkins

Differential Revision: D18071821

fbshipit-source-id: 3996b6644bbf0c16bb3b9950d57a79d4619a1656
2019-11-13 15:23:38 -08:00
Chad Austin
ca37294995 introduce a StructuredLogger
Summary:
Introduce a framework that allows recording structured log events
which are encoded as JSON and piped to a configured command line
program.

Reviewed By: pkaush

Differential Revision: D18025183

fbshipit-source-id: ab6b4d510a905a30252f2cff85d107a0d32d149e
2019-10-25 19:29:01 -07:00
Chad Austin
4f360eafd2 clang-format
Summary: Formatting had diverged in a few places. Fix that up.

Reviewed By: fanzeyi

Differential Revision: D18123219

fbshipit-source-id: 832cdd70789642f665a029196998928a9173be81
2019-10-24 14:56:43 -07:00
Chad Austin
9a5f0093f2 introduce a scribe logger that communicates with scribe_cat
Summary: Introduce a new ScribeLogger class that spawns and maintains a process. Log messages are newline-delimited and written to the process's stdin. If the process stops responding or responds too slowly, log messages are dropped.

Reviewed By: pkaush

Differential Revision: D17777215

fbshipit-source-id: c998d10c73fc103122d69ae19c5d84f58b7939d2
2019-10-22 12:42:54 -07:00
Chad Austin
65c93484e2 rename tracing to telemetry
Summary: Tracing was not an accurate name for what this directory had become. So rename it to telemetry.

Reviewed By: wez

Differential Revision: D17923303

fbshipit-source-id: fca07e8447d9b9b3ea5d860809a2d377e3c4f9f2
2019-10-15 13:39:41 -07:00