Summary:
Google Benchmark is easier to use, has more built-in functionality,
and more accurate default behavior than Folly Benchmark, so switch
EdenFS to use it.;
Reviewed By: simpkins
Differential Revision: D20273672
fbshipit-source-id: c90c49878592620a83d2821ed4bc75c20e599a75
Summary:
This sets up the counters that will allow us to expose the number
of pending FUSE requests in Eden top.
As D20846826 mentions adding metrics for FUSE request gives
visibility into fuse requests and overall health of eden.
This provides more insight beyond the metrics for live FUSE requests
since it shows the kernels view of FUSE requests. Looking at the difference
between the number of pending and live request, can identify issues
that arise at the interface between eden and FUSE and monitor how
quickly fuse workers are processing requests.
**note**: this is only for linux since macos has no equivalent to
`/sys/fs/fuse/connections`
Reviewed By: chadaustin
Differential Revision: D21074489
fbshipit-source-id: c0951f0dfd4fa764be28d8686d08cd0dd807db37
Summary:
Tl;DR Reduces costs of fuse request mertics by reducing lock contention.
D20922194 adds tracking for FUSE request metrics, this makes tracking those
metrics more efficient. Since every user request goes through the FUSE channel,
we want to reduce the cost of these metrics as much as possible (originally
mentioned in a comment D20922194).
The synchronization used to track the metrics is costly especially
when the lock is contended.
To reduce the cost, each FuseChannel will have a thread local copy of metrics.
Each will still be synchronized to allow for reading the metrics and for
Requests moved to other threads that will need to access the metrics. However,
the lock should be contended less often since only requests from a single fuse
channel thread will access it.
Reviewed By: chadaustin
Differential Revision: D21043792
fbshipit-source-id: ce58a0cbce334095976233bfac7578d39c81bb55
Summary:
This exposes metrics for the live FUSE requests (the duration
of the longest outstanding request and the number of outstanding
requests).
Because FUSE is the interface through which the user mostly interacts
with the file system they provide good metrics to judge if the perfomance
of eden is normal, or there may be an issue.
Exposing these counters this way will send them to ods, so it will not only
allow for debuging current issues, but can be used to look back at previous
problems. This data could also be used for alerting or more proactive
remediation.
Metrics are exposed per checkout to allow seeing which checkout was
having issues. This data will aggregated in `eden top` to be used as
an overall health indicator, but should more information be needed it
will be logged in ods.
Reviewed By: chadaustin
Differential Revision: D20922194
fbshipit-source-id: 16208883417acb77b62bf712cfdd9068c5420303
Summary:
This sets up the counters that will allow us to expose metrics for
live FUSE requests in Eden top.
This will allow us to track the number of live FUSE requests and the duration
of the longest FUSE request. If the duration of the longest FUSE request has
become very long, the number of FUSE requests is stuck at some value or
increasingly growing these can indicate there is a problem with the Fuse Channel
or something effecting this.
This provides more insight beyond the metrics for imports since many of the
FUSE requests will not cause imports, so this monitors more functionality.
Further because all user interaction with the filesystem goes through FUSE,
these metrics will give a good overall indication weather Eden is performing
normally.
Reviewed By: chadaustin
Differential Revision: D20846826
fbshipit-source-id: 34d834d193833a3c91d95fbb87361ddd863f64ca
Summary:
As mentioned in D20629833, adding metrics for live
imports in `eden top` gives more transparency to the
imports process and makes identifying import related
issues easier. This is set up to expose metrics for live
imports like those for pending imports in `eden top`.
Similar to D20611728 exposing this via these counters
will log this data. Having this data persisted will allow
tracking the performance of imports, and does the set
up for more pro-active fixing of issues. Further we can
look back to see issues that are no longer occurring, but
still of interest.
This also refactors the registration code so that it requires
no copy pasting to add a new counter. Avoiding copy paste
errors when adding more counters and making it easier to
maintain.
Reviewed By: chadaustin
Differential Revision: D20630813
fbshipit-source-id: 8a7a2a0135c7b7a5cde960b84dcb434c6c99eaeb
Summary:
Previously `eden top` shows metrics for imports that are queued
and fetching data (though there is a bug here fixed above), we
want to provide more granularity to help with debugging:
This explicitly separates "current" and "pending" imports:
- "live": The import is live, ie it is currently importing data
- having this metric allows debugging the case that there
is a problem fetching data
- "pending": The import is queued, live, or getting data from the
cache
- having this metric allows debugging the case that there is a
problem initiating the request, for example a request is being
starved on the queue
**note**: I moved the watches closer to the import function call
instead of renaming the currently broken pending import watches
to make it more clear in the code what these are suppose to be
timing
Reviewed By: chadaustin
Differential Revision: D20629833
fbshipit-source-id: 84ef75057149a648a51418a5cc93be87e3b3d6b5
Summary: - added logging only around the import tree call to capture non-queue related wait time
Reviewed By: chadaustin, fanzeyi
Differential Revision: D20207472
fbshipit-source-id: d88bb34ce224a26ff2be100d7789ddeff608006d
Summary:
- added logging only around the import blob call to capture non-queue related wait time
- added to `test_reading_file_gets_file_from_hg` in `integration.stats_test.HgBackingStoreStatsTest` to test import blob logging in addition to the get blob loging
(not yet done for importing trees, will do in next diff)
Reviewed By: chadaustin
Differential Revision: D20201215
fbshipit-source-id: c89281fe7d3d6e89d111ac8cce9014adff44ac40
Summary: records if a start was successful or not
Reviewed By: simpkins
Differential Revision: D19817810
fbshipit-source-id: b67253099781bb534b7e2fb26a09ba41c1f0bd69
Summary: logs information about daemon shutdowns, including duration, if it was a graceful restart or not, and if it was successful or not.
Reviewed By: fanzeyi
Differential Revision: D19817811
fbshipit-source-id: a851de8872f98952d046fb148a27fbf9f05b2212
Summary: In addition to duration and success, log object fetch counts and checkout type to Scuba.
Reviewed By: fanzeyi
Differential Revision: D19334276
fbshipit-source-id: dabf52427f2ebda2b58df93194df39d52f4fcb4f
Summary:
Multiple times I've gotten mixed up by the different casing of the
LogEvent fields versus the names in the structured log. Change the
field names to match the log names.
Reviewed By: genevievehelsel
Differential Revision: D19585307
fbshipit-source-id: 3ccbb9ec5e18155367bc85e9db00bc8d556812c4
Summary:
I have a suspicion that most edenfs starts are unclean. To test that
hypothesis, add a "mount" structured event type that records whether
the mount was cleanly shut down, how long mounting took, and whether
it's a graceful restart.
Reviewed By: genevievehelsel
Differential Revision: D19585161
fbshipit-source-id: 16eedfeb6181742e64b45bca1ed3cce3edb663cb
Summary: scuba logging for mismatched parent commits, it would be nice to have a count on how often this is encountered so we can pinpoint spikes of this case. I am not sure how helpful having the parent commits would be in the event, my thought there was that it could help see how far away the commits are and if one of the commtis doesn't exist (like in the case of stripping commtis with a cancelled hg split), but I am open to having an empty event as well if storing these strings is too expensive.
Reviewed By: fanzeyi
Differential Revision: D19432378
fbshipit-source-id: fa3e7cd6aef832c4f918f339b781590b7484cfdc
Summary:
A spike in automatic GCs usually implies something has gone wrong. Log
an event for each one, recording the cache size prior to the GC and
the cache size after.
Reviewed By: simpkins
Differential Revision: D18902580
fbshipit-source-id: 158b2635733a415a9fcc7c412b2c0f44ed04aa01
Summary:
Allocate and hook up a StructuredLogger at startup if the EdenConfig
has the appropriate values set.
Reviewed By: simpkins
Differential Revision: D18071821
fbshipit-source-id: 3996b6644bbf0c16bb3b9950d57a79d4619a1656
Summary:
Introduce a framework that allows recording structured log events
which are encoded as JSON and piped to a configured command line
program.
Reviewed By: pkaush
Differential Revision: D18025183
fbshipit-source-id: ab6b4d510a905a30252f2cff85d107a0d32d149e
Summary: Formatting had diverged in a few places. Fix that up.
Reviewed By: fanzeyi
Differential Revision: D18123219
fbshipit-source-id: 832cdd70789642f665a029196998928a9173be81
Summary: Introduce a new ScribeLogger class that spawns and maintains a process. Log messages are newline-delimited and written to the process's stdin. If the process stops responding or responds too slowly, log messages are dropped.
Reviewed By: pkaush
Differential Revision: D17777215
fbshipit-source-id: c998d10c73fc103122d69ae19c5d84f58b7939d2
Summary: Tracing was not an accurate name for what this directory had become. So rename it to telemetry.
Reviewed By: wez
Differential Revision: D17923303
fbshipit-source-id: fca07e8447d9b9b3ea5d860809a2d377e3c4f9f2