Summary:
To support better telemetry and logging in watchman we want to use Eden's components. Lets migrate and detangle the needed pieces.
This change extracts DynamicEvent from LogEvent so that it can be moved to edencommon in a later diff.
Reviewed By: kmancini
Differential Revision: D54046152
fbshipit-source-id: b50bae2b155f8640a8b319f9b0353e7f44110100
Summary:
To support better telemetry and logging in watchman we want to use Eden's components. Lets migrate and detangle the needed pieces.
This change moves Throw.h and it's related tests from eden to edencommon.
Reviewed By: genevievehelsel
Differential Revision: D54046153
fbshipit-source-id: 669d702c13e70536d9c0b58ff8ff17f826237851
Summary:
Currently, EdenFS crashes if ProjFS requests to read more bytes than we have
available for a file. I am able to repro this, by blocking the read so that
it starts before an `hg checkout` that shortens file contents and completes
after.
I am making eden return an error when we don't have enough data available.
Unfortunately, it seems there is no error code that indicates to projfs it's
view is stale and make it flush it's caches and retry. All error codes seem to
be propagated to the user. These ones are the most promising to make PrjFS do
something we want: https://learn.microsoft.com/en-us/windows/win32/projfs/provider-overview#callback-return-codes
But all those still result in errors to the user. I tried a few more common
HRESULT return codes: https://learn.microsoft.com/en-us/windows/win32/seccrypto/common-hresult-values
But all errors still get propagated to the user. Invalid argument seems to be
the most accurate reflection of the issue, so I'm going with that for now.
It would be better if we could just return the amount of data that we have, and
not have to error. However, PrjFS will still think the file is the longer
length and will add null bytes. This will leave the file in an inconsistent
state and leave `hg status` dirty. This seems worse than erroring because it
causes EdenFS to corrupt the file.
Ideally we would fix this by preventing the length from getting out of sync
from the contents we should read the data from. As far as I see there are two
options, but they are both more involved:
- Failing the checkout when we try to invalidate the contents and
it fails. This would be a bit disruptive to the user as background reads
can interfere with a checkout. but it matches the what a native filesystem would
do.
- Keeping the version of the file to read in the placeholder so the version and
length are never out of sync. The file contents would then be the old contents.
This would leave the file modified and make status dirty, which is still not a
great user experience, but at least the file would be in a consistent state.
I've reached out to ProjFS about other solutions -- maybe there is or could be
a particular error that eden can send that will make ProjFS retry and
gracefully deal.
For now error is better than crash
Reviewed By: mshroyer, MichaelCuevas
Differential Revision: D50193962
fbshipit-source-id: 6ca05a4195c29996a0cd255bfd461a98f7dd5fa6
Summary: With the removal of all HgImporter code, we no longer need to differentiate fetch_miss sources. If in the future we have other data stores that fetch from we can add this back. For now, it is just taking up extra space in our scuba table.
Reviewed By: genevievehelsel
Differential Revision: D52404111
fbshipit-source-id: 792d424d5b0459cdd0e47dad45d4bfb1298a1b3d
Summary: HgImporter was the only way to get the repo name. With the removal of it, we a new way. Adding that to SaplingNativeBackingStore.
Reviewed By: xavierd
Differential Revision: D51911195
fbshipit-source-id: 7fcb61d72219d4e5851b31507c53c62f68df8476
Summary: We have counters for retry success, but not for failure. Now that failure will not always result in a fallback to HgImporter we should add them.
Reviewed By: genevievehelsel
Differential Revision: D51794512
fbshipit-source-id: 108463e55c4feb20c19f7ab602365c089930601f
Summary:
Our object fetch miss reporting done via two different events: EdenApiMiss and HgImportFailure. EdenApiMiss is misleading (no pun intented) as it represents backingstore misses (local and remote). Neither of these events capture all that we want and EdenApiMiss had some wonky names.
This change fixes that, replacing both with ObjectFetchMiss. Additonally, it adds miss events for BlobMetadata.
Up next: we need to expose the failure causes from single object fetches through HgDatapackStore->SaplingNativeStore. Today they are eaten and only a pointer is returned.
Reviewed By: genevievehelsel
Differential Revision: D51733914
fbshipit-source-id: c592520da56a58a1d3e5a6198695857fe6c19451
Summary:
`-Wextra-semi` or `-Wextra-semi-stmt`
If the code compiles, this is safe to land.
Reviewed By: dmm-fb
Differential Revision: D51777971
fbshipit-source-id: 773e23e38af97640a244306fa4205d0f2c9e40c5
Summary:
EdenFS currently falls back to HgImporter (aka `hg debugedenimporthelper`) to rety any failures when fetching object from backingstore. We would like to remove all fall back paths to HgImporter and, eventualy, remove HgImporter (EdenFs) and debugedenimporthelper (hg) entirely.
This change adds support for retrying tree fetching one time using backingstore on the HgImporter thread. The idea being that we know some reties will succeed, but it is unclear how many and whether they will succeed via backingstore or only via HgImporter.
In the worst case, where retrying fails backingstore, this change will adds some latency (10s for timeouts). However, this is understood and expected impact is much smaller now that we have eliminated duplicates (99% of all failures). We choose to accept this added latency to gain the insights of what fails retry via backingstore and succeeds via HgImporter; as well as what fails all paths.
In cases where retrying via backingstore succeeds, we should see a very slight speedup. Future improvements will allow us to retry from within backingstore while not blocking the calling thread pool.
Reviewed By: xavierd
Differential Revision: D50531089
fbshipit-source-id: 3cde80ef73c96c78e2986ffe1ec3cc714054b137
Summary:
EdenFS currently falls back to HgImporter (aka `hg debugedenimporthelper`) to rety any failures when fetching object from backingstore. We would like to remove all fall back paths to HgImporter and, eventualy, remove HgImporter (EdenFs) and debugedenimporthelper (hg) entirely.
This change adds support for retrying blob fetching one time using backingstore on the HgImporter thread. The idea being that we know some reties will succeed, but it is unclear how many and whether they will succeed via backingstore or only via HgImporter.
In the worst case, where retrying fails backingstore, this change will adds some latency (10s for timeouts). However, this is understood and expected impact is much smaller now that we have eliminated duplicates (99% of all failures). We choose to accept this added latency to gain the insights of what fails retry via backingstore and succeeds via HgImporter; as well as what fails all paths.
In cases where retrying via backingstore succeeds, we should see a very slight speedup. Future improvements will allow us to retry from within backingstore while not blocking the calling thread pool.
Reviewed By: genevievehelsel
Differential Revision: D50423842
fbshipit-source-id: abafb15193544bcf2861f67945a038aceb9bc709
Summary: Mostly the same as streamChangesSince but we construct a object store with filtered backing store and pass it to diff tree
Reviewed By: kmancini
Differential Revision: D50207793
fbshipit-source-id: 646e1a3ccff48a4b2abf381940e971f3b1984836
The internal and external repositories are out of sync. This Pull Request attempts to brings them back in sync by patching the GitHub repository. Please carefully review this patch. You must disable ShipIt for your project in order to merge this pull request. DO NOT IMPORT this pull request. Instead, merge it directly on GitHub using the MERGE BUTTON. Re-enable ShipIt after merging.
Summary:
As part of the project to remove HgImporter adding some basic telemetry for failures returned from HgImporters. This will pair up with the remaining errors returned from EdenApiMiss (which is more accurately described as backing store miss).
This should not overwhelm any collectors as it comes after we have removed the highest contributor of edenapi_miss events. That said, once landed we can contrain the data retention of this event (1 day/50GB).
Reviewed By: xavierd
Differential Revision: D49935347
fbshipit-source-id: 92d815d802d083181a9cbc74e9df025b611e1d4c
Summary: We previously had counters to provide visibility into the in-memory blob metadata cache's hit rate, but we lacked corresponding counters for the in-memory blob and tree caches. This adds BlobCache hit counters.
Reviewed By: genevievehelsel
Differential Revision: D49564626
fbshipit-source-id: 2fdd1e51fb8944607a032e0aef8610b9bf8dabcf
Summary:
We previously had counters to provide visibility into the in-memory blob
metadata cache's hit rate, but we lacked corresponding counters for the
in-memory blob and tree caches. This adds TreeCache hit counters.
Reviewed By: chadaustin
Differential Revision: D49396499
fbshipit-source-id: 5622874b8df9f8aece61df24989a1a12c9ff4331
Summary:
Add logging for `hg edendebugimporter`
HgBackingStore will:
- Attempt to get the data from local cache
- Fetch it remotely from Mononoke
- Or, as a last resort, fall back to using `hg edendebugimporter`. This is a path we would like to deprecate and logging accesses will help us understand who is falling back to this.
Reviewed By: jdelliot
Differential Revision: D49424377
fbshipit-source-id: 959c6a42bb5a5c3df6b9bc206527d0ed28253bbe
Summary:
We would like to know how effective our various caching entities are. Here we are adding counters in Overlay to track hits and misses when loading an overlay directory.
File operations in the Overaly are only used when the overlay checker is being ran - as such not adding counters for those.
Reviewed By: MichaelCuevas
Differential Revision: D49552392
fbshipit-source-id: 1d2e699b4b9c5a1bad3d47ef1e7945a5f7f37999
Summary: We would like to know how effective our various caching entities are. Here we are adding counters in EdenStats to track lookup hits and misses in InodeMetadataTable.
Reviewed By: MichaelCuevas
Differential Revision: D49548923
fbshipit-source-id: bd48b71213ecb28d6f0316a7d69955b80aefdb37
Summary: InodeMap, in subsequent diffs, will report metrics when looking up inodes (hits, misses, errors). Adding a new stats subtype with counters for this purpose.
Reviewed By: chadaustin
Differential Revision: D49477890
fbshipit-source-id: 18386cd18e83da57053f39f6eea2966de48dbfb5
Summary: Adding metrics to enable a full picture of EdenFS cache hits top to bottom. In this case, adding cache hits (success) to LocalStore. This will be paired with existing cache misses (failure). We also add errors (exceptions) to the reporting.
Reviewed By: clara-9
Differential Revision: D49396438
fbshipit-source-id: e28bd8e2785c48f9e9acce5761b01aaeb47da59c
Summary:
Today, `EdenApiMiss` events in scuba only tell us what was missed (blob or tree), but not why they were missed. We do have error information (capatured in an exception) that we can report. This will add that error information to the `EdenApaMiss` events.
As a result, our scuba usage will surge again. To remediate this, another diff will update configerator to adjust the retention of just this new column to 1 day or 100GB whichever is hit first.
See D49212103 for related configerator changes.
Reviewed By: MichaelCuevas
Differential Revision: D49211246
fbshipit-source-id: a075eb2e968e00ded206ae9c978fe58efd359101
Summary: In an effort to observe the impact of mount type changes added the overlay type (InodeCatalogType) to the mount event as an enum (int64_t). Additionally, added a derived column ("Overlay Type Str") in scuba that maps from enum to human readable string.
Reviewed By: kmancini
Differential Revision: D48689819
fbshipit-source-id: c308fb169ee9d1b3d0f29afc50098712aed81cef
Summary: We now know that these events are occurring quite frequently. Let's log some additional info here so we can tell if the InodeMetadataTable entry is completely corrupted, or if just a small part of the entry is incorrect.
Reviewed By: xavierd
Differential Revision: D48164675
fbshipit-source-id: 8bb0991a8a916278c4add1f7feab7cd2348e4b0a
Summary:
clang-tidy had some automated suggestions for our code. Apply the ones
that make sense.
Some of them didn't, like removal of all uses of `volatile`. I
manually reverted those changes.
Reviewed By: genevievehelsel
Differential Revision: D41051052
fbshipit-source-id: 3fe22a91e929d3bb8e6346126c2c7bf9f027eb32
Summary:
Now that ProcessId exists, we should use it instead of pid_t in
ObjectFetchContext.
Reviewed By: genevievehelsel
Differential Revision: D42037216
fbshipit-source-id: 34cd89f78be35a15d73b26edc840e917fd642723
Summary:
* Added new LogEvent - NfsCrawlDetected
* Refactored code to produce a single string output for logging
* Refactored some improperly named local variables (too much Rust recently)
Reviewed By: mshroyer
Differential Revision: D47280602
fbshipit-source-id: 3254eff3f4be417d969fb41116484e350b249530
Summary: Adding to track possible corruption issues we've been getting user reports about
Reviewed By: chadaustin
Differential Revision: D46706164
fbshipit-source-id: 6a868d9f9ffd3f72303123486bbb8312f2002072
Summary:
Both the getBlobMetadata and fetchBlobMetadata were publishing to the same counter, making them indistinguishable.
Created from CodeHub with https://fburl.com/edit-in-codehub
Reviewed By: genevievehelsel
Differential Revision: D46343472
fbshipit-source-id: 04e09372c21dd5cb09336259ff34a5394d10d3b4
Summary:
We've seen a case where a large `hg update` was taking an absurdly long time in
`ObjectStore::getTree` but the telemetry was showing us that most of the time
wasn't spent fetching trees from Mercurial! The suspicion is that most of the
time was spent in the LocalStore but with no evidence to prove it.
Let's thus add some timing telemetry to various LocalStore read requests to
fill this gap.
Reviewed By: mshroyer
Differential Revision: D46154456
fbshipit-source-id: b439ac48889ed3db71db136ff6c1cc91f48c926a
Summary:
Telemetry exist for where Blob and BlobMetadata are coming from, but not for
Tree. Let's bring the Tree telemetry up to par.
Reviewed By: genevievehelsel
Differential Revision: D46154455
fbshipit-source-id: ab3c31d55b6a91009289b3b07853fa574bbaa137
Summary:
We almost never intend to include folly/String.h, especially in
headers, and it's a somewhat expensive header.
We sometimes use the string transformation functions in .cpp files,
but rarely in .h. Therefore, remove that dependency where we can.
Reviewed By: genevievehelsel
Differential Revision: D45672957
fbshipit-source-id: 11743156388aff5c61cfe6b46e385a95687a7a31
Summary: This include is no longer necessary, so we can remove it.
Reviewed By: genevievehelsel
Differential Revision: D45620895
fbshipit-source-id: d8d9ae00cc38ad1331f8f6fb1f1c55b9d4276095
Summary: Newer versions of fmt require that you pass an iterator rather than a direct memory buffer as an output to the `format_to()` family of functions. This does that to unblock an upgrade to a newer fmt version.
Differential Revision: D45555419
fbshipit-source-id: a4e2217299ada0ab5e56942dacc9107153e4260d
Summary:
TraceBus isn't easy to use correctly. Clarify some comments to make
the rules more explicit and unify the two constructors to allow
(noexcept) entry construction in place.
Reviewed By: xavierd
Differential Revision: D44657217
fbshipit-source-id: 1fa7c0e3c4ffb169be2b7b0cd1ffa2ea07dfeeb1
Summary:
This should reduce the number of allocations when pushing and popping
events into ActivityBuffer.
Also, remove -inl.h because our clangd does not support it.
Reviewed By: xavierd
Differential Revision: D44593366
fbshipit-source-id: 7dc0f2aa457b44bebe9471edd3c7e688d09534f5
Summary:
Now that the Mercurial backingstore knows how to fetch aux data, let's thread
this through the ObjectStore.
Reviewed By: kmancini
Differential Revision: D44110102
fbshipit-source-id: c57da05066d80fee199e45b4a4223168a196e3de
Summary:
The external OSS build is broken because thrift has a really long path and
cargo/git on Windows do not lick such long paths.
I'm not fixing that. But the external OSS build is the only one we run in CI any
more. In the meantime the internal build has broken because no one has been
watching.
There were 4 different breakages.
Reviewed By: chadaustin
Differential Revision: D44189633
fbshipit-source-id: 2eedbc2b3bbf5d1def075d99f11f2273dbb1f4ab
Summary: We are running into fileNotification() failures way more often than we expect on Sandcastle (5% of certain jobs are failing). Let's add some logging here to see if the failures are also happening to users.
Reviewed By: xavierd
Differential Revision: D44350819
fbshipit-source-id: 3be7578dccc297071280f528b53270a63994a99d
Summary:
DurationScope has a pair of atomic reference count operations at the
beginning and end of every recorded duration. To avoid that overhead,
reference EdenStats with RefPtr and, in production EdenFS, store a
global, unowned EdenStats object.
This gives us the benefits of explicit dependencies and no global
state in tests, while avoiding atomic RMWs in the release build.
Reviewed By: xavierd
Differential Revision: D44323723
fbshipit-source-id: 1b3384d2e6a0a2959fd79774a8ba46afc4c176ca