Summary:
folly:format is deprecated in lieu of fmt and std::format. Migrate
most of EdenFS to fmt instead.
Differential Revision: D31025948
fbshipit-source-id: 82ed674d5e255ac129995b56bc8b9731a5fbf82e
Summary:
The tree metadata fetching evolution goes as follow
(1) (commit, path) scs query
(2) tree manifest scs query [we are here]
(3) eden api manifest query [in development]
Option (1) is no longer used and is the only placed that required scs proxy hash.
Removing it will simplify transition from (2) to (3) and also cleans up bunch of unused code.
It also comes with minor performance improvement, saving about 5% on file access time.
To be precise, this is measured by running fsprobe [this is probably too little to measure in high noise benchmark like running arc focus]:
```
fsprobe.sh run cat.targets --parallel 24
```
Results:
```
W/ scshash:
P24: 0.1044 0.1007 0.1005 (hot) 0.1019 avg
W/o scshash:
P24: 0.0954 0.0964 0.1008 (hot) 0.0975 avg
```
This performance improvement comes from the fact, that even though scs hash was never created or used, we still attempted to load it from scs table, and even though this load always failed it contributed to execution time.
Reviewed By: xavierd
Differential Revision: D30942663
fbshipit-source-id: af84f1e5658e7d8d9fb6853cbb88f02b49cd050b
Summary:
LocalStore no longer special-cases Tree objects with kZeroHash
ids. Instead, unconditionally write into LocalStore with the Tree's
hash.
Reviewed By: xavierd
Differential Revision: D29155470
fbshipit-source-id: aee3840fe8dfd7aa46305b6db6f7950efb2e41d2
Summary:
In preparation for expanding to variable-width hashes, rename the
existing hash type to Hash20.
Reviewed By: genevievehelsel
Differential Revision: D28967365
fbshipit-source-id: 8ca8c39bf03bd97475628545c74cebf0deb8e62f
Summary:
Since this method wasn't overriden, EdenFS would never periodically flush data
to disk.
Reviewed By: fanzeyi
Differential Revision: D30784400
fbshipit-source-id: d88e535250a476582868dd82e57137a0ac38f921
Summary:
Having the same queue for all three makes the dequeue code overly complicated
as it needs to keep track of the kind of request that needs to be dequeued.
Incidently, the previous code had a bug where request in "putback" would be
requeued at the end of the queue, even though there were at the beginning of it
if they all had the same priorities.
This is theory should also improve the dequeue performance when the queue has a
mix of blobs/tree requests, but I haven't measured.
Reviewed By: genevievehelsel
Differential Revision: D30560490
fbshipit-source-id: b27e5429105c07e5f9eab482c12e5699ca3413f7
Summary: Currently, tree imports are queued regardless of whether they are in the `hgcache`. This adds unnecessary delay, especially if the queue is busy (importer takes a long time and causes queue to backlog). This diff adds the logic to check if the tree is in `hgcache` before enqueuing a tree import request.
Reviewed By: xavierd
Differential Revision: D30514871
fbshipit-source-id: eb23f64b7f059832571f957fb67d18c3821d2844
Summary:
Some code in the HgDatapackStore is overly complicated due to the fact that
revHash returns a owned Hash and this forces the code to thus copy it onto a
temporary vector. By having a method that can directly return a slice to the
hash, this issue disappears, thus let's add it.
Reviewed By: chadaustin
Differential Revision: D30582458
fbshipit-source-id: dc102117bc82ab72378293c0abfe9acfd862e9e6
Summary:
I noticed that every time we fetch blob from hg, we calculate sha hash and put it into metadata table.
Both calculating sha1 of content and writing it to rocks is fairly expensive, and it would be nice if we can skip doing so in some cases.
In this diff I use inexpensive cache check to see if we already calculated metadata for given blob and skip recalculation
In terms of performance, it reduces blob access time in hot case from **0.62 ms to 0.22 ms**.
[still need to do some testing with buck, but I think this should not block the diff since it seem farily trivial]
This is short-medium term fix, the longer term solution will be keeping hashes in mercurial and fetching them via eden api, but this will take some time to implement
Reviewed By: chadaustin, xavierd
Differential Revision: D30587132
fbshipit-source-id: 3b24ec88fb02e1ea514568b4e2c8f9fd784a0f10
Summary:
In order to enqueue and find an element in a hash table, the key needs to be
hashed. Hashing a HgProxyHash relies on hashing a string which is significantly
more expensive than hashing a Hash directly. Note that they both represent the
same data and thus there shouldn't be more collisions.
Reviewed By: chadaustin
Differential Revision: D30520223
fbshipit-source-id: 036007c445c28686f777aa170d0344346e7348b0
Summary:
Allocations are expensive, especially when done under a lock as this increase
the critical section, reducing the potential concurrency. While this yields to
a 1.25x speedup, this is more of a sideway improvement as the allocation is now
done prior to enqueuing. This also means that de-duplicating requests is now
more expensive, as no allocation would be done before, but at the same time,
de-duplication is the non-common code path, so the tradeoff is worthwhile.
Reviewed By: chadaustin
Differential Revision: D30520228
fbshipit-source-id: 99dea65e828f9c896fdfca6b308106554c989282
Summary: The F14 hashmap are significantly faster than the std::unordered_map.
Reviewed By: chadaustin
Differential Revision: D30520225
fbshipit-source-id: d986908c5eac17f66ae2c7589f134c430a3c656e
Summary:
When turning on the native prefetch, EdenFS will enqueue tons of blob requests
to the import request queue. The expectation is then that the threads will
dequeue batch of requests and run them. What is being observed is however
vastly different: the dequeued batches are barely bigger than 10, far lower
than the batch capacity, leading to fetching inefficiencies. The reason for
this is that enqueuing is too costly.
The first step in making enqueuing less costly is to reduce the number of times
the lock needs to be acquired by moving the de-duplication inside the enqueue
function itself. On top of reducing the number of times the lock is held, it
also halves the number of allocation done under the lock.
Reviewed By: chadaustin
Differential Revision: D30520226
fbshipit-source-id: 52f6e3c1ec45caa5c47e3fd122b3a933b0448e7c
Summary:
It turns out that we do want to use a Future to make sure that the tracebus and
watches are completed on the producer and not on the consumer of the future. We
could use a `.via(inline executor)` but the code becomes less readable, so
let's just revert the diff.
Reviewed By: chadaustin
Differential Revision: D30545721
fbshipit-source-id: 524033ab4dbd16be0c377647f7f81f7cd57c206d
Summary:
The documentation allows for not having to test in enqueue if the queue is
still running: if called in the destructor of the owner, no enqueue can
logically happen, and thus we do not need to protect against it.
Reviewed By: chadaustin
Differential Revision: D30520227
fbshipit-source-id: 9d6280ccd7fe875cd06b0746151a2897d1f98d61
Summary:
This is now only used in HgQueuedBackingStore::logBackingStoreFetch, and
manually inlining it allows for the lock to be taken once instead of once per
path, reducing the number of times the lock needs to be acquired.
Differential Revision: D30494771
fbshipit-source-id: 2d59d0343e48051e4d9c4fc196e66bcb79e7ac71
Summary: While `eden trace hg` already prints queue time when it's over 1ms, this diff adds fb303 counters for import tree/block queue time so that we can have the percentiles.
Reviewed By: xavierd
Differential Revision: D30492275
fbshipit-source-id: 3601aeb9b51b2f55f189a0e0a753fd6ef29d7341
Summary: Currently, the store loops through the requests, calls HgImporter, then waits with `getTry`. This diff makes the change to kickoff all tree imports from HgImporter then waits for future fulfillment with `collectAll`.
Reviewed By: xavierd
Differential Revision: D30486459
fbshipit-source-id: 918e52be818a2064cf04d24f455d23c1ca618434
Summary:
Instead of having 2 functions with one taking a single proxy hash, and the
other taking a vector, we can simply have a single function taking a
`folly::Range` and pass a range of one for the single proxy hash case.
Reviewed By: chadaustin
Differential Revision: D30490724
fbshipit-source-id: 5d57f5a5ffc2a5085369c61a2318edd54b24b448
Summary:
By default, atomics are using the most strict memory ordering forcing barriers
to be used. Since this atomic doesn't need any ordering, we can make it
relaxed.
Reviewed By: chadaustin
Differential Revision: D30459630
fbshipit-source-id: ff50aac919031d9bae8b870b41a6134331546a5f
Summary:
The recordFetch is an implementation detail of a BackingStore and thus we don't
need to explicitely make it virtual.
Differential Revision: D30459635
fbshipit-source-id: 34f847ca906f81924c99c26b4e8af646e91fd735
Summary:
When prefetching a large number of blobs, repeatly checking whether we should
log accesses to files can become expensive. Since the state of the config isn't
expected to change in the entire batch, we can simply test it once and bail if
logging isn't enabled.
Reviewed By: chadaustin
Differential Revision: D30458698
fbshipit-source-id: b48b9e0ad24585a76d8ce5948f5831db27e08eab
Summary: Looks like we never use this, thus let's simply remove it.
Differential Revision: D30454812
fbshipit-source-id: 28242a2144da4bab9d24debc1a60eeebcdcbaad5
Summary:
When a prefetch request is transformed into many blob requests, we query
RocksDB sequentially for all the proxy hashes, this can be quite expensive and
is also far less efficient than querying RocksDB concurrently with all the
hashes.
As a bonus, this also futurize the code a bit.
Reviewed By: chadaustin
Differential Revision: D30454068
fbshipit-source-id: 5fd238b752a662919e739451c0c1e92f66919ebf
Summary:
Since these are always used as SemiFuture, let's simply make them SemiFuture
from the get go.
Differential Revision: D30452901
fbshipit-source-id: b0863f363ce0cdb921a73d02c43fc82c1614a3dc
Summary:
Looking at strobelight when performing an `eden prefetch` shows that a lot of
time is spent copying data around. The list of hash to prefetch is for instance
copied 4 times, let's reduce this to only one time when converting Hash to a
ByteRange.
Reviewed By: chadaustin
Differential Revision: D30433285
fbshipit-source-id: 922e6e5c095bd700ee133e9bb219904baf2ae1ac
Summary:
Once the request has been dequeued, we no longer need to hold the lock, thus
let's release it to allow other threads to enqueue/dequeue requests.
Differential Revision: D30409797
fbshipit-source-id: a527c67a6bd9f47da5a3930364fd8fae0d1bc427
Summary: This will add the same getTreeEntryForRootId to ObjectStore
Reviewed By: chadaustin
Differential Revision: D29920475
fbshipit-source-id: 15bfc6a2ba70cce2095dfcf1f434fd7087605e04
Summary: Add a new method to backingstore so we can get TreeEntry by rootID
Reviewed By: chadaustin
Differential Revision: D29889482
fbshipit-source-id: 93e63624e75c7d559c4de6f68821a8efa0e0c184
Summary:
The RelativePath is always built from a valid valid one, thus re-validating it
is not necessary.
Reviewed By: chadaustin
Differential Revision: D30410686
fbshipit-source-id: 3e46359f68b1693a0a2af310466fc73d105cf2c0
Summary:
By default, a prefetch will always go through the Mercurial importer, but
when store:use-eden-native-prefetch is set, prefetch is simply pushing tons of
blob requests to the HgQueuedBackingStore, and the internal batching takes care
of efficiently fetching.
The one drawback is that prior to pushing a blob request to the queue, getBlob
will query Mercurial to see if a blob is present locally. In the context of a
FUSE access, this is totally expected as this allows for low latency blob
access. But for a prefetch call, throughput matters significantly more and the
local check can negatively affect this.
Reviewed By: genevievehelsel
Differential Revision: D30404965
fbshipit-source-id: 113883993fa641caf7095a5bc8b7dd802f33348d
Summary:
As I'm looking through the code, it took me a bit of time to fully grasp what
the commented code was for, document it for future readers.
Reviewed By: chadaustin
Differential Revision: D30399723
fbshipit-source-id: bdf448b725192d7541b1d7de7e043ff97700dbce
Summary: This is never called, no need to keep it around.
Reviewed By: chadaustin
Differential Revision: D30399722
fbshipit-source-id: bbc169141b58976031fcae224f24ea23897c6f21
Summary:
If a wrong type is passed in, an exception would be thrown at runtime, while
the static_assert would fire at compiled time, reducing the time to find the
error for the developer.
Reviewed By: chadaustin
Differential Revision: D30398255
fbshipit-source-id: fd021f96063565f83c55a9bf3f175bf879afa6ed
Summary:
This is a re-submit of D29915585 (6c5c7055ce), I've merely fixed the bug that it introduced,
thus the credit goes to markbt. Below is the original commit message:
If Mercurial asks EdenFS to update to a commit that it has just created, this
can cause a long delay while EdenFS tries to import the commit.
EdenFS needs to get the trees out of the Hg data store. But these also won't
know about the new trees until the data store is refreshed or synced.
To fix this, call the refresh method on the store if we fail to find the tree,
and try again. To make this work, we must first only look locally. To keep
things simple, we only do this for the root tree.
However, currently indexedlogdatastore doesn't actually do anything when you
ask it to refresh.
To fix these, we call flush(), which actually does a sync operation and loads
the latest data from disk, too.
Reviewed By: chadaustin
Differential Revision: D30387805
fbshipit-source-id: 3fdbd27b306f03df53b68a0bcc5ee5dc140326bb
Summary:
The SmartPlatform service that queries for a user's most used directories allows optional parameters of: os, startTime, endTime, and sandcastleAlias instead of user. This diff extends the current predictive prefetch option which queries based on the current user, mount repository, and a default numResults, to allow specification of all parameters including the optional ones.
If a user and/or repo is not specified these are determined from the server state and mount, respectively. If numResults is not specified, a default value is used (predictivePrefetchProfileSize, currently 10,000).
For sandcastle aliases, we check if the SANDCASTLE_ALIAS environment variable is set, and if so, use the value as a parameter. If a sandcastle alias is specified, the smartservice will ignore the user and query based on the alias, otherwise a user is assumed.
Differential Revision: D30160507
fbshipit-source-id: 174797f0a6f840bb33f669c8d1bb61d76ff7a309
Summary:
Original commit changeset: 34fe02ddf580
We are seeing reports of EdenFS only showing partial directories to our user. Local testing shows this commit seems to be causing the issue. Reverting it for now.
Reviewed By: kmancini
Differential Revision: D30136949
fbshipit-source-id: 7fcc81506c132055a5b639a383b8c9be68118dc5
Summary:
This improves buck build(and likely other tools) execution time significantly on the cold cache, with eden api enabled(e.g. no import helper)
I have noticed during the tests that we don't urilize network well when accessing files not in cache.
Adding some instrumentation shows that we only do up to 8 parallel fetches on mercurial side.
This seem to be limited by number of hg queue workers, simply boosting this number from 8->32 improves performance a lot, and it is still manageable number of threads
We probably need some better solution to allow for greater parallelism.
With this diff time for `buck build fb4a` on the cold cache is reduced **from 121 to 95 minutes(-30%)**.
'Parsing buck files' stage is reduced **from 07:16 min to 05:53 min(-25%)**.
During buck build we get to actual parallelism of about 16 fetches which is limited by number of buck threads.
Bumping number of buck threads with `-j 32` further reduces build time to **54 minutes (-60% from original)**
Reviewed By: xavierd
Differential Revision: D30025720
fbshipit-source-id: 6bcae1f353a0d31d8ce632ccb991c2a02f3136fc
Summary:
During a diff operation, files that have a case change should not reported as
changed for case insensitive mount. This is a follow up to D29150552 (37ccaa9231) where the
TreeInode code was taught to ignore case changes for case insensitive mounts.
Reviewed By: kmancini
Differential Revision: D29952464
fbshipit-source-id: e5fa8eccc1c1984a680838fd8589a0278989d9d0
Summary:
While not strictly necessary, we should be providing the correct `Key` for
trees when fetching them from the Hg datapack store. Pass the `path` through
to the backing store so it can be used to construct the `Key` for the tree.
Reviewed By: DurhamG
Differential Revision: D29933214
fbshipit-source-id: d9631ea37b5ffa3f7051112d12cf3161c7c677ef
Summary:
If Mercurial asks EdenFS to update to a commit that it has just created, this
can cause a long delay while EdenFS tries to import the commit.
EdenFS needs to get the trees out of the Hg data store. But these also
won't know about the new trees until the data store is refreshed or synced.
To fix this, call the refresh method on the store if we fail to find the tree,
and try again. To make this work, we must first only look locally. To keep
things simple, we only do this for the root tree.
However, currently indexedlogdatastore doesn't actually do anything when you
ask it to refresh.
To fix these, we call `flush()`, which actually does a `sync` operation and
loads the latest data from disk, too.
Reviewed By: DurhamG
Differential Revision: D29915585
fbshipit-source-id: 34fe02ddf5804be425d9cabe1a56069f22e5e4d4
Summary:
If Mercurial asks EdenFS to update to a commit that it has just created, this
can cause a long delay while EdenFS tries to import the commit.
EdenFS needs to resolve the commit to a root manifest. It does this via the
import helper, but the import helper won't know about the commit until it is
restarted, which takes a long time.
To fix this, we add an optional "root manifest" parameter to the checkout or
reset parents thrift calls. This allows the Mercurial client to inform EdenFS
of the root manifest that it already knows about, allowing EdenFS to skip this
step.
Reviewed By: chadaustin
Differential Revision: D29845604
fbshipit-source-id: 61736d84971cd2dd9a8fdaa29a1578386246e4bf
Summary: This adds counters for memory and disk counts in addition to import count so that we can understand cache hit rates during local investigation or output this in ActivityRecorder.
Reviewed By: genevievehelsel
Differential Revision: D29805637
fbshipit-source-id: 34261f91c33d6bd4bcb4b85b17d2e68360410896
Summary: We want to support different batch sizes for blob or tree. This diff moves the batch size read logic into `HgImportRequestQueue`, adds a new config `import-batch-size-tree` (added in D29714772), and updates tests accordingly.
Reviewed By: kmancini
Differential Revision: D29703450
fbshipit-source-id: b85666838a0a8c1857b9d1de4f6c47128063633a
Summary:
Channel is the general term, and FS is the distinguishing term. Rename
cause to FS, with the option for more specificity later: FUSE vs. NFS
vs. Projected FS.
Reviewed By: xavierd
Differential Revision: D29467592
fbshipit-source-id: 78fe4663f7a9896cf21f7d067bd33ab859ecdec9
Summary:
See this graph - https://fburl.com/scuba/mononoke_blobstore_trace/77nyw174. It
shows that there was a huge jumps in the number of "hgmanifest" fetches from
scs that didn't exist in the manifold. This is not great for mononoke, because
it uses a significant chunk of our manifold qps capacity (~30K qps on average,
with spikes going up to 100K qps). The main suspect here is
repo_list_hg_manifest method, because few other scs methods (if any) fetch hg
data.
It correlates with the release of new eden version on June, 23, which added
getTreeBatch() method, and looks like it was passing manifest hash and eden id
in the wrong order. See https://fburl.com/code/tsjwss8a
```
proxyHash.revHash(), // this is really the manifest node
```
This is a comment in `getTree()` method which says that `proxyHash.revHash()`
is manifest id, which is exactly what we need to send to scs.
[ScsMetadataImport::getTreeMetadata](https://fburl.com/code/xammvq2k) suggests
that the second parameter should be manifestId (i.e. proxyHash.revHash()), but
we were passing them in the wrong order. Let's fix it.
Reviewed By: xavierd
Differential Revision: D29428581
fbshipit-source-id: d041008f00c7519504c6c67173ca85709e9dc415
Summary:
I'd like to be able to override this from unit tests. The underlying goal is
to let us make this path absolute in unit tests, so that we can make Eden unit
tests that rely on hgPath work on Buck v2, where paths are relative.
xavierd suggested overriding the glfag would be the easiest course of action
so that's what this does.
Reviewed By: xavierd
Differential Revision: D29060827
fbshipit-source-id: c4084072e3eef9cc5c9d404d10b3e19028f177ea