Summary:
If multiple requests to fetch a tree come in at the same time fast enough (where the first request hasn't had the chance to retrieve the data from Mercurial and save it in the cache), we will request to download the tree multiple times. If the tree is not in the hgcache, this means we will make an extra round trip to the server. There is no limit to how many concurrent requests can be made, meaning we could make a large amount of round trips to the server for the same tree.
This adds a tracking mechanism in which we track in progress tree fetches, and if we get a request that is already in the queue or being processed, we just return a future that will be fulfilled by the first request, instead of placing this duplicate request in the queue as well.
This also makes sure if we get a duplicate request, but the duplicate request has a higher priority than the request already in the queue, we will update the priority of the request in the queue.
Reviewed By: chadaustin
Differential Revision: D26355499
fbshipit-source-id: 8d3192cf0f5628c650715f4597c92fc8c9238650
Summary:
"value not present in store" is an unhelpful error message once
serialized through Thrift etc. Provide more useful context when a key
is missing.
Reviewed By: fanzeyi
Differential Revision: D26678102
fbshipit-source-id: 514ac2fe580d1dd7c67fc20c89b75e5d8121c329
Summary: This diff sets up the ability for us to track requests as they are shuffled around in the `std::vector<> queue`. Since the queue is a vector, and since it is sorted everytime a new element is added or removed, we cannot keep track of elements in the queue with indices or references. Instead, we will store the requests in a shared_ptr so we can maintain a pointer to the request no matter where the request is moved around to.
Reviewed By: kmancini
Differential Revision: D26355907
fbshipit-source-id: d714d689963106a4f495221dbcfcbab758ffc7b2
Summary:
It's silly to use `eden prefetch --no-prefetch` to efficiently glob
for filenames. Introduce an `eden glob` command which resolves a glob
relative to the current working directory.
Reviewed By: genevievehelsel
Differential Revision: D25450358
fbshipit-source-id: 45d6dc870d21510e51d5662c75e80385886899fc
Summary:
Logging all these throttling notifications is not necessary. There can
sometimes be big batches of fetches (like 100s of K). Lets reduce this by a
factor of 1000.
Note we also would like to add logging of what process triggered these fetches
what endpoint they use etc. This will help us identify the workflows causing it,
so we could address them or skip aux data fetching in these code paths.
But this requires some fiddling with ObjectFetchContext and the logging
code, so its gonna take a bit longer :(
Reviewed By: genevievehelsel
Differential Revision: D25505654
fbshipit-source-id: e7c40164db86fadf4baf0afd0c52879e0cb2568b
Summary:
With certain compiler flags:
```
#include <atomic>
struct ImportPriority {
ImportPriority() {}
};
std::atomic<ImportPriority> f() {
throw "";
}
```
Fails with with:
```
libgcc/include/c++/trunk/atomic:194:7: error: exception specification of explicitly defaulted default constructor does not match the calculated one
atomic() noexcept = default;
^
Demo.cpp:15:29: note: in instantiation of template class 'std::atomic<ImportPriority>' requested here
std::atomic<ImportPriority> f() {
^
1 error generated.
```
Reading the C++ spec, constructors default to noexcept, but it's not clear if that's true when they have a body. There are flags you can set that make this compile, but let's be maximally compatible and add `noexcept`.
Reviewed By: fanzeyi
Differential Revision: D26076451
fbshipit-source-id: 2f63256377fb31fd7867d7b03e7572e033f72dfc
Summary: Update name to match usage of Try as tri-state, since this method also throws if the Try is empty
Reviewed By: yfeldblum
Differential Revision: D25737810
fbshipit-source-id: a4166153362f07353d212216fbaf7105867eef2a
Summary:
The StringPiece constructor is untyped, and was only used in test. We can
afford to build the PathComponent in tests instead to avoid future headaches.
Reviewed By: genevievehelsel
Differential Revision: D25434556
fbshipit-source-id: 4b10bf2576870e81412d76c4b9755b45e26986b3
Summary:
Mercurial support files with `\` in their name, which can't be represented on
Windows due to `\` being the path separator. Currently, EdenFS will throw
errors at the user when such file are encountered, let's simply warn, and
continue.
Reviewed By: chadaustin
Differential Revision: D25430523
fbshipit-source-id: 4167b4cd81380226aead8e4f4850a7738087fd95
Summary:
On OSX, if Mercurial is built from fbcode, these environment variables
(which point specifically to Eden's own par file data) can break Mercurial's
ability to load dynamic libraries. Let's unset them.
Reviewed By: xavierd
Differential Revision: D25783552
fbshipit-source-id: 74e6232d225856fedc0382abc6cd223a6c47d8bc
Summary:
HowToEven believes that both path and manifestNode might be used after being
moved and thus complains about it as that's often what is intended. However,
in C++17, this lint is spurious as both of these variables will be moved after
being copied properly in the first lambda. To silence the linter, let's just
split the combinator chain in 2.
Reviewed By: genevievehelsel
Differential Revision: D25627413
fbshipit-source-id: 1a93ca039310dfd04a3f11bd9c7de32e93057517
Summary:
The code still took a dependency on Mercurial's old manifest code to parse
manifests. It turns out the manifests have a very simple format that we could
parse directly.
This avoids various copies, conversions, std::list, removes ~1k lines of code,
at the expense of adding ~100 lines of code (some of them being C++
boilerplate).
Reviewed By: fanzeyi
Differential Revision: D25385018
fbshipit-source-id: 90d4cda2b7797584bc48c086d5592a7ecaa05dfc
Summary:
Mercurial is incompatible with TSAN at the moment, due to Rust/C++
compilation issues, Python multiprocess, and Tokio false positives. So
disable our HG tests when running under TSAN.
Reviewed By: fanzeyi
Differential Revision: D25109413
fbshipit-source-id: 86e51ebd287e10f92d6e3b8e7bf191e7946c709a
Summary:
Add a TraceBus to HgQueuedBackingStore and allow tracing import events over Thrift.
This powers a new `eden trace hg` command that allows a live view of
tree and blob imports.
Reviewed By: genevievehelsel
Differential Revision: D24732059
fbshipit-source-id: 525152fe39047160a68c1706217a06a00a6dbef1
Summary: The test doesn't compile on Windows, let's just ifdef it.
Reviewed By: genevievehelsel
Differential Revision: D25033804
fbshipit-source-id: 4f312f010f9d0db42cc9ae19df3f668e8e1c4665
Summary: This is unused, except in tests, so let's just remove it.
Reviewed By: chadaustin
Differential Revision: D24930011
fbshipit-source-id: cb132962e1dff9d12ce12e7eb75bd34a026c58b7
Summary:
This is the plumbing to allow us to skip Metadata prefetching during eden
prefetches. These can trigger a bunch of wasted network requests
when we are fetching files anyways. (These network requests are wasted since we
fetch the file contents and most of them are being throttled on sandcastle anyways.)
We won't necessarily want to skip metadata prefetching always, we will still want it
for the watchman queries, but for `eden prefetch` will probably want to skip it. This
is why we are making it an option in the GlobParams.
Reviewed By: chadaustin
Differential Revision: D24640754
fbshipit-source-id: 20db62d4c0e59fe17cb6535c86ac8f1e3877879c
Summary:
The EdenFS codebase uses folly/logging/xlog to log, but we were still relying
on glog for the various CHECK macros. Since xlog also contains equivalent CHECK
macros, let's just rely on them instead.
This is mostly codemodded + arc lint + various fixes to get it compile.
Reviewed By: chadaustin
Differential Revision: D24871174
fbshipit-source-id: 4d2a691df235d6dbd0fbd8f7c19d5a956e86b31c
Summary:
For logging and analytics, it's convenient for the
HgQueuedBackingStore to know the manifest hash and path early in the
import process. Since every object fetch requires looking up the
HgProxyHash anyway, fetch it immediately and thread it down to the
importer.
Reviewed By: kmancini
Differential Revision: D24524319
fbshipit-source-id: 0d91d55655e5ee25a010f7664e80125b7c50cf84
Summary:
Use the empty string to indicate the moved-from state, which makes
HgProxyHash moves noexcept. I plan to look up HgProxyHash early and
move it into HgImportRequest.
Reviewed By: kmancini
Differential Revision: D24522612
fbshipit-source-id: 037b4012ad6a51ad7ebd6a96de2e391cd570771b
Summary:
Stop pretending that HgBackingStore is a standalone backing store
implementation, and instead indicate it's an implementation class used
by HgQueuedBackingStore.
Reviewed By: kmancini
Differential Revision: D24514247
fbshipit-source-id: 90c3a442d01647fa6d1337cfd814f5bf4b480137
Summary:
We had some unnamed threads that made profiling performance on macOS a
bit harder. Give them a semblance of a useful name, at least.
Reviewed By: kmancini
Differential Revision: D24640223
fbshipit-source-id: 7dd74b30a081753006df681bf97ac96147b1896c
Summary:
This will enable to gather a bit more debugging regarding what processes are
fetching data. The one missing bit on Windows is to collect the process name,
for now, a "NOT IMPLEMENTED" placeholder is put in place.
Reviewed By: wez
Differential Revision: D23946258
fbshipit-source-id: 9f7642c7b9207c5b48ffff0f4eb0333af00bc7d5
Summary: Since this is only used in the manifest target, fold it into it.
Reviewed By: DurhamG
Differential Revision: D24062629
fbshipit-source-id: c3241b53bde7abba8a80a2945661d1a24b7e3034
Summary:
This is no longer about datapack, but only about parsing manifest entries, thus
renaming.
Reviewed By: DurhamG
Differential Revision: D24062634
fbshipit-source-id: 5c52b784d20437e87012dd4bc6cb13d879da9cb9
Summary:
This code is effectively unused. The only bit still relevant is that EdenFS
still depends on the Manifest class to parse a manifest.
Reviewed By: DurhamG
Differential Revision: D24037723
fbshipit-source-id: 901ae2ffc8960a95ec655a2e14d79afb8d32dcab
Summary:
It was used as a fallback path only for a while now. Since Mercurial has
support CMD_CAT_TREE for quite some time, let's get rid of it.
Reviewed By: fanzeyi
Differential Revision: D24037004
fbshipit-source-id: 69887e6d8508419a22d68d062c78676aacba3b24
Summary:
If the data isn't found in the Rust one, it can't be found in the non-Rust one.
Since the non-Rust one will issue a filesystem rescan, this is a fairly
expensive operation which shows up in strobelight when trying to walk the
entire repo with: `rg --files`.
There is one last place that still use the non-Rust stores and that's as a
fallack for when Mercurial doesn't support CMD_CAT_TREE. Since this has been
supported for a bit, I'll make a followup change to completely get rid of the
non-Rust stores.
Reviewed By: fanzeyi
Differential Revision: D24035451
fbshipit-source-id: acd9741a16f3786796d329a4cddfe4ee435bcad9
Summary:
It looks like clang in mode/win choke on the code otherwise: P143369452. It's
definitively not clear to me why. Since fmt::format should pretty much
similarly to folly::to<std::string>, just use that instead.
Reviewed By: chadaustin, fanzeyi
Differential Revision: D23884978
fbshipit-source-id: f4a7e47169c8b78011340ec0a41871c9fa4b7181
Summary: We only care about the files we need when recording prefetch profiles (since we don't want to fetch top level directories). So let's skip recording `Tree` object types.
Reviewed By: kmancini
Differential Revision: D23693533
fbshipit-source-id: 9af5437ff6571a34597425ca5f657e7126671ba9
Summary:
The ProcessNameCache code is compiled on Windows now, this definiton could
cause issues with different cpp files compiling different version of the
ProcessNameCache. To avoid this, let's remove it from Stub.h, this also removes
a bunch of #ifdef.
Reviewed By: chadaustin
Differential Revision: D23693490
fbshipit-source-id: 8f3f7b1128235b9a60f850e688b9e98910c202fc
Summary:
Currently we use a single path prefix to configure data fetch logging in eden
(i.e if the path of a file which we fetch is an extension of our configured
path, then we log that data fetch. )
There is some interest in extending this to multiple path prefixes, so that we
can log separate parts repo.
Reviewed By: StanislavGlebik
Differential Revision: D22877942
fbshipit-source-id: f6eb3dcb4fa460b4acab09677e972caf9421ddff
Summary:
In the code base, "channel" is used to denote the OS mechanism that sends
EdenFS notifications. In macOS and Linux, that's Fuse, on Windows, that's
ProjectedFS. To avoid platform specific naming in ObjectedFetchContext, let's
rename the fetch cause enum.
Reviewed By: kmancini
Differential Revision: D23462460
fbshipit-source-id: 3ac68cdf4999e6a3b4ff4ee266f94e1f9736df39
Summary:
This commit introduces a new process spawning class derived
from the ChildProcess class in the watchman codebase.
`SpawnedProcess` is similar to folly::Subprocess but is designed around the
idea that we will use a system provided spawning API to start a process, rather
than assuming the use of `fork`.
`fork` is to be avoided because it can be expensive for processes with large
address spaces and also because it interacts poorly with threads on macOS. In
particular, we see the objC runtime terminating our process in some scenarios
where fork and threads are mixed.
There are some important differences from `folly::Subprocess` and that means
that some assumptions and uses need to be altered slightly from their prior
workings. For example, detaching a SpawnedProcess moves the responsibility of
waiting on the child to a periodic task as there is no way to detach via
posix_spawn without also using fork.
On the plus side, this commit allows unifying spawning between posix and
windows systems, which simplifies the code!
Reviewed By: xavierd
Differential Revision: D23287763
fbshipit-source-id: b662af1d7eaaa9ed445c42f6c5765ae9af975eea
Summary:
As previous diffs in the stack show there were at least one place in the
codebase which used incorrect object context logger and that resulted in "blind
spots" in undesired file fetches logging i.e. undesired file fetches were
logged, but neither pid nor cmd-line was logged.
There are quite a few places in the codebase that use null
object fetch context, and threading the correct object fetch context to all of
them might be hard. Threading the context is a bit annoying, so it would be good to know something like "EdenDispatcher code is responsible for most of the blind spots, so let's thread the correct context there first". Or it would be equally good to know that none of the null object context are responsible for blind spots.
This diff might help us decide where we need to thread real object fetch context
first. Instead of passing null object fetch context let's pass null object
fetch context with causeDetail field. This field will be logged to scuba (see
BackingStoreLogger::logImport code), and instead of getting "Unknown" interface
we'll get e.g. "Unknown - EdenDispatcher::create", and that would highlight
where we need to thread the context.
A note about implementation - getNullContextWithCauseDetail returns a raw pointer
which is expected to be static i.e. it should work similarly to current
getNullContext implementation. It's quite a hack, but allows us to get rid of
memory allocations (we'd have one memory allocation per place in the code where
getNullContextWithCauseDetail). Let me know if you are ok with this hack.
Reviewed By: kmancini
Differential Revision: D23422526
fbshipit-source-id: e576bba9fc09e160fc42771c7589cdd1694d93c0
Summary:
Enabling hg dynamicconfigs in D23309090 (d643f48c8c) changed the output of `hg
manifest --debug` and broke HgImportTest. Set TESTTMP to avoid
production configs.
Reviewed By: DurhamG
Differential Revision: D23335847
fbshipit-source-id: 7ffd0394aa7a8466b266000b18f8742ed4a6b53f
Summary: Some clean up to do. `Process` will crash the entire process if `Pipe` is ever `std::nullptr`. So let's not give it a default argument `std::nullptr`.
Reviewed By: xavierd
Differential Revision: D22958765
fbshipit-source-id: 0c35e805f24a0d572bbc08efc97e59a37d0cbf88
Summary:
Having the type of data fetched can help in debugging where these fetches are
comming from. In the currently logs figuring out if a data fetch is blob or
tree requires some manual work. When looking at a big bunch of fetches this is
not super practical.
So this includes this info in our logging.
Reviewed By: chadaustin
Differential Revision: D23243444
fbshipit-source-id: 9abe5180c5d2afc0d02b27ba6a6b76401e86556e
Summary:
Previously pieces of the command line for a process were seperated by `\0`.
This makes them a bit hard to read and also makes running queries on them
harder. Converts these `\0` back to spaces to fix this.
see https://fb.workplace.com/groups/edenfs/permalink/1446711485499079/ for
more motivation.
Reviewed By: wez
Differential Revision: D23266909
fbshipit-source-id: e4a9284e04039fcd971bed0d6e21d220e946acdb
Summary: Previously we use HgImporter prefetch request in `prefetchBlobs()`, but using `getBlob()` can give us more control over the prefetch process later. So now `getBlob()` is used in `prefetchBlobs()` when `useEdenNativePrefetch` is configured as true.
Reviewed By: kmancini
Differential Revision: D22984848
fbshipit-source-id: 0bd0b1c5b50bb16da36f188915904d0223827dc3
Summary:
With Mercurial now supporting CMD_CAT_TREE for efficiently fetching and reading
trees, we can plumb this onto EdenFS. At startup time, we detect whether
Mercurial supports CMD_CAT_TREE and use that method, otherwise, we fallback to
the old CMD_FETCH_TREE.
Reviewed By: wez
Differential Revision: D23044953
fbshipit-source-id: 9aea5c5b82e97039a75ef18976a155dcb6e150bc
Summary:
Previously we fetched metadata by commit hash and path. We knew this would be a
little extra expensive, but turns out this is a lot extra expensive.
Wait why is it expensive?
In short: lots of extra lookups that are not satisfied by cache :(
In long:
1. Each piece of the path would require a read to fetch the fsnode for that tree.
So this means asking for the metadata of a/b/c/d/e means 5 reads.
2. Normally these reads could be cached, but often we would make these requests
with a commit hash for a draft commit. On the server side this info is not
cached for a draft commit, this means a lot of database reads and recalculating.
(Most of the real uses of metadata prefetching is when an engineer is working
on a local commit. We just use the commit hash of the commit the user was on
when fetching metadata for a tree, even if that tree hasn't changed since a public
commit. so this means lots of requests with draft commit hashes).
Fetching by manifest id we are able to bypass this sequential path look up.
(and even if we are on a draft commit, if the tree has not locally changed
since a public commit, the manifest id will be the same as the public commit
avoiding this whole draft commit issue).
This allows us to query scs with a manifest id for a tree.
Reviewed By: wez
Differential Revision: D22990687
fbshipit-source-id: aa81d67de1f1d04a14d174774ee216f5ac6be5ba
Summary:
This makes it similar to the Unix one, which reduces the ifdef a tiny bit.
Ideally I'd want to move the pipe handling into its own class so callers won't
have to care about windows/linux specificities.
Reviewed By: fanzeyi
Differential Revision: D22954056
fbshipit-source-id: c92a25b6abe084a7c7496c0d6e07795779e0abad
Summary:
after `startRecordingFetch()` is called by `HgQueuedBackingStore`, record the path for each fetched file. When `stopRecordingFetch()` is called, stop recording and return the collected file paths.
`startRecordingFetch()` and `stopRecordingFetch()` will be used in the next diff.
Reviewed By: chadaustin
Differential Revision: D22797037
fbshipit-source-id: a1fe30424d3c2884ffe139a0062b1e36328fd4fe
Summary:
Previously we log a process to Scuba when it does 2000 (fetchThreshold_) fetchs, but then in Scuba all processes have fetch_count = 2000. In order to see how many fetches a process really did approximately, we log the same process to Scuba every time it does 2000 more fetches.
Note: this change could make the total count of fetch-heavy events in Scuba inaccurate, as we log the same process more than once. So when users want to see how many fetch-heavy events happened, instead of setting "type = fetch_heavy", they should set exactly "fetch_count = 2000".
Reviewed By: chadaustin
Differential Revision: D22867679
fbshipit-source-id: ae3c768a8d3b03628db6a77263e715303a814e3d