Summary:
D5067763 introduced a potential deadlock. The issue is that all of the FUSE threads were blocked
on the HgImporter thread pool, which completes its futures back on serverEventThread_. The
FUSE threads were blocked on Future::get() in ensureDataLoaded().
Eventually, the right fix is some combination of eliminating ensureDataLoaded() and
replacing it with an explicitly-asynchronous API.
Reviewed By: bolinfest
Differential Revision: D6212858
fbshipit-source-id: 42b17d3e20a200f26b87588784edb5ee51e96a4a
Summary:
The result of this call was never used, and the call does not appear to have
any side-effects.
Reviewed By: chadaustin
Differential Revision: D6212076
fbshipit-source-id: e87858b75d9c0482e5f544411a85ebf1a66d63ba
Summary:
For a while, the recommended approach (e.g. from folks on the thrift team, from folks on the javafoundations team) has been to use `languages = ["java-swift"]`. This renaming brings the code in line with the recommendation. There are no actual changes to any builds, this is just a rename.
This is the second diff in a small stack. It's a fully automated change that uses buildozer to rename `*-java` rules to `*-javadeprecated`.
Differential Revision: D6189950
fbshipit-source-id: a08ba86cb92293a07eceb4b6b29385c7bf647b72
Summary:
For a while, the recommended approach (e.g. from folks on the thrift team, from
folks on the javafoundations team) has been to use `languages = ["java-swift"]`.
This renaming brings the code in line with the recommendation. There are no
actual changes to any builds, this is just a rename.
This is the first diff in a small stack. It edits the `JavaThriftConverter` to
use the `-javadeprecated` naming, but with a `java_library` shim that retains
the `-java` naming temporarily (to ease the transition). There's also a little
shim when processing the `languages` parameter, to accept either `java` or
`javadeprecated`.
Reviewed By: yfeldblum
Differential Revision: D6204449
fbshipit-source-id: 3e7f3ca7a4d5e1976176479ff4298da72af9e685
Summary:
There was a lock ordering violation in the combination of StreamingSubscriber and Journal. I changed StreamingSubscriber to always acquire the Journal lock before StreamingSubscriber's State lock.
I also simplified StreamingSubscriber's API to a single static function to isolate all of the lifetime and thread safety concerns in one implementation file.
This fixes a regression caused by D6162717
Reviewed By: simpkins
Differential Revision: D6202770
fbshipit-source-id: 326269db15bf3200bd6321edf372daf286784fb5
Summary:
Using this option prints the rage report to stdout rather than to the default
reporter. Sometimes it's easier to get the report on stdout so you can pipe it
to other commands.
Reviewed By: simpkins
Differential Revision: D6195684
fbshipit-source-id: 59e1ae6b19b951eac67377cdd5f8c0560e0d181a
Summary:
Relying on the toString of an Exception in Python seems a little gross,
especially when the `stderr` field is available directly. Cleaned up two
instances of this so it doesn't get copypasta'd further.
Reviewed By: simpkins
Differential Revision: D6195633
fbshipit-source-id: 9ae77796c287a454cb169ebf6de2953909a1e6c3
Summary:
This was an error that an end-user ran into. Previously, we did not fail
gracefully and the user was faced with an intimidating stacktrace.
Reviewed By: simpkins
Differential Revision: D6195529
fbshipit-source-id: bde3c2a3e6f49457a4c6ac5c87103cf52cd227c2
Summary:
In previous discussions, it has been pointed out that `folly::RequestContext::setContext` consumes considerable amount of CPU cycles in production environments. After some investigation, we thought that might be caused by looping over all `RequestData` instances can calling the virtual `onSet` and `onUnset` callback of them. Both the iteration and invoking virtual methods are not cheap.
As you can see from this change, most of the derived classes of `RequestData` don't override the `onSet` and `onUnset` methods. Mostly likely they are only used for per-Request tracking. So the natural idea is to skip those instances when iterating and avoid the unnecessary virtual method invoke.
I have explored the solution to dynamically examine if the `RequestData` instance added has `onSet` and `onUnset` method overridden. That is possible with GCC's PMF extension, but not [[ http://lists.llvm.org/pipermail/llvm-bugs/2015-July/041164.html | for Clang ]] and probably many other compilers. This definitely won't be very good for `folly`'s probability, even if we gate it by compiler flags.
Therefore, this Diff adds the `hasCallback` method to `RequestData` class indicating whether the instance would have `onSet` and `onUnset` overridden. To make it clear to users that they need to correctly override it in order for their `onSet` and `onUnset` callback to work, making it abstract so that user must override it to something and would aware of that.
Also made some improvements on documentation.
Reviewed By: myreg
Differential Revision: D6144049
fbshipit-source-id: 4c9fd72e9efaeb6763d55f63760eaf582ee4839e
Summary:
I ran into this issue while manually testing Eden.
Currently, this integration test fails, so it is tagged with `unittest.skip`.
There are substantial changes to our distate logic coming in D6179950, so I
will attempt to make the test pass as part of that revision.
Reviewed By: simpkins
Differential Revision: D6199789
fbshipit-source-id: cd7ce48b72bf0b54e13547b23823f4d496fa5b0b
Summary:
Upstream, some new merge actions were added:
* `p` https://phab.mercurial-scm.org/D776
* `pr` https://phab.mercurial-scm.org/D777
We must include entries for these in the list of `actions` that we build up in
`eden/hg/eden/__init__.py` because the `actions` dict gets passed through to
Mercurial's own `applyupdates()` function in `merge.py` that contains this line:
```
for f, args, msg in actions['p']:
```
Therefore, without an entry for `p` in `actions` here, we get a `KeyError`.
Reviewed By: markbt
Differential Revision: D6199215
fbshipit-source-id: a7408e5ef84a659f37e7771a7c15f6a4b14ae0f9
Summary:
In practice, if the `hg graft` succeeds in a weird way, `assert_status_empty()`
tells a lot more about what went wrong than the number of commits not matching up.
While here, I also added the following entry to the default `.hgrc` used in integration tests:
```
[ui]
origbackuppath=.hg/origbackups
```
I needed this for the change to `graft_test.py`. As we were already setting this option in
the `histedit_command.py` utility as a one-off and this is the default value of this setting
for our internal Mercurial use at Facebook, it seemed best to make it the default for all
of our integration tests. As such, I removed the one-off setting in `histedit_command.py`.
Reviewed By: simpkins
Differential Revision: D6180342
fbshipit-source-id: 6f0487624a1824459403126997ea52d1a7921feb
Summary:
Previously we flushed the pending transaction data in
eden_dirstate.setparents(). However, some dirstate code paths (particularly
dirstate.rebuild()) can directly call eden_dirstate_map.setparents().
We need to make sure the transaction data is flushed in this case.
Reviewed By: bolinfest
Differential Revision: D6175410
fbshipit-source-id: 256cb07f57ada02d6c1f118ec5075fb8ac93506c
Summary:
Now that we make use of the `.hg/dirstate` file, we should ensure that it is
initialized correctly. This takes the commit hash from the `SNAPSHOT` file and
passes it as an argument to the `post-clone` hook. In the Hg `post-clone` hook,
it writes the `.hg/dirstate` file such that the first 20 bytes are the binary
version of the commit hash and the next 20 bytes are nul bytes.
Reviewed By: simpkins
Differential Revision: D6174440
fbshipit-source-id: 29d17f88ca3f63fa6490e1db88bc2121aced74d6
Summary:
We have a couple of issues with the watchman/eden integration:
1. If you "win" the race, you can cause a segfault during shutdown
by changing files during unmount. This causes the journal updating
code to trigger a send to the client, but the associated eventBase
has already been destroyed.
2. We don't proactively send any signal to the subscriber (in practice: watchman)
when we unmount. Watchman detects the eden shutdown by noticing that its
socket has closed but has no way to detect an unmount.
This diff tries to connect the unmount with the set of subscribers and tries
to cause the thrift socket to close out.
Reviewed By: bolinfest
Differential Revision: D6162717
fbshipit-source-id: 42d4a005089cd9cddf204997b1643570488f04c3
Summary: The last missing request. It actually sends two requests - one to resolve the root tree manifest id, another to get Tree by tree manifest id
Reviewed By: simpkins
Differential Revision: D6099727
fbshipit-source-id: eec27441d19424de4170bc00aa45e81627f9d4ff
Summary:
Previous to this change, the user got an inscrutable error message. It turns out
that it is easy to make this mistake, typing `eden clone fbsource/` instead of
`eden clone fbsource` if you accidentally use tab completion.
Reviewed By: simpkins
Differential Revision: D6153889
fbshipit-source-id: 3642fdd207d6abf896d6a12891d5eb68ad984acc
Summary:
Previously, the `savebackup()` and `restorebackup()` methods in `eden_dirstate`
only retained the parent commit hashes. With this change, now the dirstate tuples
and entries in the copymap for the dirstate are also included as part of the saved
state.
Failing to restore all of the state caused issues when doing things like aborting
an `hg split`, as observed by one of our users. Although this fix works, we ultimately
plan to move the responsibility for persisting dirstate data out of Eden and into the
Hg extension. Then the data will live in `.hg/dirstate` like it would for the default
dirstate implementation.
Reviewed By: simpkins
Differential Revision: D6145420
fbshipit-source-id: baa077dee73847a47cc171cd980cdd272b3a3a99
Summary:
Add an `eden debug hg_dirstate` command to dump the contents of the Hg dirstate.
This data is stored in a binary format, so we need a custom command to view it
easily.
Reviewed By: simpkins
Differential Revision: D6139172
fbshipit-source-id: 622c0b7bcaa471a88483c6c4ddef7e0be95a3dfa
Summary:
When performing an source control checkout operation, we attempt to remove
directories that are empty after the checkout. However, this code path was
missing a call to flush the kernel cache for these directories.
As a result, even though eden thought the directory not longer existed, and
would not report it in `readdir()` results, the kernel would return stale
information from its cache when explicitly accessing this path.
Reviewed By: bolinfest
Differential Revision: D6151543
fbshipit-source-id: 6031feb268ff6f980c885efb26c3d43243dec3f4
Summary:
Add a few extra debug logs to record `processCheckoutEntry()` and
`saveOverlayPostCheckout()` calls.
Reviewed By: bolinfest
Differential Revision: D6151544
fbshipit-source-id: ca6faa8fd1fe53df1e70305f5527360c918841d1
Summary: This diff helps some common pitfalls when using set_log_level.
Reviewed By: simpkins
Differential Revision: D6142849
fbshipit-source-id: 7fa35392dda148af90d0aefdb872b6e8a8b770db
Summary:
Thess flags are not used (neither in code, nor passed in on command line). List of flags that are getting killed:
thriftMaxNumMessagesInQueue
max_queue_message_num
thrift_max_unprocessed_message
thrift_set_max_num_messages_in_queue
decorator_thriftSetMaxNumMessagesInQueue
atlas_pinger_imps_thriftSetMaxNumMessagesInQueue
tesseract_ocr_max_queue_msgs_per_thread
max_msgs_in_queue
max_num_messages_in_queue
maxNumMessagesInQueue
thrift_max_messages
thrift_queue_len
max_accepts_in_pipe
max_pending_connections_per_io_thread
thriftMaxNumPendingConnectionsPerWorker
lookup_thriftSetMaxNumMessagesInQueue
thriftSetMaxNumPendingConnectionsPerWorker
max_queue_msgs_per_thread
ds_thrift_max_num_messages_in_queue
zdb_thrift_max_num_messages_in_queue
update_control_proxy_queue_length
safety_enforcer_proxy_queue_length
maxRequestsInQueue
max_num_messages_in_pipe
Reviewed By: yfeldblum
Differential Revision: D6143500
fbshipit-source-id: 541507d50100817590b91cdd48e39a29e7c465ea
Summary:
Have HgBackingStore hold multiple HgImporters in its own thread pool. Incoming
requests are processed by threads in the thread pool.
Reviewed By: wez
Differential Revision: D5067763
fbshipit-source-id: d666b1026455d13442367673010b5934ff4cdb48
Summary:
While debugging SRProxy issue I found a bug in ThriftServer initialization (more details in D6115379), due to that bug we always have accept queue of 1k and the value passed to `setMaxNumPendingConnectionsPerWorker` is ignored, thus making all call sites bogus.
We have ~350 calls to this function in fbcode, all of them pass random numbers such as 50, 100, 50k, etc. I’m removing all of them before landing the fix for Thrift to avoid causing production issues.
Unfortunately almost every such caller has a gflag for it (that are usually have confusing names, e.g. FLAGS_thriftSetMaxNumMessagesInQueue). I’m not touching these gflags, because they may be used in different configs and removing them would cause breakages.
Reviewed By: yfeldblum
Differential Revision: D6129569
fbshipit-source-id: 1550b5073bac42d0c6fb7bdffa1835bf523609c8
Summary: looks like we missed ms2316's changes from august
Reviewed By: simpkins
Differential Revision: D6113294
fbshipit-source-id: 5dd43bd554f581397af3eb9853fa965e6127f429
Summary: this is all non-hphp includes that are going in container/
Reviewed By: mzlee, yfeldblum
Differential Revision: D6121745
fbshipit-source-id: b024bde8835fc7f332686793d75eb8e71591c912
Summary:
CodeMod: Replace `wangle/concurrency` with `folly/executors`.
The headers in `wangle/concurrency/` are now but shims to equivalent headers in `folly/executors/`.
Reviewed By: jsedgwick
Differential Revision: D6120852
fbshipit-source-id: 358ceabea7ad79f84b803ed8e3aecb2a57fdd077
Summary:
Everything that's going in system/ besides CpuId and Subprocess,
which are included in hphp
Reviewed By: mzlee
Differential Revision: D6102263
fbshipit-source-id: 564ef584c341a4ac79db14a9d58fe23ce51e78b3
Summary:
During the initial phases of a large Buck build, we expect to get a lot of
`glob()` calls during the parse phase (assuming the Buck project is configured
with `glob_handler = watchman`) and a lot of `getSHA1()` calls during the
building phase. Being able to dump this to the log file will make this easier to
audit.
Reviewed By: chadaustin
Differential Revision: D6097764
fbshipit-source-id: 4cb1bb4f6b21830c4d830fcdcf87023eab859f57
Summary:
[Wangle] Dead Code: `wangle/concurrent/SerialExecutor.h`.
It was just a shim around `folly/executors/SerialExecutor.h` but the shim is no longer required.
Reviewed By: Orvid
Differential Revision: D6075932
fbshipit-source-id: a5df32c7a27f8c60427120325016890544ffceba
Summary:
This is needed to unblock some other work on folly.
```
foundation/dependency_management/ensure-explicit-dependencies.sh -e '("boost", None, "boost_filesystem")' boost/filesystem.hpp"
```
Reviewed By: yfeldblum
Differential Revision: D6094896
fbshipit-source-id: 3d4f7b42fa54514cf8c055f5fc477c3a2463bb0a
Summary:
I changed this around at the last minute before landing D5896494 and
it seemed to work, but the optimized build is pretty consistent at falling over
at runtime with a zero page non-null address.
Reviewed By: bolinfest
Differential Revision: D6092224
fbshipit-source-id: 93448615ac3bbea30bfe7729085808d9926f7dab
Summary:
We have encountered cases where `eden health` reported
`"edenfs not healthy: edenfs not running"` even though the `edenfs` process is
still running. Because the existing implementation of `eden health` bases its
health check on the output of a `getStatus()` Thrift call, it will erroneously
report `"edenfs not running"` even if Eden is running but its Thrift server is
not running. This type of false negative could occur if `edenfs` has shutdown
the Thrift server, but not the rest of the process (quite possibly, its
shutdown is blocked on calls to `umount2()`).
This is further problematic because `eden daemon` checks `eden health`
before attempting to start the daemon. If it gets a false negative, then
`eden daemon` will forge ahead, trying to launch a new instance of the daemon,
but it will fail with a nasty error like the following:
```
I1017 11:59:25.188414 3064499 main.cpp:81] Starting edenfs. UID=5256, GID=100, PID=3064499
terminate called after throwing an instance of 'std::runtime_error'
what(): another instance of Eden appears to be running for /home/mbolin/local/.eden
*** Aborted at 1508266765 (Unix time, try 'date -d 1508266765') ***
*** Signal 6 (SIGABRT) (0x1488002ec2b3) received by PID 3064499 (pthread TID 0x7fd0d3787d40) (linux TID 3064499) (maybe from PID 30644
99, UID 5256), stack trace: ***
@ 000000000290d3cd folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t*, void*)
@ 00007fd0d133cacf (unknown)
@ 00007fd0d093e7c8 __GI_raise
@ 00007fd0d0940590 __GI_abort
@ 00007fd0d1dfeecc __gnu_cxx::__verbose_terminate_handler()
@ 00007fd0d1dfcdc5 __cxxabiv1::__terminate(void (*)())
@ 00007fd0d1dfce10 std::terminate()
@ 00007fd0d1dfd090 __cxa_throw
@ 00000000015fe8ca facebook::eden::EdenServer::acquireEdenLock()
@ 000000000160f27b facebook::eden::EdenServer::prepare()
@ 00000000016107d5 facebook::eden::EdenServer::run()
@ 000000000042c4ee main
@ 00007fd0d0929857 __libc_start_main
@ 0000000000548ad8 _start
Aborted
```
By providing more accurate information to `eden daemon`, if the user tries to
run it while the daemon is already running, they will get a more polite error
like the following:
```
error: edenfs is already running (pid 274205)
```
This revision addresses this issue by writing the PID of `edenfs` in the
lockfile. It updated the implementation of `eden health` to use the PID in the
lockfile to assess the health of Eden if the call to `getStatus()` fails. It
does this by running:
```
ps -p PID -o comm=
```
and applying some heuristics on the output to assess whether the command
associated with that process is the `edenfs` command. If it is, then
`eden health` reports the status as `STOPPED` whereas previously it would report
it as `DEAD`.
Reviewed By: wez
Differential Revision: D6086473
fbshipit-source-id: 825421a6818b56ddd7deea257a92c070c2232bdd
Summary:
This should help ensure that `eden shutdown` exits cleanly without resorting to
the use of `SIGKILL`. See D6080929 for context.
Note that D5367407 changed things so that we used `MNT_DETACH` by default, but
we've seen cases where `eden shutdown` fails to terminate within the default
timeout, so we're taking a more aggressive approach by switching to `MNT_FORCE`.
Reviewed By: wez
Differential Revision: D6085508
fbshipit-source-id: 807e4927eec640aed0c3f7d3d0d3119023ac7dd0
Summary:
In practice, we have seen cases where `eden shutdown` does not succeed within
15 seconds (the default timeout), and then we see the following message printed
to the console:
error: sent shutdown request, but edenfs did not exit within 15.0 seconds
Sometimes we end up in a state where the Thrift server has successfully shutdown
but the `edenfs` process is still running. Because `eden health` is implemented
using a Thrift call, it reports that `edenfs` is not running even though it
still is. This is confusing because a subsequent `eden daemon` request will
likely fail with something like:
```
terminate called after throwing an instance of 'std::runtime_error'
what(): another instance of Eden appears to be running for /home/mbolin/local/.eden
```
To address this, we make `eden shutdown` more aggressive in killing `edenfs`
by sending `SIGKILL`.
Follow-up work:
- Make Eden's shutdown process more robust so that `SIGKILL` is rarely, if ever,
necessary.
- Store the PID of the `edenfs` process in the lock file so that Eden does not
rely upon the Thrift server to be running in order to check whether the
`edenfs` process is running.
Reviewed By: wez
Differential Revision: D6080929
fbshipit-source-id: 462a6ec506cdb6a51211d2c64b777eadbcee9fc1
Summary:
Our existing values for these resources appear to be too conservative for
massively parallel builds. Although we expected Eden to behave more like a small
application, there are times when it has to be able to respond more like a
well-provisioned service (particularly during the parse phase of a large Buck
build).
Reviewed By: simpkins
Differential Revision: D6075340
fbshipit-source-id: 7c26e9b0f785358968430527115c63c6d8cdedc8
Summary: Request json from mononoke server and parses it.
Reviewed By: wez
Differential Revision: D5974634
fbshipit-source-id: 92095e0440d2bb018a7791e173d69542906a7e36
Summary:
First stab at implementing client for mononoke backing store.
Currently eden uses hg_importer to get hg data, and it can be slow. Mononoke server is going to be faster. Also with Mononoke server we will be able to replace mercurial sha-1 hashes with sha-256.
This backing store uses proxygen HTTP client to connect to mononoke server.
For now only getblob request is supported, and this request is synchronous. It will be made asynchronous in the later diffs.
Reviewed By: wez
Differential Revision: D5932192
fbshipit-source-id: a5464fd5b0e176cfb2da3ef4e348a62d1c06f51f
Summary:
Since adding prefetch support, we're triggering this code
each time we perform a lookup operation on a dir which means that
the eden logs are made a bit more noisy with this output than
previously.
So, let's remove the warning.
Reviewed By: simpkins
Differential Revision: D5905454
fbshipit-source-id: 0abc9b42284a5224cef908293ab377d8858977ec
Summary:
We were previously generating a simple JournalDelta consisting of
just the from/to snapshot hashes. This is great from a `!O(repo)` perspective
when recording what changed but makes it difficult for clients downstream
to reason about changes that are not tracked in source control.
This diff adds a concept of `uncleanPaths` to the journal; these are paths
that we think are/were different from the hashes in the journal entry.
Since JournalDelta needs to be able to be merged I've opted for a simple
list of the paths that have a differing status; I'm not including all of
the various dirstate states for this because it is not obvious how to
reconcile the state across successive snapshot change events.
The `uncleanPaths` set is populated with an initial set of different paths as
the first part of the checkout call (prior to changing the hash), and then is
updated after the hash has changed to capture any additional differences.
Care needs to be taken to avoid recursively attempting to grab the parents lock
so I'm replicating just a little bit of the state management glue in the
`performDiff` method.
The Journal was not setting the from/to snapshot hashes when merging deltas.
This manifested in the watchman integration tests; we'd see the null revision
as the `from` and the `to` revision held the `from` revision(!).
On the watchman side we need to ask source control to expand the list of
files that changed when the from/to hashes are different; I've added code
to handle this. This doesn't do anything smart in the case that the
source control aware queries are in use. We'll look at that in a following
diff as it isn't strictly eden specific.
`watchman clock` was returning a basically empty clock unconditionally,
which meant that most since queries would report everything since the start
of time. This is most likely contributing to poor Buck performance, although
I have not investigated the performance aspect of this. It manifested itself
in the watchman integration tests.
Reviewed By: simpkins
Differential Revision: D5896494
fbshipit-source-id: a88be6448862781a1d8f5e15285ca07b4240593a