mirror of
https://github.com/facebook/sapling.git
synced 2024-10-11 09:17:30 +03:00
2c46c59ad6
Summary: We have encountered cases where `eden health` reported `"edenfs not healthy: edenfs not running"` even though the `edenfs` process is still running. Because the existing implementation of `eden health` bases its health check on the output of a `getStatus()` Thrift call, it will erroneously report `"edenfs not running"` even if Eden is running but its Thrift server is not running. This type of false negative could occur if `edenfs` has shutdown the Thrift server, but not the rest of the process (quite possibly, its shutdown is blocked on calls to `umount2()`). This is further problematic because `eden daemon` checks `eden health` before attempting to start the daemon. If it gets a false negative, then `eden daemon` will forge ahead, trying to launch a new instance of the daemon, but it will fail with a nasty error like the following: ``` I1017 11:59:25.188414 3064499 main.cpp:81] Starting edenfs. UID=5256, GID=100, PID=3064499 terminate called after throwing an instance of 'std::runtime_error' what(): another instance of Eden appears to be running for /home/mbolin/local/.eden *** Aborted at 1508266765 (Unix time, try 'date -d 1508266765') *** *** Signal 6 (SIGABRT) (0x1488002ec2b3) received by PID 3064499 (pthread TID 0x7fd0d3787d40) (linux TID 3064499) (maybe from PID 30644 99, UID 5256), stack trace: *** @ 000000000290d3cd folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t*, void*) @ 00007fd0d133cacf (unknown) @ 00007fd0d093e7c8 __GI_raise @ 00007fd0d0940590 __GI_abort @ 00007fd0d1dfeecc __gnu_cxx::__verbose_terminate_handler() @ 00007fd0d1dfcdc5 __cxxabiv1::__terminate(void (*)()) @ 00007fd0d1dfce10 std::terminate() @ 00007fd0d1dfd090 __cxa_throw @ 00000000015fe8ca facebook::eden::EdenServer::acquireEdenLock() @ 000000000160f27b facebook::eden::EdenServer::prepare() @ 00000000016107d5 facebook::eden::EdenServer::run() @ 000000000042c4ee main @ 00007fd0d0929857 __libc_start_main @ 0000000000548ad8 _start Aborted ``` By providing more accurate information to `eden daemon`, if the user tries to run it while the daemon is already running, they will get a more polite error like the following: ``` error: edenfs is already running (pid 274205) ``` This revision addresses this issue by writing the PID of `edenfs` in the lockfile. It updated the implementation of `eden health` to use the PID in the lockfile to assess the health of Eden if the call to `getStatus()` fails. It does this by running: ``` ps -p PID -o comm= ``` and applying some heuristics on the output to assess whether the command associated with that process is the `edenfs` command. If it is, then `eden health` reports the status as `STOPPED` whereas previously it would report it as `DEAD`. Reviewed By: wez Differential Revision: D6086473 fbshipit-source-id: 825421a6818b56ddd7deea257a92c070c2232bdd |
||
---|---|---|
.. | ||
test | ||
cmd_util.py | ||
config.py | ||
configinterpolator.py | ||
debug.py | ||
main.py | ||
rage.py | ||
stats_print.py | ||
stats.py | ||
TARGETS | ||
util.py |