sapling/eden/cli
Michael Bolin 2c46c59ad6 Write the PID to the lockfile and update eden health to use it.
Summary:
We have encountered cases where `eden health` reported
`"edenfs not healthy: edenfs not running"` even though the `edenfs` process is
still running. Because the existing implementation of `eden health` bases its
health check on the output of a `getStatus()` Thrift call, it will erroneously
report `"edenfs not running"` even if Eden is running but its Thrift server is
not running. This type of false negative could occur if `edenfs` has shutdown
the Thrift server, but not the rest of the process (quite possibly, its
shutdown is blocked on calls to `umount2()`).

This is further problematic because `eden daemon` checks `eden health`
before attempting to start the daemon. If it gets a false negative, then
`eden daemon` will forge ahead, trying to launch a new instance of the daemon,
but it will fail with a nasty error like the following:

```
I1017 11:59:25.188414 3064499 main.cpp:81] Starting edenfs.  UID=5256, GID=100, PID=3064499
terminate called after throwing an instance of 'std::runtime_error'
  what():  another instance of Eden appears to be running for /home/mbolin/local/.eden
*** Aborted at 1508266765 (Unix time, try 'date -d 1508266765') ***
*** Signal 6 (SIGABRT) (0x1488002ec2b3) received by PID 3064499 (pthread TID 0x7fd0d3787d40) (linux TID 3064499) (maybe from PID 30644
99, UID 5256), stack trace: ***
    @ 000000000290d3cd folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t*, void*)
    @ 00007fd0d133cacf (unknown)
    @ 00007fd0d093e7c8 __GI_raise
    @ 00007fd0d0940590 __GI_abort
    @ 00007fd0d1dfeecc __gnu_cxx::__verbose_terminate_handler()
    @ 00007fd0d1dfcdc5 __cxxabiv1::__terminate(void (*)())
    @ 00007fd0d1dfce10 std::terminate()
    @ 00007fd0d1dfd090 __cxa_throw
    @ 00000000015fe8ca facebook::eden::EdenServer::acquireEdenLock()
    @ 000000000160f27b facebook::eden::EdenServer::prepare()
    @ 00000000016107d5 facebook::eden::EdenServer::run()
    @ 000000000042c4ee main
    @ 00007fd0d0929857 __libc_start_main
    @ 0000000000548ad8 _start
Aborted
```

By providing more accurate information to `eden daemon`, if the user tries to
run it while the daemon is already running, they will get a more polite error
like the following:

```
error: edenfs is already running (pid 274205)
```

This revision addresses this issue by writing the PID of `edenfs` in the
lockfile. It updated the implementation of `eden health` to use the PID in the
lockfile to assess the health of Eden if the call to `getStatus()` fails. It
does this by running:

```
ps -p PID -o comm=
```

and applying some heuristics on the output to assess whether the command
associated with that process is the `edenfs` command. If it is, then
`eden health` reports the status as `STOPPED` whereas previously it would report
it as `DEAD`.

Reviewed By: wez

Differential Revision: D6086473

fbshipit-source-id: 825421a6818b56ddd7deea257a92c070c2232bdd
2017-10-18 11:29:43 -07:00
..
test Diagnostic tool to report Stat information of EdenFs 2017-08-25 12:49:35 -07:00
cmd_util.py Use --home-dir to compute the config dir, if specified. 2017-10-06 12:28:41 -07:00
config.py Write the PID to the lockfile and update eden health to use it. 2017-10-18 11:29:43 -07:00
configinterpolator.py move eden/fs/cli to eden/cli 2017-04-14 11:39:01 -07:00
debug.py add a debug CLI command to set a log category's level 2017-10-16 16:37:10 -07:00
main.py eden shutdown now sends SIGKILL if the shutdown() Thrift request fails to terminate Eden. 2017-10-18 11:29:43 -07:00
rage.py Make the destination of eden rage output configurable in ~/.edenrc. 2017-10-16 11:56:23 -07:00
stats_print.py Diagnostic tool to report Stat information of EdenFs 2017-08-25 12:49:35 -07:00
stats.py Diagnostic tool to report Stat information of EdenFs 2017-08-25 12:49:35 -07:00
TARGETS Diagnostic tool to report Stat information of EdenFs 2017-08-25 12:49:35 -07:00
util.py move eden/fs/cli to eden/cli 2017-04-14 11:39:01 -07:00