Summary: * This adds a `EdenServer::recover()` method to start back up on unsuccessful takeover data send. * On an unsuccessful ping, filfill the `shutdownPromise` with a `TakeoverSendError` continaing the constructed `TakeoverData`. After this `recover` function is called, `takeoverPromise_` is reset, `takeoverShutdown` is set to `false`, and the `runningState_` is set to `RUNNING`. With taking over from the returned `TakeoverData`, the user will not encounter `Transport not connected` errors on recovery. * This adds a `EdenServer::closeStorage()` method to defer closing the `backingStore_` and `localStore_` until after our ready handshake is successful. * This defers the shutdown of the `PrivHelper` until a successful ready handshake. I also update the takeover documentation here with the new logic (and fix some formatting issues) Reviewed By: simpkins Differential Revision: D20433433 fbshipit-source-id: f59e660922674d281957e80aee5049735b901a2c
5.7 KiB
Takeover
The takeover directory holds the logic for the Takeover Client (the new EdenFS process) and Server (the old EdenFS process) whichare used during a graceful restart process.
Structure
There are 5 main components in the takeover directory: thrift serialization library, client, server, data, and handler.
Thrift serialization library
There are three main message classes that are exchanged over the takeover socket:
struct TakeoverVersionQuery
- A list of takeover data serialization versions that the client supports- empty "ready" ping - An empty ping sent by the server to ensure the client is still alive and ready to receive takeover data
union SerializedTakeoverData
- A list ofSerializedMountInfo
or a string error.struct SerializedMountInfo
- Contains the mount path, state directory, a list of bind mount paths (which is no longer used), connection information, and aSerializedInodeMap
struct SerializedInodeMap
- A list ofSerializedInodeMapEntry
unloaded inodesstruct SerializedInodeMapEntry
- contains inode information like inodeNumber, parentInode, name, isUnlinked, numFuseReferences, hash, and mode.
struct SerializedFileHandleMap
- currently empty
Client
The client has one function - takeoverMounts
. This function requests to take
over mount points from an existing edenfs process. On success, it returns a
TakeoverData
object, and it throws an exception on error. It takes three
parameters: a socketPath, a bool shouldPing, and a set of integers of supported
takeover versions. The last two parameters are for testing purposes and should
not be used in productions builds.
This has a takeover timeout of 5 minutes for receiving takeover data from old process.
We connect to the socket at the given path, then send our send our protocol version so that the server knows whether we're capable of handshaking successfully. We then wait for the server to send us a "ready" ping, making sure we are still listening on the socket. We respond to this ping and then wait for the takeover data response. It is possible that we will not recieve this ping, and instead just recieve the takeover data response.
After we get the takeover data response, we either throw an exception if we do not get a message, or we deserialize the message and check its contents. We throw an exception if the message is not the expected size (num of mount points + 2 for the lock file and the thrift socket). Otherwise, if all is well, we save the lock file, thrift socket, and all the mount points.
Server
A helper class that listens on a unix domain socket for clients that wish to
perform graceful takeover of this EdenServer
's mount points. This class uses
the EdenServer
's main EventBase
for driving its I/O.
It has a few functions:
- public function:
- start - This is called when the EdenFS daemon first starts. It begins
listening on the takeover socket, waiting for a client to connect and
request to initiate a graceful restart. When a client connects, it verifies
that the client process is from the same user ID, and that the client and
server support a compatible takeover protocol version. If the versions are
compatible, then the server starts to initiate shutdown by calling return
server_->getTakeoverHandler()->startTakeoverShutdown()
. After the shutdown is completed, the takeover server pings the takeover client to ensure it is still waiting for the data. If the ping is unsuccessful (timeout, error, etc), the takeover server stops the takeover process and returns the untransmittedTakeoverData
in an exception in order to let theEdenServer
recover itself and start serving again. Finally, it closes its storage (local and backing stores) and sends the takeover data over the takeover socket by serializing the information (version, lock file, thrift socket, mount file descriptor) or error, and sending it.
- start - This is called when the EdenFS daemon first starts. It begins
listening on the takeover socket, waiting for a client to connect and
request to initiate a graceful restart. When a client connects, it verifies
that the client process is from the same user ID, and that the client and
server support a compatible takeover protocol version. If the versions are
compatible, then the server starts to initiate shutdown by calling return
- private functions:
connectionAccepted
- callback function for allocating a connection handler when the server gets a client.acceptError
- callback function that simply logs on an accept() error on the takeover socketconnectionDone
- callback function that is declared in the .h file but currently is not defined.
Data
This holds the set of versions supported by this build. It also holds the lock
file, the server socket, the mount points, and a takeover complete promise that
will be fulfilled by the TakeoverServer
code once the TakeoverData
has been
sent to the remote process. It has a function to serialize and deserialize
the TakeoverData
.
Handler
TakeoverHandler is a pure virtual interface for classes that want to implement
graceful takeover functionality. This is primarily implemented by the
EdenServer
class. However, there are also alternative implementations used
for unit testing.
It has two pure virtual functions: startTakeoverShutdown()
and closeStorage()
.
startTakeoverShutdown()
will be called when a graceful shutdown has been
requested, with a remote process attempting to take over the currently running
mount points.
When implemented, this should return a Future that will produce the
TakeoverData
to send to the remote edenfs process once the edenfs process is
ready to transfer its mounts.
closeStorage()
will be called before sending the TakeoverData
to the client,
conditionally on a successful ready handshake (if applicable). This function should
close storage used by the server. In the case of an EdenServer
, this function
allows for locks to be released in order for the new process to take over this storage.