sapling/eden/fs/takeover/TakeoverServer.cpp

329 lines
12 KiB
C++
Raw Normal View History

/*
* Copyright (c) Facebook, Inc. and its affiliates.
*
* This software may be used and distributed according to the terms of the
* GNU General Public License version 2.
*/
#ifndef _WIN32
#include "eden/fs/takeover/TakeoverServer.h"
#include <chrono>
#include <folly/FileUtil.h>
#include <folly/Range.h>
#include <folly/SocketAddress.h>
#include <folly/futures/Future.h>
#include <folly/io/Cursor.h>
#include <folly/io/IOBuf.h>
#include <folly/io/async/EventBase.h>
#include <folly/io/async/EventHandler.h>
#include <folly/logging/xlog.h>
add version handshake to takeover protocol Summary: Whilst chatting with simpkins we realized that we lost the handshake portion of the takeover protocol during a refactor. The handshake is important for a couple of reasons: 1. It prevents unmounting and loosing all the mounts in the case that sometime decides to netcat or otherwise connect to the socket 2. It gives us an opportunity to short circuit any heavy lifting if we know that it will be impossible to succeed. 3. It allows us to rollback to earlier builds with older versions of the takeover protocol. This diff adds a little bit of machinery to enable passing a set of supported takeover protocol version numbers. The intent is to retain support for the two of these at a time; any time we change the encoding/protocol for takeover we'll bump the version number and add supporting code to handle the new format, retaining support for the prior version. Retaining the ability to handle the prior version allows us to downgrade to an earlier build gracefully if/when the need arises. I opted to do this here rather than by bumping the `kProtocolID` constant in `UnixSocket.h` becase we're not really changing the lowest level of the protocol; just the takeover specific portions. I haven't actually changed the takeover serialization in this diff, but do have some work on that happening in D6733406; that diff will be amended to take advantage and demonstrate how this versioning scheme works. A key thing to note about the implementation of this diff is that the client sends the version number to the server, but doesn't add any explicit version encoding in the response we receive. This is deliberate and allows us to upgrade prior builds to this new scheme. I'll add a more definitive check for this situation when I actually rev the format in the following diff. Reviewed By: simpkins Differential Revision: D6743065 fbshipit-source-id: c991cebfee918daad098105ca6bcfef76374c0ff
2018-01-31 01:16:05 +03:00
#include <thrift/lib/cpp2/protocol/Serializer.h>
#include "eden/fs/takeover/TakeoverData.h"
#include "eden/fs/takeover/TakeoverHandler.h"
#include "eden/fs/utils/FutureUnixSocket.h"
add version handshake to takeover protocol Summary: Whilst chatting with simpkins we realized that we lost the handshake portion of the takeover protocol during a refactor. The handshake is important for a couple of reasons: 1. It prevents unmounting and loosing all the mounts in the case that sometime decides to netcat or otherwise connect to the socket 2. It gives us an opportunity to short circuit any heavy lifting if we know that it will be impossible to succeed. 3. It allows us to rollback to earlier builds with older versions of the takeover protocol. This diff adds a little bit of machinery to enable passing a set of supported takeover protocol version numbers. The intent is to retain support for the two of these at a time; any time we change the encoding/protocol for takeover we'll bump the version number and add supporting code to handle the new format, retaining support for the prior version. Retaining the ability to handle the prior version allows us to downgrade to an earlier build gracefully if/when the need arises. I opted to do this here rather than by bumping the `kProtocolID` constant in `UnixSocket.h` becase we're not really changing the lowest level of the protocol; just the takeover specific portions. I haven't actually changed the takeover serialization in this diff, but do have some work on that happening in D6733406; that diff will be amended to take advantage and demonstrate how this versioning scheme works. A key thing to note about the implementation of this diff is that the client sends the version number to the server, but doesn't add any explicit version encoding in the response we receive. This is deliberate and allows us to upgrade prior builds to this new scheme. I'll add a more definitive check for this situation when I actually rev the format in the following diff. Reviewed By: simpkins Differential Revision: D6743065 fbshipit-source-id: c991cebfee918daad098105ca6bcfef76374c0ff
2018-01-31 01:16:05 +03:00
using apache::thrift::CompactSerializer;
using folly::AsyncServerSocket;
using folly::exceptionStr;
using folly::Future;
using folly::makeFuture;
using folly::SocketAddress;
using folly::Unit;
add additional takeover "ready" handshake Summary: For graceful restart takeovers, we would like to implement an additional handshake. This handshake will occur right after the takeover data is ready to be sent to the client, but before actually sending it. This is to make sure the old daemon can recover in case of the client not being responsive (the client replies back to the server, and if no response is recieved in 5 seconds, the server will recover). There are a few cases here: * **Server sends ping (two cases discussed below)** I introduced a new ProtocolVersion. Daemons with this change will now have ProtocolVersion4. The Server checks the max version of the client, and if this version is ProtocolVersion4, we know the client can listen for pings. So we will send the ping. Otherwise, we don't send a ping. With this, we will only send pings if we know the client will be listening for one. The case in which a client isn't listening is if we adopt this change and we downgrade past the change. * **Server does not send ping and Client knows to listen for ping** This will be a common case immediately after this change. The client will parse the sent data and check if it matches the "ready" ping, and if it doesn't, the client assumes the server simply sent the Takeover Data. * **Server does not sends ping and Client doesn't know to listen for ping** This is the case before this change. Reviewed By: simpkins Differential Revision: D20290271 fbshipit-source-id: b68e4df6264fb071d770671a80e28c90ddb0d3f2
2020-04-07 19:50:06 +03:00
DEFINE_int32(
pingReceiveTimeout,
5,
"Timeout for receiving ready ping from new process in seconds");
namespace facebook {
namespace eden {
/**
* ConnHandler handles a single connection received on the TakeoverServer
* socket.
*/
class TakeoverServer::ConnHandler {
public:
ConnHandler(TakeoverServer* server, folly::File socket)
: server_{server}, socket_{server_->getEventBase(), std::move(socket)} {}
/**
* start() begins processing data on this connection.
*
* Returns a Future that will complete successfully when this connection
* finishes gracefully taking over the EdenServer's mount points.
*/
add version handshake to takeover protocol Summary: Whilst chatting with simpkins we realized that we lost the handshake portion of the takeover protocol during a refactor. The handshake is important for a couple of reasons: 1. It prevents unmounting and loosing all the mounts in the case that sometime decides to netcat or otherwise connect to the socket 2. It gives us an opportunity to short circuit any heavy lifting if we know that it will be impossible to succeed. 3. It allows us to rollback to earlier builds with older versions of the takeover protocol. This diff adds a little bit of machinery to enable passing a set of supported takeover protocol version numbers. The intent is to retain support for the two of these at a time; any time we change the encoding/protocol for takeover we'll bump the version number and add supporting code to handle the new format, retaining support for the prior version. Retaining the ability to handle the prior version allows us to downgrade to an earlier build gracefully if/when the need arises. I opted to do this here rather than by bumping the `kProtocolID` constant in `UnixSocket.h` becase we're not really changing the lowest level of the protocol; just the takeover specific portions. I haven't actually changed the takeover serialization in this diff, but do have some work on that happening in D6733406; that diff will be amended to take advantage and demonstrate how this versioning scheme works. A key thing to note about the implementation of this diff is that the client sends the version number to the server, but doesn't add any explicit version encoding in the response we receive. This is deliberate and allows us to upgrade prior builds to this new scheme. I'll add a more definitive check for this situation when I actually rev the format in the following diff. Reviewed By: simpkins Differential Revision: D6743065 fbshipit-source-id: c991cebfee918daad098105ca6bcfef76374c0ff
2018-01-31 01:16:05 +03:00
FOLLY_NODISCARD folly::Future<folly::Unit> start() noexcept;
private:
add additional takeover "ready" handshake Summary: For graceful restart takeovers, we would like to implement an additional handshake. This handshake will occur right after the takeover data is ready to be sent to the client, but before actually sending it. This is to make sure the old daemon can recover in case of the client not being responsive (the client replies back to the server, and if no response is recieved in 5 seconds, the server will recover). There are a few cases here: * **Server sends ping (two cases discussed below)** I introduced a new ProtocolVersion. Daemons with this change will now have ProtocolVersion4. The Server checks the max version of the client, and if this version is ProtocolVersion4, we know the client can listen for pings. So we will send the ping. Otherwise, we don't send a ping. With this, we will only send pings if we know the client will be listening for one. The case in which a client isn't listening is if we adopt this change and we downgrade past the change. * **Server does not send ping and Client knows to listen for ping** This will be a common case immediately after this change. The client will parse the sent data and check if it matches the "ready" ping, and if it doesn't, the client assumes the server simply sent the Takeover Data. * **Server does not sends ping and Client doesn't know to listen for ping** This is the case before this change. Reviewed By: simpkins Differential Revision: D20290271 fbshipit-source-id: b68e4df6264fb071d770671a80e28c90ddb0d3f2
2020-04-07 19:50:06 +03:00
FOLLY_NODISCARD folly::Future<folly::Unit> sendError(
const folly::exception_wrapper& error);
FOLLY_NODISCARD folly::Future<folly::Unit> pingThenSendTakeoverData(
TakeoverData&& data);
add version handshake to takeover protocol Summary: Whilst chatting with simpkins we realized that we lost the handshake portion of the takeover protocol during a refactor. The handshake is important for a couple of reasons: 1. It prevents unmounting and loosing all the mounts in the case that sometime decides to netcat or otherwise connect to the socket 2. It gives us an opportunity to short circuit any heavy lifting if we know that it will be impossible to succeed. 3. It allows us to rollback to earlier builds with older versions of the takeover protocol. This diff adds a little bit of machinery to enable passing a set of supported takeover protocol version numbers. The intent is to retain support for the two of these at a time; any time we change the encoding/protocol for takeover we'll bump the version number and add supporting code to handle the new format, retaining support for the prior version. Retaining the ability to handle the prior version allows us to downgrade to an earlier build gracefully if/when the need arises. I opted to do this here rather than by bumping the `kProtocolID` constant in `UnixSocket.h` becase we're not really changing the lowest level of the protocol; just the takeover specific portions. I haven't actually changed the takeover serialization in this diff, but do have some work on that happening in D6733406; that diff will be amended to take advantage and demonstrate how this versioning scheme works. A key thing to note about the implementation of this diff is that the client sends the version number to the server, but doesn't add any explicit version encoding in the response we receive. This is deliberate and allows us to upgrade prior builds to this new scheme. I'll add a more definitive check for this situation when I actually rev the format in the following diff. Reviewed By: simpkins Differential Revision: D6743065 fbshipit-source-id: c991cebfee918daad098105ca6bcfef76374c0ff
2018-01-31 01:16:05 +03:00
FOLLY_NODISCARD folly::Future<folly::Unit> sendTakeoverData(
add additional takeover "ready" handshake Summary: For graceful restart takeovers, we would like to implement an additional handshake. This handshake will occur right after the takeover data is ready to be sent to the client, but before actually sending it. This is to make sure the old daemon can recover in case of the client not being responsive (the client replies back to the server, and if no response is recieved in 5 seconds, the server will recover). There are a few cases here: * **Server sends ping (two cases discussed below)** I introduced a new ProtocolVersion. Daemons with this change will now have ProtocolVersion4. The Server checks the max version of the client, and if this version is ProtocolVersion4, we know the client can listen for pings. So we will send the ping. Otherwise, we don't send a ping. With this, we will only send pings if we know the client will be listening for one. The case in which a client isn't listening is if we adopt this change and we downgrade past the change. * **Server does not send ping and Client knows to listen for ping** This will be a common case immediately after this change. The client will parse the sent data and check if it matches the "ready" ping, and if it doesn't, the client assumes the server simply sent the Takeover Data. * **Server does not sends ping and Client doesn't know to listen for ping** This is the case before this change. Reviewed By: simpkins Differential Revision: D20290271 fbshipit-source-id: b68e4df6264fb071d770671a80e28c90ddb0d3f2
2020-04-07 19:50:06 +03:00
TakeoverData&& data);
template <typename... Args>
[[noreturn]] void fail(Args&&... args) {
auto msg = folly::to<std::string>(std::forward<Args>(args)...);
XLOG(ERR) << "takeover socket error: " << msg;
throw std::runtime_error(msg);
}
add additional takeover "ready" handshake Summary: For graceful restart takeovers, we would like to implement an additional handshake. This handshake will occur right after the takeover data is ready to be sent to the client, but before actually sending it. This is to make sure the old daemon can recover in case of the client not being responsive (the client replies back to the server, and if no response is recieved in 5 seconds, the server will recover). There are a few cases here: * **Server sends ping (two cases discussed below)** I introduced a new ProtocolVersion. Daemons with this change will now have ProtocolVersion4. The Server checks the max version of the client, and if this version is ProtocolVersion4, we know the client can listen for pings. So we will send the ping. Otherwise, we don't send a ping. With this, we will only send pings if we know the client will be listening for one. The case in which a client isn't listening is if we adopt this change and we downgrade past the change. * **Server does not send ping and Client knows to listen for ping** This will be a common case immediately after this change. The client will parse the sent data and check if it matches the "ready" ping, and if it doesn't, the client assumes the server simply sent the Takeover Data. * **Server does not sends ping and Client doesn't know to listen for ping** This is the case before this change. Reviewed By: simpkins Differential Revision: D20290271 fbshipit-source-id: b68e4df6264fb071d770671a80e28c90ddb0d3f2
2020-04-07 19:50:06 +03:00
bool shouldPing_{false};
TakeoverServer* const server_{nullptr};
FutureUnixSocket socket_;
add version handshake to takeover protocol Summary: Whilst chatting with simpkins we realized that we lost the handshake portion of the takeover protocol during a refactor. The handshake is important for a couple of reasons: 1. It prevents unmounting and loosing all the mounts in the case that sometime decides to netcat or otherwise connect to the socket 2. It gives us an opportunity to short circuit any heavy lifting if we know that it will be impossible to succeed. 3. It allows us to rollback to earlier builds with older versions of the takeover protocol. This diff adds a little bit of machinery to enable passing a set of supported takeover protocol version numbers. The intent is to retain support for the two of these at a time; any time we change the encoding/protocol for takeover we'll bump the version number and add supporting code to handle the new format, retaining support for the prior version. Retaining the ability to handle the prior version allows us to downgrade to an earlier build gracefully if/when the need arises. I opted to do this here rather than by bumping the `kProtocolID` constant in `UnixSocket.h` becase we're not really changing the lowest level of the protocol; just the takeover specific portions. I haven't actually changed the takeover serialization in this diff, but do have some work on that happening in D6733406; that diff will be amended to take advantage and demonstrate how this versioning scheme works. A key thing to note about the implementation of this diff is that the client sends the version number to the server, but doesn't add any explicit version encoding in the response we receive. This is deliberate and allows us to upgrade prior builds to this new scheme. I'll add a more definitive check for this situation when I actually rev the format in the following diff. Reviewed By: simpkins Differential Revision: D6743065 fbshipit-source-id: c991cebfee918daad098105ca6bcfef76374c0ff
2018-01-31 01:16:05 +03:00
int32_t protocolVersion_{
TakeoverData::kTakeoverProtocolVersionNeverSupported};
};
Future<Unit> TakeoverServer::ConnHandler::start() noexcept {
try {
// Check the remote endpoint's credentials.
// We only allow transferring our mount points to another process
// owned by the same user.
auto uid = socket_.getRemoteUID();
if (uid != getuid()) {
return makeFuture<Unit>(std::runtime_error(folly::to<std::string>(
"invalid takeover request from incorrect user: current UID=",
getuid(),
", got request from UID ",
uid)));
}
add version handshake to takeover protocol Summary: Whilst chatting with simpkins we realized that we lost the handshake portion of the takeover protocol during a refactor. The handshake is important for a couple of reasons: 1. It prevents unmounting and loosing all the mounts in the case that sometime decides to netcat or otherwise connect to the socket 2. It gives us an opportunity to short circuit any heavy lifting if we know that it will be impossible to succeed. 3. It allows us to rollback to earlier builds with older versions of the takeover protocol. This diff adds a little bit of machinery to enable passing a set of supported takeover protocol version numbers. The intent is to retain support for the two of these at a time; any time we change the encoding/protocol for takeover we'll bump the version number and add supporting code to handle the new format, retaining support for the prior version. Retaining the ability to handle the prior version allows us to downgrade to an earlier build gracefully if/when the need arises. I opted to do this here rather than by bumping the `kProtocolID` constant in `UnixSocket.h` becase we're not really changing the lowest level of the protocol; just the takeover specific portions. I haven't actually changed the takeover serialization in this diff, but do have some work on that happening in D6733406; that diff will be amended to take advantage and demonstrate how this versioning scheme works. A key thing to note about the implementation of this diff is that the client sends the version number to the server, but doesn't add any explicit version encoding in the response we receive. This is deliberate and allows us to upgrade prior builds to this new scheme. I'll add a more definitive check for this situation when I actually rev the format in the following diff. Reviewed By: simpkins Differential Revision: D6743065 fbshipit-source-id: c991cebfee918daad098105ca6bcfef76374c0ff
2018-01-31 01:16:05 +03:00
// Check to see if we are speaking a compatible takeover protocol
// version. If not, error out so that we don't change any state.
// The client should send us the version information, but clients
// prior to the revision where this check was added will never send
// us the version data. We use a short timeout for receiving the
// version data; in practice it will appear immediately or will
// never be received.
auto timeout = std::chrono::seconds(5);
return socket_.receive(timeout)
.thenTry([this](folly::Try<UnixSocket::Message>&& msg) {
add version handshake to takeover protocol Summary: Whilst chatting with simpkins we realized that we lost the handshake portion of the takeover protocol during a refactor. The handshake is important for a couple of reasons: 1. It prevents unmounting and loosing all the mounts in the case that sometime decides to netcat or otherwise connect to the socket 2. It gives us an opportunity to short circuit any heavy lifting if we know that it will be impossible to succeed. 3. It allows us to rollback to earlier builds with older versions of the takeover protocol. This diff adds a little bit of machinery to enable passing a set of supported takeover protocol version numbers. The intent is to retain support for the two of these at a time; any time we change the encoding/protocol for takeover we'll bump the version number and add supporting code to handle the new format, retaining support for the prior version. Retaining the ability to handle the prior version allows us to downgrade to an earlier build gracefully if/when the need arises. I opted to do this here rather than by bumping the `kProtocolID` constant in `UnixSocket.h` becase we're not really changing the lowest level of the protocol; just the takeover specific portions. I haven't actually changed the takeover serialization in this diff, but do have some work on that happening in D6733406; that diff will be amended to take advantage and demonstrate how this versioning scheme works. A key thing to note about the implementation of this diff is that the client sends the version number to the server, but doesn't add any explicit version encoding in the response we receive. This is deliberate and allows us to upgrade prior builds to this new scheme. I'll add a more definitive check for this situation when I actually rev the format in the following diff. Reviewed By: simpkins Differential Revision: D6743065 fbshipit-source-id: c991cebfee918daad098105ca6bcfef76374c0ff
2018-01-31 01:16:05 +03:00
if (msg.hasException()) {
// most likely cause: timed out waiting for the client to
// send the protocol version. FutureUnixSocket::receiveTimeout()
// will close the socket unconditionally, so we can't send
// an error message back to the peer. However, for the sake
// of clarity in the control flow we bubble up the error
// as if we could do that.
XLOG(ERR) << "Exception while waiting for takeover version from "
"the client. Most likely reason is a client version "
"mismatch, you may need to perform a full "
"`eden shutdown ; eden daemon` restart to migrate."
<< msg.exception();
return folly::makeFuture<TakeoverData>(msg.exception());
}
auto query =
CompactSerializer::deserialize<TakeoverVersionQuery>(&msg->data);
auto supported =
TakeoverData::computeCompatibleVersion(*query.versions_ref());
add version handshake to takeover protocol Summary: Whilst chatting with simpkins we realized that we lost the handshake portion of the takeover protocol during a refactor. The handshake is important for a couple of reasons: 1. It prevents unmounting and loosing all the mounts in the case that sometime decides to netcat or otherwise connect to the socket 2. It gives us an opportunity to short circuit any heavy lifting if we know that it will be impossible to succeed. 3. It allows us to rollback to earlier builds with older versions of the takeover protocol. This diff adds a little bit of machinery to enable passing a set of supported takeover protocol version numbers. The intent is to retain support for the two of these at a time; any time we change the encoding/protocol for takeover we'll bump the version number and add supporting code to handle the new format, retaining support for the prior version. Retaining the ability to handle the prior version allows us to downgrade to an earlier build gracefully if/when the need arises. I opted to do this here rather than by bumping the `kProtocolID` constant in `UnixSocket.h` becase we're not really changing the lowest level of the protocol; just the takeover specific portions. I haven't actually changed the takeover serialization in this diff, but do have some work on that happening in D6733406; that diff will be amended to take advantage and demonstrate how this versioning scheme works. A key thing to note about the implementation of this diff is that the client sends the version number to the server, but doesn't add any explicit version encoding in the response we receive. This is deliberate and allows us to upgrade prior builds to this new scheme. I'll add a more definitive check for this situation when I actually rev the format in the following diff. Reviewed By: simpkins Differential Revision: D6743065 fbshipit-source-id: c991cebfee918daad098105ca6bcfef76374c0ff
2018-01-31 01:16:05 +03:00
if (!supported.has_value()) {
auto clientVersionList = folly::join(", ", *query.versions_ref());
add version handshake to takeover protocol Summary: Whilst chatting with simpkins we realized that we lost the handshake portion of the takeover protocol during a refactor. The handshake is important for a couple of reasons: 1. It prevents unmounting and loosing all the mounts in the case that sometime decides to netcat or otherwise connect to the socket 2. It gives us an opportunity to short circuit any heavy lifting if we know that it will be impossible to succeed. 3. It allows us to rollback to earlier builds with older versions of the takeover protocol. This diff adds a little bit of machinery to enable passing a set of supported takeover protocol version numbers. The intent is to retain support for the two of these at a time; any time we change the encoding/protocol for takeover we'll bump the version number and add supporting code to handle the new format, retaining support for the prior version. Retaining the ability to handle the prior version allows us to downgrade to an earlier build gracefully if/when the need arises. I opted to do this here rather than by bumping the `kProtocolID` constant in `UnixSocket.h` becase we're not really changing the lowest level of the protocol; just the takeover specific portions. I haven't actually changed the takeover serialization in this diff, but do have some work on that happening in D6733406; that diff will be amended to take advantage and demonstrate how this versioning scheme works. A key thing to note about the implementation of this diff is that the client sends the version number to the server, but doesn't add any explicit version encoding in the response we receive. This is deliberate and allows us to upgrade prior builds to this new scheme. I'll add a more definitive check for this situation when I actually rev the format in the following diff. Reviewed By: simpkins Differential Revision: D6743065 fbshipit-source-id: c991cebfee918daad098105ca6bcfef76374c0ff
2018-01-31 01:16:05 +03:00
auto serverVersionList =
folly::join(", ", kSupportedTakeoverVersions);
return folly::makeFuture<TakeoverData>(
folly::make_exception_wrapper<std::runtime_error>(
folly::to<std::string>(
"The client and the server do not share a common "
"takeover protocol implementation. Use "
"`eden shutdown ; eden daemon` to migrate. "
"clientVersions=[",
clientVersionList,
"], "
"serverVersions=[",
serverVersionList,
"]")));
}
// Initiate the takeover shutdown.
protocolVersion_ = supported.value();
add additional takeover "ready" handshake Summary: For graceful restart takeovers, we would like to implement an additional handshake. This handshake will occur right after the takeover data is ready to be sent to the client, but before actually sending it. This is to make sure the old daemon can recover in case of the client not being responsive (the client replies back to the server, and if no response is recieved in 5 seconds, the server will recover). There are a few cases here: * **Server sends ping (two cases discussed below)** I introduced a new ProtocolVersion. Daemons with this change will now have ProtocolVersion4. The Server checks the max version of the client, and if this version is ProtocolVersion4, we know the client can listen for pings. So we will send the ping. Otherwise, we don't send a ping. With this, we will only send pings if we know the client will be listening for one. The case in which a client isn't listening is if we adopt this change and we downgrade past the change. * **Server does not send ping and Client knows to listen for ping** This will be a common case immediately after this change. The client will parse the sent data and check if it matches the "ready" ping, and if it doesn't, the client assumes the server simply sent the Takeover Data. * **Server does not sends ping and Client doesn't know to listen for ping** This is the case before this change. Reviewed By: simpkins Differential Revision: D20290271 fbshipit-source-id: b68e4df6264fb071d770671a80e28c90ddb0d3f2
2020-04-07 19:50:06 +03:00
shouldPing_ =
(protocolVersion_ == TakeoverData::kTakeoverProtocolVersionFour);
add version handshake to takeover protocol Summary: Whilst chatting with simpkins we realized that we lost the handshake portion of the takeover protocol during a refactor. The handshake is important for a couple of reasons: 1. It prevents unmounting and loosing all the mounts in the case that sometime decides to netcat or otherwise connect to the socket 2. It gives us an opportunity to short circuit any heavy lifting if we know that it will be impossible to succeed. 3. It allows us to rollback to earlier builds with older versions of the takeover protocol. This diff adds a little bit of machinery to enable passing a set of supported takeover protocol version numbers. The intent is to retain support for the two of these at a time; any time we change the encoding/protocol for takeover we'll bump the version number and add supporting code to handle the new format, retaining support for the prior version. Retaining the ability to handle the prior version allows us to downgrade to an earlier build gracefully if/when the need arises. I opted to do this here rather than by bumping the `kProtocolID` constant in `UnixSocket.h` becase we're not really changing the lowest level of the protocol; just the takeover specific portions. I haven't actually changed the takeover serialization in this diff, but do have some work on that happening in D6733406; that diff will be amended to take advantage and demonstrate how this versioning scheme works. A key thing to note about the implementation of this diff is that the client sends the version number to the server, but doesn't add any explicit version encoding in the response we receive. This is deliberate and allows us to upgrade prior builds to this new scheme. I'll add a more definitive check for this situation when I actually rev the format in the following diff. Reviewed By: simpkins Differential Revision: D6743065 fbshipit-source-id: c991cebfee918daad098105ca6bcfef76374c0ff
2018-01-31 01:16:05 +03:00
return server_->getTakeoverHandler()->startTakeoverShutdown();
})
.thenTryInline(folly::makeAsyncTask(
server_->eventBase_, [this](folly::Try<TakeoverData>&& data) {
add additional takeover "ready" handshake Summary: For graceful restart takeovers, we would like to implement an additional handshake. This handshake will occur right after the takeover data is ready to be sent to the client, but before actually sending it. This is to make sure the old daemon can recover in case of the client not being responsive (the client replies back to the server, and if no response is recieved in 5 seconds, the server will recover). There are a few cases here: * **Server sends ping (two cases discussed below)** I introduced a new ProtocolVersion. Daemons with this change will now have ProtocolVersion4. The Server checks the max version of the client, and if this version is ProtocolVersion4, we know the client can listen for pings. So we will send the ping. Otherwise, we don't send a ping. With this, we will only send pings if we know the client will be listening for one. The case in which a client isn't listening is if we adopt this change and we downgrade past the change. * **Server does not send ping and Client knows to listen for ping** This will be a common case immediately after this change. The client will parse the sent data and check if it matches the "ready" ping, and if it doesn't, the client assumes the server simply sent the Takeover Data. * **Server does not sends ping and Client doesn't know to listen for ping** This is the case before this change. Reviewed By: simpkins Differential Revision: D20290271 fbshipit-source-id: b68e4df6264fb071d770671a80e28c90ddb0d3f2
2020-04-07 19:50:06 +03:00
if (!data.hasValue()) {
return sendError(data.exception());
}
if (shouldPing_) {
XLOG(DBG7) << "sending ready ping to takeover client";
return pingThenSendTakeoverData(std::move(data.value()));
} else {
XLOG(DBG7) << "not sending ready ping to takeover client";
return sendTakeoverData(std::move(data.value()));
}
}));
} catch (const std::exception& ex) {
return makeFuture<Unit>(
folly::exception_wrapper{std::current_exception(), ex});
}
}
add additional takeover "ready" handshake Summary: For graceful restart takeovers, we would like to implement an additional handshake. This handshake will occur right after the takeover data is ready to be sent to the client, but before actually sending it. This is to make sure the old daemon can recover in case of the client not being responsive (the client replies back to the server, and if no response is recieved in 5 seconds, the server will recover). There are a few cases here: * **Server sends ping (two cases discussed below)** I introduced a new ProtocolVersion. Daemons with this change will now have ProtocolVersion4. The Server checks the max version of the client, and if this version is ProtocolVersion4, we know the client can listen for pings. So we will send the ping. Otherwise, we don't send a ping. With this, we will only send pings if we know the client will be listening for one. The case in which a client isn't listening is if we adopt this change and we downgrade past the change. * **Server does not send ping and Client knows to listen for ping** This will be a common case immediately after this change. The client will parse the sent data and check if it matches the "ready" ping, and if it doesn't, the client assumes the server simply sent the Takeover Data. * **Server does not sends ping and Client doesn't know to listen for ping** This is the case before this change. Reviewed By: simpkins Differential Revision: D20290271 fbshipit-source-id: b68e4df6264fb071d770671a80e28c90ddb0d3f2
2020-04-07 19:50:06 +03:00
Future<Unit> TakeoverServer::ConnHandler::sendError(
const folly::exception_wrapper& error) {
XLOG(ERR) << "error while performing takeover shutdown: " << error;
if (socket_) {
// Send the error to the client.
return socket_.send(TakeoverData::serializeError(protocolVersion_, error));
}
add additional takeover "ready" handshake Summary: For graceful restart takeovers, we would like to implement an additional handshake. This handshake will occur right after the takeover data is ready to be sent to the client, but before actually sending it. This is to make sure the old daemon can recover in case of the client not being responsive (the client replies back to the server, and if no response is recieved in 5 seconds, the server will recover). There are a few cases here: * **Server sends ping (two cases discussed below)** I introduced a new ProtocolVersion. Daemons with this change will now have ProtocolVersion4. The Server checks the max version of the client, and if this version is ProtocolVersion4, we know the client can listen for pings. So we will send the ping. Otherwise, we don't send a ping. With this, we will only send pings if we know the client will be listening for one. The case in which a client isn't listening is if we adopt this change and we downgrade past the change. * **Server does not send ping and Client knows to listen for ping** This will be a common case immediately after this change. The client will parse the sent data and check if it matches the "ready" ping, and if it doesn't, the client assumes the server simply sent the Takeover Data. * **Server does not sends ping and Client doesn't know to listen for ping** This is the case before this change. Reviewed By: simpkins Differential Revision: D20290271 fbshipit-source-id: b68e4df6264fb071d770671a80e28c90ddb0d3f2
2020-04-07 19:50:06 +03:00
// Socket was closed (likely by a receive timeout above), so don't
// try to send again in here lest we break; instead just pass up
// the error.
return makeFuture<Unit>(error);
}
add additional takeover "ready" handshake Summary: For graceful restart takeovers, we would like to implement an additional handshake. This handshake will occur right after the takeover data is ready to be sent to the client, but before actually sending it. This is to make sure the old daemon can recover in case of the client not being responsive (the client replies back to the server, and if no response is recieved in 5 seconds, the server will recover). There are a few cases here: * **Server sends ping (two cases discussed below)** I introduced a new ProtocolVersion. Daemons with this change will now have ProtocolVersion4. The Server checks the max version of the client, and if this version is ProtocolVersion4, we know the client can listen for pings. So we will send the ping. Otherwise, we don't send a ping. With this, we will only send pings if we know the client will be listening for one. The case in which a client isn't listening is if we adopt this change and we downgrade past the change. * **Server does not send ping and Client knows to listen for ping** This will be a common case immediately after this change. The client will parse the sent data and check if it matches the "ready" ping, and if it doesn't, the client assumes the server simply sent the Takeover Data. * **Server does not sends ping and Client doesn't know to listen for ping** This is the case before this change. Reviewed By: simpkins Differential Revision: D20290271 fbshipit-source-id: b68e4df6264fb071d770671a80e28c90ddb0d3f2
2020-04-07 19:50:06 +03:00
Future<Unit> TakeoverServer::ConnHandler::pingThenSendTakeoverData(
TakeoverData&& data) {
// Send a message to ping the takeover client process.
// This ensures that the client is still connected and ready to receive data.
// If the client disconnected while we were pausing our checkout mounts and
// preparing the takeover, we want to resume our mounts rather than trying to
// transfer them to to the now-disconnected process.
UnixSocket::Message msg;
msg.data = TakeoverData::serializePing();
return socket_.send(std::move(msg))
.thenValue([this](auto&&) {
// Wait for the ping reply. Here we just give it a few seconds to
// respond.
add additional takeover "ready" handshake Summary: For graceful restart takeovers, we would like to implement an additional handshake. This handshake will occur right after the takeover data is ready to be sent to the client, but before actually sending it. This is to make sure the old daemon can recover in case of the client not being responsive (the client replies back to the server, and if no response is recieved in 5 seconds, the server will recover). There are a few cases here: * **Server sends ping (two cases discussed below)** I introduced a new ProtocolVersion. Daemons with this change will now have ProtocolVersion4. The Server checks the max version of the client, and if this version is ProtocolVersion4, we know the client can listen for pings. So we will send the ping. Otherwise, we don't send a ping. With this, we will only send pings if we know the client will be listening for one. The case in which a client isn't listening is if we adopt this change and we downgrade past the change. * **Server does not send ping and Client knows to listen for ping** This will be a common case immediately after this change. The client will parse the sent data and check if it matches the "ready" ping, and if it doesn't, the client assumes the server simply sent the Takeover Data. * **Server does not sends ping and Client doesn't know to listen for ping** This is the case before this change. Reviewed By: simpkins Differential Revision: D20290271 fbshipit-source-id: b68e4df6264fb071d770671a80e28c90ddb0d3f2
2020-04-07 19:50:06 +03:00
auto timeout = std::chrono::seconds(FLAGS_pingReceiveTimeout);
return server_->faultInjector_.checkAsync("takeover", "ping_receive")
.via(server_->eventBase_)
.thenValue(
[this, timeout](auto&&) { return socket_.receive(timeout); });
add additional takeover "ready" handshake Summary: For graceful restart takeovers, we would like to implement an additional handshake. This handshake will occur right after the takeover data is ready to be sent to the client, but before actually sending it. This is to make sure the old daemon can recover in case of the client not being responsive (the client replies back to the server, and if no response is recieved in 5 seconds, the server will recover). There are a few cases here: * **Server sends ping (two cases discussed below)** I introduced a new ProtocolVersion. Daemons with this change will now have ProtocolVersion4. The Server checks the max version of the client, and if this version is ProtocolVersion4, we know the client can listen for pings. So we will send the ping. Otherwise, we don't send a ping. With this, we will only send pings if we know the client will be listening for one. The case in which a client isn't listening is if we adopt this change and we downgrade past the change. * **Server does not send ping and Client knows to listen for ping** This will be a common case immediately after this change. The client will parse the sent data and check if it matches the "ready" ping, and if it doesn't, the client assumes the server simply sent the Takeover Data. * **Server does not sends ping and Client doesn't know to listen for ping** This is the case before this change. Reviewed By: simpkins Differential Revision: D20290271 fbshipit-source-id: b68e4df6264fb071d770671a80e28c90ddb0d3f2
2020-04-07 19:50:06 +03:00
})
.thenTryInline(folly::makeAsyncTask(
server_->eventBase_,
[this, data = std::move(data)](
folly::Try<UnixSocket::Message>&& msg) mutable {
if (msg.hasException()) {
// If we got an exception on sending or receiving here, we should
// bubble up an exception and recover.
// We must save the original takeoverComplete promise
// since we will move the TakeoverData into the takeoverComplete
// promise and the EdenServer waits on this to be fulfilled to
// determine to recover or not
auto takeoverPromise = std::move(data.takeoverComplete);
takeoverPromise.setValue(std::move(data));
add additional takeover "ready" handshake Summary: For graceful restart takeovers, we would like to implement an additional handshake. This handshake will occur right after the takeover data is ready to be sent to the client, but before actually sending it. This is to make sure the old daemon can recover in case of the client not being responsive (the client replies back to the server, and if no response is recieved in 5 seconds, the server will recover). There are a few cases here: * **Server sends ping (two cases discussed below)** I introduced a new ProtocolVersion. Daemons with this change will now have ProtocolVersion4. The Server checks the max version of the client, and if this version is ProtocolVersion4, we know the client can listen for pings. So we will send the ping. Otherwise, we don't send a ping. With this, we will only send pings if we know the client will be listening for one. The case in which a client isn't listening is if we adopt this change and we downgrade past the change. * **Server does not send ping and Client knows to listen for ping** This will be a common case immediately after this change. The client will parse the sent data and check if it matches the "ready" ping, and if it doesn't, the client assumes the server simply sent the Takeover Data. * **Server does not sends ping and Client doesn't know to listen for ping** This is the case before this change. Reviewed By: simpkins Differential Revision: D20290271 fbshipit-source-id: b68e4df6264fb071d770671a80e28c90ddb0d3f2
2020-04-07 19:50:06 +03:00
return makeFuture<Unit>(msg.exception());
}
return sendTakeoverData(std::move(data));
}));
}
Future<Unit> TakeoverServer::ConnHandler::sendTakeoverData(
TakeoverData&& data) {
// Before sending the takeover data, we must close the server's
// local and backing store. This is important for ensuring the RocksDB
// lock is released so the client can take over.
server_->getTakeoverHandler()->closeStorage();
UnixSocket::Message msg;
try {
add version handshake to takeover protocol Summary: Whilst chatting with simpkins we realized that we lost the handshake portion of the takeover protocol during a refactor. The handshake is important for a couple of reasons: 1. It prevents unmounting and loosing all the mounts in the case that sometime decides to netcat or otherwise connect to the socket 2. It gives us an opportunity to short circuit any heavy lifting if we know that it will be impossible to succeed. 3. It allows us to rollback to earlier builds with older versions of the takeover protocol. This diff adds a little bit of machinery to enable passing a set of supported takeover protocol version numbers. The intent is to retain support for the two of these at a time; any time we change the encoding/protocol for takeover we'll bump the version number and add supporting code to handle the new format, retaining support for the prior version. Retaining the ability to handle the prior version allows us to downgrade to an earlier build gracefully if/when the need arises. I opted to do this here rather than by bumping the `kProtocolID` constant in `UnixSocket.h` becase we're not really changing the lowest level of the protocol; just the takeover specific portions. I haven't actually changed the takeover serialization in this diff, but do have some work on that happening in D6733406; that diff will be amended to take advantage and demonstrate how this versioning scheme works. A key thing to note about the implementation of this diff is that the client sends the version number to the server, but doesn't add any explicit version encoding in the response we receive. This is deliberate and allows us to upgrade prior builds to this new scheme. I'll add a more definitive check for this situation when I actually rev the format in the following diff. Reviewed By: simpkins Differential Revision: D6743065 fbshipit-source-id: c991cebfee918daad098105ca6bcfef76374c0ff
2018-01-31 01:16:05 +03:00
msg.data = data.serialize(protocolVersion_);
msg.files.push_back(std::move(data.lockFile));
msg.files.push_back(std::move(data.thriftSocket));
for (auto& mount : data.mountPoints) {
msg.files.push_back(std::move(mount.fuseFD));
}
} catch (const std::exception& ex) {
auto ew = folly::exception_wrapper{std::current_exception(), ex};
data.takeoverComplete.setException(ew);
add version handshake to takeover protocol Summary: Whilst chatting with simpkins we realized that we lost the handshake portion of the takeover protocol during a refactor. The handshake is important for a couple of reasons: 1. It prevents unmounting and loosing all the mounts in the case that sometime decides to netcat or otherwise connect to the socket 2. It gives us an opportunity to short circuit any heavy lifting if we know that it will be impossible to succeed. 3. It allows us to rollback to earlier builds with older versions of the takeover protocol. This diff adds a little bit of machinery to enable passing a set of supported takeover protocol version numbers. The intent is to retain support for the two of these at a time; any time we change the encoding/protocol for takeover we'll bump the version number and add supporting code to handle the new format, retaining support for the prior version. Retaining the ability to handle the prior version allows us to downgrade to an earlier build gracefully if/when the need arises. I opted to do this here rather than by bumping the `kProtocolID` constant in `UnixSocket.h` becase we're not really changing the lowest level of the protocol; just the takeover specific portions. I haven't actually changed the takeover serialization in this diff, but do have some work on that happening in D6733406; that diff will be amended to take advantage and demonstrate how this versioning scheme works. A key thing to note about the implementation of this diff is that the client sends the version number to the server, but doesn't add any explicit version encoding in the response we receive. This is deliberate and allows us to upgrade prior builds to this new scheme. I'll add a more definitive check for this situation when I actually rev the format in the following diff. Reviewed By: simpkins Differential Revision: D6743065 fbshipit-source-id: c991cebfee918daad098105ca6bcfef76374c0ff
2018-01-31 01:16:05 +03:00
return socket_.send(TakeoverData::serializeError(protocolVersion_, ew));
}
XLOG(INFO) << "Sending takeover data to new process: "
<< msg.data.computeChainDataLength() << " bytes";
return socket_.send(std::move(msg))
.thenTry([promise = std::move(data.takeoverComplete)](
folly::Try<Unit>&& sendResult) mutable {
if (sendResult.hasException()) {
promise.setException(sendResult.exception());
} else {
// Set an uninitalized optional here to avoid an attempted recovery
promise.setValue(std::nullopt);
}
});
}
TakeoverServer::TakeoverServer(
folly::EventBase* eventBase,
AbsolutePathPiece socketPath,
TakeoverHandler* handler,
FaultInjector* faultInjector)
: eventBase_{eventBase},
handler_{handler},
socketPath_{socketPath},
faultInjector_(*faultInjector) {
start();
}
TakeoverServer::~TakeoverServer() {}
void TakeoverServer::start() {
// Build the address for the takeover socket.
SocketAddress address;
address.setFromPath(socketPath_.stringPiece());
// Remove any old file at this path, so we can bind to it.
auto rc = unlink(socketPath_.value().c_str());
if (rc != 0 && errno != ENOENT) {
folly::throwSystemError("error removing old takeover socket");
}
socket_.reset(new AsyncServerSocket{eventBase_});
socket_->bind(address);
socket_->listen(/* backlog */ 1024);
socket_->addAcceptCallback(this, nullptr);
socket_->startAccepting();
}
void TakeoverServer::connectionAccepted(
folly::NetworkSocket fdNetworkSocket,
const folly::SocketAddress& /* clientAddr */) noexcept {
int fd = fdNetworkSocket.toFd();
folly::File socket(fd, /* ownsFd */ true);
std::unique_ptr<ConnHandler> handler;
try {
handler.reset(new ConnHandler{this, std::move(socket)});
} catch (const std::exception& ex) {
XLOG(ERR) << "error allocating connection handler for new takeover "
"connection: "
<< exceptionStr(ex);
return;
}
XLOG(INFO) << "takeover socket connection received";
auto* handlerRawPtr = handler.get();
handlerRawPtr->start()
.thenError([](const folly::exception_wrapper& ew) {
XLOG(ERR) << "error processing takeover connection request: "
<< folly::exceptionStr(ew);
})
.ensure([h = std::move(handler)] {});
}
void TakeoverServer::acceptError(const std::exception& ex) noexcept {
XLOG(ERR) << "accept() error on takeover socket: " << exceptionStr(ex);
}
} // namespace eden
} // namespace facebook
#endif