sapling/eden/fs/inodes/Dirstate.cpp
Michael Bolin 57f5d72a27 Reimplement dirstate used by Eden's Hg extension as a subclass of Hg's dirstate.
Summary:
This is a major change to Eden's Hg extension.

Our initial attempt to implement `edendirstate` was to create a "clean room"
implementation that did not share code with `mercurial/dirstate.py`. This was
helpful in uncovering the subset of the dirstate API that matters for Eden. It
also provided a better safeguard against upstream changes to `dirstate.py` in
Mercurial itself.

In this implementation, the state transition management was mostly done
on the server in `Dirstate.cpp`. We also made a modest attempt to make
`Dirstate.cpp` "SCM-agnostic" such that the same APIs could be used for
Git at some point.

However, as we have tried to support more of the sophisticated functionality
in Mercurial, particularly `hg histedit`, achieving parity between the clean room
implementation and Mercurial's internals has become more challenging.
Ultimately, the clean room implementation is likely the right way to go for Eden,
but for now, we need to prioritize having feature parity with vanilla Hg when
using Eden. Once we have a more complete set of integration tests in place,
we can reimplement Eden's dirstate more aggressively to optimize things.

Fortunately, the [[ https://bitbucket.org/facebook/hg-experimental/src/default/sqldirstate/ | sqldirstate ]]
extension has already demonstrated that it is possible to provide a faithful
dirstate implementation that subclasses the original `dirstate` while using a different
storage mechanism. As such, I used `sqldirstate` as a model when implementing
the new `eden_dirstate` (distinguishing it from our v1 implementation, `edendirstate`).

In particular, `sqldirstate` uses SQL tables as storage for the following private fields
of `dirstate`: `_map`, `_dirs`, `_copymap`, `_filefoldmap`, `_dirfoldmap`. Because
`_filefoldmap` and `_dirfoldmap` exist to deal with case-insensitivity issues, we
do not support them in `eden_dirstate` and add code to ensure the codepaths that
would access them in `dirstate` never get exercised. Similarly, we also implemented
`eden_dirstate` so that it never accesses `_dirs`. (`_dirs` is a multiset of all directories in the
dirstate, which is an O(repo) data structure, so we do not want to maintain it in Eden.
It appears to be primarily used for checking whether a path to a file already exists in
the dirstate as a directory. We can protect against that in more efficient ways.)

That leaves only `_map` and `_copymap` to worry about. `_copymap` contains the set
of files that have been marked "copied" in the current dirstate, so it is fairly small and
can be stored on disk or in memory with little concern. `_map` is a bit trickier because
it is expected to have an entry for every file in the dirstate. In `sqldirstate`, it is stored
across two tables: `files` and `nonnormalfiles`. For Eden, we already represent the data
analogous to the `files` table in RocksDB/the overlay, so we do not need to create a new
equivalent to the `files` table. We do, however, need an equivalent to the `nonnormalfiles`
table, which we store in as Thrift-serialized data in an ordinary file along with the `_copymap`
data.

In our Hg extension, our implementation of `_map` is `eden_dirstate_map`, which is defined
in a Python file of the same name. Our implementation of `_copymap` is `dummy_copymap`,
which is defined in `eden_dirstate.py`. Both of these collections are simple pass-through data
structures that translate their method calls to Thrift server calls. I expect we will want to
optimize this in the future via some client-side caching, as well as creating batch APIs for talking
to the server via Thrift.

One advantage of this new implementation is that it enables us to delete
`eden/hg/eden/overrides.py`, which overrode the entry points for `hg add` and `hg remove`.
Between the recent implementation of `dirstate.walk()` for Eden and this switch
to the real dirstate, we can now use the default implementation of `hg add` and `hg remove`
(although we have to play some tricks, like in the implementation of `eden_dirstate.status()`
in order to make `hg remove` work).

In the course of doing this revision, I discovered that I had to make a minor fix to
`EdenMatchInfo.make_glob_list()` because `hg add foo` was being treated as
`hg add foo/**/*` even when `foo` was just a file (as opposed to a directory), in which
case the glob was not matching `foo`!

I also had to do some work in `eden_dirstate.status()` in which the `match` argument
was previously largely ignored. It turns out that `dirstate.py` uses `status()` for a number
of things with the `match` specified as a filter, so the output of `status()` must be filtered
by `match` accordingly. Ultimately, this seems like work that would be better done on the
server, but for simplicity, we're just going to do it in Python, for now.

For the reasons explained above, this revision deletes a lot of code `Dirstate.cpp`.
As such, `DirstateTest.cpp` does not seem worth refactoring, though the scenarios it was
testing should probably be converted to integration tests. At a high level, the role of
`DirstatePersistence` has not changed, but the exact data it writes is much different.
Its corresponding unit test is also disabled, for now.

Note that this revision does not change the name of the file where "dirstate data" is written
(this is defined as `kDirstateFile` in `ClientConfig.cpp`), so we should blow away any existing
instances of this file once this change lands. (It is still early enough in the project that it does
not seem worth the overhead of a proper migration.)

The true test of the success of this new approach is the ease with which we can write more
integration tests for things like `hg histedit` and `hg graft`. Ideally, these should require very
few changes to `eden_dirstate.py`.

Reviewed By: simpkins

Differential Revision: D5071778

fbshipit-source-id: e8fec4d393035d80f36516ac050cad025dc3ba31
2017-05-26 12:05:29 -07:00

383 lines
13 KiB
C++

/*
* Copyright (c) 2016-present, Facebook, Inc.
* All rights reserved.
*
* This source code is licensed under the BSD-style license found in the
* LICENSE file in the root directory of this source tree. An additional grant
* of patent rights can be found in the PATENTS file in the same directory.
*
*/
#include "Dirstate.h"
#include <folly/Format.h>
#include <folly/MapUtil.h>
#include <folly/Range.h>
#include <folly/Unit.h>
#include <folly/experimental/StringKeyedUnorderedMap.h>
#include "eden/fs/config/ClientConfig.h"
#include "eden/fs/fuse/MountPoint.h"
#include "eden/fs/inodes/DirstatePersistence.h"
#include "eden/fs/inodes/EdenMount.h"
#include "eden/fs/inodes/FileInode.h"
#include "eden/fs/inodes/InodeBase.h"
#include "eden/fs/inodes/InodeDiffCallback.h"
#include "eden/fs/inodes/Overlay.h"
#include "eden/fs/inodes/TreeInode.h"
#include "eden/fs/store/ObjectStore.h"
#include "eden/fs/store/ObjectStores.h"
using folly::Future;
using folly::makeFuture;
using folly::StringKeyedUnorderedMap;
using folly::StringPiece;
using folly::Unit;
using facebook::eden::hgdirstate::DirstateNonnormalFileStatus;
using facebook::eden::hgdirstate::DirstateMergeState;
using facebook::eden::hgdirstate::DirstateTuple;
using std::string;
namespace facebook {
namespace eden {
namespace {
class ThriftStatusCallback : public InodeDiffCallback {
public:
explicit ThriftStatusCallback(
const folly::StringKeyedUnorderedMap<DirstateTuple>& hgDirstateTuples)
: data_{folly::construct_in_place, hgDirstateTuples} {}
void ignoredFile(RelativePathPiece path) override {
processChangedFile(
path,
DirstateNonnormalFileStatus::MarkedForAddition,
StatusCode::ADDED,
StatusCode::IGNORED);
}
void untrackedFile(RelativePathPiece path) override {
auto data = data_.wlock();
auto dirstateTuple =
folly::get_ptr(data->hgDirstateTuples, path.stringPiece());
auto statusCode = StatusCode::NOT_TRACKED;
if (dirstateTuple != nullptr) {
auto nnFileStatus = dirstateTuple->get_status();
if (nnFileStatus == DirstateNonnormalFileStatus::MarkedForAddition) {
statusCode = StatusCode::ADDED;
} else if (nnFileStatus == DirstateNonnormalFileStatus::Normal) {
auto mergeState = dirstateTuple->get_mergeState();
// TODO(mbolin): Also need to set to ADDED if path is in the copymap.
if (mergeState == DirstateMergeState::OtherParent) {
statusCode = StatusCode::ADDED;
}
}
}
data->status.emplace(path.stringPiece().str(), statusCode);
}
void removedFile(
RelativePathPiece path,
const TreeEntry& /* sourceControlEntry */) override {
processChangedFile(
path,
DirstateNonnormalFileStatus::MarkedForRemoval,
StatusCode::REMOVED,
StatusCode::MISSING);
}
void modifiedFile(
RelativePathPiece path,
const TreeEntry& /* sourceControlEntry */) override {
processChangedFile(
path,
DirstateNonnormalFileStatus::MarkedForRemoval,
StatusCode::REMOVED,
StatusCode::MODIFIED);
}
void diffError(RelativePathPiece path, const folly::exception_wrapper& ew)
override {
// TODO: It would be nice to have a mechanism to return error info as part
// of the thrift result.
LOG(WARNING) << "error computing status data for " << path << ": "
<< folly::exceptionStr(ew);
}
/**
* Extract the ThriftHgStatus object from this callback.
*
* This method should be called no more than once, as this destructively
* moves the results out of the callback. It should only be invoked after
* the diff operation has completed.
*/
ThriftHgStatus extractStatus() {
ThriftHgStatus status;
{
auto data = data_.wlock();
status.entries.swap(data->status);
// Process any remaining user directives that weren't seen during the diff
// walk.
//
// TODO: I believe this isn't really right, but it should be good enough
// for initial testing.
//
// We really need to also check if these entries exist currently on
// disk and in source control. For files that are removed but exist on
// disk we also need to check their ignored status.
//
// - UserStatusDirective::Add, exists on disk, and in source control:
// -> skip
// - UserStatusDirective::Add, exists on disk, not in SCM, but ignored:
// -> ADDED
// - UserStatusDirective::Add, not on disk or in source control:
// -> MISSING
// - UserStatusDirective::Remove, exists on disk, and in source control:
// -> REMOVED
// - UserStatusDirective::Remove, exists on disk, not in SCM, but ignored:
// -> skip
// - UserStatusDirective::Remove, not on disk, not in source control:
// -> skip
for (const auto& entry : data->hgDirstateTuples) {
auto nnFileStatus = entry.second.get_status();
if (nnFileStatus != DirstateNonnormalFileStatus::MarkedForAddition &&
nnFileStatus != DirstateNonnormalFileStatus::MarkedForRemoval) {
// TODO(mbolin): Handle this case.
continue;
}
auto hgStatusCode =
(nnFileStatus == DirstateNonnormalFileStatus::MarkedForAddition)
? StatusCode::MISSING
: StatusCode::REMOVED;
status.entries.emplace(entry.first.str(), hgStatusCode);
}
}
return status;
}
private:
/**
* The implementation used for the ignoredFile(), untrackedFile(),
* removedFile(), and modifiedFile().
*
* The logic is:
* - If the file is present in userDirectives as userDirectiveType,
* then remove it from userDirectives and report the status as
* userDirectiveStatus.
* - Otherwise, report the status as defaultStatus
*/
void processChangedFile(
RelativePathPiece path,
DirstateNonnormalFileStatus userDirectiveType,
StatusCode userDirectiveStatus,
StatusCode defaultStatus) {
auto data = data_.wlock();
auto iter = data->hgDirstateTuples.find(path.stringPiece());
if (iter != data->hgDirstateTuples.end()) {
if (iter->second.get_status() == userDirectiveType) {
data->status.emplace(path.stringPiece().str(), userDirectiveStatus);
data->hgDirstateTuples.erase(iter);
return;
}
}
data->status.emplace(path.stringPiece().str(), defaultStatus);
}
struct Data {
explicit Data(const folly::StringKeyedUnorderedMap<DirstateTuple>& ud)
: hgDirstateTuples(ud) {}
std::map<std::string, StatusCode> status;
StringKeyedUnorderedMap<DirstateTuple> hgDirstateTuples;
};
folly::Synchronized<Data> data_;
};
} // unnamed namespace
Dirstate::Dirstate(EdenMount* mount)
: mount_(mount),
persistence_(mount->getConfig()->getDirstateStoragePath()) {
auto loadedData = persistence_.load();
}
Dirstate::~Dirstate() {}
ThriftHgStatus Dirstate::getStatus(bool listIgnored) const {
ThriftStatusCallback callback(data_.rlock()->hgDirstateTuples);
mount_->diff(&callback, listIgnored).get();
return callback.extractStatus();
}
namespace {
static bool isMagicPath(RelativePathPiece path) {
// If any component of the path name is .eden, then this path is a magic
// path that we won't allow to be checked in or show up in the dirstate.
for (auto c : path.components()) {
if (c.stringPiece() == kDotEdenName) {
return true;
}
}
return false;
}
}
Future<Unit> Dirstate::onSnapshotChanged(const Tree* rootTree) {
LOG(INFO) << "Dirstate::onSnapshotChanged(" << rootTree->getHash() << ")";
{
auto data = data_.wlock();
bool madeChanges = false;
if (!data->hgDestToSourceCopyMap.empty()) {
// For now, we blindly assume that when the snapshot changes, the copymap
// data is no longer valid.
data->hgDestToSourceCopyMap.clear();
madeChanges = true;
}
// For now, we also blindly assume that when the snapshot changes, we can
// remove all dirstate tuples except for those that have a merge state of
// OtherParent.
auto iter = data->hgDirstateTuples.begin();
while (iter != data->hgDirstateTuples.end()) {
// If we need to erase this element, it will erase iterators pointing to
// it, but other iterators will be unaffected.
auto current = iter;
++iter;
if (current->second.get_mergeState() != DirstateMergeState::OtherParent) {
data->hgDirstateTuples.erase(current);
madeChanges = true;
}
}
if (madeChanges) {
persistence_.save(*data);
}
}
return makeFuture();
}
DirstateTuple Dirstate::hgGetDirstateTuple(const RelativePathPiece filename) {
auto data = data_.rlock();
auto& hgDirstateTuples = data->hgDirstateTuples;
auto* ptr = folly::get_ptr(hgDirstateTuples, filename.stringPiece());
if (ptr != nullptr) {
return *ptr;
} else if (
filename == RelativePathPiece{".hgsub"} ||
filename == RelativePathPiece{".hgsubstate"}) {
// Currently, these are the only files that Hg appears to ask about that are
// not expected to be in the dirstate when the request is made. This is
// admittedly pretty sloppy, but since we don't seem to be planning to
// support subrepos in Eden, this seems to have the desired effect as it is
// ultimately reflected as a KeyError in the Hg extension (though it could
// be swallowing a real logical error in that case, as well).
throw std::domain_error(filename.stringPiece().str());
} else {
// TODO(mbolin): Make sure the file exists in the working copy and set the
// appropriate values in the HgDirstateTuple. Most likely the biggest
// question is whether NotTracked or Normal should be returned.
DirstateTuple tuple;
tuple.set_status(DirstateNonnormalFileStatus::NotTracked);
tuple.set_mode(0644); // TODO(mbolin): Check executable bit!
tuple.set_mergeState(DirstateMergeState::NotApplicable);
return tuple;
}
}
void Dirstate::hgSetDirstateTuple(
const RelativePathPiece filename,
const DirstateTuple* tuple) {
auto data = data_.wlock();
if (tuple->get_status() == hgdirstate::DirstateNonnormalFileStatus::Normal &&
tuple->get_mergeState() ==
hgdirstate::DirstateMergeState::NotApplicable) {
data->hgDirstateTuples.erase(filename.stringPiece());
} else {
data->hgDirstateTuples[filename.stringPiece()] = *tuple;
}
persistence_.save(*data);
}
std::unordered_map<RelativePath, DirstateTuple> Dirstate::hgGetNonnormalFiles()
const {
std::unordered_map<RelativePath, DirstateTuple> out;
auto& hgDirstateTuples = data_.rlock()->hgDirstateTuples;
for (const auto& pair : hgDirstateTuples) {
out.emplace(RelativePath{pair.first}, pair.second);
}
return out;
}
void Dirstate::hgCopyMapPut(
const RelativePathPiece dest,
const RelativePathPiece source) {
auto data = data_.wlock();
if (source.empty()) {
data->hgDestToSourceCopyMap.erase(dest.stringPiece());
} else {
data->hgDestToSourceCopyMap.emplace(dest.stringPiece(), source.copy());
}
persistence_.save(*data);
}
RelativePath Dirstate::hgCopyMapGet(const RelativePathPiece dest) const {
auto& hgDestToSourceCopyMap = data_.rlock()->hgDestToSourceCopyMap;
return folly::get_or_throw(hgDestToSourceCopyMap, dest.stringPiece());
}
folly::StringKeyedUnorderedMap<RelativePath> Dirstate::hgCopyMapGetAll() const {
return data_.rlock()->hgDestToSourceCopyMap;
}
std::ostream& operator<<(
std::ostream& os,
const DirstateAddRemoveError& error) {
return os << error.errorMessage;
}
const char kStatusCodeCharClean = 'C';
const char kStatusCodeCharModified = 'M';
const char kStatusCodeCharAdded = 'A';
const char kStatusCodeCharRemoved = 'R';
const char kStatusCodeCharMissing = '!';
const char kStatusCodeCharNotTracked = '?';
const char kStatusCodeCharIgnored = 'I';
char hgStatusCodeChar(StatusCode code) {
switch (code) {
case StatusCode::CLEAN:
return kStatusCodeCharClean;
case StatusCode::MODIFIED:
return kStatusCodeCharModified;
case StatusCode::ADDED:
return kStatusCodeCharAdded;
case StatusCode::REMOVED:
return kStatusCodeCharRemoved;
case StatusCode::MISSING:
return kStatusCodeCharMissing;
case StatusCode::NOT_TRACKED:
return kStatusCodeCharNotTracked;
case StatusCode::IGNORED:
return kStatusCodeCharIgnored;
}
throw std::runtime_error(folly::to<std::string>(
"Unrecognized StatusCode: ",
static_cast<typename std::underlying_type<StatusCode>::type>(code)));
}
std::ostream& operator<<(std::ostream& os, const ThriftHgStatus& status) {
os << "{";
for (const auto& pair : status.get_entries()) {
os << hgStatusCodeChar(pair.second) << " " << pair.first << "; ";
}
os << "}";
return os;
}
}
}