sapling/eden/fs/service/GlobNode.cpp
Wez Furlong 932ef52a55 implement glob thrift method
Summary:
This is to facilitate the watchman integration and draws on the
watchman glob implementation; the approach is to split the glob strings into
path components and evaluate the components step by step as the tree is walked.
Components that do not include any glob special characters can be handled as
a direct lookup from the directory contents (O(1) rather than O(num-entries)).

The glob method returns a set of filenames that match a list of
of glob patterns.

Recursive globs are supported.  It is worth noting that a glob like "**/*" will
return a list of every entry in the filesystem.  This is potentially expensive
and should be avoided.  simpkins is in favor of disallowing this as a forcing
function to encourage tool-makers to adopt patterns that don't rely on a
complete listing of the filesystem.

For now I'd like to get this in without such a restriction; it's also worth
noting that running `find .` in the root of the mount point has a similar
effect and we can't prevent that from happening, so the effect of the overly
broad glob is something that we need to be able to withstand in any case.

Unrestricted recursive globs will make it easier to connect certain watchman
queries in the interim, until we have a more expressive thrift API for walking
and filtering the list of files.

Note: I've removed the wildmatch flags that I'd put in the API when I stubbed
it out originally.  Since this is built on top of our GlobMatcher code and that
doesn't have those flags, I thought it would be simplest to just remove them.
If we find that we need them, we can figure out how to add them later.

Also Note: the evaluation of the glob is parallel-ready but currently limited
to 1 at a time by constraining the folly::window call to 1.  We could make this
larger but would need a more intelligent constraint.  For example, a recursive
glob could initiate N concurrent futures per level where N is the number of
sub-dirs at a given level.  Using a custom Executor for these futures may be a
better option to set an upper bound on the number of concurrent jobs allowed
for a given glob call.

Depends on D4361197

Reviewed By: simpkins

Differential Revision: D4371934

fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 12:47:05 -08:00

256 lines
7.6 KiB
C++

/*
* Copyright (c) 2016-present, Facebook, Inc.
* All rights reserved.
*
* This source code is licensed under the BSD-style license found in the
* LICENSE file in the root directory of this source tree. An additional grant
* of patent rights can be found in the PATENTS file in the same directory.
*
*/
#include "GlobNode.h"
#include "EdenError.h"
#include "eden/fs/inodes/TreeInode.h"
using std::string;
using std::unique_ptr;
using std::vector;
using folly::Future;
using folly::makeFuture;
using folly::make_unique;
using folly::StringPiece;
using std::unordered_set;
namespace facebook {
namespace eden {
GlobNode::GlobNode(StringPiece pattern, bool hasSpecials)
: pattern_(pattern), hasSpecials_(hasSpecials) {
if (pattern_ == "**" || pattern_ == "*") {
alwaysMatch_ = true;
} else {
auto compiled = GlobMatcher::create(pattern);
if (compiled.hasError()) {
throw newEdenError(
EINVAL,
"failed to compile pattern `{}` to GlobMatcher: {}",
pattern,
compiled.error());
}
matcher_ = std::move(compiled.value());
}
}
void GlobNode::parse(StringPiece pattern) {
GlobNode* parent = this;
while (!pattern.empty()) {
StringPiece token;
auto* container = &parent->children_;
bool hasSpecials;
if (pattern.startsWith("**")) {
// Recursive match defeats most optimizations; we have to stop
// tokenizing here.
token = pattern;
pattern = StringPiece();
container = &parent->recursiveChildren_;
hasSpecials = true;
} else {
token = tokenize(pattern, &hasSpecials);
}
auto node = lookupToken(container, token);
if (!node) {
container->emplace_back(std::make_unique<GlobNode>(token, hasSpecials));
node = container->back().get();
}
// If there are no more tokens remaining then we have a leaf node
// that will emit results. Update the node to reflect this.
// Note that this may convert a pre-existing node from an earlier
// glob specification to a leaf node.
if (pattern.empty()) {
node->isLeaf_ = true;
}
// Continue parsing the remainder of the pattern using this
// (possibly new) node as the parent.
parent = node;
}
}
Future<unordered_set<RelativePath>> GlobNode::evaluate(
RelativePathPiece rootPath,
TreeInodePtr root) {
unordered_set<RelativePath> results =
evaluateRecursiveComponent(rootPath, root).get();
vector<std::pair<PathComponent, GlobNode*>> recurse;
{
auto contents = root->getContents().rlock();
for (auto& node : children_) {
if (!node->hasSpecials_) {
// We can try a lookup for the exact name
auto it = contents->entries.find(PathComponentPiece(node->pattern_));
if (it != contents->entries.end()) {
// Matched!
if (node->isLeaf_) {
results.emplace((rootPath + it->first));
continue;
}
// Not the leaf of a pattern; if this is a dir, we need to recurse
if (S_ISDIR(it->second->mode)) {
recurse.emplace_back(std::make_pair(it->first, node.get()));
}
}
} else {
// We need to match it out of the entries in this inode
for (auto& entry : contents->entries) {
if (node->alwaysMatch_ ||
node->matcher_.match(entry.first.stringPiece())) {
if (node->isLeaf_) {
results.emplace((rootPath + entry.first));
continue;
}
// Not the leaf of a pattern; if this is a dir, we need to
// recurse
if (S_ISDIR(entry.second->mode)) {
recurse.emplace_back(std::make_pair(entry.first, node.get()));
}
}
}
}
}
}
// Recursively load child inodes and evaluate matches with a concurrency
// constraint.
// For now we only evaluate 1 child dir at a time.
// We could go larger but need to be careful about how this expands
// with the depth of the tree.
const constexpr size_t kConcurrency = 1;
auto childInodes = folly::window(
std::move(recurse),
[ rootPath = rootPath.copy(), root, this ](
const std::pair<PathComponent, GlobNode*>& item) {
auto candidateName = rootPath + item.first;
return root->getOrLoadChildTree(item.first).then([
candidateName,
node = item.second
](TreeInodePtr dir) { return node->evaluate(candidateName, dir); });
},
kConcurrency);
// Merge the results to yield a de-duplicated set of matches
return folly::unorderedReduce(
std::move(childInodes),
std::move(results),
[](unordered_set<RelativePath> result,
const unordered_set<RelativePath>& matches) {
result.insert(matches.begin(), matches.end());
return result;
});
}
StringPiece GlobNode::tokenize(StringPiece& pattern, bool* hasSpecials) {
*hasSpecials = false;
for (auto it = pattern.begin(); it != pattern.end(); ++it) {
switch (*it) {
case '*':
case '?':
case '[':
case '\\':
*hasSpecials = true;
break;
case '/':
// token is the input up-to-but-not-including the current position,
// which is a '/' character
StringPiece token(pattern.begin(), it);
// update the pattern to be the text after the slash
pattern = StringPiece(it + 1, pattern.end());
return token;
}
}
// No slash found, so the the rest of the pattern is the token
StringPiece token = pattern;
pattern = StringPiece();
return token;
}
GlobNode* GlobNode::lookupToken(
vector<unique_ptr<GlobNode>>* container,
StringPiece token) {
for (auto& child : *container) {
if (child->pattern_ == token) {
return child.get();
}
}
return nullptr;
}
Future<unordered_set<RelativePath>> GlobNode::evaluateRecursiveComponent(
RelativePathPiece rootPath,
TreeInodePtr root) {
unordered_set<RelativePath> results;
if (recursiveChildren_.empty()) {
return results;
}
vector<RelativePath> subDirNames;
{
auto contents = root->getContents().rlock();
for (auto& entry : contents->entries) {
auto candidateName = rootPath + entry.first;
for (auto& node : recursiveChildren_) {
if (node->alwaysMatch_ ||
node->matcher_.match(candidateName.stringPiece())) {
results.emplace(candidateName);
// No sense running multiple matches for this same file.
break;
}
}
// Remember to recurse through child dirs after we've released
// the lock on the contents.
if (S_ISDIR(entry.second->mode)) {
subDirNames.emplace_back(candidateName);
}
}
}
// Recursively load child inodes and evaluate matches with a concurrency
// constraint.
// For now we only evaluate 1 child dir at a time.
// We could go larger but need to be careful about how this expands
// with the depth of the tree.
const constexpr size_t kConcurrency = 1;
auto childInodes = folly::window(
std::move(subDirNames),
[ rootPath = rootPath.copy(), root, this ](
const RelativePath& candidateName) {
return root->getOrLoadChildTree(candidateName.basename())
.then([candidateName, this](TreeInodePtr dir) {
return evaluateRecursiveComponent(candidateName, dir);
});
},
kConcurrency);
// Merge the results to yield a de-duplicated set of matches
return folly::unorderedReduce(
childInodes,
results,
[](unordered_set<RelativePath> result,
const unordered_set<RelativePath>& matches) {
result.insert(matches.begin(), matches.end());
return result;
});
}
}
}