sapling/eden/fs/service/GlobNode.h
Wez Furlong 932ef52a55 implement glob thrift method
Summary:
This is to facilitate the watchman integration and draws on the
watchman glob implementation; the approach is to split the glob strings into
path components and evaluate the components step by step as the tree is walked.
Components that do not include any glob special characters can be handled as
a direct lookup from the directory contents (O(1) rather than O(num-entries)).

The glob method returns a set of filenames that match a list of
of glob patterns.

Recursive globs are supported.  It is worth noting that a glob like "**/*" will
return a list of every entry in the filesystem.  This is potentially expensive
and should be avoided.  simpkins is in favor of disallowing this as a forcing
function to encourage tool-makers to adopt patterns that don't rely on a
complete listing of the filesystem.

For now I'd like to get this in without such a restriction; it's also worth
noting that running `find .` in the root of the mount point has a similar
effect and we can't prevent that from happening, so the effect of the overly
broad glob is something that we need to be able to withstand in any case.

Unrestricted recursive globs will make it easier to connect certain watchman
queries in the interim, until we have a more expressive thrift API for walking
and filtering the list of files.

Note: I've removed the wildmatch flags that I'd put in the API when I stubbed
it out originally.  Since this is built on top of our GlobMatcher code and that
doesn't have those flags, I thought it would be simplest to just remove them.
If we find that we need them, we can figure out how to add them later.

Also Note: the evaluation of the glob is parallel-ready but currently limited
to 1 at a time by constraining the folly::window call to 1.  We could make this
larger but would need a more intelligent constraint.  For example, a recursive
glob could initiate N concurrent futures per level where N is the number of
sub-dirs at a given level.  Using a custom Executor for these futures may be a
better option to set an upper bound on the number of concurrent jobs allowed
for a given glob call.

Depends on D4361197

Reviewed By: simpkins

Differential Revision: D4371934

fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 12:47:05 -08:00

96 lines
3.7 KiB
C++

/*
* Copyright (c) 2016-present, Facebook, Inc.
* All rights reserved.
*
* This source code is licensed under the BSD-style license found in the
* LICENSE file in the root directory of this source tree. An additional grant
* of patent rights can be found in the PATENTS file in the same directory.
*
*/
#pragma once
#include <folly/futures/Future.h>
#include "eden/fs/inodes/InodePtrFwd.h"
#include "eden/fs/model/git/GlobMatcher.h"
#include "eden/utils/PathFuncs.h"
namespace facebook {
namespace eden {
/** Represents the compiled state of a tree-walking glob operation.
* We split the glob into path components and build a tree of name
* matching operations.
* For non-recursive globs this allows an efficient walk and compare
* as we work through the tree. Path components that have no glob
* special characters can be looked up directly from the directory
* contents as a hash lookup, rather than by repeatedly matching the
* pattern against each entry.
*/
class GlobNode {
public:
// Default constructor is intended to create the root of a set of globs
// that will be parsed into the overall glob tree.
GlobNode() = default;
GlobNode(folly::StringPiece pattern, bool hasSpecials);
// Compile and add a new glob pattern to the tree.
// Compilation splits the pattern into nodes, with one node for each
// directory separator separated path component.
void parse(folly::StringPiece pattern);
// This is a recursive function to evaluate the compiled glob against
// the provided input path and inode.
// It returns the set of matching file names.
// Note: the caller is responsible for ensuring that this
// GlobNode exists until the returned Future is resolved.
folly::Future<std::unordered_set<RelativePath>> evaluate(
RelativePathPiece rootPath,
TreeInodePtr root);
private:
// Returns the next glob node token.
// This is the text from the start of pattern up to the first
// slash, or the end of the string is there was no slash.
// pattern is advanced to the start of the next token.
// hasSpecials is set to true if the returned token contains
// any special glob characters, false otherwise.
static folly::StringPiece tokenize(
folly::StringPiece& pattern,
bool* hasSpecials);
// Look up the child corresponding to a token.
// Returns nullptr if it does not exist.
// This is a simple brute force walk of the vector; the cardinality
// of the glob nodes are typically very low so this is fine.
GlobNode* lookupToken(
std::vector<std::unique_ptr<GlobNode>>* container,
folly::StringPiece token);
// Evaluates any recursive glob entries associated with this node.
// This is a recursive function which evaluates the current GlobNode against
// the recursive set of children.
// By contrast, evaluate() walks down through the GlobNodes AND the
// inode children.
// The difference is because a pattern like "**/foo" must be recursively
// matched against all the children of the inode.
folly::Future<std::unordered_set<RelativePath>> evaluateRecursiveComponent(
RelativePathPiece rootPath,
TreeInodePtr root);
// The pattern fragment for this node
folly::StringPiece pattern_;
// The compiled pattern
GlobMatcher matcher_;
// List of non-** child rules
std::vector<std::unique_ptr<GlobNode>> children_;
// List of ** child rules
std::vector<std::unique_ptr<GlobNode>> recursiveChildren_;
// If true, generate results for matches. Only applies
// to non-recursive glob patterns.
bool isLeaf_{false};
// If false we can try a name lookup of pattern rather
// than walking the children and applying the matcher
bool hasSpecials_{false};
// If true, this node is **
bool alwaysMatch_{false};
};
}
}