sapling/eden/fs/inodes/GlobNode.h

202 lines
7.6 KiB
C
Raw Normal View History

implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
/*
* Copyright (c) Facebook, Inc. and its affiliates.
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
*
* This software may be used and distributed according to the terms of the
* GNU General Public License version 2.
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
*/
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
#pragma once
#include <folly/futures/Future.h>
#include <ostream>
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
#include "eden/fs/inodes/InodePtrFwd.h"
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
#include "eden/fs/model/Hash.h"
#include "eden/fs/model/Tree.h"
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
#include "eden/fs/model/git/GlobMatcher.h"
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
#include "eden/fs/store/ObjectStore.h"
#include "eden/fs/utils/DirType.h"
#include "eden/fs/utils/EnumValue.h"
#include "eden/fs/utils/PathFuncs.h"
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
namespace facebook {
namespace eden {
/** Represents the compiled state of a tree-walking glob operation.
* We split the glob into path components and build a tree of name
* matching operations.
* For non-recursive globs this allows an efficient walk and compare
* as we work through the tree. Path components that have no glob
* special characters can be looked up directly from the directory
* contents as a hash lookup, rather than by repeatedly matching the
* pattern against each entry.
*/
class GlobNode {
public:
// Single parameter constructor is intended to create the root of a set of
// globs that will be parsed into the overall glob tree.
explicit GlobNode(bool includeDotfiles) : includeDotfiles_(includeDotfiles) {}
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
using PrefetchList = std::shared_ptr<folly::Synchronized<std::vector<Hash>>>;
GlobNode(folly::StringPiece pattern, bool includeDotfiles, bool hasSpecials);
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
struct GlobResult {
RelativePath name;
dtype_t dtype;
// Currently this is the commit hash for the commit to which this file
// belongs. But should eden move away from commit hashes this may become
// the tree hash of the root tree to which this file belongs.
// This should never become a dangling reference because the caller
// of Globresult::evaluate ensures that the hashes have a lifetime that
// exceeds that of the GlobResults returned.
const Hash* originHash;
// Comparison operator for testing purposes
bool operator==(const GlobResult& other) const noexcept {
return name == other.name && dtype == other.dtype &&
originHash == other.originHash;
}
bool operator!=(const GlobResult& other) const noexcept {
return !(*this == other);
}
bool operator<(const GlobResult& other) const noexcept {
return name < other.name || (name == other.name && dtype < other.dtype) ||
(name == other.name && dtype == other.dtype &&
originHash < other.originHash);
}
// originHash should never become a dangling refernece because the caller
// of Globresult::evaluate ensures that the hashes have a lifetime that
// exceeds that of the GlobResults returned.
GlobResult(RelativePathPiece name, dtype_t dtype, const Hash& originHash)
: name(name.copy()), dtype(dtype), originHash(&originHash) {}
GlobResult(
RelativePath&& name,
dtype_t dtype,
const Hash& originHash) noexcept
: name(std::move(name)), dtype(dtype), originHash(&originHash) {}
};
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
// Compile and add a new glob pattern to the tree.
// Compilation splits the pattern into nodes, with one node for each
// directory separator separated path component.
void parse(folly::StringPiece pattern);
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
// This is a recursive function to evaluate the compiled glob against
// the provided input path and inode.
// It returns the set of matching file names.
// Note_0: the caller is responsible for ensuring that this
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
// GlobNode exists until the returned Future is resolved.
// Note_1: The caller is also responsible for ensuring the originHash's
// lifetime exceeds that of all the returned GlobResults. These GlobResults
// will hold pointers to this originHash.
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
// If prefetchFiles is true, each matching file will have its content
// prefetched via the ObjectStore layer. This will not change the
// materialization or overlay state for children that already have
// inodes assigned.
folly::Future<std::vector<GlobResult>> evaluate(
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
const ObjectStore* store,
ObjectFetchContext& context,
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
RelativePathPiece rootPath,
TreeInodePtr root,
PrefetchList fileBlobsToPrefetch,
const Hash& originHash);
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
// This is the Tree version of the method above
folly::Future<std::vector<GlobResult>> evaluate(
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
const ObjectStore* store,
ObjectFetchContext& context,
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
RelativePathPiece rootPath,
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
const std::shared_ptr<const Tree>& tree,
PrefetchList fileBlobsToPrefetch,
const Hash& originHash);
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
/**
* Print a human-readable description of this GlobNode to stderr.
*
* For debugging purposes only.
*/
void debugDump() const;
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
private:
// Returns the next glob node token.
// This is the text from the start of pattern up to the first
// slash, or the end of the string is there was no slash.
// pattern is advanced to the start of the next token.
// hasSpecials is set to true if the returned token contains
// any special glob characters, false otherwise.
static folly::StringPiece tokenize(
folly::StringPiece& pattern,
bool* hasSpecials);
// Look up the child corresponding to a token.
// Returns nullptr if it does not exist.
// This is a simple brute force walk of the vector; the cardinality
// of the glob nodes are typically very low so this is fine.
GlobNode* lookupToken(
std::vector<std::unique_ptr<GlobNode>>* container,
folly::StringPiece token);
// Evaluates any recursive glob entries associated with this node.
// This is a recursive function which evaluates the current GlobNode against
// the recursive set of children.
// By contrast, evaluate() walks down through the GlobNodes AND the
// inode children.
// The difference is because a pattern like "**/foo" must be recursively
// matched against all the children of the inode.
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
template <typename ROOT>
folly::Future<std::vector<GlobResult>> evaluateRecursiveComponentImpl(
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
const ObjectStore* store,
ObjectFetchContext& context,
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
RelativePathPiece rootPath,
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
ROOT&& root,
PrefetchList fileBlobsToPrefetch,
const Hash& originHash);
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
template <typename ROOT>
folly::Future<std::vector<GlobResult>> evaluateImpl(
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
const ObjectStore* store,
ObjectFetchContext& context,
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
RelativePathPiece rootPath,
ROOT&& root,
PrefetchList fileBlobsToPrefetch,
const Hash& originHash);
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
void debugDump(int currentDepth) const;
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
// The pattern fragment for this node
std::string pattern_;
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
// The compiled pattern
GlobMatcher matcher_;
// List of non-** child rules
std::vector<std::unique_ptr<GlobNode>> children_;
// List of ** child rules
std::vector<std::unique_ptr<GlobNode>> recursiveChildren_;
// For a child GlobNode that is added to this GlobNode (presumably via
// parse()), the GlobMatcher pattern associated with the child node should use
// this value for its includeDotfiles parameter.
bool includeDotfiles_;
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
// If true, generate results for matches. Only applies
// to non-recursive glob patterns.
bool isLeaf_{false};
// If false we can try a name lookup of pattern rather
// than walking the children and applying the matcher
bool hasSpecials_{false};
// true when both of the following hold:
// - this node is "**" or "*"
// - it was created with includeDotfiles=true.
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
bool alwaysMatch_{false};
};
// Streaming operators for logging and printing
inline std::ostream& operator<<(
std::ostream& stream,
const GlobNode::GlobResult& a) {
stream << "GlobResult{\"" << a.name.stringPiece()
<< "\", dtype=" << enumValue(a.dtype) << "}";
return stream;
}
} // namespace eden
} // namespace facebook