sapling/eden/fs/inodes/GlobNode.cpp

519 lines
17 KiB
C++
Raw Normal View History

implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
/*
* Copyright (c) 2016-present, Facebook, Inc.
* All rights reserved.
*
* This source code is licensed under the BSD-style license found in the
* LICENSE file in the root directory of this source tree. An additional grant
* of patent rights can be found in the PATENTS file in the same directory.
*
*/
#include "GlobNode.h"
#include "eden/fs/inodes/TreeInode.h"
using folly::Future;
using folly::makeFuture;
using folly::StringPiece;
using std::make_unique;
using std::string;
using std::unique_ptr;
using std::vector;
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
namespace facebook {
namespace eden {
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
namespace {
// Policy objects to help avoid duplicating the core globbing logic.
// We can walk over two different kinds of trees; either TreeInodes
// or raw Trees from the storage layer. While they have similar
// properties, accessing them is a little different. These policy
// objects are thin shims that make access more uniform.
/** TreeInodePtrRoot wraps a TreeInodePtr for globbing.
* TreeInodes require that a lock be held while its entries
* are iterated.
* We only need to prefetch children of TreeInodes that are
* not materialized.
*/
struct TreeInodePtrRoot {
TreeInodePtr root;
explicit TreeInodePtrRoot(TreeInodePtr root) : root(root) {}
/** Return an object that holds a lock over the children */
auto lockContents() {
return root->getContents().rlock();
}
/** Given the return value from lockContents and a name,
* return a pointer to the child with that name, or nullptr
* if there is no match */
template <typename CONTENTS>
const DirEntry* FOLLY_NULLABLE
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
lookupEntry(CONTENTS& contents, PathComponentPiece name) {
auto it = contents->entries.find(name);
if (it != contents->entries.end()) {
return &it->second;
}
return nullptr;
}
/** Return an object that can be used in a generic for()
* constructor to iterate over the contents. You must supply
* the CONTENTS object you obtained via lockContents().
* The returned iterator yields ENTRY elements that can be
* used with the entryXXX methods below. */
template <typename CONTENTS>
auto& iterate(CONTENTS& contents) {
return contents->entries;
}
/** Arrange to load a child TreeInode */
Future<TreeInodePtr> getOrLoadChildTree(PathComponentPiece name) {
return root->getOrLoadChildTree(name);
}
/** Returns true if we should call getOrLoadChildTree() for the given
* ENTRY. We only do this if the child is already materialized */
template <typename ENTRY>
bool entryShouldLoadChildTree(const ENTRY& entry) {
return entry.second.isMaterialized();
}
bool entryShouldLoadChildTree(const DirEntry* entry) {
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
return entry->isMaterialized();
}
/** Returns the name for a given ENTRY */
template <typename ENTRY>
PathComponentPiece entryName(const ENTRY& entry) {
return entry.first;
}
/** Returns true if the given ENTRY is a tree */
template <typename ENTRY>
bool entryIsTree(const ENTRY& entry) {
return entry.second.isDirectory();
}
/** Returns true if the given ENTRY is a tree (pointer version) */
bool entryIsTree(const DirEntry* entry) {
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
return entry->isDirectory();
}
/** Returns true if we should prefetch the blob content for the entry.
* We only do this if the child is not already materialized */
template <typename ENTRY>
bool entryShouldPrefetch(const ENTRY& entry) {
return !entry.second.isMaterialized();
}
bool entryShouldPrefetch(const DirEntry* entry) {
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
return !entry->isMaterialized();
}
/** Returns the hash for the given ENTRY */
template <typename ENTRY>
const Hash entryHash(const ENTRY& entry) {
return entry.second.getHash();
}
const Hash entryHash(const DirEntry* entry) {
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
return entry->getHash();
}
};
/** TreeRoot wraps a Tree for globbing.
* The entries do not need to be locked, but to satisfy the interface
* we return the entries when lockContents() is called.
*/
struct TreeRoot {
std::shared_ptr<const Tree> tree;
explicit TreeRoot(const std::shared_ptr<const Tree>& tree) : tree(tree) {}
/** We don't need to lock the contents, so we just return a reference
* to the entries */
auto& lockContents() {
return tree->getTreeEntries();
}
/** Return an object that can be used in a generic for()
* constructor to iterate over the contents. You must supply
* the CONTENTS object you obtained via lockContents().
* The returned iterator yields ENTRY elements that can be
* used with the entryXXX methods below. */
template <typename CONTENTS>
auto& iterate(CONTENTS& contents) {
return contents;
}
/** We can never load a TreeInodePtr from a raw Tree, so this always
* fails. We never call this method because entryShouldLoadChildTree()
* always returns false. */
folly::Future<TreeInodePtr> getOrLoadChildTree(PathComponentPiece) {
throw std::runtime_error("impossible to get here");
}
template <typename ENTRY>
bool entryShouldLoadChildTree(const ENTRY&) {
return false;
}
template <typename CONTENTS>
auto* FOLLY_NULLABLE lookupEntry(CONTENTS&, PathComponentPiece name) {
return tree->getEntryPtr(name);
}
template <typename ENTRY>
PathComponentPiece entryName(const ENTRY& entry) {
return entry.getName();
}
template <typename ENTRY>
bool entryIsTree(const ENTRY& entry) {
return entry.isTree();
}
bool entryIsTree(const TreeEntry* entry) {
return entry->isTree();
}
// We always need to prefetch children of a raw Tree
template <typename ENTRY>
bool entryShouldPrefetch(const ENTRY&) {
return true;
}
template <typename ENTRY>
const Hash entryHash(const ENTRY& entry) {
return entry.getHash();
}
const Hash entryHash(const TreeEntry* entry) {
return entry->getHash();
}
};
} // namespace
GlobNode::GlobNode(StringPiece pattern, bool includeDotfiles, bool hasSpecials)
: pattern_(pattern.str()),
includeDotfiles_(includeDotfiles),
hasSpecials_(hasSpecials) {
if (includeDotfiles && (pattern == "**" || pattern == "*")) {
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
alwaysMatch_ = true;
} else {
auto options =
includeDotfiles ? GlobOptions::DEFAULT : GlobOptions::IGNORE_DOTFILES;
auto compiled = GlobMatcher::create(pattern, options);
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
if (compiled.hasError()) {
throw std::system_error(
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
EINVAL,
std::generic_category(),
folly::sformat(
"failed to compile pattern `{}` to GlobMatcher: {}",
pattern,
compiled.error()));
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
}
matcher_ = std::move(compiled.value());
}
}
void GlobNode::parse(StringPiece pattern) {
GlobNode* parent = this;
string normalizedPattern;
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
while (!pattern.empty()) {
StringPiece token;
auto* container = &parent->children_;
bool hasSpecials;
if (pattern.startsWith("**")) {
// Recursive match defeats most optimizations; we have to stop
// tokenizing here.
// HACK: We special-case "**" if includeDotfiles=false. In this case, we
// need to create a GlobMatcher for this pattern, but GlobMatcher is
// designed to reject "**". As a workaround, we use "**/*", which is
// functionally equivalent in this case because there are no other
// "tokens" in the pattern following the "**" at this point.
if (pattern == "**" && !includeDotfiles_) {
normalizedPattern = "**/*";
token = normalizedPattern;
} else {
token = pattern;
}
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
pattern = StringPiece();
container = &parent->recursiveChildren_;
hasSpecials = true;
} else {
token = tokenize(pattern, &hasSpecials);
}
auto node = lookupToken(container, token);
if (!node) {
container->emplace_back(
std::make_unique<GlobNode>(token, includeDotfiles_, hasSpecials));
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
node = container->back().get();
}
// If there are no more tokens remaining then we have a leaf node
// that will emit results. Update the node to reflect this.
// Note that this may convert a pre-existing node from an earlier
// glob specification to a leaf node.
if (pattern.empty()) {
node->isLeaf_ = true;
}
// Continue parsing the remainder of the pattern using this
// (possibly new) node as the parent.
parent = node;
}
}
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
template <typename ROOT>
Future<vector<RelativePath>> GlobNode::evaluateImpl(
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
const ObjectStore* store,
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
RelativePathPiece rootPath,
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
ROOT&& root,
GlobNode::PrefetchList fileBlobsToPrefetch) {
vector<RelativePath> results;
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
vector<std::pair<PathComponentPiece, GlobNode*>> recurse;
vector<Future<vector<RelativePath>>> futures;
futures.emplace_back(evaluateRecursiveComponentImpl(
store, rootPath, root, fileBlobsToPrefetch));
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
{
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
auto contents = root.lockContents();
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
for (auto& node : children_) {
if (!node->hasSpecials_) {
// We can try a lookup for the exact name
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
auto name = PathComponentPiece(node->pattern_);
auto entry = root.lookupEntry(contents, name);
if (entry) {
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
// Matched!
if (node->isLeaf_) {
results.emplace_back((rootPath + name));
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
continue;
}
// Not the leaf of a pattern; if this is a dir, we need to recurse
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
if (root.entryIsTree(entry)) {
if (root.entryShouldLoadChildTree(entry)) {
recurse.emplace_back(std::make_pair(name, node.get()));
} else {
auto candidateName = rootPath + name;
futures.emplace_back(
store->getTree(root.entryHash(entry))
.then([candidateName,
store,
innerNode = node.get(),
fileBlobsToPrefetch](
std::shared_ptr<const Tree> dir) {
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
return innerNode->evaluateImpl(
store,
candidateName,
TreeRoot(dir),
fileBlobsToPrefetch);
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
}));
}
} else if (fileBlobsToPrefetch && root.entryShouldPrefetch(entry)) {
fileBlobsToPrefetch->wlock()->emplace_back(root.entryHash(entry));
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
}
}
} else {
// We need to match it out of the entries in this inode
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
for (auto& entry : root.iterate(contents)) {
auto name = root.entryName(entry);
if (node->alwaysMatch_ || node->matcher_.match(name.stringPiece())) {
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
if (node->isLeaf_) {
results.emplace_back((rootPath + name));
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
continue;
}
// Not the leaf of a pattern; if this is a dir, we need to
// recurse
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
if (root.entryIsTree(entry)) {
if (root.entryShouldLoadChildTree(entry)) {
recurse.emplace_back(std::make_pair(name, node.get()));
} else {
auto candidateName = rootPath + name;
futures.emplace_back(
store->getTree(root.entryHash(entry))
.then([candidateName,
store,
innerNode = node.get(),
fileBlobsToPrefetch](
std::shared_ptr<const Tree> dir) {
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
return innerNode->evaluateImpl(
store,
candidateName,
TreeRoot(dir),
fileBlobsToPrefetch);
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
}));
}
} else if (fileBlobsToPrefetch && root.entryShouldPrefetch(entry)) {
fileBlobsToPrefetch->wlock()->emplace_back(root.entryHash(entry));
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
}
}
}
}
}
}
// Recursively load child inodes and evaluate matches
for (auto& item : recurse) {
auto candidateName = rootPath + item.first;
futures.emplace_back(root.getOrLoadChildTree(item.first)
.then([store,
candidateName,
node = item.second,
fileBlobsToPrefetch](TreeInodePtr dir) {
return node->evaluateImpl(
store,
candidateName,
TreeInodePtrRoot(dir),
fileBlobsToPrefetch);
}));
}
return folly::collect(futures).then(
[shadowResults = std::move(results)](
vector<vector<RelativePath>>&& matchVector) mutable {
for (auto& matches : matchVector) {
shadowResults.insert(
shadowResults.end(),
std::make_move_iterator(matches.begin()),
std::make_move_iterator(matches.end()));
}
return shadowResults;
});
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
}
Future<vector<RelativePath>> GlobNode::evaluate(
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
const ObjectStore* store,
RelativePathPiece rootPath,
TreeInodePtr root,
GlobNode::PrefetchList fileBlobsToPrefetch) {
return evaluateImpl(
store, rootPath, TreeInodePtrRoot(root), fileBlobsToPrefetch);
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
}
folly::Future<vector<RelativePath>> GlobNode::evaluate(
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
const ObjectStore* store,
RelativePathPiece rootPath,
const std::shared_ptr<const Tree>& tree,
GlobNode::PrefetchList fileBlobsToPrefetch) {
return evaluateImpl(store, rootPath, TreeRoot(tree), fileBlobsToPrefetch);
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
}
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
StringPiece GlobNode::tokenize(StringPiece& pattern, bool* hasSpecials) {
*hasSpecials = false;
for (auto it = pattern.begin(); it != pattern.end(); ++it) {
switch (*it) {
case '*':
case '?':
case '[':
case '\\':
*hasSpecials = true;
break;
case '/':
// token is the input up-to-but-not-including the current position,
// which is a '/' character
StringPiece token(pattern.begin(), it);
// update the pattern to be the text after the slash
pattern = StringPiece(it + 1, pattern.end());
return token;
}
}
// No slash found, so the the rest of the pattern is the token
StringPiece token = pattern;
pattern = StringPiece();
return token;
}
GlobNode* GlobNode::lookupToken(
vector<unique_ptr<GlobNode>>* container,
StringPiece token) {
for (auto& child : *container) {
if (child->pattern_ == token) {
return child.get();
}
}
return nullptr;
}
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
template <typename ROOT>
Future<vector<RelativePath>> GlobNode::evaluateRecursiveComponentImpl(
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
const ObjectStore* store,
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
RelativePathPiece rootPath,
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
ROOT&& root,
GlobNode::PrefetchList fileBlobsToPrefetch) {
vector<RelativePath> results;
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
if (recursiveChildren_.empty()) {
return results;
}
vector<RelativePath> subDirNames;
vector<Future<vector<RelativePath>>> futures;
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
{
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
auto contents = root.lockContents();
for (auto& entry : root.iterate(contents)) {
auto candidateName = rootPath + root.entryName(entry);
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
for (auto& node : recursiveChildren_) {
if (node->alwaysMatch_ ||
node->matcher_.match(candidateName.stringPiece())) {
results.emplace_back(candidateName);
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
// No sense running multiple matches for this same file.
break;
}
}
// Remember to recurse through child dirs after we've released
// the lock on the contents.
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
if (root.entryIsTree(entry)) {
if (root.entryShouldLoadChildTree(entry)) {
subDirNames.emplace_back(candidateName);
} else {
futures.emplace_back(
store->getTree(root.entryHash(entry))
.then([candidateName, store, this, fileBlobsToPrefetch](
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
const std::shared_ptr<const Tree>& tree) {
return evaluateRecursiveComponentImpl(
store,
candidateName,
TreeRoot(tree),
fileBlobsToPrefetch);
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
}));
}
} else if (fileBlobsToPrefetch && root.entryShouldPrefetch(entry)) {
fileBlobsToPrefetch->wlock()->emplace_back(root.entryHash(entry));
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
}
}
}
// Recursively load child inodes and evaluate matches
for (auto& candidateName : subDirNames) {
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
futures.emplace_back(
root.getOrLoadChildTree(candidateName.basename())
.then([candidateName, store, this, fileBlobsToPrefetch](
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
TreeInodePtr dir) {
return evaluateRecursiveComponentImpl(
store,
candidateName,
TreeInodePtrRoot(dir),
fileBlobsToPrefetch);
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
}));
}
return folly::collect(futures).then(
[shadowResults = std::move(results)](
vector<vector<RelativePath>>&& matchVector) mutable {
for (auto& matches : matchVector) {
shadowResults.insert(
shadowResults.end(),
std::make_move_iterator(matches.begin()),
std::make_move_iterator(matches.end()));
}
return shadowResults;
});
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
}
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
} // namespace eden
} // namespace facebook