sapling/eden/fs/service/eden.thrift

544 lines
16 KiB
Thrift
Raw Normal View History

include "common/fb303/if/fb303.thrift"
Reimplement dirstate used by Eden's Hg extension as a subclass of Hg's dirstate. Summary: This is a major change to Eden's Hg extension. Our initial attempt to implement `edendirstate` was to create a "clean room" implementation that did not share code with `mercurial/dirstate.py`. This was helpful in uncovering the subset of the dirstate API that matters for Eden. It also provided a better safeguard against upstream changes to `dirstate.py` in Mercurial itself. In this implementation, the state transition management was mostly done on the server in `Dirstate.cpp`. We also made a modest attempt to make `Dirstate.cpp` "SCM-agnostic" such that the same APIs could be used for Git at some point. However, as we have tried to support more of the sophisticated functionality in Mercurial, particularly `hg histedit`, achieving parity between the clean room implementation and Mercurial's internals has become more challenging. Ultimately, the clean room implementation is likely the right way to go for Eden, but for now, we need to prioritize having feature parity with vanilla Hg when using Eden. Once we have a more complete set of integration tests in place, we can reimplement Eden's dirstate more aggressively to optimize things. Fortunately, the [[ https://bitbucket.org/facebook/hg-experimental/src/default/sqldirstate/ | sqldirstate ]] extension has already demonstrated that it is possible to provide a faithful dirstate implementation that subclasses the original `dirstate` while using a different storage mechanism. As such, I used `sqldirstate` as a model when implementing the new `eden_dirstate` (distinguishing it from our v1 implementation, `edendirstate`). In particular, `sqldirstate` uses SQL tables as storage for the following private fields of `dirstate`: `_map`, `_dirs`, `_copymap`, `_filefoldmap`, `_dirfoldmap`. Because `_filefoldmap` and `_dirfoldmap` exist to deal with case-insensitivity issues, we do not support them in `eden_dirstate` and add code to ensure the codepaths that would access them in `dirstate` never get exercised. Similarly, we also implemented `eden_dirstate` so that it never accesses `_dirs`. (`_dirs` is a multiset of all directories in the dirstate, which is an O(repo) data structure, so we do not want to maintain it in Eden. It appears to be primarily used for checking whether a path to a file already exists in the dirstate as a directory. We can protect against that in more efficient ways.) That leaves only `_map` and `_copymap` to worry about. `_copymap` contains the set of files that have been marked "copied" in the current dirstate, so it is fairly small and can be stored on disk or in memory with little concern. `_map` is a bit trickier because it is expected to have an entry for every file in the dirstate. In `sqldirstate`, it is stored across two tables: `files` and `nonnormalfiles`. For Eden, we already represent the data analogous to the `files` table in RocksDB/the overlay, so we do not need to create a new equivalent to the `files` table. We do, however, need an equivalent to the `nonnormalfiles` table, which we store in as Thrift-serialized data in an ordinary file along with the `_copymap` data. In our Hg extension, our implementation of `_map` is `eden_dirstate_map`, which is defined in a Python file of the same name. Our implementation of `_copymap` is `dummy_copymap`, which is defined in `eden_dirstate.py`. Both of these collections are simple pass-through data structures that translate their method calls to Thrift server calls. I expect we will want to optimize this in the future via some client-side caching, as well as creating batch APIs for talking to the server via Thrift. One advantage of this new implementation is that it enables us to delete `eden/hg/eden/overrides.py`, which overrode the entry points for `hg add` and `hg remove`. Between the recent implementation of `dirstate.walk()` for Eden and this switch to the real dirstate, we can now use the default implementation of `hg add` and `hg remove` (although we have to play some tricks, like in the implementation of `eden_dirstate.status()` in order to make `hg remove` work). In the course of doing this revision, I discovered that I had to make a minor fix to `EdenMatchInfo.make_glob_list()` because `hg add foo` was being treated as `hg add foo/**/*` even when `foo` was just a file (as opposed to a directory), in which case the glob was not matching `foo`! I also had to do some work in `eden_dirstate.status()` in which the `match` argument was previously largely ignored. It turns out that `dirstate.py` uses `status()` for a number of things with the `match` specified as a filter, so the output of `status()` must be filtered by `match` accordingly. Ultimately, this seems like work that would be better done on the server, but for simplicity, we're just going to do it in Python, for now. For the reasons explained above, this revision deletes a lot of code `Dirstate.cpp`. As such, `DirstateTest.cpp` does not seem worth refactoring, though the scenarios it was testing should probably be converted to integration tests. At a high level, the role of `DirstatePersistence` has not changed, but the exact data it writes is much different. Its corresponding unit test is also disabled, for now. Note that this revision does not change the name of the file where "dirstate data" is written (this is defined as `kDirstateFile` in `ClientConfig.cpp`), so we should blow away any existing instances of this file once this change lands. (It is still early enough in the project that it does not seem worth the overhead of a proper migration.) The true test of the success of this new approach is the ease with which we can write more integration tests for things like `hg histedit` and `hg graft`. Ideally, these should require very few changes to `eden_dirstate.py`. Reviewed By: simpkins Differential Revision: D5071778 fbshipit-source-id: e8fec4d393035d80f36516ac050cad025dc3ba31
2017-05-26 21:51:30 +03:00
include "eden/fs/inodes/hgdirstate.thrift"
namespace cpp2 facebook.eden
namespace java com.facebook.eden.thrift
namespace py facebook.eden
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/** Thrift doesn't really do unsigned numbers, but we can sort of fake it.
* This type is serialized as an integer value that is 64-bits wide and
* should round-trip with full fidelity for C++ client/server, but for
* other runtimes will have crazy results if the sign bit is ever set.
* In practice it is impossible for us to have files that large in eden,
* and sequence numbers will take an incredibly long time to ever roll
* over and cause problems.
* Once t13345978 is done, we can uncomment the cpp.type below.
*/
typedef i64 /* (cpp.type = "std::uint64_t") */ unsigned64
/**
* A source control hash, as a 20-byte binary value.
*/
typedef binary BinaryHash
exception EdenError {
1: required string message
2: optional i32 errorCode
} (message = 'message')
exception NoValueForKeyError {
1: string key
}
struct MountInfo {
1: string mountPoint
2: string edenClientPath
}
union SHA1Result {
1: BinaryHash sha1
2: EdenError error
}
/**
* Effectively a `struct timespec`
*/
struct TimeSpec {
1: i64 seconds
2: i64 nanoSeconds
}
/**
* Information that we return when querying entries
*/
struct FileInformation {
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
1: unsigned64 size // wish thrift had unsigned numbers
2: TimeSpec mtime
3: i32 mode // mode_t
}
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/** Holds information about a file, or an error in retrieving that info.
* The most likely error will be ENOENT, implying that the file doesn't exist.
*/
union FileInformationOrError {
1: FileInformation info
2: EdenError error
}
/** reference a point in time in the journal.
* This can be used to reason about a point in time in a given mount point.
* The mountGeneration value is opaque to the client.
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
*/
struct JournalPosition {
/** An opaque but unique number within the scope of a given mount point.
* This is used to determine when sequenceNumber has been invalidated. */
1: i64 mountGeneration
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/** Monotonically incrementing number
* Each journalled change causes this number to increment. */
2: unsigned64 sequenceNumber
/** Records the snapshot hash at the appropriate point in the journal */
3: BinaryHash snapshotHash
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
}
/** Holds information about a set of paths that changed between two points.
* fromPosition, toPosition define the time window.
* paths holds the list of paths that changed in that window.
*/
struct FileDelta {
/** The fromPosition passed to getFilesChangedSince */
1: JournalPosition fromPosition
/** The current position at the time that getFilesChangedSince was called */
2: JournalPosition toPosition
/** The complete list of paths from both the snapshot and the overlay that
* changed between fromPosition and toPosition */
do a better job at reporting "new" in watchman results. Summary: We're seeing that this is always set to true for eden, which is causing buck to run slower than it should. To make this work correctly, I've augmented our journal data structure so that it can track create, change and remove events for the various paths. I've also plumbed rename events into the journal. This requires a slightly more complex merge routine, so I've refactored the two call sites that were merging in slightly different contexts so that they can now share the same guts of the merge routine. Perhaps slightly counterintuitive in the merge code is that we merge a record from the past into the state for now and this is a bit backwards compared to how people think. I've expanded the eden integration test to check that we don't mix up create/change/removes for the same path in a given window. On the watchman side, we use the presence of the filename in the createdPaths set as a hint that the file is new. In that case we will set the watchman `ctime` (which is not the state ctime but is really the *created clock time*) to match the current journal position if the file is new, or leave it set to 0 if the file is not known to be new. This will cause the `is_new` flag to be set appropriately by the code in `watchman/query/eval.cpp`; if the sequence is 0 then it should never be set to true. Otherwise (when the file was in the `createPaths` set) it will be set to the current journal position and this will be seen as newer than the `since` constraint on the query and cause the file to show as `new`. Reviewed By: bolinfest Differential Revision: D5608538 fbshipit-source-id: 8d78f7da05e5e53110108aca220c3a97794f8cc2
2017-08-11 22:51:51 +03:00
3: list<string> changedPaths
4: list<string> createdPaths
5: list<string> removedPaths
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
}
enum StatusCode {
CLEAN = 0x0,
MODIFIED = 0x1,
ADDED = 0x2,
REMOVED = 0x3,
MISSING = 0x4,
NOT_TRACKED = 0x5,
IGNORED = 0x6,
}
struct ThriftHgStatus {
1: map<string, StatusCode> entries
}
enum ConflictType {
/**
* We failed to update this particular path due to an error
*/
ERROR = 0,
/**
* A locally modified file was deleted in the new Tree
*/
MODIFIED_REMOVED = 1,
/**
* An untracked local file exists in the new Tree
*/
UNTRACKED_ADDED = 2,
/**
* The file was removed locally, but modified in the new Tree
*/
REMOVED_MODIFIED = 3,
/**
* The file was removed locally, and also removed in the new Tree.
*/
MISSING_REMOVED = 4,
/**
* A locally modified file was modified in the new Tree
* This may be contents modifications, or a file type change (directory to
* file or vice-versa), or permissions changes.
*/
MODIFIED_MODIFIED = 5,
/**
* A directory was supposed to be removed or replaced with a file,
* but it contains untracked files preventing us from updating it.
*/
DIRECTORY_NOT_EMPTY = 6,
}
/**
* Details about conflicts or errors that occurred during a checkout operation
*/
struct CheckoutConflict {
1: string path
2: ConflictType type
3: string message
}
struct ScmBlobMetadata {
1: i64 size
2: BinaryHash contentsSha1
}
struct ScmTreeEntry {
1: binary name
2: i32 mode
3: BinaryHash id
}
Reimplement dirstate used by Eden's Hg extension as a subclass of Hg's dirstate. Summary: This is a major change to Eden's Hg extension. Our initial attempt to implement `edendirstate` was to create a "clean room" implementation that did not share code with `mercurial/dirstate.py`. This was helpful in uncovering the subset of the dirstate API that matters for Eden. It also provided a better safeguard against upstream changes to `dirstate.py` in Mercurial itself. In this implementation, the state transition management was mostly done on the server in `Dirstate.cpp`. We also made a modest attempt to make `Dirstate.cpp` "SCM-agnostic" such that the same APIs could be used for Git at some point. However, as we have tried to support more of the sophisticated functionality in Mercurial, particularly `hg histedit`, achieving parity between the clean room implementation and Mercurial's internals has become more challenging. Ultimately, the clean room implementation is likely the right way to go for Eden, but for now, we need to prioritize having feature parity with vanilla Hg when using Eden. Once we have a more complete set of integration tests in place, we can reimplement Eden's dirstate more aggressively to optimize things. Fortunately, the [[ https://bitbucket.org/facebook/hg-experimental/src/default/sqldirstate/ | sqldirstate ]] extension has already demonstrated that it is possible to provide a faithful dirstate implementation that subclasses the original `dirstate` while using a different storage mechanism. As such, I used `sqldirstate` as a model when implementing the new `eden_dirstate` (distinguishing it from our v1 implementation, `edendirstate`). In particular, `sqldirstate` uses SQL tables as storage for the following private fields of `dirstate`: `_map`, `_dirs`, `_copymap`, `_filefoldmap`, `_dirfoldmap`. Because `_filefoldmap` and `_dirfoldmap` exist to deal with case-insensitivity issues, we do not support them in `eden_dirstate` and add code to ensure the codepaths that would access them in `dirstate` never get exercised. Similarly, we also implemented `eden_dirstate` so that it never accesses `_dirs`. (`_dirs` is a multiset of all directories in the dirstate, which is an O(repo) data structure, so we do not want to maintain it in Eden. It appears to be primarily used for checking whether a path to a file already exists in the dirstate as a directory. We can protect against that in more efficient ways.) That leaves only `_map` and `_copymap` to worry about. `_copymap` contains the set of files that have been marked "copied" in the current dirstate, so it is fairly small and can be stored on disk or in memory with little concern. `_map` is a bit trickier because it is expected to have an entry for every file in the dirstate. In `sqldirstate`, it is stored across two tables: `files` and `nonnormalfiles`. For Eden, we already represent the data analogous to the `files` table in RocksDB/the overlay, so we do not need to create a new equivalent to the `files` table. We do, however, need an equivalent to the `nonnormalfiles` table, which we store in as Thrift-serialized data in an ordinary file along with the `_copymap` data. In our Hg extension, our implementation of `_map` is `eden_dirstate_map`, which is defined in a Python file of the same name. Our implementation of `_copymap` is `dummy_copymap`, which is defined in `eden_dirstate.py`. Both of these collections are simple pass-through data structures that translate their method calls to Thrift server calls. I expect we will want to optimize this in the future via some client-side caching, as well as creating batch APIs for talking to the server via Thrift. One advantage of this new implementation is that it enables us to delete `eden/hg/eden/overrides.py`, which overrode the entry points for `hg add` and `hg remove`. Between the recent implementation of `dirstate.walk()` for Eden and this switch to the real dirstate, we can now use the default implementation of `hg add` and `hg remove` (although we have to play some tricks, like in the implementation of `eden_dirstate.status()` in order to make `hg remove` work). In the course of doing this revision, I discovered that I had to make a minor fix to `EdenMatchInfo.make_glob_list()` because `hg add foo` was being treated as `hg add foo/**/*` even when `foo` was just a file (as opposed to a directory), in which case the glob was not matching `foo`! I also had to do some work in `eden_dirstate.status()` in which the `match` argument was previously largely ignored. It turns out that `dirstate.py` uses `status()` for a number of things with the `match` specified as a filter, so the output of `status()` must be filtered by `match` accordingly. Ultimately, this seems like work that would be better done on the server, but for simplicity, we're just going to do it in Python, for now. For the reasons explained above, this revision deletes a lot of code `Dirstate.cpp`. As such, `DirstateTest.cpp` does not seem worth refactoring, though the scenarios it was testing should probably be converted to integration tests. At a high level, the role of `DirstatePersistence` has not changed, but the exact data it writes is much different. Its corresponding unit test is also disabled, for now. Note that this revision does not change the name of the file where "dirstate data" is written (this is defined as `kDirstateFile` in `ClientConfig.cpp`), so we should blow away any existing instances of this file once this change lands. (It is still early enough in the project that it does not seem worth the overhead of a proper migration.) The true test of the success of this new approach is the ease with which we can write more integration tests for things like `hg histedit` and `hg graft`. Ideally, these should require very few changes to `eden_dirstate.py`. Reviewed By: simpkins Differential Revision: D5071778 fbshipit-source-id: e8fec4d393035d80f36516ac050cad025dc3ba31
2017-05-26 21:51:30 +03:00
struct HgNonnormalFile {
1: string relativePath
2: hgdirstate.DirstateTuple tuple
}
struct TreeInodeEntryDebugInfo {
/**
* The entry name. This is just a PathComponent, not the full path
*/
1: binary name
/**
* The inode number, or 0 if no inode number has been assigned to
* this entry
*/
2: i64 inodeNumber
/**
* The entry mode_t value
*/
3: i32 mode
/**
* True if an InodeBase object exists for this inode or not.
*/
4: bool loaded
/**
* True if an the inode is materialized in the overlay
*/
5: bool materialized
/**
* If materialized is false, hash contains the ID of the underlying source
* control Blob or Tree.
*/
6: BinaryHash hash
}
struct WorkingDirectoryParents {
1: BinaryHash parent1
2: optional BinaryHash parent2
}
struct TreeInodeDebugInfo {
1: i64 inodeNumber
2: binary path
3: bool materialized
4: BinaryHash treeHash
5: list<TreeInodeEntryDebugInfo> entries
6: i64 refcount
}
struct InodePathDebugInfo {
1: string path
2: bool loaded
3: bool linked
}
/**
* Struct to store Information about inodes in a mount point.
*/
struct MountInodeInfo {
1: i64 loadedInodeCount
2: i64 unloadedInodeCount
3: i64 materializedInodeCount
}
/**
* Struct to store fb303 counters from ServiceData.getCounters() and inode
* information of all the mount points.
*/
struct InternalStats {
1: i64 periodicUnloadCount
/**
* counters is the list of fb303 counters, key is the counter name, value is the
* counter value.
*/
2: map<string, i64> counters
/**
* mountPointInfo is a map whose key is the path of the mount point and value
* is the details like number of loaded inodes,unloaded inodes in that mount
* and number of materialized inodes in that mountpoint.
*/
3: map<string, MountInodeInfo> mountPointInfo
}
service EdenService extends fb303.FacebookService {
list<MountInfo> listMounts() throws (1: EdenError ex)
void mount(1: MountInfo info) throws (1: EdenError ex)
void unmount(1: string mountPoint) throws (1: EdenError ex)
/**
* Get the parent commit(s) of the working directory
*/
WorkingDirectoryParents getParentCommits(1: string mountPoint)
throws (1: EdenError ex)
/**
* Check out the specified snapshot.
*
* This updates the contents of the mount point so that they match the
* contents of the given snapshot.
*
* Returns a list of conflicts and errors that occurred when performing the
* checkout operation.
*
* If the force parameter is true, the working directory will be forcibly
* updated to the contents of the new snapshot, even if there were conflicts.
* Conflicts will still be reported in the return value, but the files will
* be updated to their new state. If the force parameter is false files with
* conflicts will be left unmodified. Files that are untracked in both the
* source and destination snapshots are always left unchanged, even if force
* is true.
*
* On successful return from this function the mount point will point to the
* new commit, even if some paths had conflicts or errors. The caller is
* responsible for taking appropriate action to update these paths as desired
* after checkOutRevision() returns.
*/
list<CheckoutConflict> checkOutRevision(
1: string mountPoint,
2: BinaryHash snapshotHash,
3: bool force)
throws (1: EdenError ex)
/**
* Reset the working directory's parent commits, without changing the working
* directory contents.
*
* This operation is equivalent to `git reset --soft` or `hg reset --keep`
*/
void resetParentCommits(
1: string mountPoint,
2: WorkingDirectoryParents parents)
throws (1: EdenError ex)
/**
* For each path, returns an EdenError instead of the SHA-1 if any of the
* following occur:
* - path is the empty string.
* - path identifies a non-existent file.
* - path identifies something that is not an ordinary file (e.g., symlink
* or directory).
*/
list<SHA1Result> getSHA1(1: string mountPoint, 2: list<string> paths)
throws (1: EdenError ex)
/**
* Returns a list of paths relative to the mountPoint.
*/
list<string> getBindMounts(1: string mountPoint)
throws (1: EdenError ex)
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/** Returns the sequence position at the time the method is called.
* Returns the instantaneous value of the journal sequence number.
*/
JournalPosition getCurrentJournalPosition(1: string mountPoint)
throws (1: EdenError ex)
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/** Returns the set of files (and dirs) that changed since a prior point.
* If fromPosition.mountGeneration is mismatched with the current
* mountGeneration, throws an EdenError with errorCode = ERANGE.
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
* This indicates that eden cannot compute the delta for the requested
* range. The client will need to recompute a new baseline using
* other available functions in EdenService.
*/
FileDelta getFilesChangedSince(
1: string mountPoint,
2: JournalPosition fromPosition)
throws (1: EdenError ex)
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/** Returns a subset of the stat() information for a list of paths.
* The returned list of information corresponds to the input list of
* paths; eg; result[0] holds the information for paths[0].
* We only support returning the instantaneous information about
* these paths, as we cannot answer with historical information about
* files in the overlay.
*/
list<FileInformationOrError> getFileInformation(
1: string mountPoint,
2: list<string> paths)
throws (1: EdenError ex)
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/** Returns a list of files that match the input globs.
* There are no duplicate values in the result.
* wildMatchFlags can hold various WildMatchFlags values OR'd together.
*/
list<string> glob(
1: string mountPoint,
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
2: list<string> globs)
throws (1: EdenError ex)
//////// Source Control APIs ////////
// TODO(mbolin): `hg status` has a ton of command line flags to support.
ThriftHgStatus scmGetStatus(
1: string mountPoint,
2: bool listIgnored,
) throws (1: EdenError ex)
void hgClearDirstate(
1: string mountPoint,
) throws (1: EdenError ex)
Reimplement dirstate used by Eden's Hg extension as a subclass of Hg's dirstate. Summary: This is a major change to Eden's Hg extension. Our initial attempt to implement `edendirstate` was to create a "clean room" implementation that did not share code with `mercurial/dirstate.py`. This was helpful in uncovering the subset of the dirstate API that matters for Eden. It also provided a better safeguard against upstream changes to `dirstate.py` in Mercurial itself. In this implementation, the state transition management was mostly done on the server in `Dirstate.cpp`. We also made a modest attempt to make `Dirstate.cpp` "SCM-agnostic" such that the same APIs could be used for Git at some point. However, as we have tried to support more of the sophisticated functionality in Mercurial, particularly `hg histedit`, achieving parity between the clean room implementation and Mercurial's internals has become more challenging. Ultimately, the clean room implementation is likely the right way to go for Eden, but for now, we need to prioritize having feature parity with vanilla Hg when using Eden. Once we have a more complete set of integration tests in place, we can reimplement Eden's dirstate more aggressively to optimize things. Fortunately, the [[ https://bitbucket.org/facebook/hg-experimental/src/default/sqldirstate/ | sqldirstate ]] extension has already demonstrated that it is possible to provide a faithful dirstate implementation that subclasses the original `dirstate` while using a different storage mechanism. As such, I used `sqldirstate` as a model when implementing the new `eden_dirstate` (distinguishing it from our v1 implementation, `edendirstate`). In particular, `sqldirstate` uses SQL tables as storage for the following private fields of `dirstate`: `_map`, `_dirs`, `_copymap`, `_filefoldmap`, `_dirfoldmap`. Because `_filefoldmap` and `_dirfoldmap` exist to deal with case-insensitivity issues, we do not support them in `eden_dirstate` and add code to ensure the codepaths that would access them in `dirstate` never get exercised. Similarly, we also implemented `eden_dirstate` so that it never accesses `_dirs`. (`_dirs` is a multiset of all directories in the dirstate, which is an O(repo) data structure, so we do not want to maintain it in Eden. It appears to be primarily used for checking whether a path to a file already exists in the dirstate as a directory. We can protect against that in more efficient ways.) That leaves only `_map` and `_copymap` to worry about. `_copymap` contains the set of files that have been marked "copied" in the current dirstate, so it is fairly small and can be stored on disk or in memory with little concern. `_map` is a bit trickier because it is expected to have an entry for every file in the dirstate. In `sqldirstate`, it is stored across two tables: `files` and `nonnormalfiles`. For Eden, we already represent the data analogous to the `files` table in RocksDB/the overlay, so we do not need to create a new equivalent to the `files` table. We do, however, need an equivalent to the `nonnormalfiles` table, which we store in as Thrift-serialized data in an ordinary file along with the `_copymap` data. In our Hg extension, our implementation of `_map` is `eden_dirstate_map`, which is defined in a Python file of the same name. Our implementation of `_copymap` is `dummy_copymap`, which is defined in `eden_dirstate.py`. Both of these collections are simple pass-through data structures that translate their method calls to Thrift server calls. I expect we will want to optimize this in the future via some client-side caching, as well as creating batch APIs for talking to the server via Thrift. One advantage of this new implementation is that it enables us to delete `eden/hg/eden/overrides.py`, which overrode the entry points for `hg add` and `hg remove`. Between the recent implementation of `dirstate.walk()` for Eden and this switch to the real dirstate, we can now use the default implementation of `hg add` and `hg remove` (although we have to play some tricks, like in the implementation of `eden_dirstate.status()` in order to make `hg remove` work). In the course of doing this revision, I discovered that I had to make a minor fix to `EdenMatchInfo.make_glob_list()` because `hg add foo` was being treated as `hg add foo/**/*` even when `foo` was just a file (as opposed to a directory), in which case the glob was not matching `foo`! I also had to do some work in `eden_dirstate.status()` in which the `match` argument was previously largely ignored. It turns out that `dirstate.py` uses `status()` for a number of things with the `match` specified as a filter, so the output of `status()` must be filtered by `match` accordingly. Ultimately, this seems like work that would be better done on the server, but for simplicity, we're just going to do it in Python, for now. For the reasons explained above, this revision deletes a lot of code `Dirstate.cpp`. As such, `DirstateTest.cpp` does not seem worth refactoring, though the scenarios it was testing should probably be converted to integration tests. At a high level, the role of `DirstatePersistence` has not changed, but the exact data it writes is much different. Its corresponding unit test is also disabled, for now. Note that this revision does not change the name of the file where "dirstate data" is written (this is defined as `kDirstateFile` in `ClientConfig.cpp`), so we should blow away any existing instances of this file once this change lands. (It is still early enough in the project that it does not seem worth the overhead of a proper migration.) The true test of the success of this new approach is the ease with which we can write more integration tests for things like `hg histedit` and `hg graft`. Ideally, these should require very few changes to `eden_dirstate.py`. Reviewed By: simpkins Differential Revision: D5071778 fbshipit-source-id: e8fec4d393035d80f36516ac050cad025dc3ba31
2017-05-26 21:51:30 +03:00
void hgSetDirstateTuple(
1: string mountPoint,
2: string relativePath,
3: hgdirstate.DirstateTuple tuple,
) throws (1: EdenError ex)
// Throw KeyError if no entry for relativePath?
hgdirstate.DirstateTuple hgGetDirstateTuple(
1: string mountPoint,
Reimplement dirstate used by Eden's Hg extension as a subclass of Hg's dirstate. Summary: This is a major change to Eden's Hg extension. Our initial attempt to implement `edendirstate` was to create a "clean room" implementation that did not share code with `mercurial/dirstate.py`. This was helpful in uncovering the subset of the dirstate API that matters for Eden. It also provided a better safeguard against upstream changes to `dirstate.py` in Mercurial itself. In this implementation, the state transition management was mostly done on the server in `Dirstate.cpp`. We also made a modest attempt to make `Dirstate.cpp` "SCM-agnostic" such that the same APIs could be used for Git at some point. However, as we have tried to support more of the sophisticated functionality in Mercurial, particularly `hg histedit`, achieving parity between the clean room implementation and Mercurial's internals has become more challenging. Ultimately, the clean room implementation is likely the right way to go for Eden, but for now, we need to prioritize having feature parity with vanilla Hg when using Eden. Once we have a more complete set of integration tests in place, we can reimplement Eden's dirstate more aggressively to optimize things. Fortunately, the [[ https://bitbucket.org/facebook/hg-experimental/src/default/sqldirstate/ | sqldirstate ]] extension has already demonstrated that it is possible to provide a faithful dirstate implementation that subclasses the original `dirstate` while using a different storage mechanism. As such, I used `sqldirstate` as a model when implementing the new `eden_dirstate` (distinguishing it from our v1 implementation, `edendirstate`). In particular, `sqldirstate` uses SQL tables as storage for the following private fields of `dirstate`: `_map`, `_dirs`, `_copymap`, `_filefoldmap`, `_dirfoldmap`. Because `_filefoldmap` and `_dirfoldmap` exist to deal with case-insensitivity issues, we do not support them in `eden_dirstate` and add code to ensure the codepaths that would access them in `dirstate` never get exercised. Similarly, we also implemented `eden_dirstate` so that it never accesses `_dirs`. (`_dirs` is a multiset of all directories in the dirstate, which is an O(repo) data structure, so we do not want to maintain it in Eden. It appears to be primarily used for checking whether a path to a file already exists in the dirstate as a directory. We can protect against that in more efficient ways.) That leaves only `_map` and `_copymap` to worry about. `_copymap` contains the set of files that have been marked "copied" in the current dirstate, so it is fairly small and can be stored on disk or in memory with little concern. `_map` is a bit trickier because it is expected to have an entry for every file in the dirstate. In `sqldirstate`, it is stored across two tables: `files` and `nonnormalfiles`. For Eden, we already represent the data analogous to the `files` table in RocksDB/the overlay, so we do not need to create a new equivalent to the `files` table. We do, however, need an equivalent to the `nonnormalfiles` table, which we store in as Thrift-serialized data in an ordinary file along with the `_copymap` data. In our Hg extension, our implementation of `_map` is `eden_dirstate_map`, which is defined in a Python file of the same name. Our implementation of `_copymap` is `dummy_copymap`, which is defined in `eden_dirstate.py`. Both of these collections are simple pass-through data structures that translate their method calls to Thrift server calls. I expect we will want to optimize this in the future via some client-side caching, as well as creating batch APIs for talking to the server via Thrift. One advantage of this new implementation is that it enables us to delete `eden/hg/eden/overrides.py`, which overrode the entry points for `hg add` and `hg remove`. Between the recent implementation of `dirstate.walk()` for Eden and this switch to the real dirstate, we can now use the default implementation of `hg add` and `hg remove` (although we have to play some tricks, like in the implementation of `eden_dirstate.status()` in order to make `hg remove` work). In the course of doing this revision, I discovered that I had to make a minor fix to `EdenMatchInfo.make_glob_list()` because `hg add foo` was being treated as `hg add foo/**/*` even when `foo` was just a file (as opposed to a directory), in which case the glob was not matching `foo`! I also had to do some work in `eden_dirstate.status()` in which the `match` argument was previously largely ignored. It turns out that `dirstate.py` uses `status()` for a number of things with the `match` specified as a filter, so the output of `status()` must be filtered by `match` accordingly. Ultimately, this seems like work that would be better done on the server, but for simplicity, we're just going to do it in Python, for now. For the reasons explained above, this revision deletes a lot of code `Dirstate.cpp`. As such, `DirstateTest.cpp` does not seem worth refactoring, though the scenarios it was testing should probably be converted to integration tests. At a high level, the role of `DirstatePersistence` has not changed, but the exact data it writes is much different. Its corresponding unit test is also disabled, for now. Note that this revision does not change the name of the file where "dirstate data" is written (this is defined as `kDirstateFile` in `ClientConfig.cpp`), so we should blow away any existing instances of this file once this change lands. (It is still early enough in the project that it does not seem worth the overhead of a proper migration.) The true test of the success of this new approach is the ease with which we can write more integration tests for things like `hg histedit` and `hg graft`. Ideally, these should require very few changes to `eden_dirstate.py`. Reviewed By: simpkins Differential Revision: D5071778 fbshipit-source-id: e8fec4d393035d80f36516ac050cad025dc3ba31
2017-05-26 21:51:30 +03:00
2: string relativePath,
) throws (
1: EdenError ex
2: NoValueForKeyError noValueForKeyError
)
/** Return a boolean indicating whether something was actually deleted. */
bool hgDeleteDirstateTuple(
1: string mountPoint,
2: string relativePath,
) throws (1: EdenError ex)
Reimplement dirstate used by Eden's Hg extension as a subclass of Hg's dirstate. Summary: This is a major change to Eden's Hg extension. Our initial attempt to implement `edendirstate` was to create a "clean room" implementation that did not share code with `mercurial/dirstate.py`. This was helpful in uncovering the subset of the dirstate API that matters for Eden. It also provided a better safeguard against upstream changes to `dirstate.py` in Mercurial itself. In this implementation, the state transition management was mostly done on the server in `Dirstate.cpp`. We also made a modest attempt to make `Dirstate.cpp` "SCM-agnostic" such that the same APIs could be used for Git at some point. However, as we have tried to support more of the sophisticated functionality in Mercurial, particularly `hg histedit`, achieving parity between the clean room implementation and Mercurial's internals has become more challenging. Ultimately, the clean room implementation is likely the right way to go for Eden, but for now, we need to prioritize having feature parity with vanilla Hg when using Eden. Once we have a more complete set of integration tests in place, we can reimplement Eden's dirstate more aggressively to optimize things. Fortunately, the [[ https://bitbucket.org/facebook/hg-experimental/src/default/sqldirstate/ | sqldirstate ]] extension has already demonstrated that it is possible to provide a faithful dirstate implementation that subclasses the original `dirstate` while using a different storage mechanism. As such, I used `sqldirstate` as a model when implementing the new `eden_dirstate` (distinguishing it from our v1 implementation, `edendirstate`). In particular, `sqldirstate` uses SQL tables as storage for the following private fields of `dirstate`: `_map`, `_dirs`, `_copymap`, `_filefoldmap`, `_dirfoldmap`. Because `_filefoldmap` and `_dirfoldmap` exist to deal with case-insensitivity issues, we do not support them in `eden_dirstate` and add code to ensure the codepaths that would access them in `dirstate` never get exercised. Similarly, we also implemented `eden_dirstate` so that it never accesses `_dirs`. (`_dirs` is a multiset of all directories in the dirstate, which is an O(repo) data structure, so we do not want to maintain it in Eden. It appears to be primarily used for checking whether a path to a file already exists in the dirstate as a directory. We can protect against that in more efficient ways.) That leaves only `_map` and `_copymap` to worry about. `_copymap` contains the set of files that have been marked "copied" in the current dirstate, so it is fairly small and can be stored on disk or in memory with little concern. `_map` is a bit trickier because it is expected to have an entry for every file in the dirstate. In `sqldirstate`, it is stored across two tables: `files` and `nonnormalfiles`. For Eden, we already represent the data analogous to the `files` table in RocksDB/the overlay, so we do not need to create a new equivalent to the `files` table. We do, however, need an equivalent to the `nonnormalfiles` table, which we store in as Thrift-serialized data in an ordinary file along with the `_copymap` data. In our Hg extension, our implementation of `_map` is `eden_dirstate_map`, which is defined in a Python file of the same name. Our implementation of `_copymap` is `dummy_copymap`, which is defined in `eden_dirstate.py`. Both of these collections are simple pass-through data structures that translate their method calls to Thrift server calls. I expect we will want to optimize this in the future via some client-side caching, as well as creating batch APIs for talking to the server via Thrift. One advantage of this new implementation is that it enables us to delete `eden/hg/eden/overrides.py`, which overrode the entry points for `hg add` and `hg remove`. Between the recent implementation of `dirstate.walk()` for Eden and this switch to the real dirstate, we can now use the default implementation of `hg add` and `hg remove` (although we have to play some tricks, like in the implementation of `eden_dirstate.status()` in order to make `hg remove` work). In the course of doing this revision, I discovered that I had to make a minor fix to `EdenMatchInfo.make_glob_list()` because `hg add foo` was being treated as `hg add foo/**/*` even when `foo` was just a file (as opposed to a directory), in which case the glob was not matching `foo`! I also had to do some work in `eden_dirstate.status()` in which the `match` argument was previously largely ignored. It turns out that `dirstate.py` uses `status()` for a number of things with the `match` specified as a filter, so the output of `status()` must be filtered by `match` accordingly. Ultimately, this seems like work that would be better done on the server, but for simplicity, we're just going to do it in Python, for now. For the reasons explained above, this revision deletes a lot of code `Dirstate.cpp`. As such, `DirstateTest.cpp` does not seem worth refactoring, though the scenarios it was testing should probably be converted to integration tests. At a high level, the role of `DirstatePersistence` has not changed, but the exact data it writes is much different. Its corresponding unit test is also disabled, for now. Note that this revision does not change the name of the file where "dirstate data" is written (this is defined as `kDirstateFile` in `ClientConfig.cpp`), so we should blow away any existing instances of this file once this change lands. (It is still early enough in the project that it does not seem worth the overhead of a proper migration.) The true test of the success of this new approach is the ease with which we can write more integration tests for things like `hg histedit` and `hg graft`. Ideally, these should require very few changes to `eden_dirstate.py`. Reviewed By: simpkins Differential Revision: D5071778 fbshipit-source-id: e8fec4d393035d80f36516ac050cad025dc3ba31
2017-05-26 21:51:30 +03:00
list<HgNonnormalFile> hgGetNonnormalFiles(
1: string mountPoint,
) throws (1: EdenError ex)
Reimplement dirstate used by Eden's Hg extension as a subclass of Hg's dirstate. Summary: This is a major change to Eden's Hg extension. Our initial attempt to implement `edendirstate` was to create a "clean room" implementation that did not share code with `mercurial/dirstate.py`. This was helpful in uncovering the subset of the dirstate API that matters for Eden. It also provided a better safeguard against upstream changes to `dirstate.py` in Mercurial itself. In this implementation, the state transition management was mostly done on the server in `Dirstate.cpp`. We also made a modest attempt to make `Dirstate.cpp` "SCM-agnostic" such that the same APIs could be used for Git at some point. However, as we have tried to support more of the sophisticated functionality in Mercurial, particularly `hg histedit`, achieving parity between the clean room implementation and Mercurial's internals has become more challenging. Ultimately, the clean room implementation is likely the right way to go for Eden, but for now, we need to prioritize having feature parity with vanilla Hg when using Eden. Once we have a more complete set of integration tests in place, we can reimplement Eden's dirstate more aggressively to optimize things. Fortunately, the [[ https://bitbucket.org/facebook/hg-experimental/src/default/sqldirstate/ | sqldirstate ]] extension has already demonstrated that it is possible to provide a faithful dirstate implementation that subclasses the original `dirstate` while using a different storage mechanism. As such, I used `sqldirstate` as a model when implementing the new `eden_dirstate` (distinguishing it from our v1 implementation, `edendirstate`). In particular, `sqldirstate` uses SQL tables as storage for the following private fields of `dirstate`: `_map`, `_dirs`, `_copymap`, `_filefoldmap`, `_dirfoldmap`. Because `_filefoldmap` and `_dirfoldmap` exist to deal with case-insensitivity issues, we do not support them in `eden_dirstate` and add code to ensure the codepaths that would access them in `dirstate` never get exercised. Similarly, we also implemented `eden_dirstate` so that it never accesses `_dirs`. (`_dirs` is a multiset of all directories in the dirstate, which is an O(repo) data structure, so we do not want to maintain it in Eden. It appears to be primarily used for checking whether a path to a file already exists in the dirstate as a directory. We can protect against that in more efficient ways.) That leaves only `_map` and `_copymap` to worry about. `_copymap` contains the set of files that have been marked "copied" in the current dirstate, so it is fairly small and can be stored on disk or in memory with little concern. `_map` is a bit trickier because it is expected to have an entry for every file in the dirstate. In `sqldirstate`, it is stored across two tables: `files` and `nonnormalfiles`. For Eden, we already represent the data analogous to the `files` table in RocksDB/the overlay, so we do not need to create a new equivalent to the `files` table. We do, however, need an equivalent to the `nonnormalfiles` table, which we store in as Thrift-serialized data in an ordinary file along with the `_copymap` data. In our Hg extension, our implementation of `_map` is `eden_dirstate_map`, which is defined in a Python file of the same name. Our implementation of `_copymap` is `dummy_copymap`, which is defined in `eden_dirstate.py`. Both of these collections are simple pass-through data structures that translate their method calls to Thrift server calls. I expect we will want to optimize this in the future via some client-side caching, as well as creating batch APIs for talking to the server via Thrift. One advantage of this new implementation is that it enables us to delete `eden/hg/eden/overrides.py`, which overrode the entry points for `hg add` and `hg remove`. Between the recent implementation of `dirstate.walk()` for Eden and this switch to the real dirstate, we can now use the default implementation of `hg add` and `hg remove` (although we have to play some tricks, like in the implementation of `eden_dirstate.status()` in order to make `hg remove` work). In the course of doing this revision, I discovered that I had to make a minor fix to `EdenMatchInfo.make_glob_list()` because `hg add foo` was being treated as `hg add foo/**/*` even when `foo` was just a file (as opposed to a directory), in which case the glob was not matching `foo`! I also had to do some work in `eden_dirstate.status()` in which the `match` argument was previously largely ignored. It turns out that `dirstate.py` uses `status()` for a number of things with the `match` specified as a filter, so the output of `status()` must be filtered by `match` accordingly. Ultimately, this seems like work that would be better done on the server, but for simplicity, we're just going to do it in Python, for now. For the reasons explained above, this revision deletes a lot of code `Dirstate.cpp`. As such, `DirstateTest.cpp` does not seem worth refactoring, though the scenarios it was testing should probably be converted to integration tests. At a high level, the role of `DirstatePersistence` has not changed, but the exact data it writes is much different. Its corresponding unit test is also disabled, for now. Note that this revision does not change the name of the file where "dirstate data" is written (this is defined as `kDirstateFile` in `ClientConfig.cpp`), so we should blow away any existing instances of this file once this change lands. (It is still early enough in the project that it does not seem worth the overhead of a proper migration.) The true test of the success of this new approach is the ease with which we can write more integration tests for things like `hg histedit` and `hg graft`. Ideally, these should require very few changes to `eden_dirstate.py`. Reviewed By: simpkins Differential Revision: D5071778 fbshipit-source-id: e8fec4d393035d80f36516ac050cad025dc3ba31
2017-05-26 21:51:30 +03:00
// If relativePathSource is the empty string, remove the entry in the map for
// relativePathDest.
void hgCopyMapPut(
1: string mountPoint,
2: string relativePathDest,
3: string relativePathSource,
)
string hgCopyMapGet(
1: string mountPoint,
2: string relativePathDest,
) throws (1: NoValueForKeyError noValueForKeyError)
Reimplement dirstate used by Eden's Hg extension as a subclass of Hg's dirstate. Summary: This is a major change to Eden's Hg extension. Our initial attempt to implement `edendirstate` was to create a "clean room" implementation that did not share code with `mercurial/dirstate.py`. This was helpful in uncovering the subset of the dirstate API that matters for Eden. It also provided a better safeguard against upstream changes to `dirstate.py` in Mercurial itself. In this implementation, the state transition management was mostly done on the server in `Dirstate.cpp`. We also made a modest attempt to make `Dirstate.cpp` "SCM-agnostic" such that the same APIs could be used for Git at some point. However, as we have tried to support more of the sophisticated functionality in Mercurial, particularly `hg histedit`, achieving parity between the clean room implementation and Mercurial's internals has become more challenging. Ultimately, the clean room implementation is likely the right way to go for Eden, but for now, we need to prioritize having feature parity with vanilla Hg when using Eden. Once we have a more complete set of integration tests in place, we can reimplement Eden's dirstate more aggressively to optimize things. Fortunately, the [[ https://bitbucket.org/facebook/hg-experimental/src/default/sqldirstate/ | sqldirstate ]] extension has already demonstrated that it is possible to provide a faithful dirstate implementation that subclasses the original `dirstate` while using a different storage mechanism. As such, I used `sqldirstate` as a model when implementing the new `eden_dirstate` (distinguishing it from our v1 implementation, `edendirstate`). In particular, `sqldirstate` uses SQL tables as storage for the following private fields of `dirstate`: `_map`, `_dirs`, `_copymap`, `_filefoldmap`, `_dirfoldmap`. Because `_filefoldmap` and `_dirfoldmap` exist to deal with case-insensitivity issues, we do not support them in `eden_dirstate` and add code to ensure the codepaths that would access them in `dirstate` never get exercised. Similarly, we also implemented `eden_dirstate` so that it never accesses `_dirs`. (`_dirs` is a multiset of all directories in the dirstate, which is an O(repo) data structure, so we do not want to maintain it in Eden. It appears to be primarily used for checking whether a path to a file already exists in the dirstate as a directory. We can protect against that in more efficient ways.) That leaves only `_map` and `_copymap` to worry about. `_copymap` contains the set of files that have been marked "copied" in the current dirstate, so it is fairly small and can be stored on disk or in memory with little concern. `_map` is a bit trickier because it is expected to have an entry for every file in the dirstate. In `sqldirstate`, it is stored across two tables: `files` and `nonnormalfiles`. For Eden, we already represent the data analogous to the `files` table in RocksDB/the overlay, so we do not need to create a new equivalent to the `files` table. We do, however, need an equivalent to the `nonnormalfiles` table, which we store in as Thrift-serialized data in an ordinary file along with the `_copymap` data. In our Hg extension, our implementation of `_map` is `eden_dirstate_map`, which is defined in a Python file of the same name. Our implementation of `_copymap` is `dummy_copymap`, which is defined in `eden_dirstate.py`. Both of these collections are simple pass-through data structures that translate their method calls to Thrift server calls. I expect we will want to optimize this in the future via some client-side caching, as well as creating batch APIs for talking to the server via Thrift. One advantage of this new implementation is that it enables us to delete `eden/hg/eden/overrides.py`, which overrode the entry points for `hg add` and `hg remove`. Between the recent implementation of `dirstate.walk()` for Eden and this switch to the real dirstate, we can now use the default implementation of `hg add` and `hg remove` (although we have to play some tricks, like in the implementation of `eden_dirstate.status()` in order to make `hg remove` work). In the course of doing this revision, I discovered that I had to make a minor fix to `EdenMatchInfo.make_glob_list()` because `hg add foo` was being treated as `hg add foo/**/*` even when `foo` was just a file (as opposed to a directory), in which case the glob was not matching `foo`! I also had to do some work in `eden_dirstate.status()` in which the `match` argument was previously largely ignored. It turns out that `dirstate.py` uses `status()` for a number of things with the `match` specified as a filter, so the output of `status()` must be filtered by `match` accordingly. Ultimately, this seems like work that would be better done on the server, but for simplicity, we're just going to do it in Python, for now. For the reasons explained above, this revision deletes a lot of code `Dirstate.cpp`. As such, `DirstateTest.cpp` does not seem worth refactoring, though the scenarios it was testing should probably be converted to integration tests. At a high level, the role of `DirstatePersistence` has not changed, but the exact data it writes is much different. Its corresponding unit test is also disabled, for now. Note that this revision does not change the name of the file where "dirstate data" is written (this is defined as `kDirstateFile` in `ClientConfig.cpp`), so we should blow away any existing instances of this file once this change lands. (It is still early enough in the project that it does not seem worth the overhead of a proper migration.) The true test of the success of this new approach is the ease with which we can write more integration tests for things like `hg histedit` and `hg graft`. Ideally, these should require very few changes to `eden_dirstate.py`. Reviewed By: simpkins Differential Revision: D5071778 fbshipit-source-id: e8fec4d393035d80f36516ac050cad025dc3ba31
2017-05-26 21:51:30 +03:00
/**
* In practice, this map should be fairly small.
*/
map<string, string> hgCopyMapGetAll(
1: string mountPoint,
)
//////// Debugging APIs ////////
/**
* Get the contents of a source control Tree.
*
* This can be used to confirm if eden's LocalStore contains information
* for the tree, and that the information is correct.
*
* If localStoreOnly is true, the data is loaded directly from the
* LocalStore, and an error will be raised if it is not already present in
* the LocalStore. If localStoreOnly is false, the data may be retrieved
* from the BackingStore if it is not already present in the LocalStore.
*/
list<ScmTreeEntry> debugGetScmTree(
1: string mountPoint,
2: BinaryHash id,
3: bool localStoreOnly,
) throws (1: EdenError ex)
/**
* Get the contents of a source control Blob.
*
* This can be used to confirm if eden's LocalStore contains information
* for the blob, and that the information is correct.
*/
binary debugGetScmBlob(
1: string mountPoint,
2: BinaryHash id,
3: bool localStoreOnly,
) throws (1: EdenError ex)
/**
* Get the metadata about a source control Blob.
*
* This retrieves the metadata about a source control Blob. This returns
* the size and contents SHA1 of the blob, which eden stores separately from
* the blob itself. This can also be a useful alternative to
* debugGetScmBlob() when getting data about extremely large blobs.
*/
ScmBlobMetadata debugGetScmBlobMetadata(
1: string mountPoint,
2: BinaryHash id,
3: bool localStoreOnly,
) throws (1: EdenError ex)
/**
* Get status about currently loaded inode objects.
*
* This returns details about all currently loaded inode objects under the
* given path.
*
* If the path argument is the empty string data will be returned about all
* inodes in the entire mount point. Otherwise the path argument should
* refer to a subdirectory, and data will be returned for all inodes under
* the specified subdirectory.
*
* The rename lock is not held while gathering this information, so the path
* name information returned may not always be internally consistent. If
* renames were taking place while gathering the data, some inodes may show
* up under multiple parents. It's also possible that we may miss some
* inodes during the tree walk if they were renamed from a directory that was
* not yet walked into a directory that has already been walked.
*
* This API cannot return data about inodes that have been unlinked but still
* have outstanding references.
*/
list<TreeInodeDebugInfo> debugInodeStatus(
1: string mountPoint,
2: string path,
) throws (1: EdenError ex)
/**
* Get the InodePathDebugInfo for the inode that corresponds to the given
* inode number. This provides the path for the inode and also indicates
* whether the inode is currently loaded or not. Requires that the Eden
* mountPoint be specified.
*/
InodePathDebugInfo debugGetInodePath(
1: string mountPoint,
2: i64 inodeNumber,
) throws (1: EdenError ex)
/**
* Sets the log level for a given category at runtime.
*/
void debugSetLogLevel(
1: string category,
2: string level,
) throws (1: EdenError ex)
/**
* Unloads unused Inodes from a directory inside a mountPoint whose last
* access time is older than the specified age.
*
* The age parameter is a relative time to be subtracted from the current
* (wall clock) time.
*/
i64 unloadInodeForPath(
1: string mountPoint,
2: string path,
3: TimeSpec age,
) throws (1: EdenError ex)
/**
* Flush all thread-local stats to the main ServiceData object.
*
* Thread-local counters are normally flushed to the main ServiceData once
* a second. flushStatsNow() can be used to flush thread-local counters on
* demand, in addition to the normal once-a-second flush.
*
* This is mainly useful for unit and integration tests that want to ensure
* they see up-to-date counter information without waiting for the normal
* flush interval.
*/
void flushStatsNow() throws (1: EdenError ex)
/**
* Invalidate kernel cache for inode.
*/
void invalidateKernelInodeCache(
1: string mountPoint,
2: string path
)
throws (1: EdenError ex)
/**
* Gets the number of inodes unloaded by periodic job on an EdenMount.
*/
InternalStats getStatInfo() throws (1: EdenError ex)
}