sapling/eden/fs/service/eden.thrift

680 lines
20 KiB
Thrift
Raw Normal View History

include "common/fb303/if/fb303.thrift"
namespace cpp2 facebook.eden
namespace java com.facebook.eden.thrift
namespace py facebook.eden
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/** Thrift doesn't really do unsigned numbers, but we can sort of fake it.
* This type is serialized as an integer value that is 64-bits wide and
* should round-trip with full fidelity for C++ client/server, but for
* other runtimes will have crazy results if the sign bit is ever set.
* In practice it is impossible for us to have files that large in eden,
* and sequence numbers will take an incredibly long time to ever roll
* over and cause problems.
* Once t13345978 is done, we can uncomment the cpp.type below.
*/
typedef i64 /* (cpp.type = "std::uint64_t") */ unsigned64
/**
* A source control hash.
*
* This should normally be a 20-byte binary value, however the edenfs server
* will accept BinaryHash arguments as 40-byte hexadecimal strings as well.
* Data returned by the edenfs server in a BinaryHash field will always be a
* 20-byte binary value.
*/
typedef binary BinaryHash
exception EdenError {
1: required string message
2: optional i32 errorCode
} (message = 'message')
exception NoValueForKeyError {
1: string key
}
struct MountInfo {
1: string mountPoint
2: string edenClientPath
}
union SHA1Result {
1: BinaryHash sha1
2: EdenError error
}
/**
* Effectively a `struct timespec`
*/
struct TimeSpec {
1: i64 seconds
2: i64 nanoSeconds
}
/**
* Information that we return when querying entries
*/
struct FileInformation {
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
1: unsigned64 size // wish thrift had unsigned numbers
2: TimeSpec mtime
3: i32 mode // mode_t
}
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/** Holds information about a file, or an error in retrieving that info.
* The most likely error will be ENOENT, implying that the file doesn't exist.
*/
union FileInformationOrError {
1: FileInformation info
2: EdenError error
}
/** reference a point in time in the journal.
* This can be used to reason about a point in time in a given mount point.
* The mountGeneration value is opaque to the client.
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
*/
struct JournalPosition {
/** An opaque but unique number within the scope of a given mount point.
* This is used to determine when sequenceNumber has been invalidated. */
1: i64 mountGeneration
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/** Monotonically incrementing number
* Each journalled change causes this number to increment. */
2: unsigned64 sequenceNumber
/** Records the snapshot hash at the appropriate point in the journal */
3: BinaryHash snapshotHash
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
}
/** Holds information about a set of paths that changed between two points.
* fromPosition, toPosition define the time window.
* paths holds the list of paths that changed in that window.
*/
struct FileDelta {
/** The fromPosition passed to getFilesChangedSince */
1: JournalPosition fromPosition
/** The current position at the time that getFilesChangedSince was called */
2: JournalPosition toPosition
/** The complete list of paths from both the snapshot and the overlay that
* changed between fromPosition and toPosition */
do a better job at reporting "new" in watchman results. Summary: We're seeing that this is always set to true for eden, which is causing buck to run slower than it should. To make this work correctly, I've augmented our journal data structure so that it can track create, change and remove events for the various paths. I've also plumbed rename events into the journal. This requires a slightly more complex merge routine, so I've refactored the two call sites that were merging in slightly different contexts so that they can now share the same guts of the merge routine. Perhaps slightly counterintuitive in the merge code is that we merge a record from the past into the state for now and this is a bit backwards compared to how people think. I've expanded the eden integration test to check that we don't mix up create/change/removes for the same path in a given window. On the watchman side, we use the presence of the filename in the createdPaths set as a hint that the file is new. In that case we will set the watchman `ctime` (which is not the state ctime but is really the *created clock time*) to match the current journal position if the file is new, or leave it set to 0 if the file is not known to be new. This will cause the `is_new` flag to be set appropriately by the code in `watchman/query/eval.cpp`; if the sequence is 0 then it should never be set to true. Otherwise (when the file was in the `createPaths` set) it will be set to the current journal position and this will be seen as newer than the `since` constraint on the query and cause the file to show as `new`. Reviewed By: bolinfest Differential Revision: D5608538 fbshipit-source-id: 8d78f7da05e5e53110108aca220c3a97794f8cc2
2017-08-11 22:51:51 +03:00
3: list<string> changedPaths
4: list<string> createdPaths
5: list<string> removedPaths
augment JournalDelta with unclean paths on snapshot hash change Summary: We were previously generating a simple JournalDelta consisting of just the from/to snapshot hashes. This is great from a `!O(repo)` perspective when recording what changed but makes it difficult for clients downstream to reason about changes that are not tracked in source control. This diff adds a concept of `uncleanPaths` to the journal; these are paths that we think are/were different from the hashes in the journal entry. Since JournalDelta needs to be able to be merged I've opted for a simple list of the paths that have a differing status; I'm not including all of the various dirstate states for this because it is not obvious how to reconcile the state across successive snapshot change events. The `uncleanPaths` set is populated with an initial set of different paths as the first part of the checkout call (prior to changing the hash), and then is updated after the hash has changed to capture any additional differences. Care needs to be taken to avoid recursively attempting to grab the parents lock so I'm replicating just a little bit of the state management glue in the `performDiff` method. The Journal was not setting the from/to snapshot hashes when merging deltas. This manifested in the watchman integration tests; we'd see the null revision as the `from` and the `to` revision held the `from` revision(!). On the watchman side we need to ask source control to expand the list of files that changed when the from/to hashes are different; I've added code to handle this. This doesn't do anything smart in the case that the source control aware queries are in use. We'll look at that in a following diff as it isn't strictly eden specific. `watchman clock` was returning a basically empty clock unconditionally, which meant that most since queries would report everything since the start of time. This is most likely contributing to poor Buck performance, although I have not investigated the performance aspect of this. It manifested itself in the watchman integration tests. Reviewed By: simpkins Differential Revision: D5896494 fbshipit-source-id: a88be6448862781a1d8f5e15285ca07b4240593a
2017-10-17 08:22:18 +03:00
/** When fromPosition.snapshotHash != toPosition.snapshotHash this holds
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
* the union of the set of files whose ScmFileStatus differed from the
augment JournalDelta with unclean paths on snapshot hash change Summary: We were previously generating a simple JournalDelta consisting of just the from/to snapshot hashes. This is great from a `!O(repo)` perspective when recording what changed but makes it difficult for clients downstream to reason about changes that are not tracked in source control. This diff adds a concept of `uncleanPaths` to the journal; these are paths that we think are/were different from the hashes in the journal entry. Since JournalDelta needs to be able to be merged I've opted for a simple list of the paths that have a differing status; I'm not including all of the various dirstate states for this because it is not obvious how to reconcile the state across successive snapshot change events. The `uncleanPaths` set is populated with an initial set of different paths as the first part of the checkout call (prior to changing the hash), and then is updated after the hash has changed to capture any additional differences. Care needs to be taken to avoid recursively attempting to grab the parents lock so I'm replicating just a little bit of the state management glue in the `performDiff` method. The Journal was not setting the from/to snapshot hashes when merging deltas. This manifested in the watchman integration tests; we'd see the null revision as the `from` and the `to` revision held the `from` revision(!). On the watchman side we need to ask source control to expand the list of files that changed when the from/to hashes are different; I've added code to handle this. This doesn't do anything smart in the case that the source control aware queries are in use. We'll look at that in a following diff as it isn't strictly eden specific. `watchman clock` was returning a basically empty clock unconditionally, which meant that most since queries would report everything since the start of time. This is most likely contributing to poor Buck performance, although I have not investigated the performance aspect of this. It manifested itself in the watchman integration tests. Reviewed By: simpkins Differential Revision: D5896494 fbshipit-source-id: a88be6448862781a1d8f5e15285ca07b4240593a
2017-10-17 08:22:18 +03:00
* committed fromPosition hash before the hash changed, and the set of
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
* files whose ScmFileStatus differed from the committed toPosition hash
augment JournalDelta with unclean paths on snapshot hash change Summary: We were previously generating a simple JournalDelta consisting of just the from/to snapshot hashes. This is great from a `!O(repo)` perspective when recording what changed but makes it difficult for clients downstream to reason about changes that are not tracked in source control. This diff adds a concept of `uncleanPaths` to the journal; these are paths that we think are/were different from the hashes in the journal entry. Since JournalDelta needs to be able to be merged I've opted for a simple list of the paths that have a differing status; I'm not including all of the various dirstate states for this because it is not obvious how to reconcile the state across successive snapshot change events. The `uncleanPaths` set is populated with an initial set of different paths as the first part of the checkout call (prior to changing the hash), and then is updated after the hash has changed to capture any additional differences. Care needs to be taken to avoid recursively attempting to grab the parents lock so I'm replicating just a little bit of the state management glue in the `performDiff` method. The Journal was not setting the from/to snapshot hashes when merging deltas. This manifested in the watchman integration tests; we'd see the null revision as the `from` and the `to` revision held the `from` revision(!). On the watchman side we need to ask source control to expand the list of files that changed when the from/to hashes are different; I've added code to handle this. This doesn't do anything smart in the case that the source control aware queries are in use. We'll look at that in a following diff as it isn't strictly eden specific. `watchman clock` was returning a basically empty clock unconditionally, which meant that most since queries would report everything since the start of time. This is most likely contributing to poor Buck performance, although I have not investigated the performance aspect of this. It manifested itself in the watchman integration tests. Reviewed By: simpkins Differential Revision: D5896494 fbshipit-source-id: a88be6448862781a1d8f5e15285ca07b4240593a
2017-10-17 08:22:18 +03:00
* after the hash was changed. This list of files represents files
* whose state may have changed as part of an update operation, but
* in ways that may not be able to be extracted solely by performing
* source control diff operations on the from/to hashes. */
6: list<string> uncleanPaths
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
}
struct DebugGetRawJournalParams {
1: string mountPoint
2: JournalPosition fromPosition
3: JournalPosition toPosition
}
struct DebugGetRawJournalResponse {
1: list<FileDelta> deltas
}
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
/**
* Classifies the change of the state of a file between and old and new state
* of the repository. Most commonly, the "old state" is the parent commit while
* the "new state" is the working copy.
*/
enum ScmFileStatus {
/**
* File is present in the new state, but was not present in old state.
*/
ADDED = 0x0,
/**
* File is present in both the new and old states, but its contents or
* file permissions have changed.
*/
MODIFIED = 0x1,
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
/**
* File was present in the old state, but is not present in the new state.
*/
REMOVED = 0x2,
/**
* File is present in the new state, but it is ignored according to the rules
* of the new state.
*/
IGNORED = 0x3,
}
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
struct ScmStatus {
1: map<string, ScmFileStatus> entries
/**
* A map of { path -> error message }
*
* If any errors occured while computing the diff they will be reported here.
* The results listed in the entries field may not be accurate for any paths
* listed in this error field.
*
* This map will be empty if no errors occurred.
*/
2: map<string, string> errors
}
Change how the UNTRACKED_ADDED conflict and merges are handled. Summary: Previously, we used the Mercurial code `g` when faced with an `UNTRACKED_ADDED` file conflict, but that was allowing merges to silently succeed that should not have. This revision changes our logic to use the code `m` for merge, which unearthed that we were not honoring the user's `update.check` setting properly. Because we use `update.check=noconflict` internally at Facebook, we changed the Eden integration tests to default to verifying Hg running with this setting. To support it properly, we had to port this code from `update.py` in Mercurial to our own `_determine_actions_for_conflicts()` function: ``` if updatecheck == 'noconflict': for f, (m, args, msg) in actionbyfile.iteritems(): if m not in ('g', 'k', 'e', 'r', 'pr'): msg = _("conflicting changes") hint = _("commit or update --clean to discard changes") raise error.Abort(msg, hint=hint) ``` However, this introduced an interesting issue where the `checkOutRevision()` Thrift call from Hg would update the `SNAPSHOT` file on the server, but `.hg/dirstate` would not get updated with the new parents until the update completed on the client. With the new call to `raise error.Abort` on the client, we could get in a state where the `SNAPSHOT` file had the hash of the commit assuming the update succeeded, but `.hg/dirstate` reflected the reality where it failed. To that end, we changed `checkOutRevision()` to take a new parameter, `checkoutMode`, which can take on one of three values: `NORMAL`, `DRY_RUN`, and `FORCE`. Now if the user tries to do an ordinary `hg update` with `update.check=noconflict`, we first do a `DRY_RUN` and examine the potential conflicts. Only if the conflicts should not block the update do we proceed with a call to `checkOutRevision()` in `NORMAL` mode. To make this work, we had to make a number of changes to `CheckoutAction`, `CheckoutContext`, `EdenMount`, and `TreeInode` to keep track of the `checkoutMode` and ensure that no changes are made to the working copy when a `DRY_RUN` is in effect. One minor issue (for which there is a `TODO`) is that a `DRY_RUN` will not report any `DIRECTORY_NOT_EMPTY` conflicts that may exist. As `TreeInode` is implemented today, it is a bit messy to report this type of conflict without modifying the working copy along the way. Finally, any `UNTRACKED_ADDED` conflict should cause an update to abort to match the behavior in stock Mercurial if the user has the following config setting: ``` [commands] update.check = noconflict ``` Though the original name for this setting was: ``` [experimental] updatecheck = noconflict ``` Although I am on Mercurial 4.4.1, the `update.check` setting does not seem to take effect when I run the integration tests, but the `updatecheck` setting does, so for now, I set both in `hg_extension_test_base.py` with a `TODO` to remove `updatecheck` once I can get `update.check` to do its job. Reviewed By: simpkins Differential Revision: D6366007 fbshipit-source-id: bb3ecb1270e77d59d7d9e7baa36ada61971bbc49
2017-11-30 08:38:12 +03:00
/** Option for use with checkOutRevision(). */
enum CheckoutMode {
/**
* Perform a "normal" checkout, analogous to `hg checkout` in Mercurial. Files
* in the working copy will be changed to reflect the destination snapshot,
* though files with conflicts will not be modified.
*/
NORMAL = 0,
/**
* Do not checkout: exercise the checkout logic to discover potential
* conflicts.
*/
DRY_RUN = 1,
/**
* Perform a "forced" checkout, analogous to `hg checkout --clean` in
* Mercurial. Conflicts between the working copy and destination snapshot will
* be forcibly ignored in favor of the state of the new snapshot.
*/
FORCE = 2,
}
enum ConflictType {
/**
* We failed to update this particular path due to an error
*/
ERROR = 0,
/**
* A locally modified file was deleted in the new Tree
*/
MODIFIED_REMOVED = 1,
/**
* An untracked local file exists in the new Tree
*/
UNTRACKED_ADDED = 2,
/**
* The file was removed locally, but modified in the new Tree
*/
REMOVED_MODIFIED = 3,
/**
* The file was removed locally, and also removed in the new Tree.
*/
MISSING_REMOVED = 4,
/**
* A locally modified file was modified in the new Tree
* This may be contents modifications, or a file type change (directory to
* file or vice-versa), or permissions changes.
*/
MODIFIED_MODIFIED = 5,
/**
* A directory was supposed to be removed or replaced with a file,
* but it contains untracked files preventing us from updating it.
*/
DIRECTORY_NOT_EMPTY = 6,
}
/**
* Details about conflicts or errors that occurred during a checkout operation
*/
struct CheckoutConflict {
1: string path
2: ConflictType type
3: string message
}
struct ScmBlobMetadata {
1: i64 size
2: BinaryHash contentsSha1
}
struct ScmTreeEntry {
1: binary name
2: i32 mode
3: BinaryHash id
}
struct TreeInodeEntryDebugInfo {
/**
* The entry name. This is just a PathComponent, not the full path
*/
1: binary name
/**
* The inode number, or 0 if no inode number has been assigned to
* this entry
*/
2: i64 inodeNumber
/**
* The entry mode_t value
*/
3: i32 mode
/**
* True if an InodeBase object exists for this inode or not.
*/
4: bool loaded
/**
* True if an the inode is materialized in the overlay
*/
5: bool materialized
/**
* If materialized is false, hash contains the ID of the underlying source
* control Blob or Tree.
*/
6: BinaryHash hash
}
struct WorkingDirectoryParents {
1: BinaryHash parent1
2: optional BinaryHash parent2
}
struct TreeInodeDebugInfo {
1: i64 inodeNumber
2: binary path
3: bool materialized
4: BinaryHash treeHash
5: list<TreeInodeEntryDebugInfo> entries
6: i64 refcount
}
struct InodePathDebugInfo {
1: string path
2: bool loaded
3: bool linked
}
struct SetLogLevelResult {
1: bool categoryCreated
}
/**
* Struct to store Information about inodes in a mount point.
*/
struct MountInodeInfo {
1: i64 loadedInodeCount
2: i64 unloadedInodeCount
3: i64 materializedInodeCount
4: i64 loadedFileCount
5: i64 loadedTreeCount
}
/**
* Struct to store fb303 counters from ServiceData.getCounters() and inode
* information of all the mount points.
*/
struct InternalStats {
1: i64 periodicUnloadCount
/**
* counters is the list of fb303 counters, key is the counter name, value is the
* counter value.
*/
2: map<string, i64> counters
/**
* mountPointInfo is a map whose key is the path of the mount point and value
* is the details like number of loaded inodes,unloaded inodes in that mount
* and number of materialized inodes in that mountpoint.
*/
3: map<string, MountInodeInfo> mountPointInfo
/**
* Linux-only: the contents of /proc/self/smaps, to be parsed by the caller.
*/
4: binary smaps
}
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
struct ManifestEntry {
/* mode_t */
1: i32 mode
}
struct FuseCall {
1: i32 len
2: i32 opcode
3: i64 unique
4: i64 nodeid
5: i32 uid
6: i32 gid
7: i32 pid
}
/** Params for globFiles(). */
struct GlobParams {
1: string mountPoint,
2: list<string> globs,
3: bool includeDotfiles,
// if true, prefetch matching blobs
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
4: bool prefetchFiles,
// if true, don't populate matchingFiles in the Glob
// results. This only really makes sense with prefetchFiles.
5: bool suppressFileList,
}
struct Glob {
/**
* This list cannot contain duplicate values and is not guaranteed to be
* sorted.
*/
1: list<string> matchingFiles,
}
service EdenService extends fb303.FacebookService {
list<MountInfo> listMounts() throws (1: EdenError ex)
void mount(1: MountInfo info) throws (1: EdenError ex)
void unmount(1: string mountPoint) throws (1: EdenError ex)
/**
Change how the UNTRACKED_ADDED conflict and merges are handled. Summary: Previously, we used the Mercurial code `g` when faced with an `UNTRACKED_ADDED` file conflict, but that was allowing merges to silently succeed that should not have. This revision changes our logic to use the code `m` for merge, which unearthed that we were not honoring the user's `update.check` setting properly. Because we use `update.check=noconflict` internally at Facebook, we changed the Eden integration tests to default to verifying Hg running with this setting. To support it properly, we had to port this code from `update.py` in Mercurial to our own `_determine_actions_for_conflicts()` function: ``` if updatecheck == 'noconflict': for f, (m, args, msg) in actionbyfile.iteritems(): if m not in ('g', 'k', 'e', 'r', 'pr'): msg = _("conflicting changes") hint = _("commit or update --clean to discard changes") raise error.Abort(msg, hint=hint) ``` However, this introduced an interesting issue where the `checkOutRevision()` Thrift call from Hg would update the `SNAPSHOT` file on the server, but `.hg/dirstate` would not get updated with the new parents until the update completed on the client. With the new call to `raise error.Abort` on the client, we could get in a state where the `SNAPSHOT` file had the hash of the commit assuming the update succeeded, but `.hg/dirstate` reflected the reality where it failed. To that end, we changed `checkOutRevision()` to take a new parameter, `checkoutMode`, which can take on one of three values: `NORMAL`, `DRY_RUN`, and `FORCE`. Now if the user tries to do an ordinary `hg update` with `update.check=noconflict`, we first do a `DRY_RUN` and examine the potential conflicts. Only if the conflicts should not block the update do we proceed with a call to `checkOutRevision()` in `NORMAL` mode. To make this work, we had to make a number of changes to `CheckoutAction`, `CheckoutContext`, `EdenMount`, and `TreeInode` to keep track of the `checkoutMode` and ensure that no changes are made to the working copy when a `DRY_RUN` is in effect. One minor issue (for which there is a `TODO`) is that a `DRY_RUN` will not report any `DIRECTORY_NOT_EMPTY` conflicts that may exist. As `TreeInode` is implemented today, it is a bit messy to report this type of conflict without modifying the working copy along the way. Finally, any `UNTRACKED_ADDED` conflict should cause an update to abort to match the behavior in stock Mercurial if the user has the following config setting: ``` [commands] update.check = noconflict ``` Though the original name for this setting was: ``` [experimental] updatecheck = noconflict ``` Although I am on Mercurial 4.4.1, the `update.check` setting does not seem to take effect when I run the integration tests, but the `updatecheck` setting does, so for now, I set both in `hg_extension_test_base.py` with a `TODO` to remove `updatecheck` once I can get `update.check` to do its job. Reviewed By: simpkins Differential Revision: D6366007 fbshipit-source-id: bb3ecb1270e77d59d7d9e7baa36ada61971bbc49
2017-11-30 08:38:12 +03:00
* Potentially check out the specified snapshot, reporting conflicts (and
* possibly errors), as appropriate.
*
Change how the UNTRACKED_ADDED conflict and merges are handled. Summary: Previously, we used the Mercurial code `g` when faced with an `UNTRACKED_ADDED` file conflict, but that was allowing merges to silently succeed that should not have. This revision changes our logic to use the code `m` for merge, which unearthed that we were not honoring the user's `update.check` setting properly. Because we use `update.check=noconflict` internally at Facebook, we changed the Eden integration tests to default to verifying Hg running with this setting. To support it properly, we had to port this code from `update.py` in Mercurial to our own `_determine_actions_for_conflicts()` function: ``` if updatecheck == 'noconflict': for f, (m, args, msg) in actionbyfile.iteritems(): if m not in ('g', 'k', 'e', 'r', 'pr'): msg = _("conflicting changes") hint = _("commit or update --clean to discard changes") raise error.Abort(msg, hint=hint) ``` However, this introduced an interesting issue where the `checkOutRevision()` Thrift call from Hg would update the `SNAPSHOT` file on the server, but `.hg/dirstate` would not get updated with the new parents until the update completed on the client. With the new call to `raise error.Abort` on the client, we could get in a state where the `SNAPSHOT` file had the hash of the commit assuming the update succeeded, but `.hg/dirstate` reflected the reality where it failed. To that end, we changed `checkOutRevision()` to take a new parameter, `checkoutMode`, which can take on one of three values: `NORMAL`, `DRY_RUN`, and `FORCE`. Now if the user tries to do an ordinary `hg update` with `update.check=noconflict`, we first do a `DRY_RUN` and examine the potential conflicts. Only if the conflicts should not block the update do we proceed with a call to `checkOutRevision()` in `NORMAL` mode. To make this work, we had to make a number of changes to `CheckoutAction`, `CheckoutContext`, `EdenMount`, and `TreeInode` to keep track of the `checkoutMode` and ensure that no changes are made to the working copy when a `DRY_RUN` is in effect. One minor issue (for which there is a `TODO`) is that a `DRY_RUN` will not report any `DIRECTORY_NOT_EMPTY` conflicts that may exist. As `TreeInode` is implemented today, it is a bit messy to report this type of conflict without modifying the working copy along the way. Finally, any `UNTRACKED_ADDED` conflict should cause an update to abort to match the behavior in stock Mercurial if the user has the following config setting: ``` [commands] update.check = noconflict ``` Though the original name for this setting was: ``` [experimental] updatecheck = noconflict ``` Although I am on Mercurial 4.4.1, the `update.check` setting does not seem to take effect when I run the integration tests, but the `updatecheck` setting does, so for now, I set both in `hg_extension_test_base.py` with a `TODO` to remove `updatecheck` once I can get `update.check` to do its job. Reviewed By: simpkins Differential Revision: D6366007 fbshipit-source-id: bb3ecb1270e77d59d7d9e7baa36ada61971bbc49
2017-11-30 08:38:12 +03:00
* If the checkoutMode is FORCE, the working directory will be forcibly
* updated to the contents of the new snapshot, even if there were conflicts.
* Conflicts will still be reported in the return value, but the files will be
* updated to their new state.
*
Change how the UNTRACKED_ADDED conflict and merges are handled. Summary: Previously, we used the Mercurial code `g` when faced with an `UNTRACKED_ADDED` file conflict, but that was allowing merges to silently succeed that should not have. This revision changes our logic to use the code `m` for merge, which unearthed that we were not honoring the user's `update.check` setting properly. Because we use `update.check=noconflict` internally at Facebook, we changed the Eden integration tests to default to verifying Hg running with this setting. To support it properly, we had to port this code from `update.py` in Mercurial to our own `_determine_actions_for_conflicts()` function: ``` if updatecheck == 'noconflict': for f, (m, args, msg) in actionbyfile.iteritems(): if m not in ('g', 'k', 'e', 'r', 'pr'): msg = _("conflicting changes") hint = _("commit or update --clean to discard changes") raise error.Abort(msg, hint=hint) ``` However, this introduced an interesting issue where the `checkOutRevision()` Thrift call from Hg would update the `SNAPSHOT` file on the server, but `.hg/dirstate` would not get updated with the new parents until the update completed on the client. With the new call to `raise error.Abort` on the client, we could get in a state where the `SNAPSHOT` file had the hash of the commit assuming the update succeeded, but `.hg/dirstate` reflected the reality where it failed. To that end, we changed `checkOutRevision()` to take a new parameter, `checkoutMode`, which can take on one of three values: `NORMAL`, `DRY_RUN`, and `FORCE`. Now if the user tries to do an ordinary `hg update` with `update.check=noconflict`, we first do a `DRY_RUN` and examine the potential conflicts. Only if the conflicts should not block the update do we proceed with a call to `checkOutRevision()` in `NORMAL` mode. To make this work, we had to make a number of changes to `CheckoutAction`, `CheckoutContext`, `EdenMount`, and `TreeInode` to keep track of the `checkoutMode` and ensure that no changes are made to the working copy when a `DRY_RUN` is in effect. One minor issue (for which there is a `TODO`) is that a `DRY_RUN` will not report any `DIRECTORY_NOT_EMPTY` conflicts that may exist. As `TreeInode` is implemented today, it is a bit messy to report this type of conflict without modifying the working copy along the way. Finally, any `UNTRACKED_ADDED` conflict should cause an update to abort to match the behavior in stock Mercurial if the user has the following config setting: ``` [commands] update.check = noconflict ``` Though the original name for this setting was: ``` [experimental] updatecheck = noconflict ``` Although I am on Mercurial 4.4.1, the `update.check` setting does not seem to take effect when I run the integration tests, but the `updatecheck` setting does, so for now, I set both in `hg_extension_test_base.py` with a `TODO` to remove `updatecheck` once I can get `update.check` to do its job. Reviewed By: simpkins Differential Revision: D6366007 fbshipit-source-id: bb3ecb1270e77d59d7d9e7baa36ada61971bbc49
2017-11-30 08:38:12 +03:00
* If the checkoutMode is NORMAL, files with conflicts will be left
* unmodified. Files that are untracked in both the source and destination
* snapshots are always left unchanged, even if force is true.
*
Change how the UNTRACKED_ADDED conflict and merges are handled. Summary: Previously, we used the Mercurial code `g` when faced with an `UNTRACKED_ADDED` file conflict, but that was allowing merges to silently succeed that should not have. This revision changes our logic to use the code `m` for merge, which unearthed that we were not honoring the user's `update.check` setting properly. Because we use `update.check=noconflict` internally at Facebook, we changed the Eden integration tests to default to verifying Hg running with this setting. To support it properly, we had to port this code from `update.py` in Mercurial to our own `_determine_actions_for_conflicts()` function: ``` if updatecheck == 'noconflict': for f, (m, args, msg) in actionbyfile.iteritems(): if m not in ('g', 'k', 'e', 'r', 'pr'): msg = _("conflicting changes") hint = _("commit or update --clean to discard changes") raise error.Abort(msg, hint=hint) ``` However, this introduced an interesting issue where the `checkOutRevision()` Thrift call from Hg would update the `SNAPSHOT` file on the server, but `.hg/dirstate` would not get updated with the new parents until the update completed on the client. With the new call to `raise error.Abort` on the client, we could get in a state where the `SNAPSHOT` file had the hash of the commit assuming the update succeeded, but `.hg/dirstate` reflected the reality where it failed. To that end, we changed `checkOutRevision()` to take a new parameter, `checkoutMode`, which can take on one of three values: `NORMAL`, `DRY_RUN`, and `FORCE`. Now if the user tries to do an ordinary `hg update` with `update.check=noconflict`, we first do a `DRY_RUN` and examine the potential conflicts. Only if the conflicts should not block the update do we proceed with a call to `checkOutRevision()` in `NORMAL` mode. To make this work, we had to make a number of changes to `CheckoutAction`, `CheckoutContext`, `EdenMount`, and `TreeInode` to keep track of the `checkoutMode` and ensure that no changes are made to the working copy when a `DRY_RUN` is in effect. One minor issue (for which there is a `TODO`) is that a `DRY_RUN` will not report any `DIRECTORY_NOT_EMPTY` conflicts that may exist. As `TreeInode` is implemented today, it is a bit messy to report this type of conflict without modifying the working copy along the way. Finally, any `UNTRACKED_ADDED` conflict should cause an update to abort to match the behavior in stock Mercurial if the user has the following config setting: ``` [commands] update.check = noconflict ``` Though the original name for this setting was: ``` [experimental] updatecheck = noconflict ``` Although I am on Mercurial 4.4.1, the `update.check` setting does not seem to take effect when I run the integration tests, but the `updatecheck` setting does, so for now, I set both in `hg_extension_test_base.py` with a `TODO` to remove `updatecheck` once I can get `update.check` to do its job. Reviewed By: simpkins Differential Revision: D6366007 fbshipit-source-id: bb3ecb1270e77d59d7d9e7baa36ada61971bbc49
2017-11-30 08:38:12 +03:00
* If the checkoutMode is DRY_RUN, then no files are modified in the working
* copy and the current snapshot does not change. However, potential conflicts
* are still reported in the return value.
*
Change how the UNTRACKED_ADDED conflict and merges are handled. Summary: Previously, we used the Mercurial code `g` when faced with an `UNTRACKED_ADDED` file conflict, but that was allowing merges to silently succeed that should not have. This revision changes our logic to use the code `m` for merge, which unearthed that we were not honoring the user's `update.check` setting properly. Because we use `update.check=noconflict` internally at Facebook, we changed the Eden integration tests to default to verifying Hg running with this setting. To support it properly, we had to port this code from `update.py` in Mercurial to our own `_determine_actions_for_conflicts()` function: ``` if updatecheck == 'noconflict': for f, (m, args, msg) in actionbyfile.iteritems(): if m not in ('g', 'k', 'e', 'r', 'pr'): msg = _("conflicting changes") hint = _("commit or update --clean to discard changes") raise error.Abort(msg, hint=hint) ``` However, this introduced an interesting issue where the `checkOutRevision()` Thrift call from Hg would update the `SNAPSHOT` file on the server, but `.hg/dirstate` would not get updated with the new parents until the update completed on the client. With the new call to `raise error.Abort` on the client, we could get in a state where the `SNAPSHOT` file had the hash of the commit assuming the update succeeded, but `.hg/dirstate` reflected the reality where it failed. To that end, we changed `checkOutRevision()` to take a new parameter, `checkoutMode`, which can take on one of three values: `NORMAL`, `DRY_RUN`, and `FORCE`. Now if the user tries to do an ordinary `hg update` with `update.check=noconflict`, we first do a `DRY_RUN` and examine the potential conflicts. Only if the conflicts should not block the update do we proceed with a call to `checkOutRevision()` in `NORMAL` mode. To make this work, we had to make a number of changes to `CheckoutAction`, `CheckoutContext`, `EdenMount`, and `TreeInode` to keep track of the `checkoutMode` and ensure that no changes are made to the working copy when a `DRY_RUN` is in effect. One minor issue (for which there is a `TODO`) is that a `DRY_RUN` will not report any `DIRECTORY_NOT_EMPTY` conflicts that may exist. As `TreeInode` is implemented today, it is a bit messy to report this type of conflict without modifying the working copy along the way. Finally, any `UNTRACKED_ADDED` conflict should cause an update to abort to match the behavior in stock Mercurial if the user has the following config setting: ``` [commands] update.check = noconflict ``` Though the original name for this setting was: ``` [experimental] updatecheck = noconflict ``` Although I am on Mercurial 4.4.1, the `update.check` setting does not seem to take effect when I run the integration tests, but the `updatecheck` setting does, so for now, I set both in `hg_extension_test_base.py` with a `TODO` to remove `updatecheck` once I can get `update.check` to do its job. Reviewed By: simpkins Differential Revision: D6366007 fbshipit-source-id: bb3ecb1270e77d59d7d9e7baa36ada61971bbc49
2017-11-30 08:38:12 +03:00
* On successful return from this function (unless it is a DRY_RUN), the mount
* point will point to the new snapshot, even if some paths had conflicts or
* errors. The caller is responsible for taking appropriate action to update
* these paths as desired after checkOutRevision() returns.
*/
list<CheckoutConflict> checkOutRevision(
1: string mountPoint,
2: BinaryHash snapshotHash,
Change how the UNTRACKED_ADDED conflict and merges are handled. Summary: Previously, we used the Mercurial code `g` when faced with an `UNTRACKED_ADDED` file conflict, but that was allowing merges to silently succeed that should not have. This revision changes our logic to use the code `m` for merge, which unearthed that we were not honoring the user's `update.check` setting properly. Because we use `update.check=noconflict` internally at Facebook, we changed the Eden integration tests to default to verifying Hg running with this setting. To support it properly, we had to port this code from `update.py` in Mercurial to our own `_determine_actions_for_conflicts()` function: ``` if updatecheck == 'noconflict': for f, (m, args, msg) in actionbyfile.iteritems(): if m not in ('g', 'k', 'e', 'r', 'pr'): msg = _("conflicting changes") hint = _("commit or update --clean to discard changes") raise error.Abort(msg, hint=hint) ``` However, this introduced an interesting issue where the `checkOutRevision()` Thrift call from Hg would update the `SNAPSHOT` file on the server, but `.hg/dirstate` would not get updated with the new parents until the update completed on the client. With the new call to `raise error.Abort` on the client, we could get in a state where the `SNAPSHOT` file had the hash of the commit assuming the update succeeded, but `.hg/dirstate` reflected the reality where it failed. To that end, we changed `checkOutRevision()` to take a new parameter, `checkoutMode`, which can take on one of three values: `NORMAL`, `DRY_RUN`, and `FORCE`. Now if the user tries to do an ordinary `hg update` with `update.check=noconflict`, we first do a `DRY_RUN` and examine the potential conflicts. Only if the conflicts should not block the update do we proceed with a call to `checkOutRevision()` in `NORMAL` mode. To make this work, we had to make a number of changes to `CheckoutAction`, `CheckoutContext`, `EdenMount`, and `TreeInode` to keep track of the `checkoutMode` and ensure that no changes are made to the working copy when a `DRY_RUN` is in effect. One minor issue (for which there is a `TODO`) is that a `DRY_RUN` will not report any `DIRECTORY_NOT_EMPTY` conflicts that may exist. As `TreeInode` is implemented today, it is a bit messy to report this type of conflict without modifying the working copy along the way. Finally, any `UNTRACKED_ADDED` conflict should cause an update to abort to match the behavior in stock Mercurial if the user has the following config setting: ``` [commands] update.check = noconflict ``` Though the original name for this setting was: ``` [experimental] updatecheck = noconflict ``` Although I am on Mercurial 4.4.1, the `update.check` setting does not seem to take effect when I run the integration tests, but the `updatecheck` setting does, so for now, I set both in `hg_extension_test_base.py` with a `TODO` to remove `updatecheck` once I can get `update.check` to do its job. Reviewed By: simpkins Differential Revision: D6366007 fbshipit-source-id: bb3ecb1270e77d59d7d9e7baa36ada61971bbc49
2017-11-30 08:38:12 +03:00
3: CheckoutMode checkoutMode)
throws (1: EdenError ex)
/**
* Reset the working directory's parent commits, without changing the working
* directory contents.
*
* This operation is equivalent to `git reset --soft` or `hg reset --keep`
*/
void resetParentCommits(
1: string mountPoint,
2: WorkingDirectoryParents parents)
throws (1: EdenError ex)
/**
* For each path, returns an EdenError instead of the SHA-1 if any of the
* following occur:
* - path is the empty string.
* - path identifies a non-existent file.
* - path identifies something that is not an ordinary file (e.g., symlink
* or directory).
*/
list<SHA1Result> getSHA1(1: string mountPoint, 2: list<string> paths)
throws (1: EdenError ex)
/**
* Returns a list of paths relative to the mountPoint.
*/
list<string> getBindMounts(1: string mountPoint)
throws (1: EdenError ex)
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/** Returns the sequence position at the time the method is called.
* Returns the instantaneous value of the journal sequence number.
*/
JournalPosition getCurrentJournalPosition(1: string mountPoint)
throws (1: EdenError ex)
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/** Returns the set of files (and dirs) that changed since a prior point.
* If fromPosition.mountGeneration is mismatched with the current
* mountGeneration, throws an EdenError with errorCode = ERANGE.
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
* This indicates that eden cannot compute the delta for the requested
* range. The client will need to recompute a new baseline using
* other available functions in EdenService.
*/
FileDelta getFilesChangedSince(
1: string mountPoint,
2: JournalPosition fromPosition)
throws (1: EdenError ex)
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/**
* Returns the journal entries for the specified params. Useful for auditing
* the changes that Eden has sent to Watchman. Note that the most recent
* journal entries will be at the front of the list in
* DebugGetRawJournalResponse.
*/
DebugGetRawJournalResponse debugGetRawJournal(
1: DebugGetRawJournalParams params,
) throws (1: EdenError ex)
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/** Returns a subset of the stat() information for a list of paths.
* The returned list of information corresponds to the input list of
* paths; eg; result[0] holds the information for paths[0].
* We only support returning the instantaneous information about
* these paths, as we cannot answer with historical information about
* files in the overlay.
*/
list<FileInformationOrError> getFileInformation(
1: string mountPoint,
2: list<string> paths)
throws (1: EdenError ex)
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/**
* DEPRECATED: Prefer globFiles().
* Returns a list of files that match the input globs.
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
* There are no duplicate values in the result.
*/
list<string> glob(
1: string mountPoint,
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
2: list<string> globs)
throws (1: EdenError ex)
/**
* Returns a list of files that match the GlobParams, notably,
* the list of glob patterns.
* There are no duplicate values in the result.
*/
Glob globFiles(
1: GlobParams params,
) throws (1: EdenError ex)
/**
* Get the status of the working directory against the specified commit.
*
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
* This may exclude special files according to the rules of the underlying
* SCM system, such as the .git folder in Git and the .hg folder in Mercurial.
*/
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
ScmStatus getScmStatus(
1: string mountPoint,
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
2: bool listIgnored,
3: BinaryHash commit,
) throws (1: EdenError ex)
/**
* Computes the status between two specified revisions.
* This does not care about the state of the working copy.
*/
ScmStatus getScmStatusBetweenRevisions(
1: string mountPoint,
2: BinaryHash oldHash,
3: BinaryHash newHash,
) throws (1: EdenError ex)
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
//////// SCM Commit-Related APIs ////////
/**
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
* If the relative path exists in the manifest (i.e., the current commit),
* then return the corresponding ManifestEntry; otherwise, throw
* NoValueForKeyError.
*
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
* Note that we are still experimenting with the type of SCM information Eden
* should be responsible for reporting, so this method is subject to change,
* or may go away entirely. At a minimum, it should take a commit as a
* parameter rather than assuming the current commit.
*/
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
ManifestEntry getManifestEntry(
1: string mountPoint
2: string relativePath
) throws (
1: EdenError ex
2: NoValueForKeyError noValueForKeyError
)
//////// Debugging APIs ////////
/**
* Get the contents of a source control Tree.
*
* This can be used to confirm if eden's LocalStore contains information
* for the tree, and that the information is correct.
*
* If localStoreOnly is true, the data is loaded directly from the
* LocalStore, and an error will be raised if it is not already present in
* the LocalStore. If localStoreOnly is false, the data may be retrieved
* from the BackingStore if it is not already present in the LocalStore.
*/
list<ScmTreeEntry> debugGetScmTree(
1: string mountPoint,
2: BinaryHash id,
3: bool localStoreOnly,
) throws (1: EdenError ex)
/**
* Get the contents of a source control Blob.
*
* This can be used to confirm if eden's LocalStore contains information
* for the blob, and that the information is correct.
*/
binary debugGetScmBlob(
1: string mountPoint,
2: BinaryHash id,
3: bool localStoreOnly,
) throws (1: EdenError ex)
/**
* Get the metadata about a source control Blob.
*
* This retrieves the metadata about a source control Blob. This returns
* the size and contents SHA1 of the blob, which eden stores separately from
* the blob itself. This can also be a useful alternative to
* debugGetScmBlob() when getting data about extremely large blobs.
*/
ScmBlobMetadata debugGetScmBlobMetadata(
1: string mountPoint,
2: BinaryHash id,
3: bool localStoreOnly,
) throws (1: EdenError ex)
/**
* Get status about currently loaded inode objects.
*
* This returns details about all currently loaded inode objects under the
* given path.
*
* If the path argument is the empty string data will be returned about all
* inodes in the entire mount point. Otherwise the path argument should
* refer to a subdirectory, and data will be returned for all inodes under
* the specified subdirectory.
*
* The rename lock is not held while gathering this information, so the path
* name information returned may not always be internally consistent. If
* renames were taking place while gathering the data, some inodes may show
* up under multiple parents. It's also possible that we may miss some
* inodes during the tree walk if they were renamed from a directory that was
* not yet walked into a directory that has already been walked.
*
* This API cannot return data about inodes that have been unlinked but still
* have outstanding references.
*/
list<TreeInodeDebugInfo> debugInodeStatus(
1: string mountPoint,
2: string path,
) throws (1: EdenError ex)
/**
* Get the list of outstanding fuse requests
*
* This will return the list of FuseCall structure containing the data from
* fuse_in_header.
*/
list<FuseCall> debugOutstandingFuseCalls(
1: string mountPoint,
)
/**
* Get the InodePathDebugInfo for the inode that corresponds to the given
* inode number. This provides the path for the inode and also indicates
* whether the inode is currently loaded or not. Requires that the Eden
* mountPoint be specified.
*/
InodePathDebugInfo debugGetInodePath(
1: string mountPoint,
2: i64 inodeNumber,
) throws (1: EdenError ex)
/**
* Sets the log level for a given category at runtime.
*/
SetLogLevelResult debugSetLogLevel(
1: string category,
2: string level,
) throws (1: EdenError ex)
/**
* Clears all data from the LocalStore that can be populated from the upstream
* backing store.
*/
void debugClearLocalStoreCaches() throws (1: EdenError ex)
/**
* Asks RocksDB to perform a compaction.
*/
void debugCompactLocalStorage() throws (1: EdenError ex)
/**
* Unloads unused Inodes from a directory inside a mountPoint whose last
* access time is older than the specified age.
*
* The age parameter is a relative time to be subtracted from the current
* (wall clock) time.
*/
i64 unloadInodeForPath(
1: string mountPoint,
2: string path,
3: TimeSpec age,
) throws (1: EdenError ex)
/**
* Flush all thread-local stats to the main ServiceData object.
*
* Thread-local counters are normally flushed to the main ServiceData once
* a second. flushStatsNow() can be used to flush thread-local counters on
* demand, in addition to the normal once-a-second flush.
*
* This is mainly useful for unit and integration tests that want to ensure
* they see up-to-date counter information without waiting for the normal
* flush interval.
*/
void flushStatsNow() throws (1: EdenError ex)
/**
* Invalidate kernel cache for inode.
*/
void invalidateKernelInodeCache(
1: string mountPoint,
2: string path
)
throws (1: EdenError ex)
/**
* Gets the number of inodes unloaded by periodic job on an EdenMount.
*/
InternalStats getStatInfo() throws (1: EdenError ex)
}