sapling/eden/fs/service/eden.thrift

1202 lines
35 KiB
Thrift
Raw Normal View History

/*
* Copyright (c) Facebook, Inc. and its affiliates.
*
* This software may be used and distributed according to the terms of the
* GNU General Public License version 2.
*/
include "eden/fs/config/eden_config.thrift"
include "fb303/thrift/fb303_core.thrift"
namespace cpp2 facebook.eden
namespace java com.facebook.eden.thrift
namespace py facebook.eden
namespace py3 eden.fs.service
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/** Thrift doesn't really do unsigned numbers, but we can sort of fake it.
* This type is serialized as an integer value that is 64-bits wide and
* should round-trip with full fidelity for C++ client/server, but for
* other runtimes will have crazy results if the sign bit is ever set.
* In practice it is impossible for us to have files that large in eden,
* and sequence numbers will take an incredibly long time to ever roll
* over and cause problems.
* Once t13345978 is done, we can uncomment the cpp.type below.
*/
typedef i64 /* (cpp.type = "std::uint64_t") */ unsigned64
typedef i32 pid_t
/**
* A source control hash.
*
* This should normally be a 20-byte binary value, however the edenfs server
* will accept BinaryHash arguments as 40-byte hexadecimal strings as well.
* Data returned by the edenfs server in a BinaryHash field will always be a
* 20-byte binary value.
*/
typedef binary BinaryHash
/**
* So, you thought that a path was a string?
* Paths in posix are arbitrary byte strings with some pre-defined special
* characters. On modern systems they tend to be represented as UTF-8 but
* there is no guarantee. We use the `PathString` type as symbolic way
* to indicate that you may need to perform special processing to safely
* interpret the path data on your system.
*/
typedef binary PathString
/**
* A customizable type to be returned with an EdenError, helpful for catching
* and having custom client logic to handle specfic error cases
*/
enum EdenErrorType {
/** The errorCode property is a posix errno value */
POSIX_ERROR = 0,
/** The errorCode property is a win32 error value */
WIN32_ERROR = 1,
/** The errorCode property is a windows NT HResult error value */
HRESULT_ERROR = 2,
/**
* An argument passed to thrift was invalid. errorCode will be set to EINVAL
*/
ARGUMENT_ERROR = 3,
/** An error occurred. errorCode will be not set */
GENERIC_ERROR = 4,
/** The mount generation changed. errorCode will be set to ERANGE */
MOUNT_GENERATION_CHANGED = 5,
/** The journal has been truncated. errorCode will be set to EDOM */
JOURNAL_TRUNCATED = 6,
/**
* The thrift funtion that receives this in an error is being called while
* a checkout is in progress. errorCode will not be set.
*/
CHECKOUT_IN_PROGRESS = 7,
/**
* The thrift function that receives this is an error is being called with a
* parent that is not the current parent. errorCode will not be set.
*/
OUT_OF_DATE_PARENT = 8,
}
exception EdenError {
1: required string message
2: optional i32 errorCode
3: EdenErrorType errorType
} (message = 'message')
exception NoValueForKeyError {
1: string key
}
/**
* Information about the running edenfs daemon.
*/
struct DaemonInfo {
1: i32 pid
/**
* List of command line arguments, including the executable name,
* given to the edenfs process.
*/
2: list<string> commandLine
/**
* The service status.
* This is the same data reported by fb303_core.getStatus()
*/
3: optional fb303_core.fb303_status status
/**
* The uptime of the edenfs daemon
* Same data from /proc/pid/stat
* This will not be populated in Windows build
*/
4: optional float uptime
}
/**
* The current running state of an EdenMount.
*/
enum MountState {
/**
* The EdenMount object has been constructed but has not started
* initializing.
*/
UNINITIALIZED = 0,
/**
* The mount point is currently initializing and loading necessary state
* (such as the root directory contents) before it can ask the kernel to
* mount it.
*/
INITIALIZING = 1,
/**
* The mount point has loaded its local state needed to start mounting
* but has not actually started mounting yet.
*/
INITIALIZED = 2,
/**
* Starting to mount fuse.
*/
STARTING = 3,
/**
* The EdenMount is running normally.
*/
RUNNING = 4,
/**
* Encountered an error while starting fuse mount.
*/
FUSE_ERROR = 5,
/**
* EdenMount::shutdown() has been called, but it is not complete yet.
*/
SHUTTING_DOWN = 6,
/**
* EdenMount::shutdown() has completed, but there are still outstanding
* references so EdenMount::destroy() has not been called yet.
*
* When EdenMount::destroy() is called the object can be destroyed
* immediately.
*/
SHUT_DOWN = 7,
/**
* EdenMount::destroy() has been called, but the shutdown is not complete
* yet. There are no remaining references to the EdenMount at this point,
* so when the shutdown completes it will be automatically destroyed.
*/
DESTROYING = 8,
/**
* An error occurred during mount initialization.
*
* This state is used for errors that occur during the INITIALIZING phase,
* before we have attempted to start the FUSE mount.
*/
INIT_ERROR = 9,
} (cpp2.enum_type = 'uint32_t')
struct MountInfo {
1: PathString mountPoint
2: PathString edenClientPath
3: MountState state
}
struct MountArgument {
1: PathString mountPoint
2: PathString edenClientPath
3: bool readOnly
}
union SHA1Result {
1: BinaryHash sha1
2: EdenError error
}
/**
* Effectively a `struct timespec`
*/
struct TimeSpec {
1: i64 seconds
2: i64 nanoSeconds
}
/**
* Information about filesystem entries that can be retrieved solely
* from the tree structure, without having to fetch the actual child
* objects from source control.
*/
struct EntryInformation {
1: Dtype dtype
}
union EntryInformationOrError {
1: EntryInformation info
2: EdenError error
}
/**
* Subset of stat() data returned from getFileInformation())
*/
struct FileInformation {
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
1: unsigned64 size // wish thrift had unsigned numbers
2: TimeSpec mtime
3: i32 mode // mode_t
}
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/** Holds information about a file, or an error in retrieving that info.
* The most likely error will be ENOENT, implying that the file doesn't exist.
*/
union FileInformationOrError {
1: FileInformation info
2: EdenError error
}
/** reference a point in time in the journal.
* This can be used to reason about a point in time in a given mount point.
* The mountGeneration value is opaque to the client.
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
*/
struct JournalPosition {
/** An opaque but unique number within the scope of a given mount point.
* This is used to determine when sequenceNumber has been invalidated. */
1: i64 mountGeneration
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/** Monotonically incrementing number
* Each journalled change causes this number to increment. */
2: unsigned64 sequenceNumber
/** Records the snapshot hash at the appropriate point in the journal */
3: BinaryHash snapshotHash
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
}
/** Holds information about a set of paths that changed between two points.
* fromPosition, toPosition define the time window.
* paths holds the list of paths that changed in that window.
*
* This type is quasi-deprecated. It has multiple API problems and should be
* rethought when we have a chance to make a breaking change.
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
*/
struct FileDelta {
/** The fromPosition passed to getFilesChangedSince */
1: JournalPosition fromPosition
/** The current position at the time that getFilesChangedSince was called */
2: JournalPosition toPosition
/** The union of changedPaths and createdPaths contains the total set of paths
* changed in the overlay between fromPosition and toPosition.
* Disjoint with createdPaths.
*/
3: list<PathString> changedPaths
/** The set of paths created between fromPosition and toPosition.
* Used by Watchman to search for cookies and to populate its 'new' field.
* Disjoint with changedPaths.
*/
4: list<PathString> createdPaths
/** Deprecated - always empty. */
5: list<PathString> removedPaths
augment JournalDelta with unclean paths on snapshot hash change Summary: We were previously generating a simple JournalDelta consisting of just the from/to snapshot hashes. This is great from a `!O(repo)` perspective when recording what changed but makes it difficult for clients downstream to reason about changes that are not tracked in source control. This diff adds a concept of `uncleanPaths` to the journal; these are paths that we think are/were different from the hashes in the journal entry. Since JournalDelta needs to be able to be merged I've opted for a simple list of the paths that have a differing status; I'm not including all of the various dirstate states for this because it is not obvious how to reconcile the state across successive snapshot change events. The `uncleanPaths` set is populated with an initial set of different paths as the first part of the checkout call (prior to changing the hash), and then is updated after the hash has changed to capture any additional differences. Care needs to be taken to avoid recursively attempting to grab the parents lock so I'm replicating just a little bit of the state management glue in the `performDiff` method. The Journal was not setting the from/to snapshot hashes when merging deltas. This manifested in the watchman integration tests; we'd see the null revision as the `from` and the `to` revision held the `from` revision(!). On the watchman side we need to ask source control to expand the list of files that changed when the from/to hashes are different; I've added code to handle this. This doesn't do anything smart in the case that the source control aware queries are in use. We'll look at that in a following diff as it isn't strictly eden specific. `watchman clock` was returning a basically empty clock unconditionally, which meant that most since queries would report everything since the start of time. This is most likely contributing to poor Buck performance, although I have not investigated the performance aspect of this. It manifested itself in the watchman integration tests. Reviewed By: simpkins Differential Revision: D5896494 fbshipit-source-id: a88be6448862781a1d8f5e15285ca07b4240593a
2017-10-17 08:22:18 +03:00
/** When fromPosition.snapshotHash != toPosition.snapshotHash this holds
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
* the union of the set of files whose ScmFileStatus differed from the
augment JournalDelta with unclean paths on snapshot hash change Summary: We were previously generating a simple JournalDelta consisting of just the from/to snapshot hashes. This is great from a `!O(repo)` perspective when recording what changed but makes it difficult for clients downstream to reason about changes that are not tracked in source control. This diff adds a concept of `uncleanPaths` to the journal; these are paths that we think are/were different from the hashes in the journal entry. Since JournalDelta needs to be able to be merged I've opted for a simple list of the paths that have a differing status; I'm not including all of the various dirstate states for this because it is not obvious how to reconcile the state across successive snapshot change events. The `uncleanPaths` set is populated with an initial set of different paths as the first part of the checkout call (prior to changing the hash), and then is updated after the hash has changed to capture any additional differences. Care needs to be taken to avoid recursively attempting to grab the parents lock so I'm replicating just a little bit of the state management glue in the `performDiff` method. The Journal was not setting the from/to snapshot hashes when merging deltas. This manifested in the watchman integration tests; we'd see the null revision as the `from` and the `to` revision held the `from` revision(!). On the watchman side we need to ask source control to expand the list of files that changed when the from/to hashes are different; I've added code to handle this. This doesn't do anything smart in the case that the source control aware queries are in use. We'll look at that in a following diff as it isn't strictly eden specific. `watchman clock` was returning a basically empty clock unconditionally, which meant that most since queries would report everything since the start of time. This is most likely contributing to poor Buck performance, although I have not investigated the performance aspect of this. It manifested itself in the watchman integration tests. Reviewed By: simpkins Differential Revision: D5896494 fbshipit-source-id: a88be6448862781a1d8f5e15285ca07b4240593a
2017-10-17 08:22:18 +03:00
* committed fromPosition hash before the hash changed, and the set of
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
* files whose ScmFileStatus differed from the committed toPosition hash
augment JournalDelta with unclean paths on snapshot hash change Summary: We were previously generating a simple JournalDelta consisting of just the from/to snapshot hashes. This is great from a `!O(repo)` perspective when recording what changed but makes it difficult for clients downstream to reason about changes that are not tracked in source control. This diff adds a concept of `uncleanPaths` to the journal; these are paths that we think are/were different from the hashes in the journal entry. Since JournalDelta needs to be able to be merged I've opted for a simple list of the paths that have a differing status; I'm not including all of the various dirstate states for this because it is not obvious how to reconcile the state across successive snapshot change events. The `uncleanPaths` set is populated with an initial set of different paths as the first part of the checkout call (prior to changing the hash), and then is updated after the hash has changed to capture any additional differences. Care needs to be taken to avoid recursively attempting to grab the parents lock so I'm replicating just a little bit of the state management glue in the `performDiff` method. The Journal was not setting the from/to snapshot hashes when merging deltas. This manifested in the watchman integration tests; we'd see the null revision as the `from` and the `to` revision held the `from` revision(!). On the watchman side we need to ask source control to expand the list of files that changed when the from/to hashes are different; I've added code to handle this. This doesn't do anything smart in the case that the source control aware queries are in use. We'll look at that in a following diff as it isn't strictly eden specific. `watchman clock` was returning a basically empty clock unconditionally, which meant that most since queries would report everything since the start of time. This is most likely contributing to poor Buck performance, although I have not investigated the performance aspect of this. It manifested itself in the watchman integration tests. Reviewed By: simpkins Differential Revision: D5896494 fbshipit-source-id: a88be6448862781a1d8f5e15285ca07b4240593a
2017-10-17 08:22:18 +03:00
* after the hash was changed. This list of files represents files
* whose state may have changed as part of an update operation, but
* in ways that may not be able to be extracted solely by performing
* source control diff operations on the from/to hashes. */
6: list<PathString> uncleanPaths
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
}
struct DebugGetRawJournalParams {
1: PathString mountPoint
2: optional i32 limit
3: i32 fromSequenceNumber
}
struct DebugPathChangeInfo {
1: bool existedBefore
2: bool existedAfter
}
/**
* A fairly direct modeling of the underlying JournalDelta data structure.
*/
struct DebugJournalDelta {
1: JournalPosition fromPosition
2: JournalPosition toPosition
3: map<PathString, DebugPathChangeInfo> changedPaths
4: set<PathString> uncleanPaths
}
struct DebugGetRawJournalResponse {
2: list<DebugJournalDelta> allDeltas
}
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
/**
* Classifies the change of the state of a file between and old and new state
* of the repository. Most commonly, the "old state" is the parent commit while
* the "new state" is the working copy.
*/
enum ScmFileStatus {
/**
* File is present in the new state, but was not present in old state.
*/
ADDED = 0x0,
/**
* File is present in both the new and old states, but its contents or
* file permissions have changed.
*/
MODIFIED = 0x1,
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
/**
* File was present in the old state, but is not present in the new state.
*/
REMOVED = 0x2,
/**
* File is present in the new state, but it is ignored according to the rules
* of the new state.
*/
IGNORED = 0x3,
}
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
struct ScmStatus {
1: map<PathString, ScmFileStatus> entries
/**
* A map of { path -> error message }
*
* If any errors occured while computing the diff they will be reported here.
* The results listed in the entries field may not be accurate for any paths
* listed in this error field.
*
* This map will be empty if no errors occurred.
*/
2: map<PathString, string> errors
}
Change how the UNTRACKED_ADDED conflict and merges are handled. Summary: Previously, we used the Mercurial code `g` when faced with an `UNTRACKED_ADDED` file conflict, but that was allowing merges to silently succeed that should not have. This revision changes our logic to use the code `m` for merge, which unearthed that we were not honoring the user's `update.check` setting properly. Because we use `update.check=noconflict` internally at Facebook, we changed the Eden integration tests to default to verifying Hg running with this setting. To support it properly, we had to port this code from `update.py` in Mercurial to our own `_determine_actions_for_conflicts()` function: ``` if updatecheck == 'noconflict': for f, (m, args, msg) in actionbyfile.iteritems(): if m not in ('g', 'k', 'e', 'r', 'pr'): msg = _("conflicting changes") hint = _("commit or update --clean to discard changes") raise error.Abort(msg, hint=hint) ``` However, this introduced an interesting issue where the `checkOutRevision()` Thrift call from Hg would update the `SNAPSHOT` file on the server, but `.hg/dirstate` would not get updated with the new parents until the update completed on the client. With the new call to `raise error.Abort` on the client, we could get in a state where the `SNAPSHOT` file had the hash of the commit assuming the update succeeded, but `.hg/dirstate` reflected the reality where it failed. To that end, we changed `checkOutRevision()` to take a new parameter, `checkoutMode`, which can take on one of three values: `NORMAL`, `DRY_RUN`, and `FORCE`. Now if the user tries to do an ordinary `hg update` with `update.check=noconflict`, we first do a `DRY_RUN` and examine the potential conflicts. Only if the conflicts should not block the update do we proceed with a call to `checkOutRevision()` in `NORMAL` mode. To make this work, we had to make a number of changes to `CheckoutAction`, `CheckoutContext`, `EdenMount`, and `TreeInode` to keep track of the `checkoutMode` and ensure that no changes are made to the working copy when a `DRY_RUN` is in effect. One minor issue (for which there is a `TODO`) is that a `DRY_RUN` will not report any `DIRECTORY_NOT_EMPTY` conflicts that may exist. As `TreeInode` is implemented today, it is a bit messy to report this type of conflict without modifying the working copy along the way. Finally, any `UNTRACKED_ADDED` conflict should cause an update to abort to match the behavior in stock Mercurial if the user has the following config setting: ``` [commands] update.check = noconflict ``` Though the original name for this setting was: ``` [experimental] updatecheck = noconflict ``` Although I am on Mercurial 4.4.1, the `update.check` setting does not seem to take effect when I run the integration tests, but the `updatecheck` setting does, so for now, I set both in `hg_extension_test_base.py` with a `TODO` to remove `updatecheck` once I can get `update.check` to do its job. Reviewed By: simpkins Differential Revision: D6366007 fbshipit-source-id: bb3ecb1270e77d59d7d9e7baa36ada61971bbc49
2017-11-30 08:38:12 +03:00
/** Option for use with checkOutRevision(). */
enum CheckoutMode {
/**
* Perform a "normal" checkout, analogous to `hg checkout` in Mercurial. Files
* in the working copy will be changed to reflect the destination snapshot,
* though files with conflicts will not be modified.
*/
NORMAL = 0,
/**
* Do not checkout: exercise the checkout logic to discover potential
* conflicts.
*/
DRY_RUN = 1,
/**
* Perform a "forced" checkout, analogous to `hg checkout --clean` in
* Mercurial. Conflicts between the working copy and destination snapshot will
* be forcibly ignored in favor of the state of the new snapshot.
*/
FORCE = 2,
}
enum ConflictType {
/**
* We failed to update this particular path due to an error
*/
ERROR = 0,
/**
* A locally modified file was deleted in the new Tree
*/
MODIFIED_REMOVED = 1,
/**
* An untracked local file exists in the new Tree
*/
UNTRACKED_ADDED = 2,
/**
* The file was removed locally, but modified in the new Tree
*/
REMOVED_MODIFIED = 3,
/**
* The file was removed locally, and also removed in the new Tree.
*/
MISSING_REMOVED = 4,
/**
* A locally modified file was modified in the new Tree
* This may be contents modifications, or a file type change (directory to
* file or vice-versa), or permissions changes.
*/
MODIFIED_MODIFIED = 5,
/**
* A directory was supposed to be removed or replaced with a file,
* but it contains untracked files preventing us from updating it.
*/
DIRECTORY_NOT_EMPTY = 6,
}
/**
* Details about conflicts or errors that occurred during a checkout operation
*/
struct CheckoutConflict {
1: PathString path
2: ConflictType type
3: string message
}
struct ScmBlobMetadata {
1: i64 size
2: BinaryHash contentsSha1
}
struct ScmTreeEntry {
1: binary name
2: i32 mode
3: BinaryHash id
}
struct TreeInodeEntryDebugInfo {
/**
* The entry name. This is just a PathComponent, not the full path
*/
1: binary name
/**
* The inode number, or 0 if no inode number has been assigned to
* this entry
*/
2: i64 inodeNumber
/**
* The entry mode_t value
*/
3: i32 mode
/**
* True if an InodeBase object exists for this inode or not.
*/
4: bool loaded
/**
* True if an the inode is materialized in the overlay
*/
5: bool materialized
/**
* If materialized is false, hash contains the ID of the underlying source
* control Blob or Tree.
*/
6: BinaryHash hash
/**
* Size of the file in bytes, won't be set for directories
*/
7: optional i64 fileSize
}
struct WorkingDirectoryParents {
1: BinaryHash parent1
2: optional BinaryHash parent2
}
struct TreeInodeDebugInfo {
1: i64 inodeNumber
2: binary path
3: bool materialized
4: BinaryHash treeHash
5: list<TreeInodeEntryDebugInfo> entries
6: i64 refcount
}
struct InodePathDebugInfo {
1: PathString path
2: bool loaded
3: bool linked
}
struct SetLogLevelResult {
1: bool categoryCreated
}
struct JournalInfo {
1: i64 entryCount
// The estimated memory used by the journal in bytes
2: i64 memoryUsage
// The duration of the journal in seconds
3: i64 durationSeconds
}
/**
* Struct to store Information about inodes in a mount point.
*/
struct MountInodeInfo {
2: i64 unloadedInodeCount
4: i64 loadedFileCount
5: i64 loadedTreeCount
}
struct CacheStats {
1: i64 entryCount
2: i64 totalSizeInBytes
3: i64 hitCount
4: i64 missCount
5: i64 evictionCount
6: i64 dropCount
}
/**
* Struct to store fb303 counters from ServiceData.getCounters() and inode
* information of all the mount points.
*/
struct InternalStats {
1: i64 periodicUnloadCount
/**
* counters is the list of fb303 counters, key is the counter name, value is the
* counter value.
*/
2: map<string, i64> counters
/**
* mountPointInfo is a map whose key is the path of the mount point and value
* is the details like number of loaded inodes,unloaded inodes in that mount
* and number of materialized inodes in that mountpoint.
*/
3: map<PathString, MountInodeInfo> mountPointInfo
/**
* Linux-only: the contents of /proc/self/smaps, to be parsed by the caller.
*/
4: binary smaps
/**
* Linux-only: privateBytes populated from contents of /proc/self/smaps.
* Populated with current value (the fb303 counters value is an average).
*/
5: i64 privateBytes
/**
* Linux-only: vmRSS bytes is populated from contents of /proc/self/stats.
* Populated with current value (the fb303 counters value is an average).
*/
6: i64 vmRSSBytes
/**
* Statistics about the in-memory blob cache.
*/
7: CacheStats blobCacheStats
/**
* mountPointJournalInfo is a map whose key is the path of the mount point
* and whose value is information about the journal on that mount
*/
8: map<PathString, JournalInfo> mountPointJournalInfo
}
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
struct ManifestEntry {
/* mode_t */
1: i32 mode
}
struct FuseCall {
1: i32 len
2: i32 opcode
3: i64 unique
4: i64 nodeid
5: i32 uid
6: i32 gid
7: pid_t pid
}
struct GetConfigParams {
// Whether to reload the config from disk to make sure it is up-to-date
1: eden_config.ConfigReloadBehavior reload =
eden_config.ConfigReloadBehavior.AutoReload
}
/**
* A representation of the system-dependent dirent::d_type field.
* The bits and their interpretation is system dependent.
* This value is u8 on all systems that implement it. We
* use i16 to pass this through thrift, which doesn't have unsigned
* numbers
*/
typedef i16 OsDtype
/**
* These numbers match up with Linux and macOS.
* Windows doesn't have dtype_t, but a subset of these map to and from
* the GetFileType and dwFileAttributes equivalents.
*
* Dtype and OsDtype can be cast between each other on all platforms.
*/
enum Dtype {
UNKNOWN = 0
FIFO = 1 // DT_FIFO
CHAR = 2 // DT_CHR
DIR = 4 // DT_DIR
BLOCK = 6 // DT_BLK
REGULAR = 8 // DT_REG
LINK = 10 // DT_LNK
SOCKET = 12 // DT_SOCK
WHITEOUT = 14 // DT_WHT
}
/** Params for globFiles(). */
struct GlobParams {
1: PathString mountPoint,
2: list<string> globs,
3: bool includeDotfiles,
// if true, prefetch matching blobs
add `eden prefetch` command Summary: This is a first pass at a prefetcher. The idea is simple, but the execution is impeded by some unfortunate slowness in different parts of mercurial. The idea is that you pass a list of glob patterns and we'll do something to make accessing files that match those patterns ideally faster than if you didn't give us the prefetch hint. In theory we could run `hg prefetch -I PATTERN` for this, but prefetch takes several minutes materializing and walking the whole manifest to find matches, checking outgoing revs and various other overheads. There is a revision flag that can be specified to try to reduce this effort, but it still takes more than a minute. This diff: * Removes a `Future::get()` call in the GlobNode code * Makes `globFiles` use Futures directly rather than `Future::get()` * Adds a `prefetchFiles` parameter to `globFiles` * Adds `eden prefetch` to the CLI and makes it call `globFiles` with `prefetchFiles=true` * Adds the abillity to glob over `Tree` as well as the existing `TreeInode`. This means that we can avoid allocating inodes for portions of the tree that have not yet been loaded. When `prefetchFiles` is set we'll ask ObjectStore to load the blob for matching files. I'm not currently doing this in the `TreeInode` case on the assumption that we already did this earlier when its `TreeInode::prefetch` method was called. The glob executor joins the blob prefetches at each GlobNode level. It may be possible to observe higher throughput if we join the complete set at the end. Reviewed By: chadaustin Differential Revision: D7825423 fbshipit-source-id: d2ae03d0f62f00090537198095661475056e968d
2018-05-25 23:47:46 +03:00
4: bool prefetchFiles,
// if true, don't populate matchingFiles in the Glob
// results. This only really makes sense with prefetchFiles.
5: bool suppressFileList,
6: bool wantDtype,
}
struct Glob {
/**
* This list cannot contain duplicate values and is not guaranteed to be
* sorted.
*/
1: list<PathString> matchingFiles,
2: list<OsDtype> dtypes,
}
struct AccessCounts {
1: i64 fuseTotal
2: i64 fuseReads
3: i64 fuseWrites
4: i64 fuseBackingStoreImports
5: i64 fuseDurationNs
}
struct MountAccesses {
1: map<pid_t, AccessCounts> accessCountsByPid
2: map<pid_t, i64> fetchCountsByPid
}
struct GetAccessCountsResult {
1: map<pid_t, binary> cmdsByPid
2: map<PathString, MountAccesses> accessesByMount
// TODO: Count the number of thrift requests
// 3: map<pid_t, AccessCount> thriftAccesses
}
enum TracePointEvent {
// Start of a new block
START = 0;
// End of a block
STOP = 1;
}
struct TracePoint {
// Holds nanoseconds since the epoch
1: i64 timestamp,
// Opaque identifier for the entire trace - used to associate this
// tracepoint with other tracepoints across an entire request
2: i64 traceId,
// Opaque identifier for this "block" where a block is some logical
// piece of work with a well-defined start and stop point
3: i64 blockId,
// Opaque identifer for the parent block from which the current
// block was constructed - used to create causal relationships
// between blocks
4: i64 parentBlockId,
// The name of the block, only set on the tracepoint starting the
// block, must point to a statically allocated cstring
5: string name = "",
// What event this trace point represents
6: TracePointEvent event,
}
struct FaultDefinition {
1: string keyClass
2: string keyValueRegex
// If count is non-zero this fault will automatically expire after it has
// been hit count times.
3: i64 count
// If block is true the fault will block until explicitly unblocked later.
// delayMilliseconds and errorMessage will be ignored if block is true
4: bool block
5: i64 delayMilliseconds
6: optional string errorType
7: optional string errorMessage
}
struct RemoveFaultArg {
1: string keyClass
2: string keyValueRegex
}
struct UnblockFaultArg {
1: optional string keyClass
2: optional string keyValueRegex
3: optional string errorType
4: optional string errorMessage
}
add a new getScmStatusV2() API to replace getScmStatus() Summary: Add a new thrift API for computing the difference between the working directory and a given source control commit. This has the following differences from the old getScmStatus() commit: - The parameters are accepted in a GetScmStatusParams structure now. This makes it easier for the server-side C++ implementation to tell which parameters have actually been specified by the caller. This will make it easier to extend this API in the future without having to replace it with a new function call again. - The return value is a GetScmStatusResult, which includes both the ScmStatus and the EdenFS version number. This will allow code like `hg status` to get both the status results and the EdenFS version in a single call, without needing to make multiple separate thrift calls. - This new call will return an error if the caller requests the status against a commit that disagrees with EdenFS's view of the current commit. Because the individual `hg` command line processes do not perform any synchronization of their own when reading the working directory parent, they can often call EdenFS with stale parent information, or while a checkout is currently in progress. This new behavior will reject the request with an error, rather than having EdenFS perform a potentially very expensive status computation when the results probably aren't actually useful to the caller anyway. Reviewed By: chadaustin Differential Revision: D15110218 fbshipit-source-id: ebc2f74dafc090d4fd245de8e4d62e2b086500dd
2019-10-29 19:54:26 +03:00
struct GetScmStatusResult {
1: ScmStatus status
// The version of the EdenFS daemon.
// This is returned since we usually want status calls to be able to check
// the current EdenFS version and warn the user if EdenFS is running an old
// or known-bad version.
2: string version
}
struct GetScmStatusParams {
/**
* The Eden checkout to query
*/
1: PathString mountPoint
/**
* The commit ID of the current working directory parent commit.
*
* An error will be returned if this is not actually the current parent
* commit. This behavior exists to support callers that do not perform their
* own external synchronization around access to the current parent commit,
* like Mercurial.
*/
2: BinaryHash commit
/**
* Whether ignored files should be reported in the results.
*
* Some special source-control related files (e.g., inside the .hg or .git
* directory) will never be reported even when listIgnored is true.
*/
3: bool listIgnored = false
}
service EdenService extends fb303_core.BaseService {
list<MountInfo> listMounts() throws (1: EdenError ex)
void mount(1: MountArgument info) throws (1: EdenError ex)
void unmount(1: PathString mountPoint) throws (1: EdenError ex)
/**
Change how the UNTRACKED_ADDED conflict and merges are handled. Summary: Previously, we used the Mercurial code `g` when faced with an `UNTRACKED_ADDED` file conflict, but that was allowing merges to silently succeed that should not have. This revision changes our logic to use the code `m` for merge, which unearthed that we were not honoring the user's `update.check` setting properly. Because we use `update.check=noconflict` internally at Facebook, we changed the Eden integration tests to default to verifying Hg running with this setting. To support it properly, we had to port this code from `update.py` in Mercurial to our own `_determine_actions_for_conflicts()` function: ``` if updatecheck == 'noconflict': for f, (m, args, msg) in actionbyfile.iteritems(): if m not in ('g', 'k', 'e', 'r', 'pr'): msg = _("conflicting changes") hint = _("commit or update --clean to discard changes") raise error.Abort(msg, hint=hint) ``` However, this introduced an interesting issue where the `checkOutRevision()` Thrift call from Hg would update the `SNAPSHOT` file on the server, but `.hg/dirstate` would not get updated with the new parents until the update completed on the client. With the new call to `raise error.Abort` on the client, we could get in a state where the `SNAPSHOT` file had the hash of the commit assuming the update succeeded, but `.hg/dirstate` reflected the reality where it failed. To that end, we changed `checkOutRevision()` to take a new parameter, `checkoutMode`, which can take on one of three values: `NORMAL`, `DRY_RUN`, and `FORCE`. Now if the user tries to do an ordinary `hg update` with `update.check=noconflict`, we first do a `DRY_RUN` and examine the potential conflicts. Only if the conflicts should not block the update do we proceed with a call to `checkOutRevision()` in `NORMAL` mode. To make this work, we had to make a number of changes to `CheckoutAction`, `CheckoutContext`, `EdenMount`, and `TreeInode` to keep track of the `checkoutMode` and ensure that no changes are made to the working copy when a `DRY_RUN` is in effect. One minor issue (for which there is a `TODO`) is that a `DRY_RUN` will not report any `DIRECTORY_NOT_EMPTY` conflicts that may exist. As `TreeInode` is implemented today, it is a bit messy to report this type of conflict without modifying the working copy along the way. Finally, any `UNTRACKED_ADDED` conflict should cause an update to abort to match the behavior in stock Mercurial if the user has the following config setting: ``` [commands] update.check = noconflict ``` Though the original name for this setting was: ``` [experimental] updatecheck = noconflict ``` Although I am on Mercurial 4.4.1, the `update.check` setting does not seem to take effect when I run the integration tests, but the `updatecheck` setting does, so for now, I set both in `hg_extension_test_base.py` with a `TODO` to remove `updatecheck` once I can get `update.check` to do its job. Reviewed By: simpkins Differential Revision: D6366007 fbshipit-source-id: bb3ecb1270e77d59d7d9e7baa36ada61971bbc49
2017-11-30 08:38:12 +03:00
* Potentially check out the specified snapshot, reporting conflicts (and
* possibly errors), as appropriate.
*
Change how the UNTRACKED_ADDED conflict and merges are handled. Summary: Previously, we used the Mercurial code `g` when faced with an `UNTRACKED_ADDED` file conflict, but that was allowing merges to silently succeed that should not have. This revision changes our logic to use the code `m` for merge, which unearthed that we were not honoring the user's `update.check` setting properly. Because we use `update.check=noconflict` internally at Facebook, we changed the Eden integration tests to default to verifying Hg running with this setting. To support it properly, we had to port this code from `update.py` in Mercurial to our own `_determine_actions_for_conflicts()` function: ``` if updatecheck == 'noconflict': for f, (m, args, msg) in actionbyfile.iteritems(): if m not in ('g', 'k', 'e', 'r', 'pr'): msg = _("conflicting changes") hint = _("commit or update --clean to discard changes") raise error.Abort(msg, hint=hint) ``` However, this introduced an interesting issue where the `checkOutRevision()` Thrift call from Hg would update the `SNAPSHOT` file on the server, but `.hg/dirstate` would not get updated with the new parents until the update completed on the client. With the new call to `raise error.Abort` on the client, we could get in a state where the `SNAPSHOT` file had the hash of the commit assuming the update succeeded, but `.hg/dirstate` reflected the reality where it failed. To that end, we changed `checkOutRevision()` to take a new parameter, `checkoutMode`, which can take on one of three values: `NORMAL`, `DRY_RUN`, and `FORCE`. Now if the user tries to do an ordinary `hg update` with `update.check=noconflict`, we first do a `DRY_RUN` and examine the potential conflicts. Only if the conflicts should not block the update do we proceed with a call to `checkOutRevision()` in `NORMAL` mode. To make this work, we had to make a number of changes to `CheckoutAction`, `CheckoutContext`, `EdenMount`, and `TreeInode` to keep track of the `checkoutMode` and ensure that no changes are made to the working copy when a `DRY_RUN` is in effect. One minor issue (for which there is a `TODO`) is that a `DRY_RUN` will not report any `DIRECTORY_NOT_EMPTY` conflicts that may exist. As `TreeInode` is implemented today, it is a bit messy to report this type of conflict without modifying the working copy along the way. Finally, any `UNTRACKED_ADDED` conflict should cause an update to abort to match the behavior in stock Mercurial if the user has the following config setting: ``` [commands] update.check = noconflict ``` Though the original name for this setting was: ``` [experimental] updatecheck = noconflict ``` Although I am on Mercurial 4.4.1, the `update.check` setting does not seem to take effect when I run the integration tests, but the `updatecheck` setting does, so for now, I set both in `hg_extension_test_base.py` with a `TODO` to remove `updatecheck` once I can get `update.check` to do its job. Reviewed By: simpkins Differential Revision: D6366007 fbshipit-source-id: bb3ecb1270e77d59d7d9e7baa36ada61971bbc49
2017-11-30 08:38:12 +03:00
* If the checkoutMode is FORCE, the working directory will be forcibly
* updated to the contents of the new snapshot, even if there were conflicts.
* Conflicts will still be reported in the return value, but the files will be
* updated to their new state.
*
Change how the UNTRACKED_ADDED conflict and merges are handled. Summary: Previously, we used the Mercurial code `g` when faced with an `UNTRACKED_ADDED` file conflict, but that was allowing merges to silently succeed that should not have. This revision changes our logic to use the code `m` for merge, which unearthed that we were not honoring the user's `update.check` setting properly. Because we use `update.check=noconflict` internally at Facebook, we changed the Eden integration tests to default to verifying Hg running with this setting. To support it properly, we had to port this code from `update.py` in Mercurial to our own `_determine_actions_for_conflicts()` function: ``` if updatecheck == 'noconflict': for f, (m, args, msg) in actionbyfile.iteritems(): if m not in ('g', 'k', 'e', 'r', 'pr'): msg = _("conflicting changes") hint = _("commit or update --clean to discard changes") raise error.Abort(msg, hint=hint) ``` However, this introduced an interesting issue where the `checkOutRevision()` Thrift call from Hg would update the `SNAPSHOT` file on the server, but `.hg/dirstate` would not get updated with the new parents until the update completed on the client. With the new call to `raise error.Abort` on the client, we could get in a state where the `SNAPSHOT` file had the hash of the commit assuming the update succeeded, but `.hg/dirstate` reflected the reality where it failed. To that end, we changed `checkOutRevision()` to take a new parameter, `checkoutMode`, which can take on one of three values: `NORMAL`, `DRY_RUN`, and `FORCE`. Now if the user tries to do an ordinary `hg update` with `update.check=noconflict`, we first do a `DRY_RUN` and examine the potential conflicts. Only if the conflicts should not block the update do we proceed with a call to `checkOutRevision()` in `NORMAL` mode. To make this work, we had to make a number of changes to `CheckoutAction`, `CheckoutContext`, `EdenMount`, and `TreeInode` to keep track of the `checkoutMode` and ensure that no changes are made to the working copy when a `DRY_RUN` is in effect. One minor issue (for which there is a `TODO`) is that a `DRY_RUN` will not report any `DIRECTORY_NOT_EMPTY` conflicts that may exist. As `TreeInode` is implemented today, it is a bit messy to report this type of conflict without modifying the working copy along the way. Finally, any `UNTRACKED_ADDED` conflict should cause an update to abort to match the behavior in stock Mercurial if the user has the following config setting: ``` [commands] update.check = noconflict ``` Though the original name for this setting was: ``` [experimental] updatecheck = noconflict ``` Although I am on Mercurial 4.4.1, the `update.check` setting does not seem to take effect when I run the integration tests, but the `updatecheck` setting does, so for now, I set both in `hg_extension_test_base.py` with a `TODO` to remove `updatecheck` once I can get `update.check` to do its job. Reviewed By: simpkins Differential Revision: D6366007 fbshipit-source-id: bb3ecb1270e77d59d7d9e7baa36ada61971bbc49
2017-11-30 08:38:12 +03:00
* If the checkoutMode is NORMAL, files with conflicts will be left
* unmodified. Files that are untracked in both the source and destination
* snapshots are always left unchanged, even if force is true.
*
Change how the UNTRACKED_ADDED conflict and merges are handled. Summary: Previously, we used the Mercurial code `g` when faced with an `UNTRACKED_ADDED` file conflict, but that was allowing merges to silently succeed that should not have. This revision changes our logic to use the code `m` for merge, which unearthed that we were not honoring the user's `update.check` setting properly. Because we use `update.check=noconflict` internally at Facebook, we changed the Eden integration tests to default to verifying Hg running with this setting. To support it properly, we had to port this code from `update.py` in Mercurial to our own `_determine_actions_for_conflicts()` function: ``` if updatecheck == 'noconflict': for f, (m, args, msg) in actionbyfile.iteritems(): if m not in ('g', 'k', 'e', 'r', 'pr'): msg = _("conflicting changes") hint = _("commit or update --clean to discard changes") raise error.Abort(msg, hint=hint) ``` However, this introduced an interesting issue where the `checkOutRevision()` Thrift call from Hg would update the `SNAPSHOT` file on the server, but `.hg/dirstate` would not get updated with the new parents until the update completed on the client. With the new call to `raise error.Abort` on the client, we could get in a state where the `SNAPSHOT` file had the hash of the commit assuming the update succeeded, but `.hg/dirstate` reflected the reality where it failed. To that end, we changed `checkOutRevision()` to take a new parameter, `checkoutMode`, which can take on one of three values: `NORMAL`, `DRY_RUN`, and `FORCE`. Now if the user tries to do an ordinary `hg update` with `update.check=noconflict`, we first do a `DRY_RUN` and examine the potential conflicts. Only if the conflicts should not block the update do we proceed with a call to `checkOutRevision()` in `NORMAL` mode. To make this work, we had to make a number of changes to `CheckoutAction`, `CheckoutContext`, `EdenMount`, and `TreeInode` to keep track of the `checkoutMode` and ensure that no changes are made to the working copy when a `DRY_RUN` is in effect. One minor issue (for which there is a `TODO`) is that a `DRY_RUN` will not report any `DIRECTORY_NOT_EMPTY` conflicts that may exist. As `TreeInode` is implemented today, it is a bit messy to report this type of conflict without modifying the working copy along the way. Finally, any `UNTRACKED_ADDED` conflict should cause an update to abort to match the behavior in stock Mercurial if the user has the following config setting: ``` [commands] update.check = noconflict ``` Though the original name for this setting was: ``` [experimental] updatecheck = noconflict ``` Although I am on Mercurial 4.4.1, the `update.check` setting does not seem to take effect when I run the integration tests, but the `updatecheck` setting does, so for now, I set both in `hg_extension_test_base.py` with a `TODO` to remove `updatecheck` once I can get `update.check` to do its job. Reviewed By: simpkins Differential Revision: D6366007 fbshipit-source-id: bb3ecb1270e77d59d7d9e7baa36ada61971bbc49
2017-11-30 08:38:12 +03:00
* If the checkoutMode is DRY_RUN, then no files are modified in the working
* copy and the current snapshot does not change. However, potential conflicts
* are still reported in the return value.
*
Change how the UNTRACKED_ADDED conflict and merges are handled. Summary: Previously, we used the Mercurial code `g` when faced with an `UNTRACKED_ADDED` file conflict, but that was allowing merges to silently succeed that should not have. This revision changes our logic to use the code `m` for merge, which unearthed that we were not honoring the user's `update.check` setting properly. Because we use `update.check=noconflict` internally at Facebook, we changed the Eden integration tests to default to verifying Hg running with this setting. To support it properly, we had to port this code from `update.py` in Mercurial to our own `_determine_actions_for_conflicts()` function: ``` if updatecheck == 'noconflict': for f, (m, args, msg) in actionbyfile.iteritems(): if m not in ('g', 'k', 'e', 'r', 'pr'): msg = _("conflicting changes") hint = _("commit or update --clean to discard changes") raise error.Abort(msg, hint=hint) ``` However, this introduced an interesting issue where the `checkOutRevision()` Thrift call from Hg would update the `SNAPSHOT` file on the server, but `.hg/dirstate` would not get updated with the new parents until the update completed on the client. With the new call to `raise error.Abort` on the client, we could get in a state where the `SNAPSHOT` file had the hash of the commit assuming the update succeeded, but `.hg/dirstate` reflected the reality where it failed. To that end, we changed `checkOutRevision()` to take a new parameter, `checkoutMode`, which can take on one of three values: `NORMAL`, `DRY_RUN`, and `FORCE`. Now if the user tries to do an ordinary `hg update` with `update.check=noconflict`, we first do a `DRY_RUN` and examine the potential conflicts. Only if the conflicts should not block the update do we proceed with a call to `checkOutRevision()` in `NORMAL` mode. To make this work, we had to make a number of changes to `CheckoutAction`, `CheckoutContext`, `EdenMount`, and `TreeInode` to keep track of the `checkoutMode` and ensure that no changes are made to the working copy when a `DRY_RUN` is in effect. One minor issue (for which there is a `TODO`) is that a `DRY_RUN` will not report any `DIRECTORY_NOT_EMPTY` conflicts that may exist. As `TreeInode` is implemented today, it is a bit messy to report this type of conflict without modifying the working copy along the way. Finally, any `UNTRACKED_ADDED` conflict should cause an update to abort to match the behavior in stock Mercurial if the user has the following config setting: ``` [commands] update.check = noconflict ``` Though the original name for this setting was: ``` [experimental] updatecheck = noconflict ``` Although I am on Mercurial 4.4.1, the `update.check` setting does not seem to take effect when I run the integration tests, but the `updatecheck` setting does, so for now, I set both in `hg_extension_test_base.py` with a `TODO` to remove `updatecheck` once I can get `update.check` to do its job. Reviewed By: simpkins Differential Revision: D6366007 fbshipit-source-id: bb3ecb1270e77d59d7d9e7baa36ada61971bbc49
2017-11-30 08:38:12 +03:00
* On successful return from this function (unless it is a DRY_RUN), the mount
* point will point to the new snapshot, even if some paths had conflicts or
* errors. The caller is responsible for taking appropriate action to update
* these paths as desired after checkOutRevision() returns.
*/
list<CheckoutConflict> checkOutRevision(
1: PathString mountPoint,
2: BinaryHash snapshotHash,
Change how the UNTRACKED_ADDED conflict and merges are handled. Summary: Previously, we used the Mercurial code `g` when faced with an `UNTRACKED_ADDED` file conflict, but that was allowing merges to silently succeed that should not have. This revision changes our logic to use the code `m` for merge, which unearthed that we were not honoring the user's `update.check` setting properly. Because we use `update.check=noconflict` internally at Facebook, we changed the Eden integration tests to default to verifying Hg running with this setting. To support it properly, we had to port this code from `update.py` in Mercurial to our own `_determine_actions_for_conflicts()` function: ``` if updatecheck == 'noconflict': for f, (m, args, msg) in actionbyfile.iteritems(): if m not in ('g', 'k', 'e', 'r', 'pr'): msg = _("conflicting changes") hint = _("commit or update --clean to discard changes") raise error.Abort(msg, hint=hint) ``` However, this introduced an interesting issue where the `checkOutRevision()` Thrift call from Hg would update the `SNAPSHOT` file on the server, but `.hg/dirstate` would not get updated with the new parents until the update completed on the client. With the new call to `raise error.Abort` on the client, we could get in a state where the `SNAPSHOT` file had the hash of the commit assuming the update succeeded, but `.hg/dirstate` reflected the reality where it failed. To that end, we changed `checkOutRevision()` to take a new parameter, `checkoutMode`, which can take on one of three values: `NORMAL`, `DRY_RUN`, and `FORCE`. Now if the user tries to do an ordinary `hg update` with `update.check=noconflict`, we first do a `DRY_RUN` and examine the potential conflicts. Only if the conflicts should not block the update do we proceed with a call to `checkOutRevision()` in `NORMAL` mode. To make this work, we had to make a number of changes to `CheckoutAction`, `CheckoutContext`, `EdenMount`, and `TreeInode` to keep track of the `checkoutMode` and ensure that no changes are made to the working copy when a `DRY_RUN` is in effect. One minor issue (for which there is a `TODO`) is that a `DRY_RUN` will not report any `DIRECTORY_NOT_EMPTY` conflicts that may exist. As `TreeInode` is implemented today, it is a bit messy to report this type of conflict without modifying the working copy along the way. Finally, any `UNTRACKED_ADDED` conflict should cause an update to abort to match the behavior in stock Mercurial if the user has the following config setting: ``` [commands] update.check = noconflict ``` Though the original name for this setting was: ``` [experimental] updatecheck = noconflict ``` Although I am on Mercurial 4.4.1, the `update.check` setting does not seem to take effect when I run the integration tests, but the `updatecheck` setting does, so for now, I set both in `hg_extension_test_base.py` with a `TODO` to remove `updatecheck` once I can get `update.check` to do its job. Reviewed By: simpkins Differential Revision: D6366007 fbshipit-source-id: bb3ecb1270e77d59d7d9e7baa36ada61971bbc49
2017-11-30 08:38:12 +03:00
3: CheckoutMode checkoutMode)
throws (1: EdenError ex)
/**
* Reset the working directory's parent commits, without changing the working
* directory contents.
*
* This operation is equivalent to `git reset --soft` or `hg reset --keep`
*/
void resetParentCommits(
1: PathString mountPoint,
2: WorkingDirectoryParents parents)
throws (1: EdenError ex)
/**
* For each path, returns an EdenError instead of the SHA-1 if any of the
* following occur:
* - path is the empty string.
* - path identifies a non-existent file.
* - path identifies something that is not an ordinary file (e.g., symlink
* or directory).
*/
list<SHA1Result> getSHA1(1: PathString mountPoint, 2: list<PathString> paths)
throws (1: EdenError ex)
/**
* Returns a list of paths relative to the mountPoint. DEPRECATED!
*/
list<PathString> getBindMounts(1: PathString mountPoint)
throws (1: EdenError ex)
/**
* On systems that support bind mounts, establish a bind mount within the
* repo such that `mountPoint / repoPath` is redirected to `targetPath`.
* If `repoPath` is already a bind mount managed by eden, this function
* will throw an error.
* If `repoPath` is not a directory then it will be created similar to
* running `mkdir -p mountPoint/repoPath` and then the bind mount
* will be established.
* If `repoPath` exists and is not a directory, an error will be thrown.
* If the bind mount cannot be set up, an error will be thrown.
*/
void addBindMount(1: PathString mountPoint,
2: PathString repoPath,
3: PathString targetPath) throws (1: EdenError ex)
/**
* Removes the bind mount specified by `repoPath` from the set of managed
* bind mounts.
* If `repoPath` is not a bind mount managed by eden, this function
* will throw an error.
* If the bind mount cannot be removed, an error will be thrown.
*/
void removeBindMount(1: PathString mountPoint, 2: PathString repoPath)
throws (1: EdenError ex)
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/** Returns the sequence position at the time the method is called.
* Returns the instantaneous value of the journal sequence number.
*/
JournalPosition getCurrentJournalPosition(1: PathString mountPoint)
throws (1: EdenError ex)
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/** Returns the set of files (and dirs) that changed since a prior point.
* If fromPosition.mountGeneration is mismatched with the current
* mountGeneration, throws an EdenError with errorCode = ERANGE.
* If the domain required by fromPosition goes past the Journal's memory,
* throws an EdenError with errorCode = EDOM.
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
* This indicates that eden cannot compute the delta for the requested
* range. The client will need to recompute a new baseline using
* other available functions in EdenService.
*/
FileDelta getFilesChangedSince(
1: PathString mountPoint,
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
2: JournalPosition fromPosition)
throws (1: EdenError ex)
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/** Sets the memory limit on the journal such that the journal will forget
* old data to keep itself under a certain estimated memory use.
*/
void setJournalMemoryLimit(
1: PathString mountPoint,
2: i64 limit)
throws (1: EdenError ex)
/** Gets the memory limit on the journal
*/
i64 getJournalMemoryLimit(
1: PathString mountPoint,
) throws (1: EdenError ex)
/** Forces the journal to flush, sending a truncated result to subscribers
*/
void flushJournal(
1: PathString mountPoint,
) throws (1: EdenError ex)
/**
* Returns the journal entries for the specified params. Useful for auditing
* the changes that Eden has sent to Watchman. Note that the most recent
* journal entries will be at the front of the list in
* DebugGetRawJournalResponse.
*/
DebugGetRawJournalResponse debugGetRawJournal(
1: DebugGetRawJournalParams params,
) throws (1: EdenError ex)
/**
* Returns the subset of information about a list of paths that can
* be determined from each's parent directory tree. For now, that
* includes whether the entry exists and its dtype.
*/
list<EntryInformationOrError> getEntryInformation(
1: PathString mountPoint,
2: list<PathString> paths)
throws (1: EdenError ex)
/**
* Returns a subset of the stat() information for a list of paths.
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
* The returned list of information corresponds to the input list of
* paths; eg; result[0] holds the information for paths[0].
* We only support returning the instantaneous information about
* these paths, as we cannot answer with historical information about
* files in the overlay.
*/
list<FileInformationOrError> getFileInformation(
1: PathString mountPoint,
2: list<PathString> paths)
throws (1: EdenError ex)
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
/**
* DEPRECATED: Prefer globFiles().
* Returns a list of files that match the input globs.
additional query API for our thrift interface Summary: This diff adds a couple more things to our thrift interface: 1. Introduces JournalPosition 2. Adds methods to query the current JournalPosition and obtain a delta since a given JournalPosition 3. Augments getMaterializedFiles to also return the current JournalPosition 4. Adds a method to evaluate a `glob` against Eden 5. Adds a method using thrift streaming to subscribe to realtime changes Could probably finesse the naming a little bit. The JournalPosition allows reasoning about changes to files that are not part of an Eden snapshot. Internally the journal position is just the SequenceNumber from the journal datastructures, but when we expose it to clients we need to be able to distinguish between a sequence number from the current instance of the eden service and a prior incarnation (eg: if the process has been restarted, and we have no way to recreate the journal we need to be able to indicate this to the client if they ask about changes in that range). For the convenience of the client we also include the `toHash` (the most recent hash from the journal entry) which is likely useful for the `hg` dirstate operations; it is useful to know that the snapshot may have changed since the last query about the dirstate. The `getFileInformation` method returns the instantaneously available `stat()` like information about the requested list of files. Since we simply don't have historical data on how files in the overlay looked (only how they look now), this method does not allow passing in a JournalPosition. When it comes to comparing historical data, we will need to add an API that accepts two snapshot hashes and generates the results from there. This particular method is geared up to understanding the current state of the world; the obvious use case is plugging in the file list from `getFilesChangedSince` into this function to figure out what's what. * Do we want a function that combines `getFilesChangedSince` + `getFileInformation` into a single RPC? Why is there a glob method? It's to support a use-case in the watchman/buck integration. I'm just sketching it out in the thrift interface at this stage. In the future we also need to be able to express how to carry out a tree walk, but that will require some query predicates that I don't want to get hung up on specifying immediately. Why is the streaming stuff in its own thrift file? We can't generate code for it in java or perhaps also python. It's only needed to plumb data into watchman so it's broken out into its own definition. Nothing depends on that file yet, so it's probably not specified quite right. The important thing is how the subscribe method looks: it's essentially the same as the method to query a delta, but it keeps emitting deltas as they are produced. This is another API that will benefit from query predicates when we get around to specifying them. I've added `JournalDelta::fromHash` and `JournalDelta::toHash` to hold the appropriate snapshot ids in the journal entry; this will allow us to indicate when we've checked out a new snapshot, or created a new snapshot. We have no way to populate these yet; I commented on D3762646 about storing the `snapshotID` that we have during `EdenServiceHandler::mountImpl` into either the `EdenMount` or the proposed `RootInode` class. Once we have that we can simply sample it and store it as we generate `JournalDelta`s. Reviewed By: simpkins Differential Revision: D3860804 fbshipit-source-id: 896c24c354e6f58328fb45c24b16915d9e937108
2016-09-19 22:48:12 +03:00
* There are no duplicate values in the result.
*/
list<PathString> glob(
1: PathString mountPoint,
implement glob thrift method Summary: This is to facilitate the watchman integration and draws on the watchman glob implementation; the approach is to split the glob strings into path components and evaluate the components step by step as the tree is walked. Components that do not include any glob special characters can be handled as a direct lookup from the directory contents (O(1) rather than O(num-entries)). The glob method returns a set of filenames that match a list of of glob patterns. Recursive globs are supported. It is worth noting that a glob like "**/*" will return a list of every entry in the filesystem. This is potentially expensive and should be avoided. simpkins is in favor of disallowing this as a forcing function to encourage tool-makers to adopt patterns that don't rely on a complete listing of the filesystem. For now I'd like to get this in without such a restriction; it's also worth noting that running `find .` in the root of the mount point has a similar effect and we can't prevent that from happening, so the effect of the overly broad glob is something that we need to be able to withstand in any case. Unrestricted recursive globs will make it easier to connect certain watchman queries in the interim, until we have a more expressive thrift API for walking and filtering the list of files. Note: I've removed the wildmatch flags that I'd put in the API when I stubbed it out originally. Since this is built on top of our GlobMatcher code and that doesn't have those flags, I thought it would be simplest to just remove them. If we find that we need them, we can figure out how to add them later. Also Note: the evaluation of the glob is parallel-ready but currently limited to 1 at a time by constraining the folly::window call to 1. We could make this larger but would need a more intelligent constraint. For example, a recursive glob could initiate N concurrent futures per level where N is the number of sub-dirs at a given level. Using a custom Executor for these futures may be a better option to set an upper bound on the number of concurrent jobs allowed for a given glob call. Depends on D4361197 Reviewed By: simpkins Differential Revision: D4371934 fbshipit-source-id: 444735600bc16d2c2185f2277ddc5b51f672600a
2017-01-26 23:45:50 +03:00
2: list<string> globs)
throws (1: EdenError ex)
/**
* Returns a list of files that match the GlobParams, notably,
* the list of glob patterns.
* There are no duplicate values in the result.
*/
Glob globFiles(
1: GlobParams params,
) throws (1: EdenError ex)
/**
* Chowns all files in the requested mount to the requested uid and gid
*/
void chown(1: PathString mountPoint, 2: i32 uid, 3: i32 gid)
add a new getScmStatusV2() API to replace getScmStatus() Summary: Add a new thrift API for computing the difference between the working directory and a given source control commit. This has the following differences from the old getScmStatus() commit: - The parameters are accepted in a GetScmStatusParams structure now. This makes it easier for the server-side C++ implementation to tell which parameters have actually been specified by the caller. This will make it easier to extend this API in the future without having to replace it with a new function call again. - The return value is a GetScmStatusResult, which includes both the ScmStatus and the EdenFS version number. This will allow code like `hg status` to get both the status results and the EdenFS version in a single call, without needing to make multiple separate thrift calls. - This new call will return an error if the caller requests the status against a commit that disagrees with EdenFS's view of the current commit. Because the individual `hg` command line processes do not perform any synchronization of their own when reading the working directory parent, they can often call EdenFS with stale parent information, or while a checkout is currently in progress. This new behavior will reject the request with an error, rather than having EdenFS perform a potentially very expensive status computation when the results probably aren't actually useful to the caller anyway. Reviewed By: chadaustin Differential Revision: D15110218 fbshipit-source-id: ebc2f74dafc090d4fd245de8e4d62e2b086500dd
2019-10-29 19:54:26 +03:00
/**
* Return the list of files that are different from the specified source
* control commit.
*/
GetScmStatusResult getScmStatusV2(
1: GetScmStatusParams params
) throws (1: EdenError ex)
/**
* Get the status of the working directory against the specified commit.
*
add a new getScmStatusV2() API to replace getScmStatus() Summary: Add a new thrift API for computing the difference between the working directory and a given source control commit. This has the following differences from the old getScmStatus() commit: - The parameters are accepted in a GetScmStatusParams structure now. This makes it easier for the server-side C++ implementation to tell which parameters have actually been specified by the caller. This will make it easier to extend this API in the future without having to replace it with a new function call again. - The return value is a GetScmStatusResult, which includes both the ScmStatus and the EdenFS version number. This will allow code like `hg status` to get both the status results and the EdenFS version in a single call, without needing to make multiple separate thrift calls. - This new call will return an error if the caller requests the status against a commit that disagrees with EdenFS's view of the current commit. Because the individual `hg` command line processes do not perform any synchronization of their own when reading the working directory parent, they can often call EdenFS with stale parent information, or while a checkout is currently in progress. This new behavior will reject the request with an error, rather than having EdenFS perform a potentially very expensive status computation when the results probably aren't actually useful to the caller anyway. Reviewed By: chadaustin Differential Revision: D15110218 fbshipit-source-id: ebc2f74dafc090d4fd245de8e4d62e2b086500dd
2019-10-29 19:54:26 +03:00
* DEPRECATED: Prefer using getScmStatusV2() in new code. Callers may still
* need to fall back to getScmStatus() if talking to an older edenfs daemon
* that does not support getScmStatusV2() yet.
*/
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
ScmStatus getScmStatus(
1: PathString mountPoint,
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
2: bool listIgnored,
3: BinaryHash commit,
) throws (1: EdenError ex)
/**
* DEPRECATED
*
* Computes the status between two specified revisions.
* This does not care about the state of the working copy.
*/
ScmStatus getScmStatusBetweenRevisions(
1: PathString mountPoint,
2: BinaryHash oldHash,
3: BinaryHash newHash,
) throws (1: EdenError ex)
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
//////// SCM Commit-Related APIs ////////
/**
* DEPRECATED: Remove when Mercurial has migrated to not calling
* getManifestEntry, probably by July 2020.
*
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
* If the relative path exists in the manifest (i.e., the current commit),
* then return the corresponding ManifestEntry; otherwise, throw
* NoValueForKeyError.
*
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
* Note that we are still experimenting with the type of SCM information Eden
* should be responsible for reporting, so this method is subject to change,
* or may go away entirely. At a minimum, it should take a commit as a
* parameter rather than assuming the current commit.
*/
Store Hg dirstate data in Hg instead of Eden. Summary: This is a major change to how we manage the dirstate in Eden's Hg extension. Previously, the dirstate information was stored under `$EDEN_CONFIG_DIR`, which is Eden's private storage. Any time the Mercurial extension wanted to read or write the dirstate, it had to make a Thrift request to Eden to do so on its behalf. The upside is that Eden could answer dirstate-related questions independently of the Python code. This was sufficiently different than how Mercurial's default dirstate worked that our subclass, `eden_dirstate`, had to override quite a bit of behavior. Failing to manage the `.hg/dirstate` file in a way similar to the way Mercurial does has exposed some "unofficial contracts" that Mercurial has. For example, tools like Nuclide rely on changes to the `.hg/dirstate` file as a heuristic to determine when to invalidate its internal caches for Mercurial data. Today, Mercurial has a well-factored `dirstatemap` abstraction that is primarily responsible for the transactions with the dirstate's data. With this split, we can focus on putting most of our customizations in our `eden_dirstate_map` subclass while our `eden_dirstate` class has to override fewer methods. Because the data is managed through the `.hg/dirstate` file, transaction logic in Mercurial that relies on renaming/copying that file will work out-of-the-box. This change also reduces the number of Thrift calls the Mercurial extension has to make for operations like `hg status` or `hg add`. In this revision, we introduce our own binary format for the `.hg/dirstate` file. The logic to read and write this file is in `eden/py/dirstate.py`. After the first 40 bytes, which are used for the parent hashes, the next four bytes are reserved for a version number for the file format so we can manage file format changes going forward. Admittedly one downside of this change is that it is a breaking change. Ideally, users should commit all of their local changes in their existing mounts, shutdown Eden, delete the old mounts, restart Eden, and re-clone. In the end, this change deletes a number of Mercurial-specific code and Thrift APIs from Eden. This is a better separation of concerns that makes Eden more SCM-agnostic. For example, this change removes `Dirstate.cpp` and `DirstatePersistance.cpp`, replacing them with the much simpler and more general `Differ.cpp`. The Mercurial-specific logic from `Dirstate.cpp` that turned a diff into an `hg status` now lives in the Mercurial extension in `EdenThriftClient.getStatus()`, which is much more appropriate. Note that this reverts the changes that were recently introduced in D6116105: we now need to intercept `localrepo.localrepository.dirstate` once again. Reviewed By: simpkins Differential Revision: D6179950 fbshipit-source-id: 5b78904909b669c9cc606e2fe1fd118ef6eaab95
2017-11-07 06:44:24 +03:00
ManifestEntry getManifestEntry(
1: PathString mountPoint
2: PathString relativePath
) throws (
1: EdenError ex
2: NoValueForKeyError noValueForKeyError
)
//////// Administrative APIs ////////
/**
* Returns information about the running process, including pid and command
* line.
*/
DaemonInfo getDaemonInfo() throws (1: EdenError ex)
/**
* DEPRECATED
*
* Returns the pid of the running edenfs daemon. New code should call
* getDaemonInfo instead. This method exists for Thrift clients that
* predate getDaemonInfo, such as older versions of the CLI.
*/
i64 getPid() throws (1: EdenError ex)
/**
* Ask the server to shutdown and provide it some context for its logs
*/
void initiateShutdown(1: string reason) throws (1: EdenError ex)
/**
* Get the current configuration settings
*/
eden_config.EdenConfigData getConfig(1: GetConfigParams params)
throws (1: EdenError ex)
/**
* Ask eden to reload its configuration data from disk.
*/
void reloadConfig() throws (1: EdenError ex)
//////// Debugging APIs ////////
/**
* Get the contents of a source control Tree.
*
* This can be used to confirm if eden's LocalStore contains information
* for the tree, and that the information is correct.
*
* If localStoreOnly is true, the data is loaded directly from the
* LocalStore, and an error will be raised if it is not already present in
* the LocalStore. If localStoreOnly is false, the data may be retrieved
* from the BackingStore if it is not already present in the LocalStore.
*/
list<ScmTreeEntry> debugGetScmTree(
1: PathString mountPoint,
2: BinaryHash id,
3: bool localStoreOnly,
) throws (1: EdenError ex)
/**
* Get the contents of a source control Blob.
*
* This can be used to confirm if eden's LocalStore contains information
* for the blob, and that the information is correct.
*/
binary debugGetScmBlob(
1: PathString mountPoint,
2: BinaryHash id,
3: bool localStoreOnly,
) throws (1: EdenError ex)
/**
* Get the metadata about a source control Blob.
*
* This retrieves the metadata about a source control Blob. This returns
* the size and contents SHA1 of the blob, which eden stores separately from
* the blob itself. This can also be a useful alternative to
* debugGetScmBlob() when getting data about extremely large blobs.
*/
ScmBlobMetadata debugGetScmBlobMetadata(
1: PathString mountPoint,
2: BinaryHash id,
3: bool localStoreOnly,
) throws (1: EdenError ex)
/**
* Get status about currently loaded inode objects.
*
* This returns details about all currently loaded inode objects under the
* given path.
*
* If the path argument is the empty string data will be returned about all
* inodes in the entire mount point. Otherwise the path argument should
* refer to a subdirectory, and data will be returned for all inodes under
* the specified subdirectory.
*
* The rename lock is not held while gathering this information, so the path
* name information returned may not always be internally consistent. If
* renames were taking place while gathering the data, some inodes may show
* up under multiple parents. It's also possible that we may miss some
* inodes during the tree walk if they were renamed from a directory that was
* not yet walked into a directory that has already been walked.
*
* This API cannot return data about inodes that have been unlinked but still
* have outstanding references.
*/
list<TreeInodeDebugInfo> debugInodeStatus(
1: PathString mountPoint,
2: PathString path,
) throws (1: EdenError ex)
/**
* Get the list of outstanding fuse requests
*
* This will return the list of FuseCall structure containing the data from
* fuse_in_header.
*/
list<FuseCall> debugOutstandingFuseCalls(
1: PathString mountPoint,
)
/**
* Get the InodePathDebugInfo for the inode that corresponds to the given
* inode number. This provides the path for the inode and also indicates
* whether the inode is currently loaded or not. Requires that the Eden
* mountPoint be specified.
*/
InodePathDebugInfo debugGetInodePath(
1: PathString mountPoint,
2: i64 inodeNumber,
) throws (1: EdenError ex)
/**
* Clear pidFetchCounts_ in ObjectStore to start a new recording of process
* fetch counts.
*/
void clearFetchCounts() throws (1: EdenError ex)
void clearFetchCountsByMount(1: PathString mountPath) throws (1: EdenError ex)
/**
* Queries all of the live Eden mounts for the processes that accessed FUSE
* over the last `duration` seconds.
*
* Note that eden only maintains a few seconds worth of accesses.
*/
GetAccessCountsResult getAccessCounts(1: i64 duration)
throws (1: EdenError ex)
/**
* Column by column, clears and compacts the LocalStore. All columns are
* compacted, but only columns that contain ephemeral data are cleared.
*
* Even though the behavior of this method is identical to
* debugClearLocalStoreCaches followed by debugCompactLocalStorage(), it is
* separate so it can clear and compact each column in order to minimize the
* risk of running out of disk space. Since RocksDB is a write-ahead logging
* database, clearing a column increases its disk usage until it's compacted.
*/
void clearAndCompactLocalStore() throws (1: EdenError ex)
/**
* Clears all data from the LocalStore that can be populated from the upstream
* backing store.
*/
void debugClearLocalStoreCaches() throws (1: EdenError ex)
/**
* Asks RocksDB to perform a compaction.
*/
void debugCompactLocalStorage() throws (1: EdenError ex)
/**
* Unloads unused Inodes from a directory inside a mountPoint whose last
* access time is older than the specified age.
*
* The age parameter is a relative time to be subtracted from the current
* (wall clock) time.
*/
i64 unloadInodeForPath(
1: PathString mountPoint,
2: PathString path,
3: TimeSpec age,
) throws (1: EdenError ex)
/**
* Flush all thread-local stats to the main ServiceData object.
*
* Thread-local counters are normally flushed to the main ServiceData once
* a second. flushStatsNow() can be used to flush thread-local counters on
* demand, in addition to the normal once-a-second flush.
*
* This is mainly useful for unit and integration tests that want to ensure
* they see up-to-date counter information without waiting for the normal
* flush interval.
*/
void flushStatsNow() throws (1: EdenError ex)
/**
* Invalidate kernel cache for inode.
*/
void invalidateKernelInodeCache(
1: PathString mountPoint,
2: PathString path
)
throws (1: EdenError ex)
/**
* Gets the number of inodes unloaded by periodic job on an EdenMount.
*/
InternalStats getStatInfo() throws (1: EdenError ex)
void enableTracing()
void disableTracing()
list<TracePoint> getTracePoints()
/**
* Configure a new fault in Eden's fault injection framework.
*
* This throws an exception if the fault injection framework was not enabled
* when edenfs was started.
*/
void injectFault(1: FaultDefinition fault) throws (1: EdenError ex)
/**
* Remove a fault previously defined with injectFault()
*
* Returns true if a matching fault was found and remove, and false
* if no matching fault was found.
*/
bool removeFault(1: RemoveFaultArg fault) throws (1: EdenError ex)
/**
* Unblock fault injection checks pending on a block fault.
*
* Returns the number of pending calls that were unblocked
*/
i64 unblockFault(1: UnblockFaultArg info) throws (1: EdenError ex)
}