Summary:
The takeover tests are failing in a couple ways.
First, there are failures:
multiprocessing seems to behave differently on mac than linux.
The process calls cause locking issues when "pickling". multiprocessing seems
kind un reliable, and we don't really need it in either of the used places.
Second. there are timeouts:
accessing an fd that was open before takeover seems to hang sometimes.
I can not manually repro on my M1, but don't have time to dig in right now, so
I will just leave a comment with some info on the issue and leave these disabled
for now.
Reviewed By: mshroyer
Differential Revision: D44000288
fbshipit-source-id: 76ef085967a495ffd3ab0a8aae337960368d75e0
Summary:
The most common crash on macOS is when a graceful restart is performed while a
mount is ongoing. In this case, an EDEN_BUG is raised in EdenFS leading to
EdenFS to crash and restart.
As a very easy solution, let's check prior to starting graceful restart if
there are any ongoing mounts, and if so, just abort the graceful restart. Note
that there is still a potential race if a mount occurs right after this check,
but that should be significantly more rare and unlikely to happen in practice.
Reviewed By: kmancini
Differential Revision: D41640271
fbshipit-source-id: c0fd0cc0305cbcdd941aa0e2e422706af799d5b3
Summary:
Applies new import merging and sorting from µsort v1.0.
When merging imports, µsort will make a best-effort to move associated
comments to match merged elements, but there are known limitations due to
the diynamic nature of Python and developer tooling. These changes should
not produce any dangerous runtime changes, but may require touch-ups to
satisfy linters and other tooling.
Note that µsort uses case-insensitive, lexicographical sorting, which
results in a different ordering compared to isort. This provides a more
consistent sorting order, matching the case-insensitive order used when
sorting import statements by module name, and ensures that "frog", "FROG",
and "Frog" always sort next to each other.
For details on µsort's sorting and merging semantics, see the user guide:
https://usort.readthedocs.io/en/stable/guide.html#sorting
Reviewed By: lisroach
Differential Revision: D36402162
fbshipit-source-id: 6d180e9003d466c4f866fc9d454c6531766ca1dd
Summary:
This makes it consistent with the naming in the CLI, and will allow introducing
a get_thrift_client for streaming APIs.
Reviewed By: genevievehelsel
Differential Revision: D35767106
fbshipit-source-id: deb45090875b0366111e16e27abba22858b8f370
Summary:
The restart tests have started failing recently due to various timeouts. The
root cause is well understood to be caused by an increase in the dependency
graph of EdenFS. Let's increase the timeout while we figure out a way to
decrease the dependencies.
Reviewed By: genevievehelsel
Differential Revision: D35829901
fbshipit-source-id: d7e03ea68308b9eb36e3584d3c073b41378a78de
Summary:
The current serialization type we use for the graceful restart protocol only
allows us to send mount specific information in the non error case.
If we want to send general takeover information, like the order of the file
descriptors we send along with the takeover message, we are out of luck.
I am changing the final type we send during takeover to be a struct so that
we can add non-mount specific information. This allows us to create an explicit
order of our file descriptors and will let us send a dynamic number of general
file descriptors.
Currently, we have to send all the expected general file descriptors
(thift socket, lock file, mountd socket) in a fixed order.
And those file descriptors have to be valid because we cannot send an invalid
file descriptor with sendmsg.
When NFS is not enabled we do not initialize the mountd socket so we cannot
send a valid filedescriptor for the mountd socket (unless we sent some dummy
file descriptor which feels really hacky and not a good permanent solution).
To be able to fix the bug we have in graceful restart due to not having
a mountd socket we need to be able to send only a portion of the general
file descriptors.
When NFS is not enabled, we do not initialize the NFS Server. This causes
a seg fault during takeover because we try to shutdown the nfs server.
We should only shutdown the NFS Server when we initialized it. This
means we only pass the mountd server socket during takeover when NFS was enabled
when we started the server.
To allow use to enable the NFS Server across graceful restart we only
perform a takeover start when the mountd server socket was received during
takeover.
Reviewed By: xavierd
Differential Revision: D33755447
fbshipit-source-id: cc471c3051de8e41c21a4d1d8e42d93d5295e90e
Summary:
With Facebook having been renamed Meta Platforms, we need to change the license
headers.
Reviewed By: fanzeyi
Differential Revision: D33407812
fbshipit-source-id: b11bfbbf13a48873f0cea75f212cc7b07a68fb2e
Summary:
It feels a bit weird for getDaemonInfo().status's meaning to diverge
from fb303's getStatus(), but fb303 is being restructured so remove
the custom getStatus implementation.
Reviewed By: praihan
Differential Revision: D33176576
fbshipit-source-id: 5a7b7f3394b123b1073191508c2c0c1ebcda5bb5
Summary:
Now that graceful restart is supported, we can enable the integration tests!
These are all passing. The IO test is a little flakey, but the fuse version is
as well. So I haven't invested a lot of time in it.
One test is only compatible with version 1 and 2 of the takeover protocol
which is not compatible with takeover for nfs mounts. So that test will remain
disabled.
Reviewed By: xavierd
Differential Revision: D31738202
fbshipit-source-id: 0160310355138db59a8e671d3888b948e896eefa
Summary:
The changes in this diff makes the code to "run" and skip correctly according to our rules.
All integration tests are now runnable on Windows with mode/win
Differential Revision: D30143255
fbshipit-source-id: b2ddbff7268f182274b3755f4b28df6ac6cdeef4
Summary:
This applies the formatting changes from black v21.4b2 to all covered
projects in fbsource. Most changes are to single line docstrings, as black
will now remove leading and trailing whitespace to match PEP8. Any other
formatting changes are likely due to files that landed without formatting,
or files that previously triggered errors in black.
Any changes to code should be AST identical. Any test failures are likely
due to bad tests, or testing against the output of pyfmt.
Reviewed By: thatch
Differential Revision: D28204910
fbshipit-source-id: 804725bcd14f763e90c5ddff1d0418117c15809a
Summary: This just makes it more obvious _where_ `--force` should be passed.
Reviewed By: genevievehelsel
Differential Revision: D28119590
fbshipit-source-id: 1fbdb4428e9b89e7b66c959f874067485a91d534
Summary:
The Python 2-and-3 Thrift API is sort of deprecated and does not
handle binary data in `binary` fields. In advance of migrating to the
modern Python 3 API, remane eden.thrift to eden.thrift.legacy.
Reviewed By: fanzeyi
Differential Revision: D21889697
fbshipit-source-id: a745ee8977999acbfb383a4edebe81d8deb1734e
Summary:
This restructures the `ThriftServer server_` stopping logic in the graceful restart case. Instead of stopping the server, we stop listening then explicitly stop the server. This refactor should exhibit the same behavior as today since we block on `stopListening()`, but this will allow for simpler refactoring of thrift call queueing logic in the future (by removing the `stopListening()` call and replacing it with a `startQueueingAndWaitForOutstandingCallsToFinish()` type of call).
In terms of layout, this consolidates all the stopping code into one function `startTakeoverShutdown()`. This function now returns the `TakeoverData` itself instead returning a future that is fulfilled after stopping the Thrift server. The TakeoverServer still communicates via `takeoverComplete`, but now that future is stored during `startTakeoverShutdown()` instead of being the return value of `performTakeoverShutdown()`. This also generally eliminates `TakeoverPromise`.
Reviewed By: simpkins
Differential Revision: D20744151
fbshipit-source-id: 60f0c273b4f3889b53586d79efd95bfb27256e1b
Summary:
* This adds a `EdenServer::recover()` method to start back up on unsuccessful takeover data send.
* On an unsuccessful ping, filfill the `shutdownPromise` with a `TakeoverSendError` continaing the constructed `TakeoverData`. After this `recover` function is called, `takeoverPromise_` is reset, `takeoverShutdown` is set to `false`, and the `runningState_` is set to `RUNNING`.
With taking over from the returned `TakeoverData`, the user will not encounter `Transport not connected` errors on recovery.
* This adds a `EdenServer::closeStorage()` method to defer closing the `backingStore_` and `localStore_` until after our ready handshake is successful.
* This defers the shutdown of the `PrivHelper` until a successful ready handshake.
I also update the takeover documentation here with the new logic (and fix some formatting issues)
Reviewed By: simpkins
Differential Revision: D20433433
fbshipit-source-id: f59e660922674d281957e80aee5049735b901a2c
Summary:
For graceful restart takeovers, we would like to implement an additional handshake. This handshake will occur right after the takeover data is ready to be sent to the client, but before actually sending it. This is to make sure the old daemon can recover in case of the client not being responsive (the client replies back to the server, and if no response is recieved in 5 seconds, the server will recover).
There are a few cases here:
* **Server sends ping (two cases discussed below)**
I introduced a new ProtocolVersion. Daemons with this change will now have ProtocolVersion4. The Server checks the max version of the client, and if this version is ProtocolVersion4, we know the client can listen for pings. So we will send the ping. Otherwise, we don't send a ping. With this, we will only send pings if we know the client will be listening for one. The case in which a client isn't listening is if we adopt this change and we downgrade past the change.
* **Server does not send ping and Client knows to listen for ping**
This will be a common case immediately after this change. The client will parse the sent data and check if it matches the "ready" ping, and if it doesn't, the client assumes the server simply sent the Takeover Data.
* **Server does not sends ping and Client doesn't know to listen for ping**
This is the case before this change.
Reviewed By: simpkins
Differential Revision: D20290271
fbshipit-source-id: b68e4df6264fb071d770671a80e28c90ddb0d3f2
Summary:
D17135557 added a bunch of `pyre-fixme` comments to the EdenFS integration
tests for cases where Pyre cannot detect that some attributes are initialized
by the test case `setUp()` method.
It looks like Pyre's handling of `setUp()` is somewhat incorrect: it looks
like if a class has a `setUp()` method this currently suppresses all
uninitialized attribute errors (even if some attributes really are never
initialized). However, Pyre does not detect `setUp()` methods inherited from
parent classes, and always warns about uninitialized attributes in this case
even they are initialized.
Lets change these comments from `pyre-fixme` to `pyre-ignore` since this
appears to be an issue with Pyre rather than with this code. T62487924 is
open to track adding support for annotating custom constructor methods, which
might help here. I've also posted in Pyre Q&A about incorrect handling of
`setUp()` in derived classes.
Reviewed By: grievejia
Differential Revision: D19963118
fbshipit-source-id: 9fd13fc8665367e0780f871a5a0d9a8fe50cc687
Summary:
Two bugs conspired to cause edenfs after a graceful restart to think
the kernel supported FUSE_NO_OPENDIR_SUPPORT when it didn't: the
connection info struct wasn't zeroed, and FUSE connection capabilities
weren't properly mirrored into the Dispatcher upon graceful
restart. Fix both and add an integration test.
Reviewed By: simpkins
Differential Revision: D18903761
fbshipit-source-id: 23f4db3e240ee7d035f707820072c606a45f1138
Summary:
Update the copyright & license headers in Python files to reflect the
relicensing to GPLv2
Reviewed By: wez
Differential Revision: D15487088
fbshipit-source-id: 9f2138dff41048d2c35f15e09a04ae5a9c9c80dd
Summary:
If TreeInode::startLoadingInode() is in progress, and EdenServer::startTakeoverShutdown() is called, edenfs can deadlock:
1. Thread A: A FUSE request calls TreeInode::readdir() -> TreeInode::prefetch() -> TreeInode::startLoadingInode() on the children TreeInode-s -> RocksDbLocalStore::getFuture().
2. Thread B: A takeover request calls EdenServer::performTakeoverShutdown() -> InodeMap::shutdown().
3. Thread C: RocksDbLocalStore::getFuture() (called in step 1) completes -> TreeInode::inodeLoadComplete(). (The inodeLoadComplete continuation was registered by TreeInode::registerInodeLoadComplete().)
4. Thread C: After TreeInode::inodeLoadComplete() returns, the TreeInode's InodePtr is destructed, dropping the reference count to 0.
5. Thread C: InodeMap::onInodeUnreferenced() -> InodeMap::shutdownComplete() -> EdenMount::shutdown() (called in step 2) completes -> EdenServer::performTakeoverShutdown().
6. Thread C: EdenServer::performTakeoverShutdown() -> localStore_.reset() -> RocksDbLocalStore::~RocksDbLocalStore().
7. Thread C: RocksDbLocalStore::~RocksDbLocalStore() signals the thread pool to exit and waits for the pool's threads to exit. Because thread C is one of the threads managed by RocksDbLocalStore's thread pool, the signal is never handled and RocksDbLocalStore::~RocksDbLocalStore() never finishes.
Fix this deadlock by executing EdenServer::shutdown()'s callback (in EdenServer::performTakeoverShutdown()) on a different thread.
Reviewed By: simpkins
Differential Revision: D14337058
fbshipit-source-id: 1d63b4e7d8f5103a2dde31e329150bf763be3db7
Summary:
Update most of the `eden/cli/config.py` to use `Path` instead of `str` where
appropriate. This also updates several of the APIs in `util.py` that were
affected as well.
Reviewed By: chadaustin
Differential Revision: D14356543
fbshipit-source-id: a8f6d15b8870bf689eeb78f9fc0e9a0c65c97218
Summary:
When graceful restart was first implemented we forgot to update the
lock file with the new pid, resulting in occasional unexpected output
from tools like eden doctor.
Reviewed By: simpkins
Differential Revision: D13744411
fbshipit-source-id: cdc758ed6ac1201fd2ff3e9d7805bb5ab6f83e8a
Summary:
Add type annotations for class member variables. The pyre type checker has
some limited automatic type detection for member variables set in
`__init__()`, but in general it expects member variables to be explicitly
declared at the top-level of the class.
Reviewed By: strager
Differential Revision: D13051092
fbshipit-source-id: 080259ab3f422ffae2b908ed610062237105ccbe
Summary:
The thrift APIs accept path names and commit IDs as binary data (python bytes)
rather than unicode strings. Our python code got this wrong in several
locations. It looks like mypy didn't previously flag this since mypy doesn't
actually figure out the correct type for the thrift `client` object, and
seemed to just be largely ignoring it. I plan to update the code so that mypy
can figure out out the client type correctly. Fixing these type errors is
required to make sure we won't get type errors once that is changed.
This just simply uses `encode("utf-8")` for now. In the future the path
arguments should be converted to `pathlib.Path`, which will do a slightly
smarter conversion, and avoid errors on non-UTF-8 binary data. In the
meantime, I believe that just using `encode("utf-8")` preserves what the
thrift code was doing implicitly before, and does not make handling of
non-UTF-8 data any worse than it was before.
Reviewed By: strager
Differential Revision: D13051094
fbshipit-source-id: 94cb62f3dd78b8e854a72a392fe8fdfad5ffd4cb
Summary:
I kept running into issues trying to get graceful restart and
flush_cache to work together in the hg integration suite, so add a
test to ensure flush_cache succeeds after a graceful restart in the
main integration suite.
Also, to make the test's output easier to follow, add logging when
invalidating inodes.
Reviewed By: simpkins
Differential Revision: D8215961
fbshipit-source-id: 33db4292af3969ae23940c3027ba513ed20c53fb
Summary: Mostly empty lines removed and added. A few bugfixes on excessive line splitting.
Reviewed By: cooperlees
Differential Revision: D8198776
fbshipit-source-id: 4361faf4a2b9347d57fb6e1342c494575f2beb67
Summary:
Once a mount point has been unmounted we no longer need to care about
outstanding FUSE reference counts--we can treat them as if they are all zero.
This updates EdenMount to tell the InodeMap when the mount point is unloaded,
and changes InodeMap::unloadInode() to make use of this information when
deciding if it needs to remember the inode information.
Previously InodeMap would save information for inodes with outstanding FUSE
reference counts. Writing all of this state to the overlay could take a
non-trivial amount of time.
Reviewed By: chadaustin
Differential Revision: D7555998
fbshipit-source-id: 0896f867ce850ab3e61c262776d536de003685ff
Summary:
We already had type annotations on most of the `hg` integration tests. This
adds them for the top-level (non-source-control-specific) tests.
typeseverywhere
Reviewed By: wez
Differential Revision: D7459281
fbshipit-source-id: 41266b232ded510d6b63dd3e62c272a0cd6a0e1a
Summary:
Update the eden_repo_test decorator so that it no longer automatically adds
`EdenRepoTestBase` as a parent class. Individual test classes still specify
`EdenRepoTest` as their parent now.
This enables `mypy` to correctly figure out that the individual test classes
derive from `unittest.TestCase`.
This basically does the same thing as D6268258 for the top-level integration
tests.
Reviewed By: wez
Differential Revision: D7459280
fbshipit-source-id: 5d18bd241dad77d55541ac3fa1d169496ffe7003
Summary:
Fix FuseChannel::processSession() to always process all FUSE requests that it
reads. Previously it checked to see if it should stop immediately after
reading FUSE request. It was possible for the old process to successfully read
a FUSE request, see that it was supposed to stop, and then exit this worker
thread without ever processing this FUSE request. This would cause the client
that sent the request to hang indefinitely, since no response would ever be
sent.
Reviewed By: wez
Differential Revision: D7436867
fbshipit-source-id: c58c2f6c49102fa6b66ac83fc1639595a5277ce0
Summary:
Add a new integration test that performs a graceful restart after invoking the
getScmStatusBetweenRevisions() thrift call.
Prior to D7341609 this would cause edenfs to crash on shutdown with the error
"!!BUG!! After InodeMap::shutdown() finished, 2 inodes still loaded; they must
all (except the root) have been unloaded for this to succeed!"
Before D7341609 the `EdenMount::diffRevisions()` created a new temporary inode
tree solely to perform a diff. This resulted in multiple root inodes that all
pointed to the same EdenMount, but the EdenMount didn't know about any of these
alternate root inodes. These temporary inode trees never got destroyed,
causing this error on shutdown.
Reviewed By: chadaustin, wez
Differential Revision: D7333005
fbshipit-source-id: 8406d2e2ceb00264050b0aceec583baae2da69ec
Summary:
This makes the default `memory` for speed and minimal
flakiness, but allows a test to select a different engine where
appropriate (eg: restart and remount tests).
Reviewed By: chadaustin
Differential Revision: D6944207
fbshipit-source-id: 1fb11387beda02d059a796dad5a42d56ddcf6e88
Summary:
To me, restart implied shutting down the daemon and restarting it
again. Perhaps instead of `eden daemon --takeover` we should have
`eden restart`. But if people typed `eden restart` I imagine they're
trying to debug a problem, so that's probably not the right verb.
Reviewed By: wez
Differential Revision: D6929166
fbshipit-source-id: d568a1940d67f755e4c3656098c58fc81e0a3156