sapling/eden/docs/Rename.md
Adam Simpkins 8b4c984b28 always load affected inodes in rename(), and update lock ordering
Summary:
This updates the TreeInode::rename() code to handle concurrency better.
In particular:

- The code now ensures that both the source inode being renamed and destination
  inode (if it exists) are loaded.  This simplifies issues when an inode is
  being loaded at the same time a rename is in progress.  This ensures that any
  pending load is processed before the rename takes place.  (All promises for
  the load might not be fulfilled before the rename completes, but the relevant
  TreeInode and InodeMap data structures are updated before the rename occurs.)

  This does mean that the rename code potentially might have to retry several
  times if the inode it began loading is no longer the affected source or
  destination or child once the load completes.  However, this seems like a
  reasonable trade-off, compared to dealing with the complications that would
  arise with the load code having to handle renames occuring before load
  completion.

- The code now implements proper lock ordering, to avoid acquiring locks in
  conflicting orders that might cause deadlock with other threads also trying
  to acquire the same locks.  The InodeLocks.md document has been updated to
  clarify the TreeInode lock ordering requirements.

Reviewed By: wez

Differential Revision: D4493526

fbshipit-source-id: 627393fafad90eb551aea62be7762d59ed043abe
2017-02-03 18:35:03 -08:00

3.5 KiB

A few notes about renames:

Rename Lock

There is a mountpoint-wide rename lock that is held during any rename or unlink operation. An Inode's path cannot be changed without holding this lock.

However, we currently do not hold the rename lock when creating new files or directories. Therefore TreeEntry contents_.entries fields may change even when the rename lock is not held. (We could potentially revisit this choice later and require holding the rename lock even when creating new inodes.)

Renaming over directories

Rename supports renaming one directory over an existing directory, as long as the destination directory is empty. This means we must (a) be able to safely check if the directory is currently empty, and (b) be able to prevent new files or directories from being created inside the destination directory once the rename has started.

We currently achieve this by acquiring the destination directory's contents_ lock. This does mean that a rename operation may hold up to 3 TreeInode locks concurrently: the source directory, the destination parent directory, and the destination child directory. The InodeLocks document describes the lock ordering requirements for acquiring these 3 locks.

This also means that create() and mkdir() operations must check if the parent directory is unlinked after acquiring the parent directory's contents lock.

Handling unloaded children

When rename() (and unlink()/rmdir()) is invoked, the parent directories have already been loaded (typically having been identified via inode number). However, the affected children may not have been loaded yet, and are referred to by name.

We had a few choices for how to deal with this situation.

For now we have opted to always load the child entries in question before performing the rename. This is slightly tricky, as loading the child may take some time, and another rename or unlink operation may also be in progress, and may affect the child in question before our operation can take place. One option would have been to hold the rename lock while waiting on the children to be loaded. However, this would have blocked all other rename/unlink/rmdir operations for the duration of the load, which seems undesirable. Instead, we wait for the load to complete, then double check to confirm that the named entry that we desire is actually loaded. The original inode we loaded may have been renamed or unlinked, so we may find an unloaded entry or no entry at all. If we find an unloaded entry we have to repeat the load operation. We therefore may have to retry loading the requested children multiple times before we can make progress, but we should eventually succeed or fail. Once the children are loaded the rename itself is then fairly straightforward.

Another option would have been to allow the rename even though the requested children are not loaded. The main downside with this approach is that we still need to confirm if the destination child is an empty directory or not. This would have meant either loading the destination child inode anyway, or storing some extra data to track if an unloaded inode is an empty directory or not. This also makes the inode loading code more complicated, as an inode may be unlinked or renamed while it is already in the process of being loaded. When the load completes we would need to double-check which parent TreeInode the new entry needs to be inserted into. All-in-all this felt more complicated than simply always loading the affected children before performing rename/unlink/rmdir operations.