added clay architecture doc

This commit is contained in:
Philip C Monk 2015-09-18 16:45:09 -04:00
parent 646fa7cf14
commit 0bfa034edb

View File

@ -0,0 +1,415 @@
# clay
## high-level
clay is the primary filesystem for the arvo operating system,
which is the core of an urbit. The architecture of clay is
intrinsically connected with arvo, but we assume no knowledge of
either arvo or urbit. We will point out only those features of
arvo that are necessary for an understanding of clay, and we will
do so only when they arise.
The first relevant feature of arvo is that it is a deterministic
system where input and output are defined as a series of events
and effects. The state of arvo is simply a function of its event
log. None of the effects from an event are emitted until the
event is entered in the log and persisted, either to disk or
another trusted source of persistence, such as a Kafka cluster.
Consequently, arvo is a single-level store: everything in its
state is persistent.
In a more traditional OS, everything in RAM can be erased at any
time by power failure, and is always erased on reboot. Thus, a
primary purpose of a filesystem is to ensure files persist across
power failures and reboots. In arvo, both power failures and
reboots are special cases of suspending computation, which is
done safely since our event log is already persistent. Therefore,
clay is not needed in arvo for persistence. Why, then, do we have a
filesystem? There are two answers to this question.
First, clay provides a filesystem tree, which is a convenient
user interface for some applications. Unix has the useful concept
of virtual filesystems, which are used for everything from direct
access to devices, to random number generators, to the /proc
tree. It is easy and intuitive to read from and write to a
filesystem tree.
Second, clay has a distributed revision control system baked into
it. Traditional filesystems are not revision controlled, so
userspace software -- such as git -- is written on top of them to
do so. clay natively provides the same functionality as modern
DVCSes, and more.
clay has two other unique properties that we'll cover later on:
it supports typed data and is referentially transparent.
### Revision Control
Every urbit has one or more "desks", which are independently
revision-controlled branches. Each desk contains its own mark
definitions, apps, doc, and so forth.
Traditionally, an urbit has at least a base and a home desk. The
base desk has all the system software from the distribution. the
home desk is a fork of base with all the stuff specific to the
user of the urbit.
A desk is a series of numbered commits, the most recent of which
represents the current state of the desk. A commit is composed of
(1) an absolute time when it was created, (2) a list of zero or
more parents, and (3) a map from paths to data.
Most commits have exactly one parent, but the initial commit on a
desk may have zero parents, and merge commits have more than one
parent.
The non-meta data is stored in the map of paths to data. It's
worth noting that no constraints are put on this map, so, for
example, both /a/b and /a/b/c could have data. This is impossible
in a traditional Unix filesystem since it means that /a/b is both
a file and a directory. Conventionally, the final element in the
path is its mark -- much like a filename extension in Unix. Thus,
/doc/readme.md in Unix is stored as /doc/readme/md in urbit.
The data is not stored directly in the map; rather, a hash of the
data is stored, and we maintain a master blob store. Thus, if the
same data is referred to in multiple commits (as, for example,
when a file doesn't change between commits), only the hash is
duplicated.
In the master blob store, we either store the data directly, or
else we store a diff against another blob. The hash is dependent
only on the data within and not on whether or not it's stored
directly, so we may on occasion rearrange the contents of the
blob store for performance reasons.
Recall that a desk is a series of numbered commits. Not every
commit in a desk must be numbered. For example, if the base desk
has had 50 commits since home was forked from it, then a merge
from base to home will only add a single revision number to home,
although the full commit history will be accessible by traversing
the parentage of the individual commits.
We do guarantee that the first commit is numbered 1, commits are
numbered consecutively after that (i.e. there are no "holes"),
the topmost commit is always numbered, and every numbered commit
is an ancestor of every later numbered commit.
There are three ways to refer to particular commits in the
revision history. Firstly, one can use the revision number.
Secondly, one can use any absolute time between the one numbered
commit and the next (inclusive of the first, exclusive of the
second). Thirdly, every desk has a map of labels to revision
numbers. These labels may be used to refer to specific commits.
Additionally, clay is a global filesystem, so data on other urbit
is easily accessible the same way as data on our local urbit. In
general, the path to a particular revision of a desk is
/~urbit-name/desk-name/revision. Thus, to get /try/readme/md
from revision 5 of the home desk on ~sampel-sipnym, we refer to
/~sampel-sipnym/home/5/try/readme/md. Clay's namespace is thus
global and referentially transparent.
XXX reactivity here?
### A Typed Filesystem
Since clay is a general filesystem for storing data of arbitrary
types, in order to revision control correctly it needs to be
aware of types all the way through. Traditional revision control
does an excellent job of handling source code, so for source code
we act very similar to traditional revision control. The
challenge is to handle other data similarly well.
For example, modern VCSs generally support "binary files", which
are files for which the standard textual diffing, patching, and
merging algorithms are not helpful. A "diff" of two binary files
is just a pair of the files, "patching" this diff is just
replacing the old file with the new one, and "merging"
non-identical diffs is always a conflict, which can't even be
helpfully annotated. Without knowing anything about the structure
of a blob of data, this is the best we can do.
Often, though, "binary" files have some internal structure, and
it is possible to create diff, patch, and merge algorithms that
take advantage of this structure. An image may be the result of a
base image with some set of operations applied. With algorithms
aware of this set of operations, not only can revision control
software save space by not having to save every revision of the
image individually, these transformations can be made on parallel
branches and merged at will.
Suppose Alice is tasked with touching up a picture, improving the
color balance, adjusting the contrast, and so forth, while Bob
has the job of cropping the picture to fit where it's needed and
adding textual overlay. Without type-aware revision control,
these changes must be made serially, requiring Alice and Bob to
explicitly coordinate their efforts. With type-aware revision
control, these operations may be performed in parallel, and then
the two changesets can be merged programmatically.
Of course, even some kinds of text files may be better served by
diff, patch, and merge algorithms aware of the structure of the
files. Consider a file containing a pretty-printed JSON object.
Small changes in the JSON object may result in rather significant
changes in how the object is pretty-printed (for example, by
addding an indentation level, splitting a single line into
multiple lines).
A text file wrapped at 80 columns also reacts suboptimally with
unadorned Hunt-McIlroy diffs. A single word inserted in a
paragraph may push the final word or two of the line onto the
next line, and the entire rest of the paragraph may be flagged as
a change. Two diffs consisting of a single added word to
different sentences may be flagged as a conflict. In general,
prose should be diffed by sentence, not by line.
As far as the author is aware, clay is the first generalized,
type-aware revision control system. We'll go into the workings
of this system in some detail.
### Marks
Central to a typed filesystem is the idea of types. In clay, we
call these "marks". A mark is a file that defines a type,
conversion routines to and from the mark, and diff, patch, and
merge routines.
For example, a `%txt` mark may be a list of lines of text, and it
may include conversions to `%mime` to allow it to be serialized
and sent to a browswer or to the unix filesystem. It will also
include Hunt-McIlroy diff, patch, and merge algorithms.
A `%json` mark would be defined as a json object in the code, and
it would have a parser to convert from `%txt` and a printer to
convert back to `%txt`. The diff, patch, and merge algorithms are
fairly straightforward for json, though they're very different
from the text ones.
More formally, a mark is a core with three arms, `++grab`,
`++grow`, and `++grad`. In `++grab` is a series of functions to
convert from other marks to the given mark. In `++grow` is a
series of functions to convert from the given mark to other
marks. In `++grad` is `++diff`, `++pact`, `++join`, and `++mash`.
The types are as follows, in an informal pseudocode:
++ grab:
++ mime: <mime> -> <mark-type>
++ txt: <txt> -> <mark-type>
...
++ grow:
++ mime: <mark-type> -> <mime>
++ txt: <mark-type> -> <txt>
...
++ grad
++ diff: (<mark-type>, <mark-type>) -> <diff-type>
++ pact: (<mark-type>, <diff-type>) -> <mark-type>
++ join: (<diff-type>, <diff-type>) -> <diff-type> or NULL
++ mash: (<diff-type>, <diff-type>) -> <diff-type>
These types are basically what you would expect. Not every mark
has each of these functions defined -- all of them are optional
in the general case.
In general, for a particular mark, the `++grab` and `++grow` entries
(if they exist) should be inverses of each other.
In `++grad`, `++diff` takes two instances of a mark and produces
a diff of them. `++pact` takes an instance of a mark and patches
it with the given diff. `++join` takes two diffs and attempts to
merge them into a single diff. If there are conflicts, it
produces null. `++mash` takes two diffs and forces a merge,
annotating any conflicts.
In general, if `++diff` called with A and B produces diff D, then
`++pact` called with A and D should produce B. Also, if `++join`
of two diffs does not produce null, then `++mash` of the same
diffs should produce the same result.
Alternately, instead of `++diff`, `++pact`, `++join`, and
`++mash`, a mark can provide the same functionality by defining
`++sted` to be the name of another mark to which we wish to
delegate the revision control responsibilities. Then, before
running any of those functions, clay will convert to the other
mark, and convert back afterward. For example, the `%hoon` mark
is revision-controlled in the same way as `%txt`, so its `++grad`
is simply `++sted %txt`. Of course, `++txt` must be defined in
`++grow` and `++grab` as well.
Every file in clay has a mark, and that mark must have a
fully-functioning `++grad`. Marks are used for more than just
clay, and other marks don't need a `++grad`, but if a piece of
data is to be saved to clay, we must know how to revision-control
it.
Additionally, if a file is to be synced out to unix, then it must
have conversion routines to and from the `%mime` mark.
##Using clay
### Reading and Subscribing
When reading from Clay, there are three types of requests. A
`%sing` request asks for data at single revsion. A `%next`
request asks to be notified the next time there's a change to
given file. A `%many` request asks to be notified on every
change in a desk for a range of changes.
For `%sing` and `%next`, there are generally three things to be
queried. A `%u` request simply checks for the existence of a
file at a path. A `%x` request gets the data in the file at a
path. A `%y` request gets a hash of the data in the file at the
path combined with all its children and their data. Thus, `%y`
of a node changes if it or any of its children change.
A `%sing` request is fulfilled immediately if possible. If the
requested revision is in the future, or is on another ship for
which we don't have the result cached, we don't respond
immediately. If the requested revision is in the future, we wait
until the revision happens before we respond to the request. If
the request is for data on another ship, we pass on the request
to the other ship. In general, Clay subscriptions, like most
things in Urbit, aren't guaranteed to return immediately.
They'll return when they can, and they'll do so in a
referentially transparent manner.
A `%next` request checks query at the given revision, and it
produces the result of the query the next time it changes, along
with the revsion number when it changes. Thus, a `%next` of a
`%u` is triggered when a file is added or deleted, a `%next of a
`%x` is triggered when a file is added, deleted, or changed, and
a `%next` of a `%y` is triggered when a file or any of its
children is added, deleted, or changed.
A `%many` request is triggered every time the given desk has a
new revision. Unlike a `%next`, a `%many` has both a start and
an end revsion, after which it stops returning. For `%next`, a
single change is reported, and if the caller wishes to hear of
the next change, it must resubscribe. For `%many`, every revsion
from the start to the end triggers a response. Since a `%many`
request doesn't ask for any particular data, there aren't `%u`,
`%x`, and `%y` versions for it.
### Unix sync
One of the primary functions of clay is as a convenient user
interface. While tools exist to use clay from within urbit, it's
often useful to be able to treat clay like any other filesystem
from the Unix perspective -- to "mount" it, as it were.
From urbit, you can run `|mount /path/to/directory %mount-point`,
and this will mount the given clay directory to the mount-point
directory in Unix. Every file is converted to `%mime` before it's
written to Unix, and converted back when read from Unix. The
entire directory is watched (a la Dropbox), and every change is
auto-committed to clay.
### Merging
Merging is a fundamental operation for a distributed revision
control system. At their root, clay's merges are similar to
git's, but with some additions to accomodate typed data. There
are seven different merge strategies.
Throughout our discussion, we'll say that the merge is from
Alice's desk to Bob's. Recall that a commit is a date (for all
new commits this will be the current date), a list of parents,
and the data itself.
A `%init` merge should be used iff it's the first commit to a
desk. The head of Alice's desk is used as the number 1 commit to
Bob's desk. Obviously, the ancestry remains intact through
traversing the parentage of the commit even though previous
commits are not numbered for Bob's desk.
A `%this` merge means to keep what's in Bob's desk, but join the
ancestry. Thus, the new commit has the head of each desk as
parents, but the data is exactly what's in Bob's desk. For those
following along in git, this is the 'ours' merge strategy, not
the '--ours' option to the 'recursive' merge strategy. In other
words, even if Alice makes a change that does not conflict with
Bob, we throw it away. It's Bob's way or the highway.
A `%that` merge means to take what's in Alice's desk, but join
the ancestry. This is the reverse of `%this`.
A `%fine` merge is a "fast-forward" merge. This succeeds iff one
head is in the ancestry of the other. In this case, we use the
descendant as our new head.
For `%meet`, `%mate`, and `%meld` merges, we first find the most
recent common ancestor to use as our merge base. If we have no
common ancestors, then we fail. If we have more than one most
recent common ancestor, then we have a criss-cross situation,
which should be handled delicately. At present, we delicately
throw up our hands and give up, but something akin to git's
'recursive' strategy should be implemented in the future.
There's a functional inclusion ordering on `%fine`, `%meet`,
`%mate`, and `%meld` such that if an earlier strategy would have
succeeded, then every later strategy will produce the same
result. Put another way, every earlier strategy is the same as
every later strategy except with a restricted domain.
A `%meet` merge only succeeds if the changes from the merge base
to Alice's head (hereafter, "Alice's changes") are in different
files than Bob's changes. In this case, the parents are both
Alice's and Bob's heads, and the data is the merge base plus
Alice's changed files plus Bob's changed files.
A `%mate` merge attempts to merge changes to the same file when
both Alice and bob change it. If the merge is clean, we use it;
otherwise, we fail. A merge between different types of changes --
for example, deleting a file vs changing it -- is always a
conflict. If we succeed, the parents are both Alice's and Bob's
heads, and the data is the merge base plus Alice's changed files
plus Bob's changed files plus the merged files.
A `%meld` merge will succeed even if there are conflicts. If
there are conflicts in a file, then we use the merge base's
version of that file, and we produce a set of files with
conflicts. The parents are both Alice's and Bob's heads, and the
data is the merge base plus Alice's changed files plus Bob's
changed files plus the successfully merged files plus the merge
base's version of the conflicting files.
That's the extent of the merge options in clay proper. In
userspace there's a final option `%auto`, which is the most
common. `%auto` checks to see if Bob's desk exists, and if it
doesn't we use a `%init` merge. Otherwise, we progressively try
`%fine`, `%meet`, and `%mate` until one succeeds.
If none succeed, we merge Bob's desk into a scratch desk. Then,
we merge Alice's desk into the scratch desk with the `%meld`
option to force the merge. For each file in the produced set of
conflicting files, we call the `++mash` function for the
appropriate mark, which annotates the conflicts if we know how.
Finally, we display a message to the user informing them of the
scratch desk's existence, which files have annotated conflicts,
and which files have unannotated conflicts. When the user has
resolved the conflicts, they can merge the scratch desk back into
Bob's desk. This will be a `%fine` merge since Bob's head is in
the ancestry of the scratch desk.
### Autosync
Tracking and staying in sync with another desk is another
fundamental operation. We call this "autosync". This doesn't mean
simply mirroring a desk, since that wouldn't allow local changes.
We simply want to apply changes as they are made upstream, as
long as there are no conflicts with local changes.
This is implemented by watching the other desk, and, when it has
changes, merging these changes into our desk with the usual merge
strategies.
Note that it's quite reasonable for two desks to be autosynced to
each other. This results in any change on one desk being mirrored
to the other and vice versa.
Additionally, it's fine to set up an autosync even if one desk,
the other desk, or both desks do not exist. The sync will be
activated when the upstream desk comes into existence and will
create the downstream desk if needed.