mirror of
https://github.com/ilyakooo0/urbit.git
synced 2024-12-01 11:33:41 +03:00
added clay architecture doc
This commit is contained in:
parent
646fa7cf14
commit
0bfa034edb
415
pub/doc/arvo/clay/architecture.md
Normal file
415
pub/doc/arvo/clay/architecture.md
Normal file
@ -0,0 +1,415 @@
|
|||||||
|
# clay
|
||||||
|
|
||||||
|
## high-level
|
||||||
|
|
||||||
|
clay is the primary filesystem for the arvo operating system,
|
||||||
|
which is the core of an urbit. The architecture of clay is
|
||||||
|
intrinsically connected with arvo, but we assume no knowledge of
|
||||||
|
either arvo or urbit. We will point out only those features of
|
||||||
|
arvo that are necessary for an understanding of clay, and we will
|
||||||
|
do so only when they arise.
|
||||||
|
|
||||||
|
The first relevant feature of arvo is that it is a deterministic
|
||||||
|
system where input and output are defined as a series of events
|
||||||
|
and effects. The state of arvo is simply a function of its event
|
||||||
|
log. None of the effects from an event are emitted until the
|
||||||
|
event is entered in the log and persisted, either to disk or
|
||||||
|
another trusted source of persistence, such as a Kafka cluster.
|
||||||
|
Consequently, arvo is a single-level store: everything in its
|
||||||
|
state is persistent.
|
||||||
|
|
||||||
|
In a more traditional OS, everything in RAM can be erased at any
|
||||||
|
time by power failure, and is always erased on reboot. Thus, a
|
||||||
|
primary purpose of a filesystem is to ensure files persist across
|
||||||
|
power failures and reboots. In arvo, both power failures and
|
||||||
|
reboots are special cases of suspending computation, which is
|
||||||
|
done safely since our event log is already persistent. Therefore,
|
||||||
|
clay is not needed in arvo for persistence. Why, then, do we have a
|
||||||
|
filesystem? There are two answers to this question.
|
||||||
|
|
||||||
|
First, clay provides a filesystem tree, which is a convenient
|
||||||
|
user interface for some applications. Unix has the useful concept
|
||||||
|
of virtual filesystems, which are used for everything from direct
|
||||||
|
access to devices, to random number generators, to the /proc
|
||||||
|
tree. It is easy and intuitive to read from and write to a
|
||||||
|
filesystem tree.
|
||||||
|
|
||||||
|
Second, clay has a distributed revision control system baked into
|
||||||
|
it. Traditional filesystems are not revision controlled, so
|
||||||
|
userspace software -- such as git -- is written on top of them to
|
||||||
|
do so. clay natively provides the same functionality as modern
|
||||||
|
DVCSes, and more.
|
||||||
|
|
||||||
|
clay has two other unique properties that we'll cover later on:
|
||||||
|
it supports typed data and is referentially transparent.
|
||||||
|
|
||||||
|
### Revision Control
|
||||||
|
|
||||||
|
Every urbit has one or more "desks", which are independently
|
||||||
|
revision-controlled branches. Each desk contains its own mark
|
||||||
|
definitions, apps, doc, and so forth.
|
||||||
|
|
||||||
|
Traditionally, an urbit has at least a base and a home desk. The
|
||||||
|
base desk has all the system software from the distribution. the
|
||||||
|
home desk is a fork of base with all the stuff specific to the
|
||||||
|
user of the urbit.
|
||||||
|
|
||||||
|
A desk is a series of numbered commits, the most recent of which
|
||||||
|
represents the current state of the desk. A commit is composed of
|
||||||
|
(1) an absolute time when it was created, (2) a list of zero or
|
||||||
|
more parents, and (3) a map from paths to data.
|
||||||
|
|
||||||
|
Most commits have exactly one parent, but the initial commit on a
|
||||||
|
desk may have zero parents, and merge commits have more than one
|
||||||
|
parent.
|
||||||
|
|
||||||
|
The non-meta data is stored in the map of paths to data. It's
|
||||||
|
worth noting that no constraints are put on this map, so, for
|
||||||
|
example, both /a/b and /a/b/c could have data. This is impossible
|
||||||
|
in a traditional Unix filesystem since it means that /a/b is both
|
||||||
|
a file and a directory. Conventionally, the final element in the
|
||||||
|
path is its mark -- much like a filename extension in Unix. Thus,
|
||||||
|
/doc/readme.md in Unix is stored as /doc/readme/md in urbit.
|
||||||
|
|
||||||
|
The data is not stored directly in the map; rather, a hash of the
|
||||||
|
data is stored, and we maintain a master blob store. Thus, if the
|
||||||
|
same data is referred to in multiple commits (as, for example,
|
||||||
|
when a file doesn't change between commits), only the hash is
|
||||||
|
duplicated.
|
||||||
|
|
||||||
|
In the master blob store, we either store the data directly, or
|
||||||
|
else we store a diff against another blob. The hash is dependent
|
||||||
|
only on the data within and not on whether or not it's stored
|
||||||
|
directly, so we may on occasion rearrange the contents of the
|
||||||
|
blob store for performance reasons.
|
||||||
|
|
||||||
|
Recall that a desk is a series of numbered commits. Not every
|
||||||
|
commit in a desk must be numbered. For example, if the base desk
|
||||||
|
has had 50 commits since home was forked from it, then a merge
|
||||||
|
from base to home will only add a single revision number to home,
|
||||||
|
although the full commit history will be accessible by traversing
|
||||||
|
the parentage of the individual commits.
|
||||||
|
|
||||||
|
We do guarantee that the first commit is numbered 1, commits are
|
||||||
|
numbered consecutively after that (i.e. there are no "holes"),
|
||||||
|
the topmost commit is always numbered, and every numbered commit
|
||||||
|
is an ancestor of every later numbered commit.
|
||||||
|
|
||||||
|
There are three ways to refer to particular commits in the
|
||||||
|
revision history. Firstly, one can use the revision number.
|
||||||
|
Secondly, one can use any absolute time between the one numbered
|
||||||
|
commit and the next (inclusive of the first, exclusive of the
|
||||||
|
second). Thirdly, every desk has a map of labels to revision
|
||||||
|
numbers. These labels may be used to refer to specific commits.
|
||||||
|
|
||||||
|
Additionally, clay is a global filesystem, so data on other urbit
|
||||||
|
is easily accessible the same way as data on our local urbit. In
|
||||||
|
general, the path to a particular revision of a desk is
|
||||||
|
/~urbit-name/desk-name/revision. Thus, to get /try/readme/md
|
||||||
|
from revision 5 of the home desk on ~sampel-sipnym, we refer to
|
||||||
|
/~sampel-sipnym/home/5/try/readme/md. Clay's namespace is thus
|
||||||
|
global and referentially transparent.
|
||||||
|
|
||||||
|
XXX reactivity here?
|
||||||
|
|
||||||
|
### A Typed Filesystem
|
||||||
|
|
||||||
|
Since clay is a general filesystem for storing data of arbitrary
|
||||||
|
types, in order to revision control correctly it needs to be
|
||||||
|
aware of types all the way through. Traditional revision control
|
||||||
|
does an excellent job of handling source code, so for source code
|
||||||
|
we act very similar to traditional revision control. The
|
||||||
|
challenge is to handle other data similarly well.
|
||||||
|
|
||||||
|
For example, modern VCSs generally support "binary files", which
|
||||||
|
are files for which the standard textual diffing, patching, and
|
||||||
|
merging algorithms are not helpful. A "diff" of two binary files
|
||||||
|
is just a pair of the files, "patching" this diff is just
|
||||||
|
replacing the old file with the new one, and "merging"
|
||||||
|
non-identical diffs is always a conflict, which can't even be
|
||||||
|
helpfully annotated. Without knowing anything about the structure
|
||||||
|
of a blob of data, this is the best we can do.
|
||||||
|
|
||||||
|
Often, though, "binary" files have some internal structure, and
|
||||||
|
it is possible to create diff, patch, and merge algorithms that
|
||||||
|
take advantage of this structure. An image may be the result of a
|
||||||
|
base image with some set of operations applied. With algorithms
|
||||||
|
aware of this set of operations, not only can revision control
|
||||||
|
software save space by not having to save every revision of the
|
||||||
|
image individually, these transformations can be made on parallel
|
||||||
|
branches and merged at will.
|
||||||
|
|
||||||
|
Suppose Alice is tasked with touching up a picture, improving the
|
||||||
|
color balance, adjusting the contrast, and so forth, while Bob
|
||||||
|
has the job of cropping the picture to fit where it's needed and
|
||||||
|
adding textual overlay. Without type-aware revision control,
|
||||||
|
these changes must be made serially, requiring Alice and Bob to
|
||||||
|
explicitly coordinate their efforts. With type-aware revision
|
||||||
|
control, these operations may be performed in parallel, and then
|
||||||
|
the two changesets can be merged programmatically.
|
||||||
|
|
||||||
|
Of course, even some kinds of text files may be better served by
|
||||||
|
diff, patch, and merge algorithms aware of the structure of the
|
||||||
|
files. Consider a file containing a pretty-printed JSON object.
|
||||||
|
Small changes in the JSON object may result in rather significant
|
||||||
|
changes in how the object is pretty-printed (for example, by
|
||||||
|
addding an indentation level, splitting a single line into
|
||||||
|
multiple lines).
|
||||||
|
|
||||||
|
A text file wrapped at 80 columns also reacts suboptimally with
|
||||||
|
unadorned Hunt-McIlroy diffs. A single word inserted in a
|
||||||
|
paragraph may push the final word or two of the line onto the
|
||||||
|
next line, and the entire rest of the paragraph may be flagged as
|
||||||
|
a change. Two diffs consisting of a single added word to
|
||||||
|
different sentences may be flagged as a conflict. In general,
|
||||||
|
prose should be diffed by sentence, not by line.
|
||||||
|
|
||||||
|
As far as the author is aware, clay is the first generalized,
|
||||||
|
type-aware revision control system. We'll go into the workings
|
||||||
|
of this system in some detail.
|
||||||
|
|
||||||
|
### Marks
|
||||||
|
|
||||||
|
Central to a typed filesystem is the idea of types. In clay, we
|
||||||
|
call these "marks". A mark is a file that defines a type,
|
||||||
|
conversion routines to and from the mark, and diff, patch, and
|
||||||
|
merge routines.
|
||||||
|
|
||||||
|
For example, a `%txt` mark may be a list of lines of text, and it
|
||||||
|
may include conversions to `%mime` to allow it to be serialized
|
||||||
|
and sent to a browswer or to the unix filesystem. It will also
|
||||||
|
include Hunt-McIlroy diff, patch, and merge algorithms.
|
||||||
|
|
||||||
|
A `%json` mark would be defined as a json object in the code, and
|
||||||
|
it would have a parser to convert from `%txt` and a printer to
|
||||||
|
convert back to `%txt`. The diff, patch, and merge algorithms are
|
||||||
|
fairly straightforward for json, though they're very different
|
||||||
|
from the text ones.
|
||||||
|
|
||||||
|
More formally, a mark is a core with three arms, `++grab`,
|
||||||
|
`++grow`, and `++grad`. In `++grab` is a series of functions to
|
||||||
|
convert from other marks to the given mark. In `++grow` is a
|
||||||
|
series of functions to convert from the given mark to other
|
||||||
|
marks. In `++grad` is `++diff`, `++pact`, `++join`, and `++mash`.
|
||||||
|
|
||||||
|
The types are as follows, in an informal pseudocode:
|
||||||
|
|
||||||
|
++ grab:
|
||||||
|
++ mime: <mime> -> <mark-type>
|
||||||
|
++ txt: <txt> -> <mark-type>
|
||||||
|
...
|
||||||
|
++ grow:
|
||||||
|
++ mime: <mark-type> -> <mime>
|
||||||
|
++ txt: <mark-type> -> <txt>
|
||||||
|
...
|
||||||
|
++ grad
|
||||||
|
++ diff: (<mark-type>, <mark-type>) -> <diff-type>
|
||||||
|
++ pact: (<mark-type>, <diff-type>) -> <mark-type>
|
||||||
|
++ join: (<diff-type>, <diff-type>) -> <diff-type> or NULL
|
||||||
|
++ mash: (<diff-type>, <diff-type>) -> <diff-type>
|
||||||
|
|
||||||
|
These types are basically what you would expect. Not every mark
|
||||||
|
has each of these functions defined -- all of them are optional
|
||||||
|
in the general case.
|
||||||
|
|
||||||
|
In general, for a particular mark, the `++grab` and `++grow` entries
|
||||||
|
(if they exist) should be inverses of each other.
|
||||||
|
|
||||||
|
In `++grad`, `++diff` takes two instances of a mark and produces
|
||||||
|
a diff of them. `++pact` takes an instance of a mark and patches
|
||||||
|
it with the given diff. `++join` takes two diffs and attempts to
|
||||||
|
merge them into a single diff. If there are conflicts, it
|
||||||
|
produces null. `++mash` takes two diffs and forces a merge,
|
||||||
|
annotating any conflicts.
|
||||||
|
|
||||||
|
In general, if `++diff` called with A and B produces diff D, then
|
||||||
|
`++pact` called with A and D should produce B. Also, if `++join`
|
||||||
|
of two diffs does not produce null, then `++mash` of the same
|
||||||
|
diffs should produce the same result.
|
||||||
|
|
||||||
|
Alternately, instead of `++diff`, `++pact`, `++join`, and
|
||||||
|
`++mash`, a mark can provide the same functionality by defining
|
||||||
|
`++sted` to be the name of another mark to which we wish to
|
||||||
|
delegate the revision control responsibilities. Then, before
|
||||||
|
running any of those functions, clay will convert to the other
|
||||||
|
mark, and convert back afterward. For example, the `%hoon` mark
|
||||||
|
is revision-controlled in the same way as `%txt`, so its `++grad`
|
||||||
|
is simply `++sted %txt`. Of course, `++txt` must be defined in
|
||||||
|
`++grow` and `++grab` as well.
|
||||||
|
|
||||||
|
Every file in clay has a mark, and that mark must have a
|
||||||
|
fully-functioning `++grad`. Marks are used for more than just
|
||||||
|
clay, and other marks don't need a `++grad`, but if a piece of
|
||||||
|
data is to be saved to clay, we must know how to revision-control
|
||||||
|
it.
|
||||||
|
|
||||||
|
Additionally, if a file is to be synced out to unix, then it must
|
||||||
|
have conversion routines to and from the `%mime` mark.
|
||||||
|
|
||||||
|
##Using clay
|
||||||
|
|
||||||
|
### Reading and Subscribing
|
||||||
|
|
||||||
|
When reading from Clay, there are three types of requests. A
|
||||||
|
`%sing` request asks for data at single revsion. A `%next`
|
||||||
|
request asks to be notified the next time there's a change to
|
||||||
|
given file. A `%many` request asks to be notified on every
|
||||||
|
change in a desk for a range of changes.
|
||||||
|
|
||||||
|
For `%sing` and `%next`, there are generally three things to be
|
||||||
|
queried. A `%u` request simply checks for the existence of a
|
||||||
|
file at a path. A `%x` request gets the data in the file at a
|
||||||
|
path. A `%y` request gets a hash of the data in the file at the
|
||||||
|
path combined with all its children and their data. Thus, `%y`
|
||||||
|
of a node changes if it or any of its children change.
|
||||||
|
|
||||||
|
A `%sing` request is fulfilled immediately if possible. If the
|
||||||
|
requested revision is in the future, or is on another ship for
|
||||||
|
which we don't have the result cached, we don't respond
|
||||||
|
immediately. If the requested revision is in the future, we wait
|
||||||
|
until the revision happens before we respond to the request. If
|
||||||
|
the request is for data on another ship, we pass on the request
|
||||||
|
to the other ship. In general, Clay subscriptions, like most
|
||||||
|
things in Urbit, aren't guaranteed to return immediately.
|
||||||
|
They'll return when they can, and they'll do so in a
|
||||||
|
referentially transparent manner.
|
||||||
|
|
||||||
|
A `%next` request checks query at the given revision, and it
|
||||||
|
produces the result of the query the next time it changes, along
|
||||||
|
with the revsion number when it changes. Thus, a `%next` of a
|
||||||
|
`%u` is triggered when a file is added or deleted, a `%next of a
|
||||||
|
`%x` is triggered when a file is added, deleted, or changed, and
|
||||||
|
a `%next` of a `%y` is triggered when a file or any of its
|
||||||
|
children is added, deleted, or changed.
|
||||||
|
|
||||||
|
A `%many` request is triggered every time the given desk has a
|
||||||
|
new revision. Unlike a `%next`, a `%many` has both a start and
|
||||||
|
an end revsion, after which it stops returning. For `%next`, a
|
||||||
|
single change is reported, and if the caller wishes to hear of
|
||||||
|
the next change, it must resubscribe. For `%many`, every revsion
|
||||||
|
from the start to the end triggers a response. Since a `%many`
|
||||||
|
request doesn't ask for any particular data, there aren't `%u`,
|
||||||
|
`%x`, and `%y` versions for it.
|
||||||
|
|
||||||
|
### Unix sync
|
||||||
|
|
||||||
|
One of the primary functions of clay is as a convenient user
|
||||||
|
interface. While tools exist to use clay from within urbit, it's
|
||||||
|
often useful to be able to treat clay like any other filesystem
|
||||||
|
from the Unix perspective -- to "mount" it, as it were.
|
||||||
|
|
||||||
|
From urbit, you can run `|mount /path/to/directory %mount-point`,
|
||||||
|
and this will mount the given clay directory to the mount-point
|
||||||
|
directory in Unix. Every file is converted to `%mime` before it's
|
||||||
|
written to Unix, and converted back when read from Unix. The
|
||||||
|
entire directory is watched (a la Dropbox), and every change is
|
||||||
|
auto-committed to clay.
|
||||||
|
|
||||||
|
### Merging
|
||||||
|
|
||||||
|
Merging is a fundamental operation for a distributed revision
|
||||||
|
control system. At their root, clay's merges are similar to
|
||||||
|
git's, but with some additions to accomodate typed data. There
|
||||||
|
are seven different merge strategies.
|
||||||
|
|
||||||
|
Throughout our discussion, we'll say that the merge is from
|
||||||
|
Alice's desk to Bob's. Recall that a commit is a date (for all
|
||||||
|
new commits this will be the current date), a list of parents,
|
||||||
|
and the data itself.
|
||||||
|
|
||||||
|
A `%init` merge should be used iff it's the first commit to a
|
||||||
|
desk. The head of Alice's desk is used as the number 1 commit to
|
||||||
|
Bob's desk. Obviously, the ancestry remains intact through
|
||||||
|
traversing the parentage of the commit even though previous
|
||||||
|
commits are not numbered for Bob's desk.
|
||||||
|
|
||||||
|
A `%this` merge means to keep what's in Bob's desk, but join the
|
||||||
|
ancestry. Thus, the new commit has the head of each desk as
|
||||||
|
parents, but the data is exactly what's in Bob's desk. For those
|
||||||
|
following along in git, this is the 'ours' merge strategy, not
|
||||||
|
the '--ours' option to the 'recursive' merge strategy. In other
|
||||||
|
words, even if Alice makes a change that does not conflict with
|
||||||
|
Bob, we throw it away. It's Bob's way or the highway.
|
||||||
|
|
||||||
|
A `%that` merge means to take what's in Alice's desk, but join
|
||||||
|
the ancestry. This is the reverse of `%this`.
|
||||||
|
|
||||||
|
A `%fine` merge is a "fast-forward" merge. This succeeds iff one
|
||||||
|
head is in the ancestry of the other. In this case, we use the
|
||||||
|
descendant as our new head.
|
||||||
|
|
||||||
|
For `%meet`, `%mate`, and `%meld` merges, we first find the most
|
||||||
|
recent common ancestor to use as our merge base. If we have no
|
||||||
|
common ancestors, then we fail. If we have more than one most
|
||||||
|
recent common ancestor, then we have a criss-cross situation,
|
||||||
|
which should be handled delicately. At present, we delicately
|
||||||
|
throw up our hands and give up, but something akin to git's
|
||||||
|
'recursive' strategy should be implemented in the future.
|
||||||
|
|
||||||
|
There's a functional inclusion ordering on `%fine`, `%meet`,
|
||||||
|
`%mate`, and `%meld` such that if an earlier strategy would have
|
||||||
|
succeeded, then every later strategy will produce the same
|
||||||
|
result. Put another way, every earlier strategy is the same as
|
||||||
|
every later strategy except with a restricted domain.
|
||||||
|
|
||||||
|
A `%meet` merge only succeeds if the changes from the merge base
|
||||||
|
to Alice's head (hereafter, "Alice's changes") are in different
|
||||||
|
files than Bob's changes. In this case, the parents are both
|
||||||
|
Alice's and Bob's heads, and the data is the merge base plus
|
||||||
|
Alice's changed files plus Bob's changed files.
|
||||||
|
|
||||||
|
A `%mate` merge attempts to merge changes to the same file when
|
||||||
|
both Alice and bob change it. If the merge is clean, we use it;
|
||||||
|
otherwise, we fail. A merge between different types of changes --
|
||||||
|
for example, deleting a file vs changing it -- is always a
|
||||||
|
conflict. If we succeed, the parents are both Alice's and Bob's
|
||||||
|
heads, and the data is the merge base plus Alice's changed files
|
||||||
|
plus Bob's changed files plus the merged files.
|
||||||
|
|
||||||
|
A `%meld` merge will succeed even if there are conflicts. If
|
||||||
|
there are conflicts in a file, then we use the merge base's
|
||||||
|
version of that file, and we produce a set of files with
|
||||||
|
conflicts. The parents are both Alice's and Bob's heads, and the
|
||||||
|
data is the merge base plus Alice's changed files plus Bob's
|
||||||
|
changed files plus the successfully merged files plus the merge
|
||||||
|
base's version of the conflicting files.
|
||||||
|
|
||||||
|
That's the extent of the merge options in clay proper. In
|
||||||
|
userspace there's a final option `%auto`, which is the most
|
||||||
|
common. `%auto` checks to see if Bob's desk exists, and if it
|
||||||
|
doesn't we use a `%init` merge. Otherwise, we progressively try
|
||||||
|
`%fine`, `%meet`, and `%mate` until one succeeds.
|
||||||
|
|
||||||
|
If none succeed, we merge Bob's desk into a scratch desk. Then,
|
||||||
|
we merge Alice's desk into the scratch desk with the `%meld`
|
||||||
|
option to force the merge. For each file in the produced set of
|
||||||
|
conflicting files, we call the `++mash` function for the
|
||||||
|
appropriate mark, which annotates the conflicts if we know how.
|
||||||
|
|
||||||
|
Finally, we display a message to the user informing them of the
|
||||||
|
scratch desk's existence, which files have annotated conflicts,
|
||||||
|
and which files have unannotated conflicts. When the user has
|
||||||
|
resolved the conflicts, they can merge the scratch desk back into
|
||||||
|
Bob's desk. This will be a `%fine` merge since Bob's head is in
|
||||||
|
the ancestry of the scratch desk.
|
||||||
|
|
||||||
|
### Autosync
|
||||||
|
|
||||||
|
Tracking and staying in sync with another desk is another
|
||||||
|
fundamental operation. We call this "autosync". This doesn't mean
|
||||||
|
simply mirroring a desk, since that wouldn't allow local changes.
|
||||||
|
We simply want to apply changes as they are made upstream, as
|
||||||
|
long as there are no conflicts with local changes.
|
||||||
|
|
||||||
|
This is implemented by watching the other desk, and, when it has
|
||||||
|
changes, merging these changes into our desk with the usual merge
|
||||||
|
strategies.
|
||||||
|
|
||||||
|
Note that it's quite reasonable for two desks to be autosynced to
|
||||||
|
each other. This results in any change on one desk being mirrored
|
||||||
|
to the other and vice versa.
|
||||||
|
|
||||||
|
Additionally, it's fine to set up an autosync even if one desk,
|
||||||
|
the other desk, or both desks do not exist. The sync will be
|
||||||
|
activated when the upstream desk comes into existence and will
|
||||||
|
create the downstream desk if needed.
|
Loading…
Reference in New Issue
Block a user