added clay architecture doc

2025-01-02 03:52:13 +03:00 · 2015-09-18 16:45:09 -04:00 · 2015-09-18 16:45:09 -04:00 · 0bfa034edb
commit 0bfa034edb
parent 646fa7cf14
1 changed files with 415 additions and 0 deletions
--- a/pub/doc/arvo/clay/architecture.md
+++ b/pub/doc/arvo/clay/architecture.md
@ -0,0 +1,415 @@
+# clay
+
+## high-level
+
+clay is the primary filesystem for the arvo operating system,
+which is the core of an urbit. The architecture of clay is
+intrinsically connected with arvo, but we assume no knowledge of
+either arvo or urbit. We will point out only those features of
+arvo that are necessary for an understanding of clay, and we will
+do so only when they arise.
+
+The first relevant feature of arvo is that it is a deterministic
+system where input and output are defined as a series of events
+and effects. The state of arvo is simply a function of its event
+log. None of the effects from an event are emitted until the
+event is entered in the log and persisted, either to disk or
+another trusted source of persistence, such as a Kafka cluster.
+Consequently, arvo is a single-level store: everything in its
+state is persistent. 
+
+In a more traditional OS, everything in RAM can be erased at any
+time by power failure, and is always erased on reboot. Thus, a
+primary purpose of a filesystem is to ensure files persist across
+power failures and reboots.  In arvo, both power failures and
+reboots are special cases of suspending computation, which is
+done safely since our event log is already persistent. Therefore,
+clay is not needed in arvo for persistence. Why, then, do we have a
+filesystem? There are two answers to this question.
+
+First, clay provides a filesystem tree, which is a convenient
+user interface for some applications. Unix has the useful concept
+of virtual filesystems, which are used for everything from direct
+access to devices, to random number generators, to the /proc
+tree. It is easy and intuitive to read from and write to a
+filesystem tree.
+
+Second, clay has a distributed revision control system baked into
+it.  Traditional filesystems are not revision controlled, so
+userspace software -- such as git -- is written on top of them to
+do so. clay natively provides the same functionality as modern
+DVCSes, and more.
+
+clay has two other unique properties that we'll cover later on:
+it supports typed data and is referentially transparent. 
+
+### Revision Control
+
+Every urbit has one or more "desks", which are independently
+revision-controlled branches. Each desk contains its own mark
+definitions, apps, doc, and so forth.
+
+Traditionally, an urbit has at least a base and a home desk. The
+base desk has all the system software from the distribution. the
+home desk is a fork of base with all the stuff specific to the
+user of the urbit.
+
+A desk is a series of numbered commits, the most recent of which
+represents the current state of the desk. A commit is composed of
+(1) an absolute time when it was created, (2) a list of zero or
+more parents, and (3) a map from paths to data.
+
+Most commits have exactly one parent, but the initial commit on a
+desk may have zero parents, and merge commits have more than one
+parent.
+
+The non-meta data is stored in the map of paths to data. It's
+worth noting that no constraints are put on this map, so, for
+example, both /a/b and /a/b/c could have data. This is impossible
+in a traditional Unix filesystem since it means that /a/b is both
+a file and a directory. Conventionally, the final element in the
+path is its mark -- much like a filename extension in Unix. Thus,
+/doc/readme.md in Unix is stored as /doc/readme/md in urbit.
+
+The data is not stored directly in the map; rather, a hash of the
+data is stored, and we maintain a master blob store. Thus, if the
+same data is referred to in multiple commits (as, for example,
+when a file doesn't change between commits), only the hash is
+duplicated.
+
+In the master blob store, we either store the data directly, or
+else we store a diff against another blob. The hash is dependent
+only on the data within and not on whether or not it's stored
+directly, so we may on occasion rearrange the contents of the
+blob store for performance reasons.
+
+Recall that a desk is a series of numbered commits. Not every
+commit in a desk must be numbered. For example, if the base desk
+has had 50 commits since home was forked from it, then a merge
+from base to home will only add a single revision number to home,
+although the full commit history will be accessible by traversing
+the parentage of the individual commits.
+
+We do guarantee that the first commit is numbered 1, commits are
+numbered consecutively after that (i.e. there are no "holes"),
+the topmost commit is always numbered, and every numbered commit
+is an ancestor of every later numbered commit.
+
+There are three ways to refer to particular commits in the
+revision history.  Firstly, one can use the revision number.
+Secondly, one can use any absolute time between the one numbered
+commit and the next (inclusive of the first, exclusive of the
+second). Thirdly, every desk has a map of labels to revision
+numbers. These labels may be used to refer to specific commits.
+
+Additionally, clay is a global filesystem, so data on other urbit
+is easily accessible the same way as data on our local urbit.  In
+general, the path to a particular revision of a desk is
+/~urbit-name/desk-name/revision.  Thus, to get /try/readme/md
+from revision 5 of the home desk on ~sampel-sipnym, we refer to
+/~sampel-sipnym/home/5/try/readme/md.  Clay's namespace is thus
+global and referentially transparent.
+
+XXX reactivity here?
+
+### A Typed Filesystem
+
+Since clay is a general filesystem for storing data of arbitrary
+types, in order to revision control correctly it needs to be
+aware of types all the way through.  Traditional revision control
+does an excellent job of handling source code, so for source code
+we act very similar to traditional revision control. The
+challenge is to handle other data similarly well.
+
+For example, modern VCSs generally support "binary files", which
+are files for which the standard textual diffing, patching, and
+merging algorithms are not helpful. A "diff" of two binary files
+is just a pair of the files, "patching" this diff is just
+replacing the old file with the new one, and "merging"
+non-identical diffs is always a conflict, which can't even be
+helpfully annotated. Without knowing anything about the structure
+of a blob of data, this is the best we can do.
+
+Often, though, "binary" files have some internal structure, and
+it is possible to create diff, patch, and merge algorithms that
+take advantage of this structure. An image may be the result of a
+base image with some set of operations applied. With algorithms
+aware of this set of operations, not only can revision control
+software save space by not having to save every revision of the
+image individually, these transformations can be made on parallel
+branches and merged at will.
+
+Suppose Alice is tasked with touching up a picture, improving the
+color balance, adjusting the contrast, and so forth, while Bob
+has the job of cropping the picture to fit where it's needed and
+adding textual overlay.  Without type-aware revision control,
+these changes must be made serially, requiring Alice and Bob to
+explicitly coordinate their efforts. With type-aware revision
+control, these operations may be performed in parallel, and then
+the two changesets can be merged programmatically.
+
+Of course, even some kinds of text files may be better served by
+diff, patch, and merge algorithms aware of the structure of the
+files. Consider a file containing a pretty-printed JSON object.
+Small changes in the JSON object may result in rather significant
+changes in how the object is pretty-printed (for example, by
+addding an indentation level, splitting a single line into
+multiple lines).
+
+A text file wrapped at 80 columns also reacts suboptimally with
+unadorned Hunt-McIlroy diffs. A single word inserted in a
+paragraph may push the final word or two of the line onto the
+next line, and the entire rest of the paragraph may be flagged as
+a change. Two diffs consisting of a single added word to
+different sentences may be flagged as a conflict. In general,
+prose should be diffed by sentence, not by line.
+
+As far as the author is aware, clay is the first generalized,
+type-aware revision control system.  We'll go into the workings
+of this system in some detail.
+
+### Marks
+
+Central to a typed filesystem is the idea of types. In clay, we
+call these "marks". A mark is a file that defines a type,
+conversion routines to and from the mark, and diff, patch, and
+merge routines.
+
+For example, a `%txt` mark may be a list of lines of text, and it
+may include conversions to `%mime` to allow it to be serialized
+and sent to a browswer or to the unix filesystem. It will also
+include Hunt-McIlroy diff, patch, and merge algorithms.
+
+A `%json` mark would be defined as a json object in the code, and
+it would have a parser to convert from `%txt` and a printer to
+convert back to `%txt`. The diff, patch, and merge algorithms are
+fairly straightforward for json, though they're very different
+from the text ones.
+
+More formally, a mark is a core with three arms, `++grab`,
+`++grow`, and `++grad`. In `++grab` is a series of functions to
+convert from other marks to the given mark.  In `++grow` is a
+series of functions to convert from the given mark to other
+marks. In `++grad` is `++diff`, `++pact`, `++join`, and `++mash`.
+
+The types are as follows, in an informal pseudocode:
+
+    ++  grab:
+      ++  mime: <mime> -> <mark-type>
+      ++  txt: <txt> -> <mark-type>
+      ...
+    ++  grow:
+      ++  mime: <mark-type> -> <mime>
+      ++  txt: <mark-type> -> <txt>
+      ...
+    ++  grad
+      ++  diff: (<mark-type>, <mark-type>) -> <diff-type>
+      ++  pact: (<mark-type>, <diff-type>) -> <mark-type>
+      ++  join: (<diff-type>, <diff-type>) -> <diff-type> or NULL
+      ++  mash: (<diff-type>, <diff-type>) -> <diff-type>
+
+These types are basically what you would expect. Not every mark
+has each of these functions defined -- all of them are optional
+in the general case.
+
+In general, for a particular mark, the `++grab` and `++grow` entries
+(if they exist) should be inverses of each other.
+
+In `++grad`, `++diff` takes two instances of a mark and produces
+a diff of them. `++pact` takes an instance of a mark and patches
+it with the given diff. `++join` takes two diffs and attempts to
+merge them into a single diff. If there are conflicts, it
+produces null. `++mash` takes two diffs and forces a merge,
+annotating any conflicts.
+
+In general, if `++diff` called with A and B produces diff D, then
+`++pact` called with A and D should produce B. Also, if `++join`
+of two diffs does not produce null, then `++mash` of the same
+diffs should produce the same result.
+
+Alternately, instead of `++diff`, `++pact`, `++join`, and
+`++mash`, a mark can provide the same functionality by defining
+`++sted` to be the name of another mark to which we wish to
+delegate the revision control responsibilities. Then, before
+running any of those functions, clay will convert to the other
+mark, and convert back afterward. For example, the `%hoon` mark
+is revision-controlled in the same way as `%txt`, so its `++grad`
+is simply `++sted %txt`. Of course, `++txt` must be defined in
+`++grow` and `++grab` as well.
+
+Every file in clay has a mark, and that mark must have a
+fully-functioning `++grad`. Marks are used for more than just
+clay, and other marks don't need a `++grad`, but if a piece of
+data is to be saved to clay, we must know how to revision-control
+it.
+
+Additionally, if a file is to be synced out to unix, then it must
+have conversion routines to and from the `%mime` mark.
+
+##Using clay
+
+### Reading and Subscribing
+
+When reading from Clay, there are three types of requests.  A
+`%sing` request asks for data at single revsion.  A `%next`
+request asks to be notified the next time there's a change to
+given file.  A `%many` request asks to be notified on every
+change in a desk for a range of changes.
+
+For `%sing` and `%next`, there are generally three things to be
+queried.  A `%u` request simply checks for the existence of a
+file at a path.  A `%x` request gets the data in the file at a
+path.  A `%y` request gets a hash of the data in the file at the
+path combined with all its children and their data.  Thus, `%y`
+of a node changes if it or any of its children change.
+
+A `%sing` request is fulfilled immediately if possible.  If the
+requested revision is in the future, or is on another ship for
+which we don't have the result cached, we don't respond
+immediately.  If the requested revision is in the future, we wait
+until the revision happens before we respond to the request.  If
+the request is for data on another ship, we pass on the request
+to the other ship.  In general, Clay subscriptions, like most
+things in Urbit, aren't guaranteed to return immediately.
+They'll return when they can, and they'll do so in a
+referentially transparent manner.
+
+A `%next` request checks query at the given revision, and it
+produces the result of the query the next time it changes, along
+with the revsion number when it changes.  Thus, a `%next` of a
+`%u` is triggered when a file is added or deleted, a `%next of a
+`%x` is triggered when a file is added, deleted, or changed, and
+a `%next` of a `%y` is triggered when a file or any of its
+children is added, deleted, or changed.
+
+A `%many` request is triggered every time the given desk has a
+new revision.  Unlike a `%next`, a `%many` has both a start and
+an end revsion, after which it stops returning.  For `%next`, a
+single change is reported, and if the caller wishes to hear of
+the next change, it must resubscribe.  For `%many`, every revsion
+from the start to the end triggers a response.  Since a `%many`
+request doesn't ask for any particular data, there aren't `%u`,
+`%x`, and `%y` versions for it.
+
+### Unix sync
+
+One of the primary functions of clay is as a convenient user
+interface. While tools exist to use clay from within urbit, it's
+often useful to be able to treat clay like any other filesystem
+from the Unix perspective -- to "mount" it, as it were.
+
+From urbit, you can run `|mount /path/to/directory %mount-point`,
+and this will mount the given clay directory to the mount-point
+directory in Unix. Every file is converted to `%mime` before it's
+written to Unix, and converted back when read from Unix. The
+entire directory is watched (a la Dropbox), and every change is
+auto-committed to clay.
+
+### Merging
+
+Merging is a fundamental operation for a distributed revision
+control system. At their root, clay's merges are similar to
+git's, but with some additions to accomodate typed data. There
+are seven different merge strategies.
+
+Throughout our discussion, we'll say that the merge is from
+Alice's desk to Bob's. Recall that a commit is a date (for all
+new commits this will be the current date), a list of parents,
+and the data itself.
+
+A `%init` merge should be used iff it's the first commit to a
+desk.  The head of Alice's desk is used as the number 1 commit to
+Bob's desk. Obviously, the ancestry remains intact through
+traversing the parentage of the commit even though previous
+commits are not numbered for Bob's desk.
+
+A `%this` merge means to keep what's in Bob's desk, but join the
+ancestry. Thus, the new commit has the head of each desk as
+parents, but the data is exactly what's in Bob's desk. For those
+following along in git, this is the 'ours' merge strategy, not
+the '--ours' option to the 'recursive' merge strategy. In other
+words, even if Alice makes a change that does not conflict with
+Bob, we throw it away. It's Bob's way or the highway.
+
+A `%that` merge means to take what's in Alice's desk, but join
+the ancestry. This is the reverse of `%this`.
+
+A `%fine` merge is a "fast-forward" merge. This succeeds iff one
+head is in the ancestry of the other. In this case, we use the
+descendant as our new head.
+
+For `%meet`, `%mate`, and `%meld` merges, we first find the most
+recent common ancestor to use as our merge base. If we have no
+common ancestors, then we fail. If we have more than one most
+recent common ancestor, then we have a criss-cross situation,
+which should be handled delicately. At present, we delicately
+throw up our hands and give up, but something akin to git's
+'recursive' strategy should be implemented in the future.
+
+There's a functional inclusion ordering on `%fine`, `%meet`,
+`%mate`, and `%meld` such that if an earlier strategy would have
+succeeded, then every later strategy will produce the same
+result. Put another way, every earlier strategy is the same as
+every later strategy except with a restricted domain.
+
+A `%meet` merge only succeeds if the changes from the merge base
+to Alice's head (hereafter, "Alice's changes") are in different
+files than Bob's changes. In this case, the parents are both
+Alice's and Bob's heads, and the data is the merge base plus
+Alice's changed files plus Bob's changed files.
+
+A `%mate` merge attempts to merge changes to the same file when
+both Alice and bob change it. If the merge is clean, we use it;
+otherwise, we fail. A merge between different types of changes --
+for example, deleting a file vs changing it -- is always a
+conflict. If we succeed, the parents are both Alice's and Bob's
+heads, and the data is the merge base plus Alice's changed files
+plus Bob's changed files plus the merged files.
+
+A `%meld` merge will succeed even if there are conflicts. If
+there are conflicts in a file, then we use the merge base's
+version of that file, and we produce a set of files with
+conflicts. The parents are both Alice's and Bob's heads, and the
+data is the merge base plus Alice's changed files plus Bob's
+changed files plus the successfully merged files plus the merge
+base's version of the conflicting files.
+
+That's the extent of the merge options in clay proper. In
+userspace there's a final option `%auto`, which is the most
+common.  `%auto` checks to see if Bob's desk exists, and if it
+doesn't we use a `%init` merge. Otherwise, we progressively try
+`%fine`, `%meet`, and `%mate` until one succeeds.
+
+If none succeed, we merge Bob's desk into a scratch desk.  Then,
+we merge Alice's desk into the scratch desk with the `%meld`
+option to force the merge. For each file in the produced set of
+conflicting files, we call the `++mash` function for the
+appropriate mark, which annotates the conflicts if we know how.
+
+Finally, we display a message to the user informing them of the
+scratch desk's existence, which files have annotated conflicts,
+and which files have unannotated conflicts. When the user has
+resolved the conflicts, they can merge the scratch desk back into
+Bob's desk. This will be a `%fine` merge since Bob's head is in
+the ancestry of the scratch desk.
+
+### Autosync
+
+Tracking and staying in sync with another desk is another
+fundamental operation. We call this "autosync". This doesn't mean
+simply mirroring a desk, since that wouldn't allow local changes.
+We simply want to apply changes as they are made upstream, as
+long as there are no conflicts with local changes.
+
+This is implemented by watching the other desk, and, when it has
+changes, merging these changes into our desk with the usual merge
+strategies.
+
+Note that it's quite reasonable for two desks to be autosynced to
+each other. This results in any change on one desk being mirrored
+to the other and vice versa.
+
+Additionally, it's fine to set up an autosync even if one desk,
+the other desk, or both desks do not exist. The sync will be
+activated when the upstream desk comes into existence and will
+create the downstream desk if needed.