shrub/pub/doc/arvo/clay/commentary.md
2015-06-19 17:16:48 -04:00

83 KiB

%clay commentary

%clay is our filesystem.

The first part of this will be reference documentation for the data types used by our filesystem. In fact, as a general guide, we recommend reading and attempting to understand the data structures used in any Hoon code before you try to read the code itself. Although complete understanding of the data structures is impossible without seeing them used in the code, an 80% understanding greatly clarifies the code. As another general guide, when reading Hoon, it rarely pays off to understand every line of code when it appears. Try to get the gist of it, and then move on. The next time you come back to it, it'll likely make a lot more sense.

After a description of the data models, we'll give an overview of the interface that vanes and applications can use to interact with the filesystem.

Finally, we'll dive into the code and the algorithms themselves. You know, the fun part.

Data Models

As you're reading through this section, remember you can always come back to this when you run into these types later on. You're not going to remember everything the first time through, but it is worth reading, or at least skimming, this so that you get a rough idea of how our state is organized.

The types that are certainly worth reading are ++raft, ++room, ++dome, ++ankh, ++rung, ++rang, ++blob, ++yaki, and ++nori (possibly in that order). All in all, though, this section isn't too long, so many readers may wish to quickly read through all of it. If you get bored, though, just skip to the next section. You can always come back when you need to.

++raft, formal state

++  raft                                                ::  filesystem
          $:  fat=(map ship room)                       ::  domestic
              hoy=(map ship rung)                       ::  foreign
              ran=rang                                  ::  hashes
          ==                                            ::

This is the state of our vane. Anything that must be remembered between calls to clay is stored in this state.

fat is the set of domestic servers. This stores all the information that is specfic to a particular ship on this pier. The keys to this map are the ships on the current pier. all the information that is specific to a particular foreign ship. The keys to this map are all the ships whose filesystems we have attempted to access through clay.

ran is the store of all commits and deltas, keyed by hash. The is where all the "real" data we know is stored; the rest is "just bookkeeping".

++room, filesystem per domestic ship

++  room                                                ::  fs per ship
          $:  hun=duct                                  ::  terminal duct
              hez=(unit duct)                           ::  sync duch
              dos=(map desk dojo)                       ::  native desk
          ==                                            ::

This is the representation of the filesystem of a ship on our pier.

hun is the duct we use to send messages to dill to display notifications of filesystem changes. Only %note gifts should be produced along this duct. This is set by the %init kiss.

hez, if present, is the duct we use to send sync messages to unix so that they end up in the pier unix directory. Only %ergo gifts should be producd along this duct. This is set by %into and %invo kisses.

dos is a well-known operating system released in 1981. It is also the set of desks on this ship, mapped to their data.

++desk, filesystem branch

++  desk  ,@tas                                         ::  ship desk case spur

This is the name of a branch of the filesystem. The default desks are "arvo", "main", and "try". More may be created by simply referencing them. Desks have independent histories and states, and they may be merged into each other.

++dojo, domestic desk state

++  dojo  ,[p=cult q=dome]                              ::  domestic desk state

This is the all the data that is specific to a particular desk on a domestic ship. p is the set of subscribers to this desk and q is the data in the desk.

++cult, subscriptions

++  cult  (map duct rave)                               ::  subscriptions

This is the set of subscriptions to a particular desk. The keys are the ducts from where the subscriptions requests came. The results will be produced along these ducts. The values are a description of the requested information.

++rave, general subscription request

++  rave                                                ::  general request
          $%  [& p=mood]                                ::  single request
              [| p=moat]                                ::  change range
          ==                                            ::

This represents a subscription request for a desk. The request can be for either a single item in the desk or else for a range of changes on the desk.

++rove, stored general subscription request

++  rove  (each mood moot)                              ::  stored request

When we store a request, we store subscriptions with a little extra information so that we can determine whether new versions actually affect the path we're subscribed to.

++mood, single subscription request

++  mood  ,[p=care q=case r=path]                       ::  request in desk

This represents a request for the state of the desk at a particular commit, specfied by q. p specifies what kind of information is desired, and r specifies the path we are requesting.

++moat, range subscription request

++  moat  ,[p=case q=case r=path]                       ::  change range

This represents a request for all changes between p and q on path r. You will be notified when a change is made to the node referenced by the path or to any of its children.

++moot, stored range subscription request

++  moot  ,[p=case q=case r=path s=(map path lobe)]     ::

This is just a ++moat plus a map of paths to lobes. This map represents the data at the node referenced by the path at case p, if we've gotten to that case (else null). We only send a notification along the subscription if the data at a new revision is different than it was.

++care, clay submode

++  care  ?(%u %v %w %x %y %z)                          ::  clay submode

This specifies what type of information is requested in a subscription or a scry.

%u requests the ++rang at the current moment. Because this information is not stored for any moment other than the present, we crash if the ++case is not a %da for now.

%v requests the ++dome at the specified commit.

%w requests the revsion number of the desk.

%x requests the file at a specified path at the specified commit. If there is no node at that path or if the node has no contents (that is, if q:ankh is null), then this produces null.

%y requests a ++arch of the specfied commit at the specified path.

%z requests the ++ankh of the specified commit at the specfied path.

++arch, shallow filesystem node

++  arch  ,[p=@uvI q=(unit ,@uvI) r=(map ,@ta ,~)]      ::  fundamental node

This is analogous to ++ankh except that the we have neither our contents nor the ankhs of our children. The other fields are exactly the same, so p is a hash of the associated ankh, u.q, if it exists, is a hash of the contents of this node, and the keys of r are the names of our children. r is a map to null rather than a set so that the ordering of the map will be equivalent to that of r:ankh, allowing efficient conversion.

++case, specifying a commit

++  case                                                ::  ship desk case spur
          $%  [%da p=@da]                               ::  date
              [%tas p=@tas]                             ::  label
              [%ud p=@ud]                               ::  number
          ==                                            ::

A commit can be referred to in three ways: %da refers to the commit that was at the head on date p, %tas refers to the commit labeled p, and %ud refers to the commit numbered p. Note that since these all can be reduced down to a %ud, only numbered commits may be referenced with a ++case.

++dome, desk data

++  dome                                                ::  project state
          $:  ang=agon                                  ::  pedigree
              ank=ankh                                  ::  state
              let=@ud                                   ::  top id
              hit=(map ,@ud tako)                       ::  changes by id
              lab=(map ,@tas ,@ud)                      ::  labels
          ==                                            ::

This is the data that is actually stored in a desk.

ang is unused and should be removed.

ank is the current state of the desk. Thus, it is the state of the filesystem at revison let. The head of a desk is always a numbered commit.

let is the number of the most recently numbered commit. This is also the total number of numbered commits.

hit is a map of numerical ids to hashes of commits. These hashes are mapped into their associated commits in hut:rang. In general, the keys of this map are exactly the numbers from 1 to let, with no gaps. Of course, when there are no numbered commits, let is 0, so hit is null. Additionally, each of the commits is an ancestor of every commit numbered greater than this one. Thus, each is a descendant of every commit numbered less than this one. Since it is true that the date in each commit (t:yaki) is no earlier than that of each of its parents, the numbered commits are totally ordered in the same way by both pedigree and date. Of course, not every commit is numbered. If that sounds too complicated to you, don't worry about it. It basically behaves exactly as you would expect.

lab is a map of textual labels to numbered commits. Note that labels can only be applied to numbered commits. Labels must be unique across a desk.

++ankh, filesystem node

++  ankh                                                ::  fs node (new)
          $:  p=cash                                    ::  recursive hash
              q=(unit ,[p=cash q=*])                    ::  file
              r=(map ,@ta ankh)                         ::  folders
          ==                                            ::

This is a single node in the filesystem. This may be file or a directory or both. In earth filesystems, a node is a file xor a directory. On mars, we're inclusive, so a node is a file ior a directory.

p is a recursive hash that depends on the contents of the this file or directory and on any children.

q is the contents of this file, if any. p.q is a hash of the contents while q.q is the data itself.

r is the set of children of this node. In the case of a pure file, this is empty. The keys are the names of the children and the values are, recursively, the nodes themselves.

++cash, ankh hash

++  cash  ,@uvH                                         ::  ankh hash

This is a 128-bit hash of an ankh. These are mostly stored within ankhs themselves, and they are used to check for changes in possibly-deep hierarchies.

++rung, filesystem per neighbor ship

++  rung  $:  rus=(map desk rede)                       ::  neighbor desks
          ==                                            ::

This is the filesystem of a neighbor ship. The keys to this map are all the desks we know about on their ship.

++rede, desk state

++  rede                                                ::  universal project
          $:  lim=@da                                   ::  complete to
              qyx=cult                                  ::  subscribers
              ref=(unit rind)                           ::  outgoing requests
              dom=dome                                  ::  revision state
          ==                                            ::

This is our knowledge of the state of a desk, either foreign or domestic.

lim is the date of the last full update. We only respond to requests for stuff before this time.

qyx is the list of subscribers to this desk. For domestic desks, this is simply p:dojo, all subscribers to the desk, while in foreign desks this is all the subscribers from our ship to the foreign desk.

ref is the request manager for the desk. For domestic desks, this is null since we handle requests ourselves.

dom is the actual data in the desk.

++rind, request manager

++  rind                                                ::  request manager
          $:  nix=@ud                                   ::  request index
              bom=(map ,@ud ,[p=duct q=rave])           ::  outstanding
              fod=(map duct ,@ud)                       ::  current requests
              haw=(map mood (unit))                     ::  simple cache
          ==                                            ::

This is the request manager for a foreign desk.

nix is one more than the index of the most recent request. Thus, it is the next available request number.

bom is the set of outstanding requests. The keys of this map are some subset of the numbers between 0 and one less than nix. The members of the map are exactly those requests that have not yet been fully satisfied.

fod is the same set as bom, but from a different perspective. In particular, the values of fod are the same as the values of bom, and the p out of the values of bom are the same as the keys of fod. Thus, we can map ducts to their associated request number and ++rave, and we can map numbers to their associated duct and ++rave.

haw is a map from simple requests to their values. This acts as a cache for requests that have already been made. Thus, the second request for a particular ++mood is nearly instantaneous.

++rang, data store

++  rang  $:  hut=(map tako yaki)                       ::
              lat=(map lobe blob)                       ::
          ==                                            ::

This is a set of data keyed by hash. Thus, this is where the "real" data is stored, but it is only meaningful if we know the hash of what we're looking for.

hut is a map from hashes to commits. We often get the hashes from hit:dome, which keys them by logical id. Not every commit has an id.

lat is a map from hashes to the actual data. We often get the hashes from a ++yaki, a commit, which references this map to get the data. There is no ++blob in any ++yaki. They are only accessible through this map.

++tako, commit reference

++  tako  ,@                                            ::  yaki ref

This is a hash of a ++yaki, a commit. These are most notably used as the keys in hut:rang, where they are associated with the actual ++yaki, and as the values in hit:dome, where sequential ids are associated with these.

++yaki, commit

++  yaki  ,[p=(list tako) q=(map path lobe) r=tako t=@da] ::  commit

This is a single commit.

p is a list of the hashes of the parents of this commit. In most cases, this will be a single commit, but in a merge there may be more parents. In theory, there may be an arbitrary number of parents, but in practice merges have exactly two parents. This may change in the future. For commit 1, there is no parent.

q is a map of the paths on a desk to the data at that location. If you understand what a ++lobe and a ++blob is, then the type signature here tells the whole story.

r is the hash associated with this commit.

t is the date at which this commit was made.

++lobe, data reference

++  lobe  ,@                                            ::  blob ref

This is a hash of a ++blob. These are most notably used in lat:rang, where they are associated with the actual ++blob, and as the values in q:yaki, where paths are associated with their data in a commit.

++blob, data

++  blob  $%  [%delta p=lobe q=lobe r=udon]             ::  delta on q
              [%direct p=lobe q=* r=umph]               ::
              [%indirect p=lobe q=* r=udon s=lobe]      ::
          ==                                            ::

This is a node of data. In every case, p is the hash of the blob.

%delta is the case where we define the data by a delta on other data. In practice, the other data is always the previous commit, but nothing depends on this. q is the hash of the parent blob, and r is the delta.

%direct is the case where we simply have the data directly. q is the data itself, and r is any preprocessing instructions. These almost always come from the creation of a file.

%indirect is both of the preceding cases at once. q is the direct data, r is the delta, and s is the parent blob. It should always be the case that applying r to s gives the same data as q directly (with the prepreprocessor instructions in p.r). This exists purely for performance reasons. This is unused, at the moment, but in general these should be created when there are a long line of changes so that we do not have to traverse the delta chain back to the creation of the file.

++udon, abstract delta

++  udon                                                ::  abstract delta
          $:  p=umph                                    ::  preprocessor
              $=  q                                     ::  patch
              $%  [%a p=* q=*]                          ::  trivial replace
                  [%b p=udal]                           ::  atomic indel
                  [%c p=(urge)]                         ::  list indel
                  [%d p=upas q=upas]                    ::  tree edit
              ==                                        ::
          ==                                            ::

This is an abstract change to a file. This is a superset of what would normally be called diffs. Diffs usually refer to changes in lines of text while we have the ability to do more interesting deltas on arbitrary data structures.

p is any preprocessor instructions.

%a refers to the trival delta of a complete replace of old data with new data.

%b refers to changes in an opaque atom on the block level. This has very limited usefulness, and is not used at the moment.

%c refers to changes in a list of data. This is often lines of text, which is your classic diff. We, however, will work on any list of data.

%d refers to changes in a tree of data. This is general enough to describe changes to any hoon noun, but often more special-purpose delta should be created for different content types. This is not used at the moment, and may in fact be unimplemented.

++urge, list change

++  urge  |*(a=_,* (list (unce a)))                     ::  list change

This is a parametrized type for list changes. For example, (urge ,@t) is a list change for lines of text.

++unce, change part of a list.

++  unce  |*  a=_,*                                     ::  change part
          $%  [%& p=@ud]                                ::  skip[copy]
              [%| p=(list a) q=(list a)]                ::  p -> q[chunk]
          ==                                            ::  

This is a single change in a list of elements of type a. For example, (unce ,@t) is a single change in a lines of text.

%& means the next p lines are unchanged.

%| means the lines p have changed to q.

++umph, preprocessing information

++  umph                                                ::  change filter
          $|  $?  %a                                    ::  no filter
                  %b                                    ::  jamfile
                  %c                                    ::  LF text
              ==                                        ::
          $%  [%d p=@ud]                                ::  blocklist
          ==                                            ::

This space intentionally left undocumented. This stuff will change once we get a well-typed clay.

++upas, tree change

++  upas                                                ::  tree change (%d)
          $&  [p=upas q=upas]                           ::  cell
          $%  [%0 p=axis]                               ::  copy old
              [%1 p=*]                                  ::  insert new
              [%2 p=axis q=udon]                        ::  mutate!
          ==                                            ::

This space intentionally left undocumented. This stuff is not known to work, and will likely change when we get a well-typed clay. Also, this is not a complicated type; it is not difficult to work out the meaning.

++nori, repository action

++  nori                                                ::  repository action
          $%  [& q=soba]                                ::  delta
              [| p=@tas]                                ::  label
          ==                                            ::

This describes a change that we are asking clay to make to the desk. There are two kinds of changes that may be made: we can modify files or we can apply a label to a commit.

In the | case, we will simply label the current commit with the given label. In the & case, we will apply the given changes.

++soba, delta

++  soba  ,[p=cart q=(list ,[p=path q=miso])]           ::  delta

This describes a set of changes to make to a desk. The cart is simply a pair of the old hash and the new hash of the desk. The list is a list of changes keyed by the file they're changing. Thus, the paths are paths to files to be changed while miso is a description of the change itself.

++miso, ankh delta

++  miso                                                ::  ankh delta
          $%  [%del p=*]                                ::  delete
              [%ins p=*]                                ::  insert
              [%mut p=udon]                             ::  mutate
          ==                                            ::

There are three kinds of changes that may be made to a node in a desk. We can insert a file, in which case p is the contents of the new file. We can delete a file, in which case p is the contents of the old file. Finally, we can mutate that file, in which case the udon describes the changes we are applying to the file.

++mizu, merged state

++  mizu  ,[p=@u q=(map ,@ud tako) r=rang]              ::  new state

This is the input to the %merg kiss, which allows us to perform a merge. The p is the number of the new head commit. The q is a map from numbers to commit hashes. This is all the new numbered commits that are to be inserted. The keys to this should always be the numbers from let.dom plus one to p, inclusive. The r is the maps of all the new commits and data. Since these are merged into the current state, no old commits or data need be here.

++riff, request/desist

++  riff  ,[p=desk q=(unit rave)]                       ::  request/desist

This represents a request for data about a particular desk. If q contains a rave, then this opens a subscription to the desk for that data. If q is null, then this tells clay to cancel the subscription along this duct.

++riot, response

++  riot  (unit rant)                                   ::  response/complete

A riot is a response to a subscription. If null, the subscription has been completed, and no more responses will be sent. Otherwise, the rant is the produced data.

++rant, response data

++  rant                                                ::  namespace binding
          $:  p=[p=care q=case r=@tas]                  ::  clade release book
              q=path                                    ::  spur
              r=*                                       ::  data
          ==                                            ::

This is the data at a particular node in the filesystem. p.p specifies the type of data that was requested (and is produced). q.p gives the specific version reported (since a range of versions may be requested in a subscription). r.p is the desk. q is the path to the filesystem node. r is the data itself (in the format specified by p.p).

++nako, subscription response data

++  nako  $:  gar=(map ,@ud tako)                       ::  new ids
              let=@ud                                   ::  next id
              lar=(set yaki)                            ::  new commits
              bar=(set blob)                            ::  new content
          ==                                            ::

This is the data that is produced by a request for a range of revisions of a desk. This allows us to easily keep track of a remote repository -- all the new information we need is contained in the nako.

gar is a map of the revisions in the range to the hash of the commit at that revision. These hashes can be used with hut:rang to find the commit itself.

let is either the last revision number in the range or the most recent revision number, whichever is smaller.

lar is the set of new commits, and bar is the set of new content.

Public Interface

As with all vanes, there are exactly two ways to interact with clay. %clay exports a namespace accessible through .^, which is described above under ++care. The primary way of interacting with clay, though, is by sending kisses and receiving gifts.

++  gift                                                ::  out result <-$
          $%  [%ergo p=@p q=@tas r=@ud]                 ::  version update
              [%note p=@tD q=tank]                      ::  debug message
              [%writ p=riot]                            ::  response
          ==                                            ::
++  kiss                                                ::  in request ->$
          $%  [%info p=@p q=@tas r=nori]                ::  internal edit
              [%ingo p=@p q=@tas r=nori]                ::  internal noun edit
              [%init p=@p]                              ::  report install
              [%into p=@p q=@tas r=nori]                ::  external edit
              [%invo p=@p q=@tas r=nori]                ::  external noun edit
              [%merg p=@p q=@tas r=mizu]                ::  internal change
              [%wart p=sock q=@tas r=path s=*]          ::  network request
              [%warp p=sock q=riff]                     ::  file request
          ==                                            ::

There are only a small number of possible kisses, so it behooves us to describe each in detail.

          $%  [%info p=@p q=@tas r=nori]                ::  internal edit

              [%into p=@p q=@tas r=nori]                ::  external edit

These two kisses are nearly identical. At a high level, they apply changes to the filesystem. Whenever we add, remove, or edit a file, one of these cards is sent. The p is the ship whose filesystem we're trying to change, the q is the desk we're changing, and the r is the request change. For the format of the requested change, see the documentation for ++nori above.

When a file is changed in the unix filesystem, vere will send a %into kiss. This tells clay that the duct over which the kiss was sent is the duct that unix is listening on for changes. From within Arvo, though, we should never send a %into kiss. The %info kiss is exactly identical except it does not reset the duct.

              [%ingo p=@p q=@tas r=nori]                ::  internal noun edit

              [%invo p=@p q=@tas r=nori]                ::  external noun edit

These kisses are currently identical to %info and %into, though this will not always be the case. The intent is for these kisses to allow typed changes to clay so that we may store typed data. This is currently unimplemented.

              [%init p=@p]                              ::  report install

Init is called when a ship is started on our pier. This simply creates a default room to go into our raft. Essentially, this initializes the filesystem for a ship.

              [%merg p=@p q=@tas r=mizu]                ::  internal change

This is called to perform a merge. This is most visibly called by :update to update the filesystem of the current ship to that of its sein. The p and q are as in %info, and the r is the description of the merge. See ++mizu above.

XX XX [%wake ~] :: timer activate XX XX
XX This card is sent by unix at the time specified by ++doze. This time is XX usually the closest time specified in a subscription request. When %wake is XX called, we update our subscribers if there have been any changes.

              [%wart p=sock q=@tas r=path s=*]          ::  network request

This is a request that has come across the network for a particular file. When another ship asks for a file from us, that request comes to us in the form of a %wart kiss. This is handled by trivially turning it into a %warp.

              [%warp p=sock q=riff]                     ::  file request

This is a request for information about a particular desk. This is, in its most general form, a subscription, though in many cases it is the trivial case of a subscription -- a read. See ++riff for the format of the request.

Lifecycle of a Local Read

There are two real types of interaction with a filesystem: you can read, and you can write. We'll describe each process, detailing both the flow of control followed by the kernel and the algorithms involved. The simpler case is that of the read, so we'll begin with that.

When a vane or an application wishes to read a file from the filesystem, it sends a %warp kiss, as described above. Of course, you may request a file on another ship and, being a global filesystem, clay will happily produce it for you. That code pathway will be described in another section; here, we will restrict ourselves to examining the case of a read from a ship on our own pier.

The kiss can request either a single version of a file node or a range of versions of a desk. Here, we'll deal only with a request for a single version.

As in all vanes, a kiss enters clay via a call to ++call. Scanning through the arm, we quickly see where %warp is handled.

        ?:  =(p.p.q.hic q.p.q.hic)
          =+  une=(un p.p.q.hic now ruf)
          =+  wex=(di:une p.q.q.hic)
          =+  ^=  wao
            ?~  q.q.q.hic
              (ease:wex hen)
            (eave:wex hen u.q.q.q.hic)
          =+  ^=  woo
            abet:wao
          [-.woo abet:(pish:une p.q.q.hic +.woo ran.wao)]

We're following the familar patern of producing a list of moves and an updated state. In this case, the state is ++raft.

We first check to see if the sending and receiving ships are the same. If they're not, then this is a request for data on another ship. We describe that process later. Here, we discuss only the case of a local read.

At a high level, the call to ++un sets up the core for the domestic ship that contains the files we're looking for. The call to ++di sets up the core for the particular desk we're referring to.

After this, we perform the actual request. If there is no rave in the riff, then that means we are cancelling a request, so we call ++ease:de. Otherwise, we start a subscription with ++eave:de. We call ++abet:de to resolve our various types of output into actual moves. We produce the moves we found above and the ++un core resolved with ++pish:un (putting the modified desk in the room) and ++abet:un (putting the modified room in the raft).

Much of this is fairly straightforward, so we'll only describe ++ease, ++eave, and ++abet:de. Feel free to look up the code to the other steps -- it should be easy to follow.

Although it's called last, it's usually worth examining ++abet first, since it defines in what ways we can cause side effects. Let's do that, and also a few of the lines at the beginning of ++de.

    =|  yel=(list ,[p=duct q=gift])
    =|  byn=(list ,[p=duct q=riot])
    =|  vag=(list ,[p=duct q=gift])
    =|  say=(list ,[p=duct q=path r=ship s=[p=@ud q=riff]])
    =|  tag=(list ,[p=duct q=path c=note])
    |%
    ++  abet
      ^-  [(list move) rede]
      :_  red
      ;:  weld
        %+  turn  (flop yel)
        |=([a=duct b=gift] [hun %give b])
      ::
        %+  turn  (flop byn)
        |=([a=duct b=riot] [a %give [%writ b]])
      ::
        %+  turn  (flop vag)
        |=([a=duct b=gift] [a %give b])
      ::
        %+  turn  (flop say)
        |=  [a=duct b=path c=ship d=[p=@ud q=riff]]
        :-  a
        [%pass b %a %want [who c] [%q %re p.q.d (scot %ud p.d) ~] q.d]
      ::
        %+  turn  (flop tag)
        |=([a=duct b=path c=note] [a %pass b c])
      ==

This is very simple code. We see there are exactly five different kinds of side effects we can generate.

In yel we put gifts that we wish to be sent along the hun:room duct to dill. See the documentation for ++room above. This is how we display messages to the terminal.

In byn we put riots that we wish returned to subscribers. Recall that a riot is a response to a subscription. These are returned to our subscribers in the form of a %writ gift.

In vag we put gifts along with the ducts on which to send them. This allows us to produce arbitrary gifts, but in practice this is only used to produce %ergo gifts.

In say we put messages we wish to pass to ames. These messages are used to request information from clay on other piers. We must provide not only the duct and the request (the riff), but also the return path, the other ship to talk to, and the sequence number of the request.

In tag we put arbitrary notes we wish to pass to other vanes. For now, the only notes we pass here are %wait and %rest to the timer vane.

Now that we know what kinds of side effects we may have, we can jump into the handling of requests.

    ++  ease                                          ::  release request
      |=  hen=duct
      ^+  +>
      ?~  ref  +>
        =+  rov=(~(got by qyx) hen)
        =.  qyx  (~(del by qyx) hen)
        (mabe rov (cury best hen))
      =.  qyx  (~(del by qyx) hen)
      |-  ^+  +>+.$
      =+  nux=(~(get by fod.u.ref) hen)
      ?~  nux  +>+.$
      %=  +>+.$
        say        [[hen [(scot %ud u.nux) ~] for [u.nux syd ~]] say]
        fod.u.ref  (~(del by fod.u.ref) hen)
        bom.u.ref  (~(del by bom.u.ref) u.nux)
      ==

This is called when we're cancelling a subscription. For domestic desks, ref is null, so we're going to cancel any timer we might have created. We first delete the duct from our map of requests, and then we call ++mabe with ++best to send a %rest kiss to the timer vane if we have started a timer. We'll describe ++best and ++mabe momentarily.

Although we said we're not going to talk about foreign requests yet, it's easy to see that for foreign desks, we cancel any outstanding requests for this duct and send a message over ames to the other ship telling them to cancel the subscription.

    ++  best
      |=  [hen=duct tym=@da]
      %_(+> tag :_(tag [hen /tyme %t %rest tym]))

This simply pushes a %rest note onto tag, from where it will be passed back to arvo to be handled. This cancels the timer at the given duct (with the given time).

    ++  mabe                                            ::  maybe fire function
      |*  [rov=rove fun=$+(@da _+>.^$)]
      ^+  +>.$
      %-  fall  :_  +>.$
      %-  bind  :_  fun
      ^-  (unit ,@da)
      ?-    -.rov
          %&
        ?.  ?=(%da -.q.p.rov)  ~
        `p.q.p.rov
          %|
        =*  mot  p.rov
        %+  hunt
          ?.  ?=(%da -.p.mot)  ~
          ?.((lth now p.p.mot) ~ [~ p.p.mot])
        ?.  ?=(%da -.q.mot)  ~
        ?.((lth now p.q.mot) [~ now] [~ p.q.mot])
      ==

This decides whether the given request can only be satsified in the future. In that case, we call the given function with the time in the future when we expect to have an update to give to this request. This is called with ++best to cancel timers and with ++bait to start them.

For single requests, we have a time if the request is for a particular time (which is assumed to be in the future). For ranges of requests, we check both the start and end cases to see if they are time cases. If so, we choose the earlier time.

If any of those give us a time, then we call the given funciton with the smallest time.

The more interesting case is, of course, when we're not cancelling a subscription but starting one.

    ++  eave                                          ::  subscribe
      |=  [hen=duct rav=rave]
      ^+  +>
      ?-    -.rav
          &
        ?:  &(=(p.p.rav %u) !=(p.q.p.rav now))
          ~&  [%clay-fail p.q.p.rav %now now]
          !!
        =+  ver=(aver p.rav)
        ?~  ver
          (duce hen rav)
        ?~  u.ver
          (blub hen)
        (blab hen p.rav u.u.ver)

There are two types of subscriptions -- either we're requesting a single file or we're requesting a range of versions of a desk. We'll dicuss the simpler case first.

First, we check that we're not requesting the rang from any time other than the present. Since we don't store that information for any other time, we can't produce it in a referentially transparent manner for any time other than the present.

Then, we try to read the requested mood p.rav. If we can't access the request data right now, we call ++duce to put the request in our queue to be satisfied when the information becomes available.

This case occurs when we make a request for a case whose (1) date is after the current date, (2) number is after the current number, or (3) label is not yet used.

    ++  duce                                            ::  produce request
      |=  [hen=duct rov=rove]
      ^+  +>
      =.  qyx  (~(put by qyx) hen rov)
      ?~  ref
        (mabe rov (cury bait hen))
      |-  ^+  +>+.$                                     ::  XX  why?
      =+  rav=(reve rov)
      =+  ^=  vaw  ^-  rave
        ?.  ?=([%& %v *] rav)  rav
        [%| [%ud let.dom] `case`q.p.rav r.p.rav]
      =+  inx=nix.u.ref
      %=  +>+.$
        say        [[hen [(scot %ud inx) ~] for [inx syd ~ vaw]] say]
        nix.u.ref  +(nix.u.ref)
        bom.u.ref  (~(put by bom.u.ref) inx [hen vaw])
        fod.u.ref  (~(put by fod.u.ref) hen inx)
      ==

The code for ++duce is nearly the exact inverse of ++ease, which in the case of a domestic desk is very simple -- we simply put the duct and rave into qyx and possibly start a timer with ++mabe and ++bait. Recall that ref is null for domestic desks and that ++mabe fires the given function with the time we need to be woken up at, if we need to be woken up at a particular time.

    ++  bait
      |=  [hen=duct tym=@da]
      %_(+> tag :_(tag [hen /tyme %t %wait tym]))

This sets an alarm by sending a %wait card with the given time to the timer vane.

Back in ++eave, if ++aver returned [~ ~], then we cancel the subscription. This occurs when we make (1) a %x request for a file that does not exist, (2) a %w request with a case that is not a number, or (3) a %w request with a nonempty path. The ++blub is exactly what you would expect it to be.

    ++  blub                                          ::  ship stop
      |=  hen=duct
      %_(+> byn [[hen ~] byn])

We notify the duct that we're cancelling their subscription since it isn't satisfiable.

Otherwise, we have received the desired information, so we send it on to the subscriber with ++blab.

    ++  blab                                          ::  ship result
      |=  [hen=duct mun=mood dat=*]
      ^+  +>
      +>(byn [[hen ~ [p.mun q.mun syd] r.mun dat] byn])

The most interesting arm called in ++eave is, of course, ++aver, where we actually try to read the data.

    ++  aver                                          ::  read
      |=  mun=mood
      ^-  (unit (unit ,*))
      ?:  &(=(p.mun %u) !=(p.q.mun now))              ::  prevent bad things
        ~&  [%clay-fail p.q.mun %now now]
        !!
      =+  ezy=?~(ref ~ (~(get by haw.u.ref) mun))
      ?^  ezy  ezy
      =+  nao=(~(case-to-aeon ze lim dom ran) q.mun)
      ?~(nao ~ [~ (~(read-at-aeon ze lim dom ran) u.nao mun)])

We check immediately that we're not requesting the rang for any time other than the present.

If this is a foreign desk, then we check our cache for the specific request. If either this is a domestic desk or we don't have the request in our cache, then we have to actually go read the data from our dome.

We need to do two things. First, we try to find the number of the commit specified by the given case, and then we try to get the data there.

Here, we jump into arvo/zuse.hoon, which is where much of the algorithmic code is stored, as opposed to the clay interface, which is stored in arvo/clay.hoon. We examine ++case-to-aeon:ze.

  ++  case-to-aeon                                      ::    case-to-aeon:ze
    |=  lok=case                                        ::  act count through
    ^-  (unit aeon)
    ?-    -.lok
        %da
      ?:  (gth p.lok lim)  ~
      |-  ^-  (unit aeon)
      ?:  =(0 let)  [~ 0]                               ::  avoid underflow
      ?:  %+  gte  p.lok 
          =<  t
          %-  tako-to-yaki
          %-  aeon-to-tako
          let
        [~ let]
      $(let (dec let))
    ::
        %tas  (~(get by lab) p.lok)
        %ud   ?:((gth p.lok let) ~ [~ p.lok])
    ==

We handle each type of case differently. The latter two types are easy.

If we're requesting a revision by label, then we simply look up the requested label in lab from the given dome. If it exists, that is our aeon; else we produce null, indicating the requested revision does not yet exist.

If we're requesting a revision by number, we check if we've yet reached that number. If so, we produce the number; else we produce null.

If we're requesting a revision by date, we check first if the date is in the future, returning null if so. Else we start from the most recent revision and scan backwards until we find the first revision committed before that date, and we produce that. If we requested a date before any revisions were committed, we produce 0.

The definitions of ++aeon-to-tako and ++tako-to-yaki are trivial.

  ++  aeon-to-tako  ~(got by hit)

  ++  tako-to-yaki  ~(got by hut)                       ::  grab yaki

We simply look up the aeon or tako in their respective maps (hit and hut).

Assuming we got a valid version number, ++aver calls ++read-at-aeon:ze, which reads the requested data at the given revision.

  ++  read-at-aeon                                      ::    read-at-aeon:ze
    |=  [oan=aeon mun=mood]                             ::  seek and read
    ^-  (unit)
    ?:  &(?=(%w p.mun) !?=(%ud -.q.mun))                ::  NB only for speed
      ?^(r.mun ~ [~ oan])
    (read:(rewind oan) mun)

If we're requesting the revision number with a case other than by number, then we go ahead and just produce the number we were given. Otherwise, we call ++rewind to rewind our state to the given revision, and then we call ++read to get the requested information.

  ++  rewind                                            ::    rewind:ze
    |=  oan=aeon                                        ::  rewind to aeon
    ^+  +>
    ?:  =(let oan)  +>
    ?:  (gth oan let)  !!                               ::  don't have version
    +>(ank (checkout-ankh q:(tako-to-yaki (aeon-to-tako oan))), let oan)

If we're already at the requested version, we do nothing. If we're requesting a version later than our head, we are unable to comply.

Otherwise, we get the hash of the commit at the number, and from that we get the commit itself (the yaki), which has the map of path to lobe that represents a version of the filesystem. We call ++checkout-ankh to checkout the commit, and we replace ank in our context with the result.

  ++  checkout-ankh                                     ::    checkout-ankh:ze
    |=  hat=(map path lobe)                             ::  checkout commit
    ^-  ankh
    %-  cosh
    %+  roll  (~(tap by hat) ~)
    |=  [[pat=path bar=lobe] ank=ankh]
    ^-  ankh
    %-  cosh
    ?~  pat
      =+  zar=(lobe-to-noun bar)
      ank(q [~ (sham zar) zar])
    =+  nak=(~(get by r.ank) i.pat)
    %=  ank
      r  %+  ~(put by r.ank)  i.pat 
         $(pat t.pat, ank (fall nak _ankh))
    ==

Twice we call ++cosh, which hashes a commit, updating p in an ankh. Let's jump into that algorithm before we describe ++checkout-ankh.

++  cosh                                                ::  locally rehash
  |=  ank=ankh                                          ::  NB v/unix.c
  ank(p rehash:(zu ank))

We simply replace p in the hash with the cash we get from a call to ++rehash:zu.

++  zu  !:                                              ::  filesystem
  |=  ank=ankh                                          ::  filesystem state
  =|  myz=(list ,[p=path q=miso])                       ::  changes in reverse
  =|  ram=path                                          ::  reverse path into
  |%
  ++  rehash                                            ::  local rehash
    ^-  cash
    %+  mix  ?~(q.ank 0 p.u.q.ank)
    =+  axe=1
    |-  ^-  cash
    ?~  r.ank  _@
    ;:  mix
      (shaf %dash (mix axe (shaf %dush (mix p.n.r.ank p.q.n.r.ank))))
      $(r.ank l.r.ank, axe (peg axe 2))
      $(r.ank r.r.ank, axe (peg axe 3))
    ==

++zu is a core we set up with a particular filesystem node to traverse a checkout of the filesystem and access the actual data inside it. One of the things we can do with it is to create a recursive hash of the node.

In ++rehash, if this node is a file, then we xor the remainder of the hash with the hash of the contents of the file. The remainder of the hash is 0 if we have no children, else we descend into our children. Basically, we do a half SHA-256 of the xor of the axis of this child and the half SHA-256 of the xor of the name of the child and the hash of the child. This is done for each child and all the results are xored together.

Now we return to our discussion of ++checkout-ankh.

We fold over every path in this version of the filesystem and create a great ankh out of them. First, we call ++lobe-to-noun to get the raw data referred to be each lobe.

  ++  lobe-to-noun                                      ::  grab blob
    |=  p=lobe                                          ::  ^-  *
    %-  blob-to-noun  
    (lobe-to-blob p)

This converts a lobe into the raw data it refers to by first getting the blob with ++lobe-to-blob and converting that into data with ++blob-to-noun.

  ++  lobe-to-blob  ~(got by lat)                       ::  grab blob

This just grabs the blob that the lobe refers to.

  ++  blob-to-noun                                      ::  grab blob
    |=  p=blob
    ?-   -.p
       %delta  (lump r.p (lobe-to-noun q.p))
       %direct  q.p
       %indirect  q.p
    ==

If we have either a direct or an indirect blob, then the data is stored right in the blob. Otherwise, we have to reconstruct it from the diffs. We do this by calling ++lump on the diff in the blob with the data obtained by recursively calling the parent of this blob.

++  lump                                                ::  apply patch
  |=  [don=udon src=*]
  ^-  *
  ?+    p.don  ~|(%unsupported !!)
      %a
    ?+  -.q.don  ~|(%unsupported !!)
      %a  q.q.don
      %c  (lurk ((hard (list)) src) p.q.don)
      %d  (lure src p.q.don)
    ==
  ::
      %c
    =+  dst=(lore ((hard ,@) src))
    %-  roly
    ?+  -.q.don  ~|(%unsupported !!)
      %a  ((hard (list ,@t)) q.q.don)
      %c  (lurk dst p.q.don)
    ==
  ==

This is defined in arvo/hoon.hoon for historical reasons which are likely no longer applicable. Since the ++umph structure will likely change we convert clay to be a typed filesystem, we'll only give a high-level description of this process. If we have a %a udon, then we're performing a trivial replace, so we produce simply q.q.don. If we have a %c udon, then we're performing a list merge (as in, for example, lines of text). The merge is performed by ++lurk.

++  lurk                                                ::  apply list patch
  |*  [hel=(list) rug=(urge)]
  ^+  hel
  =+  war=`_hel`~
  |-  ^+  hel
  ?~  rug  (flop war)
  ?-    -.i.rug
      &
    %=   $
      rug  t.rug
      hel  (slag p.i.rug hel)
      war  (weld (flop (scag p.i.rug hel)) war)
    ==
  ::
      |
    %=  $
      rug  t.rug
      hel  =+  gur=(flop p.i.rug)
           |-  ^+  hel
           ?~  gur  hel
           ?>(&(?=(^ hel) =(i.gur i.hel)) $(hel t.hel, gur t.gur))
      war  (weld q.i.rug war)
    ==
  ==

We accumulate our final result in war. If there's nothing more in our list of merge instructions (unces), we just reverse war and produce it. Otherwise, we process another unce. If the unce is of type &, then we have p.i.rug lines of no changes, so we just copy them over from hel to war. If the unice is of type |, then we verify that the source lines (in hel) are what we expect them to be (p.i.rug), crashing on failure. If they're good, then we append the new lines in q.i.rug onto war.

And that's really it. List merges are pretty easy. Anyway, if you recall, we were discussing ++checkout-ankh.

  ++  checkout-ankh                                     ::    checkout-ankh:ze
    |=  hat=(map path lobe)                             ::  checkout commit
    ^-  ankh
    %-  cosh
    %+  roll  (~(tap by hat) ~)
    |=  [[pat=path bar=lobe] ank=ankh]
    ^-  ankh
    %-  cosh
    ?~  pat
      =+  zar=(lobe-to-noun bar)
      ank(q [~ (sham zar) zar])
    =+  nak=(~(get by r.ank) i.pat)
    %=  ank
      r  %+  ~(put by r.ank)  i.pat 
         $(pat t.pat, ank (fall nak _ankh))
    ==

If the path is null, then we calculate zar, the raw data at the path pat in this version. We produce the given ankh with the correct data.

Otherwise, we try to get the child we're looking at from our parent ankh. If it's already been created, this succeeds; otherwise, we simply create a default blank ankh. We place ourselves in our parent after recursively computing our children.

This algorithm really isn't that complicated, but it may not be immediately obvious why it works. An example should clear everything up.

Suppose hat is a map of the following information.

/greeting                 -->  "customary upon meeting"
/greeting/english         -->  "hello"
/greeting/spanish         -->  "hola"
/greeting/russian/short   -->  "привет"
/greeting/russian/long    -->  "Здравствуйте"
/farewell/russian         -->  "до свидания"

Furthermore, let's say that we process them in this order:

/greeting/english
/greeting/russian/short
/greeting/russian/long
/greeting
/greeting/spanish
/farewell/russian

Then, the first path we process is /greeting/english . Since our path is not null, we try to get nak, but because our ankh is blank at this point it doesn't find anything. Thus, update our blank top-level ankh with a child %greeting. and recurse with the blank nak to create the ankh of the new child.

In the recursion, we our path is /english and our ankh is again blank. We try to get the english child of our ankh, but this of course fails. Thus, we update our blank /greeting ankh with a child english produced by recursing.

Now our path is null, so we call ++lobe-to-noun to get the actual data, and we place it in the brand-new ankh.

Next, we process /greeting/russian/short. Since our path is not null, we try to get the child named %greeting, which does exist since we created it earlier. We put modify this child by recursing on it. Our path is now /russian/short, so we look for a %russian child in our /greeting ankh. This doesn't exist, so we add it by recursing. Our path is now /short, so we look for a %short child in our /greeting/russian ankh. This doesn't exist, so we add it by recursing. Now our path is null, so we set the contents of this node to "привет", and we're done processing this path.

Next, we process /greeting/russian/long. This proceeds similarly to the previous except that now the ankh for /greeting/russian already exists, so we simply reuse it rather than creating a new one. Of course, we still must create a new /greeting/russian/long ankh.

Next, we process /greeting. This ankh already exists, so after we've recursed once, our path is null, and our ankh is not blank -- it already has two children (and two grandchildren). We don't touch those, though, since a node may be both a file and a directory. We just add the contents of the file -- "customary upon meeting" -- to the existing ankh.

Next, we process /greeting/spanish. Of course, the /greeting ankh already exists, but it doesn't have a %spanish child, so we create that, taking care not to disturb the contents of the /greeting file. We put "hola" into the ankh and call it good.

Finally, we process /farewell/russian. Here, the /farewell ankh doesn't exist, so we create it. Clearly the newly-created ankh doesn't have any children, so we have to add a %russian child, and in this child we put our last content -- "до свидания".

We hope it's fairly obvious that the order we process the paths doesn't affect the final ankh tree. The tree will be constructed in a very different order depending on what order the paths come in, but the resulting tree is independent of order.

At any rate, we were talking about something important, weren't we? If you recall, that concludes our discussion of ++rewind, which was called from ++read-at-aeon. In summary, ++rewind returns a context in which our current state is (very nearly) as it was when the specified version of the desk was the head. This allows ++read-at-aeon to call ++read to read the requested information.

  ++  read                                              ::    read:ze
    |=  mun=mood                                        ::  read at point
    ^-  (unit)
    ?:  ?=(%v p.mun)
      [~ `dome`+<+<.read]
    ?:  &(?=(%w p.mun) !?=(%ud -.q.mun))
      ?^(r.mun ~ [~ let])
    ?:  ?=(%w p.mun)
      =+  ^=  yak
          %-  tako-to-yaki
          %-  aeon-to-tako
          let
      ?^(r.mun ~ [~ [t.yak (forge-nori yak)]])
      ::?>  ?=(^ hit)  ?^(r.mun ~ [~ i.hit])     ::  what do?? need [@da nori]
    (query(ank ank:(descend-path:(zu ank) r.mun)) p.mun)

If we're requesting the dome, then we just return that immediately.

If we're requesting the revision number of the desk and we're not requesting it by number, then we just return the current number of this desk. Note of course that this was really already handled in ++read-at-aeon.

If we're requesting a %w with a specific revision number, then we do something or other with the commit there. It's kind of weird, and it doesn't seem to work, so we'll ignore this case.

Otherwise, we descend into the ankh tree to the given path with ++descend-path:zu, and then we handle specific request in ++query.

  ++  descend-path                                      ::  descend recursively
    |=  way=path
    ^+  +>
    ?~(way +> $(way t.way, +> (descend i.way)))

This is simple recursion down into the ankh tree. ++descend descends one level, so this will eventually get us down to the path we want.

  ++  descend                                           ::  descend
    |=  lol=@ta
    ^+  +>
    =+  you=(~(get by r.ank) lol)
    +>.$(ram [lol ram], ank ?~(you [*cash ~ ~] u.you))

ram is the path that we're at, so to descend one level we push the name of this level onto that path. We update the ankh with the correct one at that path if it exists; else we create a blank one.

Once we've decscended to the correct level, we need to actually deal with the request.

  ++  query                                             ::    query:ze
    |=  ren=?(%u %v %x %y %z)                           ::  endpoint query
    ^-  (unit ,*)
    ?-  ren
      %u  [~ `rang`+<+>.query]
      %v  [~ `dome`+<+<.query]
      %x  ?~(q.ank ~ [~ q.u.q.ank])
      %y  [~ as-arch]
      %z  [~ ank]
    ==

Now that everything's set up, it's really easy. If they're requesting the rang, dome, or ankh, we give it to them. If the contents of a file, we give it to them if it is in fact a file. If the arch, then we calculate it with ++as-arch.

  ++  as-arch                                           ::    as-arch:ze
    ^-  arch                                            ::  arch report
    :+  p.ank
      ?~(q.ank ~ [~ p.u.q.ank])
    |-  ^-  (map ,@ta ,~)
    ?~  r.ank  ~
    [[p.n.r.ank ~] $(r.ank l.r.ank) $(r.ank r.r.ank)]

This very simply strips out all the "real" data and returns just our own hash, the hash of the file contents (if we're a file), and a map of the names of our immediate children.

Lifecycle of a Local Subscription

A subscription to a range of revisions of a desk initially follows the same path that a single read does. In ++aver, we checked the head of the given rave. If the head was &, then it was a single request, so we handled it above. If |, then we handle it with the following code.

        =+  nab=(~(case-to-aeon ze lim dom ran) p.p.rav)
        ?~  nab
          ?>  =(~ (~(case-to-aeon ze lim dom ran) q.p.rav))
          (duce hen (rive rav))
        =+  huy=(~(case-to-aeon ze lim dom ran) q.p.rav)
        ?:  &(?=(^ huy) |((lth u.huy u.nab) &(=(0 u.huy) =(0 u.nab))))
          (blub hen)
        =+  top=?~(huy let.dom u.huy)
        =+  sar=(~(lobes-at-path ze lim dom ran) u.nab r.p.rav)
        =+  ear=(~(lobes-at-path ze lim dom ran) top r.p.rav)
        =.  +>.$
          ?:  =(sar ear)  +>.$
          =+  fud=(~(make-nako ze lim dom ran) u.nab top)
          (bleb hen u.nab fud)
        ?^  huy
          (blub hen)
        =+  ^=  ptr  ^-  case
            [%ud +(let.dom)]
        (duce hen `rove`[%| ptr q.p.rav r.p.rav ear])
      ==

Recall that ++case-to-aeon:ze produces the revision number that a case corresponds to, if it corresponds to any. If it doesn't yet correspond to a revision, then it produces null.

Thus, we first check to see if we've even gotten to the beginning of the range of revisions requested. If not, then we assert that we haven't yet gotten to the end of the range either, because that would be really strange. If not, then we immediately call ++duce, which, if you recall, for a local request, simply puts this duct and rove into our cult qyx, so that we know who to respond to when the revision does appear.

If we've already gotten to the first revision, then we can produce some content immediately. If we've also gotten to the final revision, and that revision is earlier than the start revision, then it's a bad request and we call ++blub, which tells the subscriber that his subscription will not be satisfied.

Otherwise, we find the data at the given path at the beginning of the subscription and at the last available revision in the subscription. If they're the same, then we don't send a notification. Otherwise, we call ++gack, which creates the ++nako we need to produce. We call ++bleb to actually produce the information.

If we already have the last requested revision, then we also tell the subscriber with ++blub that the subscription will receive no further updates.

If there will be more revisions in the subscription, then we call ++duce, adding the duct to our subscribers. We modify the rove to start at the next revision since we've already handled all the revisions up to the present.

We glossed over the calls to ++lobes-at-path, ++make-nako, and ++bleb, so we'll get back to those right now. ++bleb is simple, so we'll start with that.

    ++  bleb                                          ::  ship sequence
      |=  [hen=duct ins=@ud hip=nako]
      ^+  +>
      (blab hen [%w [%ud ins] ~] hip)

We're given a duct, the beginning revision number, and the nako that contains the updates since that revision. We use ++blab to produce this result to the subscriber. The case is %w with a revision number of the beginning of the subscription, and the data is the nako itself.

We call ++lobes-at-path:ze to get the data at the particular path.

  ++  lobes-at-path                                     ::    lobes-at-path:ze
    |=  [oan=aeon pax=path]                             ::  data at path
    ^-  (map path lobe)
    ?:  =(0 oan)  ~
    %-  mo
    %+  skim
      %.  ~
      %~  tap  by
      =<  q
      %-  tako-to-yaki
      %-  aeon-to-tako
      oan
    |=  [p=path q=lobe]
    ?|  ?=(~ pax)
        ?&  !?=(~ p)
            =(-.pax -.p)
            $(p +.p, pax +.pax)
    ==  ==

At revision zero, the theoretical common revision between all repositories, there is no data, so we produce null.

We get the list of paths (paired with their lobe) in the revision referred to by the given number and we keep only those paths which begin with pax. Converting to a map, we now have a map from the subpaths at the given path to the hash of their data. This is simple and efficient to calculate and compare to later revisions. This allows us to easily tell if a node or its children have changed.

Finally, we will describe ++make-nako:ze.

  ++  make-nako                                         ::  gack a through b
    |=  [a=aeon b=aeon]
    ^-  [(map aeon tako) aeon (set yaki) (set blob)]
    :_  :-  b
        =-  [(takos-to-yakis -<) (lobes-to-blobs ->)]
        %+  reachable-between-takos
          (~(get by hit) a)                             ::  if a not found, a=0
        (aeon-to-tako b)
    ^-  (map aeon tako)
    %-  mo  %+  skim  (~(tap by hit) ~)
    |=  [p=aeon *]
    &((gth p a) (lte p b))

We need to produce four things -- the numbers of the new commits, the number of the latest commit, the new commits themselves, and the new data itself.

The first is fairly easy to produce. We simply go over our map of numbered commits and produce all those numbered greater than a and not greater than b.

The second is even easier to produce -- b is clearly our most recent commit.

The third and fourth are slightly more interesting, though not too terribly difficult. First, we call ++reachable-between-takos.

  ++  reachable-between-takos
    |=  [a=(unit tako) b=tako]                          ::  pack a through b
    ^-  [(set tako) (set lobe)]
    =+  ^=  sar 
        ?~  a  ~
        (reachable-takos r:(tako-to-yaki u.a))
    =+  yak=`yaki`(tako-to-yaki b)
    %+  new-lobes-takos  (new-lobes ~ sar)              ::  get lobes
    |-  ^-  (set tako)                                  ::  walk onto sar
    ?:  (~(has in sar) r.yak)
      ~
    =+  ber=`(set tako)`(~(put in `(set tako)`~) `tako`r.yak)
    %-  ~(uni in ber)
    ^-  (set tako)
    %+  roll  p.yak
    |=  [yek=tako bar=(set tako)]
    ^-  (set tako)
    ?:  (~(has in bar) yek)                             ::  save some time
      bar
    %-  ~(uni in bar)
    ^$(yak (tako-to-yaki yek))

We take a possible starting commit and a definite ending commit, and we produce the set of commits and the set of data between them.

We let sar be the set of commits reachable from a. If a is null, then obviously no commits are reachable. Otherwise, we call ++reachable-takos to calculate this.

  ++  reachable-takos                                   ::  reachable
    |=  p=tako                                          ::  XX slow
    ^-  (set tako)
    =+  y=(tako-to-yaki p)
    =+  t=(~(put in _(set tako)) p)
    %+  roll  p.y
    |=  [q=tako s=_t]
    ?:  (~(has in s) q)                                 ::  already done
      s                                                 ::  hence skip
    (~(uni in s) ^$(p q))                               ::  otherwise traverse

We very simply produce the set of the given tako plus its parents, recursively.

Back in ++reachable-between-takos, we let yak be the yaki of b, the ending commit. With this, we create a set that is the union of sar and all takos reachable from b.

We pass sar into ++new-lobes to get all the lobes referenced by any tako referenced by a. The result is passed into ++new-lobes-takos to do the same, but not recomputing those in already calculated last sentence. This produces the sets of takos and lobes we need.

  ++  new-lobes                                         ::  object hash set
    |=  [b=(set lobe) a=(set tako)]                     ::  that aren't in b
    ^-  (set lobe)
    %+  roll  (~(tap in a) ~)
    |=  [tak=tako bar=(set lobe)]
    ^-  (set lobe)
    =+  yak=(tako-to-yaki tak)
    %+  roll  (~(tap by q.yak) ~)
    |=  [[path lob=lobe] far=_bar]
    ^-  (set lobe)
    ?~  (~(has in b) lob)                               ::  don't need
      far
    =+  gar=(lobe-to-blob lob)
    ?-  -.gar
      %direct  (~(put in far) lob)
      %delta  (~(put in $(lob q.gar)) lob)
      %indirect  (~(put in $(lob s.gar)) lob)
    ==

Here, we're creating a set of lobes referenced in a commit in a. We start out with b as the initial set of lobes, so we don't need to recompute any of the lobes referenced in there.

The algorithm is pretty simple, so we won't bore you with the details. We simply traverse every commit in a, looking at every blob referenced there, and, if it's not already in b, we add it to b. In the case of a direct blob, we're done. For a delta or an indirect blob, we recursively add every blob referenced within the blob.

  ++  new-lobes-takos                                   ::  garg & repack
    |=  [b=(set lobe) a=(set tako)]
    ^-  [(set tako) (set lobe)]
    [a (new-lobes b a)]

Here, we just update the set of lobes we're given with the commits we're given and produce both sets.

This concludes our discussion of a local subscription.

Lifecycle of a Foreign Read or Subscription

Foreign reads and subscriptions are handled in much the same way as local ones. The interface is the same -- a vane or app sends a %warp kiss with the request. The difference is simply that the sock refers to the foreign ship.

Thus, we start in the same place -- in ++call, handling %warp. However, since the two side of the sock are different, we follow a different path.

        =+  wex=(do now p.q.hic p.q.q.hic ruf)
        =+  ^=  woo
          ?~  q.q.q.hic
            abet:(ease:wex hen)
          abet:(eave:wex hen u.q.q.q.hic)
        [-.woo (posh q.p.q.hic p.q.q.hic +.woo ruf)]

If we compare this to how the local case was handled, we see that it's not all that different. We use ++do rather than ++un and ++de to set up the core for the foreign ship. This gives us a ++de core, so we either cancel or begin the request by calling ++ease or ++eave, exactly as in the local case. In either case, we call ++abet:de to resolve our various types of output into actual moves, as described in the local case. Finally, we call ++posh to update our raft, putting the modified rung into the raft.

We'll first trace through ++do.

  ++  do
    |=  [now=@da [who=ship him=ship] syd=@tas ruf=raft]
    =+  ^=  rug  ^-  rung
        =+  rug=(~(get by hoy.ruf) him)
        ?^(rug u.rug *rung)
    =+  ^=  red  ^-  rede
        =+  yit=(~(get by rus.rug) syd)
        ?^(yit u.yit `rede`[~2000.1.1 ~ [~ *rind] *dome])
    ((de now ~ ~) [who him] syd red ran.ruf)

If we already have a rung for this foreign ship, then we use that. Otherwise, we create a new blank one. If we already have a rede in this rung, then we use that, otherwise we create a blank one. An important point to note here is that we let ref in the rede be [~ *rind]. Recall, for domestic desks ref is null. We use this to distinguish between foreign and domestic desks in ++de.

With this information, we create a ++de core as usual.

Although we've already covered ++ease and ++eave, we'll go through them quickly again, highlighting the case of foreign request.

    ++  ease                                          ::  release request
      |=  hen=duct
      ^+  +>
      ?~  ref  +>
        =+  rov=(~(got by qyx) hen)
        =.  qyx  (~(del by qyx) hen)
        (mabe rov (cury best hen))
      =.  qyx  (~(del by qyx) hen)
      |-  ^+  +>+.$
      =+  nux=(~(get by fod.u.ref) hen)
      ?~  nux  +>+.$
      %=  +>+.$
        say        [[hen [(scot %ud u.nux) ~] for [u.nux syd ~]] say]
        fod.u.ref  (~(del by fod.u.ref) hen)
        bom.u.ref  (~(del by bom.u.ref) u.nux)
      ==

Here, we still remove the duct from our cult (we maintain a cult even for foreign desks), but we also need to tell the foreign desk to cancel our subscription. We do this by sending a request (by appending to say, which gets resolved in ++abet:de to a %want kiss to ames) to the foreign ship to cancel the subscription. Since we don't anymore expect a response on this duct, we remove it from fod and bom, which are the maps between ducts, raves, and request sequence numbers. Basically, we remove every trace of the subscription from our request manager.

In the case of ++eave, where we're creating a new request, everything is exactly identical to the case of the local request except ++duce. We said that ++duce simply puts the request into our cult. This is true for a domestic request, but distinctly untrue for foreign requests.

    ++  duce                                          ::  produce request
      |=  [hen=duct rov=rove]
      ^+  +>
      =.  qyx  (~(put by qyx) hen rov)
      ?~  ref  +>.$
      |-  ^+  +>+.$                                   ::  XX  why?
      =+  rav=(reve rov)
      =+  ^=  vaw  ^-  rave
        ?.  ?=([%& %v *] rav)  rav
        [%| [%ud let.dom] `case`q.p.rav r.p.rav]
      =+  inx=nix.u.ref
      %=  +>+.$
        say        [[hen [(scot %ud inx) ~] for [inx syd ~ vaw]] say]
        nix.u.ref  +(nix.u.ref)
        bom.u.ref  (~(put by bom.u.ref) inx [hen vaw])
        fod.u.ref  (~(put by fod.u.ref) hen inx)
      ==

If we have a request manager (i.e. this is a foreign desk), then we do the approximate inverse of ++ease. We create a rave out of the given request and send it off to the foreign desk by putting it in say. Note that the rave is created to request the information starting at the next revision number. Since this is a new request, we put it into fod and bom to associate the request with its duct and its sequence number. Since we're using another sequence number, we must increment nix, which represents the next available sequence number.

And that's really it for this side of the request. Requesting foreign information isn't that hard. Let's see what it looks like on the other side. When we get a request from another ship for information on our ship, that comes to us in the form of a %wart from ames.

We handle a %wart in ++call, right next to where we handle the %warp case.

        %wart
      ?>  ?=(%re q.q.hic)
      =+  ryf=((hard riff) s.q.hic)
      :_  ..^$
      :~  :-  hen
          :^  %pass  [(scot %p p.p.q.hic) (scot %p q.p.q.hic) r.q.hic]
            %c
          [%warp [p.p.q.hic p.p.q.hic] ryf]
      ==

Every request we receive should be of type riff, so we coerce it into that type. We just convert this into a new %warp kiss that we pass to ourself. This gets handled like normal, as a local request. When the request produces a value, it does so like normal as a %writ, which is returned to ++take along the path we just sent it on.

        %writ
      ?>  ?=([@ @ *] tea)
      =+  our=(need (slaw %p i.tea))
      =+  him=(need (slaw %p i.t.tea))
      :_  ..^$
      :~  :-  hen
          [%pass ~ %a [%want [our him] [%r %re %c t.t.tea] p.+.q.hin]]
      ==

Since we encoded the ship we need to respond to in the path, we can just pass our %want back to ames, so that we tell the requesting ship about the new data.

This comes back to the original ship as a %waft from ames, which comes into ++take, right next to where we handled %writ.

        %waft
      ?>  ?=([@ @ ~] tea)
      =+  syd=(need (slaw %tas i.tea))
      =+  inx=(need (slaw %ud i.t.tea))
      =+  ^=  zat
        =<  wake
        (knit:(do now p.+.q.hin syd ruf) [inx ((hard riot) q.+.q.hin)])
      =^  mos  ruf
        =+  zot=abet.zat
        [-.zot (posh q.p.+.q.hin syd +.zot ruf)]
      [mos ..^$(ran.ruf ran.zat)]                         ::  merge in new obj

This gets the desk and sequence number from the path the request was sent over. This determines exactly which request is being responded to. We call ++knit:de to apply the changes to our local desk, and we call ++wake to update our subscribers. Then we call ++abet:de and ++posh as normal (like in ++eave).

We'll examine ++knit and ++wake, in that order.

    ++  knit                                          ::  external change
      |=  [inx=@ud rot=riot]
      ^+  +>
      ?>  ?=(^ ref)
      |-  ^+  +>+.$
      =+  ruv=(~(get by bom.u.ref) inx)
      ?~  ruv  +>+.$
      =>  ?.  |(?=(~ rot) ?=(& -.q.u.ruv))  .
          %_  .
            bom.u.ref  (~(del by bom.u.ref) inx)
            fod.u.ref  (~(del by fod.u.ref) p.u.ruv)
          ==
      ?~  rot
        =+  rav=`rave`q.u.ruv
        %=    +>+.$
            lim
          ?.(&(?=(| -.rav) ?=(%da -.q.p.rav)) lim `@da`p.q.p.rav)
        ::
            haw.u.ref
          ?.  ?=(& -.rav)  haw.u.ref
          (~(put by haw.u.ref) p.rav ~)
        ==
      ?<  ?=(%v p.p.u.rot)
      =.  haw.u.ref
        (~(put by haw.u.ref) [p.p.u.rot q.p.u.rot q.u.rot] ~ r.u.rot)
      ?.  ?=(%w p.p.u.rot)  +>+.$
      |-  ^+  +>+.^$
      =+  nez=[%w [%ud let.dom] ~]
      =+  nex=(~(get by haw.u.ref) nez)
      ?~  nex  +>+.^$
      ?~  u.nex  +>+.^$  ::  should never happen
      =.  +>+.^$     =+  roo=(edis ((hard nako) u.u.nex))
                     ?>(?=(^ ref.roo) roo)
      %=  $
        haw.u.ref  (~(del by haw.u.ref) nez)
      ==

This is kind of a long gate, but don't worry, it's not bad at all.

First, we assert that we're not a domestic desk. That wouldn't make any sense at all.

Since we have the sequence number of the request, we can get the duct and rave from bom in our request manager. If we didn't actually request this data (or the request was canceled before we got it), then we do nothing.

Else, we remove the request from bom and fod unless this was a subscription request and we didn't receive a null riot (which would indicate the last message on the subscription).

Now, if we received a null riot, then if this was a subscription request by date, then we update lim in our request manager (representing the latest time at which we have complete information for this desk) to the date that was requested. If this was a single read request, then we put the result in our simple cache haw to make future requests faster. Then we're done.

If we received actual data, then we put it into our cache haw. Unless it's a %w request, we're done.

If it is a %w request, then we try to get the %w at our current head from the cache. In general, that should be the thing we just put in a moment ago, but that is not guaranteed. The most common case where this is different is when we receive desk updates out of order. At any rate, since we now have new information, we need to apply it to our local copy of the desk. We do so in ++edis, and then we remove the stuff we just applied from the cache, since it's not really a true "single read", like what should really be in the cache.

    ++  edis                                          ::  apply subscription
      |=  nak=nako
      ^+  +>
      %=  +>
        hit.dom  (~(uni by hit.dom) gar.nak)
        let.dom  let.nak
        lat.ran  %+  roll  (~(tap in bar.nak) ~)
                 =<  .(yeb lat.ran)
                 |=  [sar=blob yeb=(map lobe blob)]
                 =+  zax=(blob-to-lobe sar)
                 %+  ~(put by yeb)  zax  sar
        hut.ran  %+  roll  (~(tap in lar.nak) ~)
                 =<  .(yeb hut.ran)
                 |=  [sar=yaki yeb=(map tako yaki)]
                 %+  ~(put by yeb)  r.sar  sar
      ==

This shows, of course, exactly why nako is defined the way it is. To become completely up to date with the foreign desk, we need to merge hit with the foreign one so that we have all the revision numbers. We update let so that we know which revision is the head.

We merge the new blobs in lat, keying them by their hash, which we get from a call to ++blob-to-lobe. Recall that the hash is stored in the actual blob itself. We do the same thing to the new yakis, putting them in hut, keyed by their hash.

Now, our local dome should be exactly the same as the foreign one.

This concludes our discussion of ++knit. Now the changes have been applied to our local copy of the desk, and we just need to update our subscribers. We do so in ++wake:de.

    ++  wake                                          ::  update subscribers
      ^+  .
      =+  xiq=(~(tap by qyx) ~)
      =|  xaq=(list ,[p=duct q=rove])
      |-  ^+  ..wake
      ?~  xiq
        ..wake(qyx (~(gas by *cult) xaq))
      ?-    -.q.i.xiq
          &
        =+  cas=?~(ref ~ (~(get by haw.u.ref) `mood`p.q.i.xiq))
        ?^  cas
          %=    $
              xiq  t.xiq
              ..wake  ?~  u.cas  (blub p.i.xiq)
                      (blab p.i.xiq p.q.i.xiq u.u.cas)
          ==
        =+  nao=(~(case-to-aeon ze lim dom ran) q.p.q.i.xiq)
        ?~  nao  $(xiq t.xiq, xaq [i.xiq xaq])
        $(xiq t.xiq, ..wake (balk p.i.xiq u.nao p.q.i.xiq))
      ::
          |
        =+  mot=`moot`p.q.i.xiq
        =+  nab=(~(case-to-aeon ze lim dom ran) p.mot)
        ?~  nab
          $(xiq t.xiq, xaq [i.xiq xaq])
        =+  huy=(~(case-to-aeon ze lim dom ran) q.mot)
        ?~  huy
          =+  ptr=[%ud +(let.dom)]
          %=  $
            xiq     t.xiq
            xaq     [[p.i.xiq [%| ptr q.mot r.mot s.mot]] xaq]
            ..wake  =+  ^=  ear
                        (~(lobes-at-path ze lim dom ran) let.dom r.p.q.i.xiq)
                    ?:  =(s.p.q.i.xiq ear)  ..wake
                    =+  fud=(~(make-nako ze lim dom ran) u.nab let.dom)
                    (bleb p.i.xiq let.dom fud)
          ==
        %=  $
          xiq     t.xiq
          ..wake  =-  (blub:- p.i.xiq)
                  =+  ^=  ear
                      (~(lobes-at-path ze lim dom ran) u.huy r.p.q.i.xiq)
                  ?:  =(s.p.q.i.xiq ear)  (blub p.i.xiq)
                  =+  fud=(~(make-nako ze lim dom ran) u.nab u.huy)
                  (bleb p.i.xiq +(u.nab) fud)
        ==
      ==
    --

This is even longer than ++knit, but it's pretty similar to ++eave. We loop through each of our subscribers xiq, processing each in turn. When we're done, we just put the remaining subscribers back in our subscriber list.

If the subscriber is a single read, then, if this is a foreign desk (note that ++wake is called from other arms, and not only on foreign desks). Obviously, if we find an identical request there, then we can produce the result immediately. Referential transparency for the win. We produce the result with a call to ++blab. If this is a foreign desk but the result is not in the cache, then we produce ++blub (canceled subscription with no data) for reasons entirely opaque to me. Seriously, it seems like we should wait until we get an actual response to the request. If someone figures out why this is, let me know. At any rate, it seems to work.

If this is a domestic desk, then we check to see if the case exists yet. If it doesn't, then we simply move on to the next subscriber, consing this one onto xaq so that we can check again the next time around. If it does exist, then we call ++balk to fulfill the request and produce it.

++balk is very simple, so we'll describe it here before we get to the subscription case.

    ++  balk                                          ::  read and send
      |=  [hen=duct oan=@ud mun=mood]
      ^+  +>
      =+  vid=(~(read-at-aeon ze lim dom ran) oan mun)
      ?~  vid  (blub hen)  (blab hen mun u.vid)

We call ++read-at-aeon on the given request and aeon. If you recall, this processes a mood at a particular aeon and produces the result, if there is one. If there is data at the requested location, then we produce it with ++blab. Else, we call ++blub to notify the subscriber that no data can ever come over this subscriptioin since it is now impossible for there to ever be data for the given request. Because referential transparency.

At any rate, back to ++wake. If the given rave is a subscription request, then we proceed similarly to how we do in ++eave. We first try to get the aeon referred to by the starting case. If it doesn't exist yet, then we can't do anything interesting with this subscription, so we move on to the next one.

Otherwise, we try to get the aeon referred to by the ending case. If it doesn't exist yet, then we produce all the information we can. We call ++lobes-at-path at the given aeon and path to see if the requested path has actually changed. If it hasn't, then we don't produce anything; else, we produce the correct nako by calling ++bleb on the result of ++make-nako, as in ++eave. At any rate, we move on to the next subscription, putting back into our cult the current subscription with a new start case of the next aeon after the present.

If the aeon referred to by the ending case does exist, then we drop this subscriber from our cult and satisfy its request immediately. This is the same as before -- we check to see if the data at the path has actually changed, producing it if it has; else, we call ++blub since no more data can be produced over this subscription.

This concludes our discussion of foreign requests.