15 KiB
next | sort | title |
---|---|---|
true | 6 | Filesystem handbook |
Filesystem handbook
Urbit has its own revision-controlled filesystem, the %clay
vane. %clay
is like a simplified git
, but more reactive,
and also typed. Okay, this makes no sense.
The most common way to use %clay
is to mount a %clay
node in
a Unix directory. The Urbit process will watch this directory
and automatically record edits as changes, Dropbox style. The
mounted directory is always at the root of your pier directory.
Commands
Note that in both commands and generators, a currently unbound case (such as a version in the future) will make the calculation block, not complete. A remote case will cause a network request. A remote, unbound case will cause a waiting subscription.
Mounting to Unix
|mount [pax=path pot=$|(~ [knot ~])]
Mount the path pax
at the Unix mount point pot
, the name of a
subdirectory in your pier.
|mount %/pub/doc %documents
with a $PIER
of /home/nixon/urbit/fintud-macrep
, will mount
%/pub/doc
in /home/nixon/urbit/fintud-macrep/documents
.
The mount point is optional; if it's not supplied, the last knot
in the path (%doc
) will be used.
|unmount [mon=$|(term [knot path]) ~]
Undo a mount, either by specifying the path or the mount point:
|unmount %/pub/doc
|unmount %documents
It's a good habit to also delete the Unix subtree, but Urbit doesn't do it for you.
Revision-control operations
|merge [syd=desk src=beak how=$|(~ [germ ~])]
Merge the beak src
into the desk syd
, with optional merge
strategy how
.
The src
beak can be a desk (%home
); a plot-desk cell
([~doznec %home]
); or a plot-desk-case path (/=home=
).
|merge %home-work /=home= %fine
|merge %home-work /=home=
|sync [syd=desk her=plot org=$|(~ [desk ~])]
Activate autosync from the plot her
and source desk org
, into
the desk syd
. If org
is omitted, it's the same as syd
:
|sync %home-local ~doznec %home
|sync %home ~doznec
Note that |merge
takes a path because it needs a source case
(revision), which would make no sense for |sync
.
|label [syd=desk lab=term]
Label the current version of desk syd
:
|unsync [syd=desk her=plot org=desk ~]
Turn off autosync. The argument needs to match the original
|sync
perfectly, or Urbit will become angry and confused.
Filesystem manipulation
|rm [paz=(list path)]
Remove any leaf at each of the paths in paz
.
|rm /===/pub/fab/nixon/hoon
Remember that folders in %clay
are a consequence of the tree of
leaves; there is no rmdir
or mkdir
.
|cp [too=path fro=path how=$|(~ [germ ~])]
Copy the subtree fro
into the subtree too
, committing it with
the specified merge strategy.
|mv [too=path fro=path how=$|(~ [germ ~])]
In %clay
, |mv
is just a shorthand for |cp
then |rm
. The
|rm
doesn't happen unless the |cp
succeeds, obviously -- it's
good to be transactional.
Filesystem generators
+cal [paz=(list path)]
+cat [pax=path]
Produce the noun, if any, at each of these (global) paths.
+cat
produces one result, +cal
a list.
+ls [pax=path ~]
Produce the list of names in the folder at pax
.
Because generators aren't passed the dojo's default path, unlike
the current directory in Unix, it's not possible to build an
trivial +ls
that's the equivalent of Unix ls
. You always
have to write +ls %
.
+ll [pax=path ~]
Like +ls
, but the result is a list of full paths. Useful as
Urbit equivalent of the Unix wildcard *
.
A quick overview of %clay
%clay
is a typed, global revision-control system. Or in other
words, a typed, global referentially transparent namespace. It's
difficult to understate how awesome this is.
(Actually, in Layer 4 and 5 code, you can use the Hoon .^
rune
to literally dereference this namespace. And in Layer 5, a
generator will even block until the resource is available.)
(Another awesome global immutable namespace is IPFS. But IPFS is
distributed, whereas %clay
is just decentralized. IPFS stores
resources around the network in a DHT, like Freenet or
Bittorrent; %clay
stores resources on the publisher's server,
like HTTP or git.)
Path format
As a noun, a path in %clay
is a (list knot)
, where each
segment is an @ta
atom -- URL-safe text, restricted to [a z]
,
[0 9]
, .
, -
, _
and ~
. The list is a tuple terminated
with a Hoon null, ~
.
As an ordinary Hoon noun, [%foo %bar %baz]
has this structure.
But Hoon also supports the Unix path syntax: /foo/bar/baz
is
the same noun.
Relative paths
The Hoon path syntax is always defined relative to a default
path, which is configuration state in the Hoon parser. In
:dojo
, this works a little like the Unix current directory.
(But note that in Unix, relative paths are expanded by the
application, which can read the current directory from the
environment. In Urbit, the current directory and variables are
hidden by the dojo from any code it runs. The parser generates
the absolute path -- more like the way a Unix shell parser
unglobs *
.)
Relative path syntax: %
is the default path (Unix .
). %%
is the parent path (Unix ..
). Unix does not have ...
,
....
, etc. But Urbit has %%%
, %%%%
, etc. Urbit has no
local relative paths; in Unix, foo/bar
is a shorthand for
./foo/bar
, but in Urbit you have to write %/foo/bar
.
Unix has no top-level substitution syntax, but Urbit does. If
the default path is /foo/bar/baz
, /=/moo
means /foo/moo
,
and /=/moo/=/goo
means /foo/moo/baz/goo
. Also, instead of
/=/=/zoo
or /=/=/=/voo
, write /==zoo
or /===voo
. Your
fingers have enough miles on them already.
Beak
The top three knots in a %clay
path are /plot/desk/case
,
where plot
is of course an urbit; desk
is a branch name; and
case
is a revision identity, which is either (a) a label, (b) a
date, or (c) a change number. For obscure reasons, this prefix
is called the beak
.
Spur
The rest of the path, or spur
, navigates a tree of node
nouns. A node
is like an inode in a Unix filesystem, but
different.
An inode is either a file or a directory. A node
is both a
folder (which may be empty) and an optional leaf (a noun).
There is no rmdir
or mkdir
; an empty node is automatically
pruned, and creating a node creates its path. The absence of a
file-or-directory mode bit eliminates all kinds of strange corner
cases, especially in merging.
Leaf
%clay
is a typed filesystem, or more precisely a marked one.
When we sync Unix and Urbit paths, we convert a Unix file extension
(an informal specifications) into a Urbit *mark*
(an
executable specification)
The mark name is actually the last knot in the path. Or to put
it differently: if any %clay
node has a leaf, its name within
its parent is its mark.
This is ridiculously confusing without examples. Suppose we have the following Unix files, with directories to match:
doc.md
doc/intro.md
doc/start.md
These become the Urbit files
%/doc/md
%/doc/intro/md
%/doc/start/md
The folder map of the %/doc
node contains three entries: %md
,
%intro
, %start
. The folder of %/doc/intro
and that of
%/doc/start
each contain one entry: %md
(the mark of an atom
in Markdown syntax).
Perhaps this example helps explains why %clay
uses this node
design. One, it's a simple index-page model for any kind of
published tree. Two, this tree can expand its leaves smoothly,
just by adding content: if we decided %/doc/start
was not a
leaf but a tree, we could just add %/doc/start/child/md
.
And three, the %clay
node structure syncs invertibly with an
equivalent, and not unduly weird, Unix inode layout.
Mounting to Unix
The most convenient way of interacting with %clay
is mounting
it to Unix, and modifying it with Unix tools. The mount
directory is a flat subdirectory of your Urbit pier.
When you have a live mount point, Urbit monitors it with
inotify()
or equivalent. (It would be neat to have a FUSE
driver, but we don't.) If you shut your urbit off, it will
recheck the mount point when it reloads.
Unix files beginning with .
, with no extension, with an
extension that doesn't map to an Urbit mark, or containing data
that doesn't validate to the mark, are ignored. Depending on the
extension, there may be a more or less complex conversion from
the Unix length/bytestream pair to the Urbit noun.
More about desks and marks
The Hoon source code for a mark like %md
is in
/===/mar/md/hoon
. But relative to what beak? What's in the
/===
?
The mark source of a leaf in %clay
is always relative to its
own plot, desk and case. For example, a leaf at
`/~fintud-macrep/home/31/pub/doc/hello/md`
is controlled by the mark source
`/~fintud-macrep/home/31/mar/md/hoon`
If there is no such file or it doesn't compile, the mark is
effectively treated as %noun
, ie, an arbitrary value.
(Note that when updating a mark, any update which shrinks the set of nouns in that mark needs to at least adapt old nouns to new. Also, mark source updates should be very slow, but aren't. They should validate all nouns against the new mark, but don't.)
What can you do with a mark? Validate an arbitrary noun; perform
diffs, patches, and and conflict merges; transform to or from
another mark. The %ford
vane, which builds and converts nouns,
can even discover and apply multi-step conversion paths.
Marks are also used to describe network messages. In this case, the mark source beak is the beak of the receiving urbit.
Desks and merging
As in any git-shaped revision control system, the core operation of the system is merging.
One of the effects of same-beak marks is that it doesn't make
sense to create an empty desk. You can't populate an empty desk
properly with typed files. Instead, a new desk should be merged
from an existing desk -- normally the default desk, %home
.
It's also generally bad style to edit directly in the desk you
want to modify. Your Unix filesystem changes will appear as a
stream of small, unstructured changes. You should be editing a
working desk. Conventionally, to change %home
, merge %home
into %home-work
, edit there, and merge back as a "commit."
Ideally, your "commits" include modifications to a text file that
acts as a changelog.
So merges are important. Again as in git
, merge strategies are
important. That said, if you are not doing exciting things with
%clay
, you can skip the strategy subsection. By default,
%clay
will always use the %auto
meta-strategy, which will
always work if you're not doing exciting things.
Merge strategies
There are seven different merge strategies. Throughout our discussion, we'll say that the merge is from Alice's desk to Bob's.
Direct strategies
A %init
merge should be used iff it's the first commit to a
desk. The head of Alice's desk is used as the number 1 commit to
Bob's desk. Obviously, the ancestry remains intact when
traversing the parentage of the commit, even though previous
commits are not numbered for Bob's desk.
A %this
merge means to keep what's in Bob's desk, but join the
ancestry. Thus, the new commit has the head of each desk as
parents, but the data is exactly what's in Bob's desk. For those
following along in git, this is the 'ours' merge strategy, not
the '--ours' option to the 'recursive' merge strategy. In other
words, even if Alice makes a change that does not conflict with
Bob, we throw it away.
A %that
merge means to take what's in Alice's desk, but join
the ancestry. This is the reverse of %this
.
A %fine
merge is a "fast-forward" merge. This succeeds iff one
head is in the ancestry of the other. In this case, we use the
descendant as our new head.
For %meet
, %mate
, and %meld
merges, we first find the most
recent common ancestor to use as our merge base. If we have no
common ancestors, then we fail. If we have multiple most
recent common ancestors, then we have a criss-cross situation,
which should be handled delicately. At present, we don't handle
this kind of situation, but something akin to git's 'recursive'
strategy should be implemented in the future.
There's a functional inclusion ordering on %fine
, %meet
,
%mate
, and %meld
such that if an earlier strategy would have
succeeded, then every later strategy will produce the same
result. Put another way, every earlier strategy is the same as
every later strategy except with a restricted domain.
A %meet
merge only succeeds if the changes from the merge base
to Alice's head (hereafter, "Alice's changes") are in different
files than Bob's changes. In this case, the parents are both
Alice's and Bob's heads, and the data is the merge base plus
Alice's changed files plus Bob's changed files.
A %mate
merge attempts to merge changes to the same file when
both Alice and Bob change it. If the merge is clean, we use it;
otherwise, we fail. A merge between different types of changes --
for example, deleting a file vs changing it -- is always a
conflict. If we succeed, the parents are both Alice's and Bob's
heads, and the data is the merge base plus Alice's changed files
plus Bob's changed files plus the merged files.
A %meld
merge will succeed even if there are conflicts. If
there are conflicts in a file, then we use the merge base's
version of that file, and we produce a set of files with
conflicts. The parents are both Alice's and Bob's heads, and the
data is the merge base plus Alice's changed files plus Bob's
changed files plus the successfully merged files plus the merge
base's version of the conflicting files.
Meta-strategies
There's also a meta-strategy %auto
, which is the most common.
If no strategy is supplied, then %auto
is assumed. %auto
checks to see if Bob's desk exists, and if it doesn't we use a
%init
merge. Otherwise, we progressively try %fine
,
%meet
, and %mate
until one succeeds.
If none succeed, we merge Bob's desk into a scratch desk. Then,
we merge Alice's desk into the scratch desk with the %meld
option to force the merge. For each file in the produced set of
conflicting files, we call the ++mash
function for the
appropriate mark, which annotates the conflicts if we know how.
Finally, we display a message to the user informing them of the
scratch desk's existence, which files have annotated conflicts,
and which files have unannotated conflicts. When the user has
resolved the conflicts, they can merge the scratch desk back into
Bob's desk. This will be a %fine
merge since Bob's head is in
the ancestry of the scratch desk.
Autosync
Since %clay
is reactive, it has a subscription interface.
Changes to the filesystem create events which code at Layers 3 or
4 (vanes or apps) can listen to.
The :hood
appliance uses subscriptions to implement "autosync".
When one desk is synced to another, any changes to the first desk
are automatically applied to the second -- for any two desks, on
any two urbits.
Autosync isn't just mirroring. The target desk might have
changes of its own. We use the full merge capabilities of
%clay
to try to make the merge clean. If there are conflicts,
it'll notify you through :talk
, and ask you to resolve.
There can be complex sync flows, many of which are useful. Often, many urbits will be synced to some upstream desk that is trusted to provide updates. Sometimes, it's useful to sync two desks to each other, so that changes to one or the other are mirrored. Cyclical sync structures are normal and healthy. Also, one desk can be the target of multiple autosyncs.