--- title: Filesystem handbook sort: 6 next: true --- # Filesystem handbook Urbit has its own revision-controlled filesystem, the `%clay` vane. `%clay` is like a simplified `git`, but more reactive, and also typed. Okay, this makes no sense. The most common way to use `%clay` is to mount a `%clay` node in a Unix directory. The Urbit process will watch this directory and automatically record edits as changes, Dropbox style. The mounted directory is always at the root of your pier directory. ## Commands Note that in both commands and generators, a currently unbound case (such as a version in the future) will make the calculation block, not complete. A remote case will cause a network request. A remote, unbound case will cause a waiting subscription. ### Mounting to Unix #### `|mount [pax=path pot=$|(~ [knot ~])]` Mount the path `pax` at the Unix mount point `pot`, the name of a subdirectory in your pier. |mount %/pub/doc %documents with a `$PIER` of `/home/nixon/urbit/fintud-macrep`, will mount `%/pub/doc` in `/home/nixon/urbit/fintud-macrep/documents`. The mount point is optional; if it's not supplied, the last knot in the path (`%doc`) will be used. #### `|unmount [mon=$|(term [knot path]) ~] ` Undo a mount, either by specifying the path or the mount point: |unmount %/pub/doc |unmount %documents It's a good habit to also delete the Unix subtree, but Urbit doesn't do it for you. ### Revision-control operations #### `|merge [syd=desk src=beak how=$|(~ [germ ~])]` Merge the beak `src` into the desk `syd`, with optional merge strategy `how`. The `src` beak can be a desk (`%home`); a plot-desk cell (`[~doznec %home]`); or a plot-desk-case path (`/=home=`). |merge %home-work /=home= %fine |merge %home-work /=home= #### `|sync [syd=desk her=plot org=$|(~ [desk ~])]` Activate autosync from the plot `her` and source desk `org`, into the desk `syd`. If `org` is omitted, it's the same as `syd`: |sync %home-local ~doznec %home |sync %home ~doznec Note that `|merge` takes a path because it needs a source case (revision), which would make no sense for `|sync`. #### `|label [syd=desk lab=term]` Label the current version of desk `syd`: #### `|unsync [syd=desk her=plot org=desk ~]` Turn off autosync. The argument needs to match the original `|sync` perfectly, or Urbit will become angry and confused. ### Filesystem manipulation #### `|rm [paz=(list path)]` Remove any leaf at each of the paths in `paz`. |rm /===/pub/fab/nixon/hoon Remember that folders in `%clay` are a consequence of the tree of leaves; there is no `rmdir` or `mkdir`. #### `|cp [too=path fro=path how=$|(~ [germ ~])]` Copy the subtree `fro` into the subtree `too`, committing it with the specified merge strategy. #### `|mv [too=path fro=path how=$|(~ [germ ~])]` In `%clay`, `|mv` is just a shorthand for `|cp` then `|rm`. The `|rm` doesn't happen unless the `|cp` succeeds, obviously -- it's good to be transactional. ### Filesystem generators #### `+cal [paz=(list path)]` #### `+cat [pax=path]` Produce the noun, if any, at each of these (global) paths. `+cat` produces one result, `+cal` a list. #### `+ls [pax=path ~]` Produce the list of names in the folder at `pax`. Because generators aren't passed the dojo's default path, unlike the current directory in Unix, it's not possible to build an trivial `+ls` that's the equivalent of Unix `ls`. You always have to write `+ls %`. #### `+ll [pax=path ~]` Like `+ls`, but the result is a list of full paths. Useful as Urbit equivalent of the Unix wildcard `*`. ## A quick overview of `%clay` `%clay` is a typed, global revision-control system. Or in other words, a typed, global referentially transparent namespace. It's difficult to understate how awesome this is. (Actually, in Layer 4 and 5 code, you can use the Hoon `.^` rune to literally *dereference* this namespace. And in Layer 5, a generator will even *block* until the resource is available.) (Another awesome global immutable namespace is IPFS. But IPFS is distributed, whereas `%clay` is just decentralized. IPFS stores resources around the network in a DHT, like Freenet or Bittorrent; `%clay` stores resources on the publisher's server, like HTTP or git.) ### Path format As a noun, a path in `%clay` is a `(list knot)`, where each segment is an `@ta` atom -- URL-safe text, restricted to `[a z]`, `[0 9]`, `.`, `-`, `_` and `~`. The list is a tuple terminated with a Hoon null, `~`. As an ordinary Hoon noun, `[%foo %bar %baz]` has this structure. But Hoon also supports the Unix path syntax: `/foo/bar/baz` is the same noun. ### Relative paths The Hoon path syntax is always defined relative to a default path, which is configuration state in the Hoon parser. In `:dojo`, this works a little like the Unix current directory. (But note that in Unix, relative paths are expanded by the application, which can read the current directory from the environment. In Urbit, the current directory and variables are hidden by the dojo from any code it runs. The parser generates the absolute path -- more like the way a Unix shell parser unglobs `*`.) Relative path syntax: `%` is the default path (Unix `.`). `%%` is the parent path (Unix `..`). Unix does not have `...`, `....`, etc. But Urbit has `%%%`, `%%%%`, etc. Urbit has no local relative paths; in Unix, `foo/bar` is a shorthand for `./foo/bar`, but in Urbit you have to write `%/foo/bar`. Unix has no top-level substitution syntax, but Urbit does. If the default path is `/foo/bar/baz`, `/=/moo` means `/foo/moo`, and `/=/moo/=/goo` means `/foo/moo/baz/goo`. Also, instead of `/=/=/zoo` or `/=/=/=/voo`, write `/==zoo` or `/===voo`. Your fingers have enough miles on them already. ### Beak The top three knots in a `%clay` path are `/plot/desk/case`, where `plot` is of course an urbit; `desk` is a branch name; and `case` is a revision identity, which is either (a) a label, (b) a date, or (c) a change number. For obscure reasons, this prefix is called the `beak`. ### Spur The rest of the path, or `spur`, navigates a tree of `node` nouns. A `node` is like an inode in a Unix filesystem, but different. An inode is *either* a file or a directory. A `node` is *both* a folder (which may be empty) and an optional leaf (a noun). There is no `rmdir` or `mkdir`; an empty node is automatically pruned, and creating a node creates its path. The absence of a file-or-directory mode bit eliminates all kinds of strange corner cases, especially in merging. ### Leaf `%clay` is a typed filesystem, or more precisely a *marked* one. When we sync Unix and Urbit paths, we convert a Unix file extension (an informal specifications) into a Urbit `*mark*` (an executable specification) The mark name is actually the last knot in the path. Or to put it differently: if any `%clay` node has a leaf, its name within its parent is its mark. This is ridiculously confusing without examples. Suppose we have the following Unix files, with directories to match: doc.md doc/intro.md doc/start.md These become the Urbit files %/doc/md %/doc/intro/md %/doc/start/md The folder map of the `%/doc` node contains three entries: `%md`, `%intro`, `%start`. The folder of `%/doc/intro` and that of `%/doc/start` each contain one entry: `%md` (the mark of an atom in Markdown syntax). Perhaps this example helps explains *why* `%clay` uses this node design. One, it's a simple index-page model for any kind of published tree. Two, this tree can expand its leaves smoothly, just by adding content: if we decided `%/doc/start` was not a leaf but a tree, we could just add `%/doc/start/child/md`. And three, the `%clay` node structure syncs invertibly with an equivalent, and not unduly weird, Unix inode layout. ### Mounting to Unix The most convenient way of interacting with `%clay` is mounting it to Unix, and modifying it with Unix tools. The mount directory is a flat subdirectory of your Urbit pier. When you have a live mount point, Urbit monitors it with `inotify()` or equivalent. (It would be neat to have a FUSE driver, but we don't.) If you shut your urbit off, it will recheck the mount point when it reloads. Unix files beginning with `.`, with no extension, with an extension that doesn't map to an Urbit mark, or containing data that doesn't validate to the mark, are ignored. Depending on the extension, there may be a more or less complex conversion from the Unix length/bytestream pair to the Urbit noun. ### More about desks and marks The Hoon source code for a mark like `%md` is in `/===/mar/md/hoon`. But relative to what beak? What's in the `/===`? The mark source of a leaf in `%clay` is always relative to its own plot, desk and case. For example, a leaf at `/~fintud-macrep/home/31/pub/doc/hello/md` is controlled by the mark source `/~fintud-macrep/home/31/mar/md/hoon` If there is no such file or it doesn't compile, the mark is effectively treated as `%noun`, ie, an arbitrary value. (Note that when updating a mark, any update which shrinks the set of nouns in that mark needs to at least adapt old nouns to new. Also, mark source updates should be very slow, but aren't. They should validate all nouns against the new mark, but don't.) What can you do with a mark? Validate an arbitrary noun; perform diffs, patches, and and conflict merges; transform to or from another mark. The `%ford` vane, which builds and converts nouns, can even discover and apply multi-step conversion paths. Marks are also used to describe network messages. In this case, the mark source beak is the beak of the receiving urbit. ### Desks and merging As in any git-shaped revision control system, the core operation of the system is merging. One of the effects of same-beak marks is that it doesn't make sense to create an empty desk. You can't populate an empty desk properly with typed files. Instead, a new desk should be merged from an existing desk -- normally the default desk, `%home`. It's also generally bad style to edit directly in the desk you want to modify. Your Unix filesystem changes will appear as a stream of small, unstructured changes. You should be editing a working desk. Conventionally, to change `%home`, merge `%home` into `%home-work`, edit there, and merge back as a "commit." Ideally, your "commits" include modifications to a text file that acts as a changelog. So merges are important. Again as in `git`, merge strategies are important. That said, if you are not doing exciting things with `%clay`, you can skip the strategy subsection. By default, `%clay` will always use the `%auto` meta-strategy, which will always work if you're not doing exciting things. #### Merge strategies There are seven different merge strategies. Throughout our discussion, we'll say that the merge is from Alice's desk to Bob's. ##### Direct strategies A `%init` merge should be used iff it's the first commit to a desk. The head of Alice's desk is used as the number 1 commit to Bob's desk. Obviously, the ancestry remains intact when traversing the parentage of the commit, even though previous commits are not numbered for Bob's desk. A `%this` merge means to keep what's in Bob's desk, but join the ancestry. Thus, the new commit has the head of each desk as parents, but the data is exactly what's in Bob's desk. For those following along in git, this is the 'ours' merge strategy, not the '--ours' option to the 'recursive' merge strategy. In other words, even if Alice makes a change that does not conflict with Bob, we throw it away. A `%that` merge means to take what's in Alice's desk, but join the ancestry. This is the reverse of `%this`. A `%fine` merge is a "fast-forward" merge. This succeeds iff one head is in the ancestry of the other. In this case, we use the descendant as our new head. For `%meet`, `%mate`, and `%meld` merges, we first find the most recent common ancestor to use as our merge base. If we have no common ancestors, then we fail. If we have multiple most recent common ancestors, then we have a criss-cross situation, which should be handled delicately. At present, we don't handle this kind of situation, but something akin to git's 'recursive' strategy should be implemented in the future. There's a functional inclusion ordering on `%fine`, `%meet`, `%mate`, and `%meld` such that if an earlier strategy would have succeeded, then every later strategy will produce the same result. Put another way, every earlier strategy is the same as every later strategy except with a restricted domain. A `%meet` merge only succeeds if the changes from the merge base to Alice's head (hereafter, "Alice's changes") are in different files than Bob's changes. In this case, the parents are both Alice's and Bob's heads, and the data is the merge base plus Alice's changed files plus Bob's changed files. A `%mate` merge attempts to merge changes to the same file when both Alice and Bob change it. If the merge is clean, we use it; otherwise, we fail. A merge between different types of changes -- for example, deleting a file vs changing it -- is always a conflict. If we succeed, the parents are both Alice's and Bob's heads, and the data is the merge base plus Alice's changed files plus Bob's changed files plus the merged files. A `%meld` merge will succeed even if there are conflicts. If there are conflicts in a file, then we use the merge base's version of that file, and we produce a set of files with conflicts. The parents are both Alice's and Bob's heads, and the data is the merge base plus Alice's changed files plus Bob's changed files plus the successfully merged files plus the merge base's version of the conflicting files. ##### Meta-strategies There's also a meta-strategy `%auto`, which is the most common. If no strategy is supplied, then `%auto` is assumed. `%auto` checks to see if Bob's desk exists, and if it doesn't we use a `%init` merge. Otherwise, we progressively try `%fine`, `%meet`, and `%mate` until one succeeds. If none succeed, we merge Bob's desk into a scratch desk. Then, we merge Alice's desk into the scratch desk with the `%meld` option to force the merge. For each file in the produced set of conflicting files, we call the `++mash` function for the appropriate mark, which annotates the conflicts if we know how. Finally, we display a message to the user informing them of the scratch desk's existence, which files have annotated conflicts, and which files have unannotated conflicts. When the user has resolved the conflicts, they can merge the scratch desk back into Bob's desk. This will be a `%fine` merge since Bob's head is in the ancestry of the scratch desk. ### Autosync Since `%clay` is reactive, it has a subscription interface. Changes to the filesystem create events which code at Layers 3 or 4 (vanes or apps) can listen to. The `:hood` appliance uses subscriptions to implement "autosync". When one desk is synced to another, any changes to the first desk are automatically applied to the second -- for any two desks, on any two urbits. Autosync isn't just mirroring. The target desk might have changes of its own. We use the full merge capabilities of `%clay` to try to make the merge clean. If there are conflicts, it'll notify you through `:talk`, and ask you to resolve. There can be complex sync flows, many of which are useful. Often, many urbits will be synced to some upstream desk that is trusted to provide updates. Sometimes, it's useful to sync two desks to each other, so that changes to one or the other are mirrored. Cyclical sync structures are normal and healthy. Also, one desk can be the target of multiple autosyncs.