From 4f045561e67cab8795b2e1382fedd390690088c7 Mon Sep 17 00:00:00 2001 From: Martin von Zweigbergk Date: Sat, 12 Dec 2020 00:12:04 -0800 Subject: [PATCH] replace placeholder README.md with real content --- README.md | 213 ++++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 173 insertions(+), 40 deletions(-) diff --git a/README.md b/README.md index c0b22e1d0..ba3312790 100644 --- a/README.md +++ b/README.md @@ -1,53 +1,186 @@ -# New Project Template +# Jujube -This repository contains a template that can be used to seed a repository for a -new Google open source project. -See [go/releasing](http://go/releasing) (available externally at -https://opensource.google/docs/releasing/) for more information about -releasing a new Google open source project. +## Disclaimer -This template uses the Apache license, as is Google's default. See the -documentation for instructions on using alternate license. +This is not a Google product. It is an experimental version-control system +(VCS). It is not ready for use. It was written by me, Martin von Zweigbergk +(martinvonz@google.com). It is my personal hobby project. It does not indicate +any commitment or direction from Google. -## How to use this template -1. Clone it from GitHub. - * There is no reason to fork it. -1. Create a new local repository and copy the files from this repo into it. -1. Modify README.md and docs/contributing.md to represent your project, not the - template project. -1. Develop your new project! +## Introduction -``` shell -git clone https://github.com/google/new-project -mkdir my-new-thing -cd my-new-thing -git init -cp -r ../new-project/* ../new-project/.github . -git add * -git commit -a -m 'Boilerplate for new Google open source project' -``` +I started the project mostly in order to test the viability of some UX ideas in +practice. I continue to use it for that, but my short-term goal now is to make +it useful as an alternative CLI for Git repos. -## Source Code Headers +The storage design is similar to Git's in that it stores commits, trees, and +blobs. However, the blobs are actually split into three types: normal files, +symlinks (Unicode paths), and conflicts (more about that later). -Every file containing source code must include copyright and license -information. This includes any JS/CSS files that you might be serving out to -browsers. (This is to help well-intentioned people avoid accidental copying that -doesn't comply with the license.) +The command-line tool is called `jj` for now because it's easy to type and easy +to replace (rare in English). The project is called "Jujube" (a fruit) because +that's the first word I could think of that matched "jj". -Apache header: - Copyright 2020 Google LLC +## Features - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at +The following subsections describe the current features. The text is aimed at +readers who are already familiar with other VCSs. - https://www.apache.org/licenses/LICENSE-2.0 +### Compatible with Git - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. +The tool currently has two backends. One is called "local store" and is very +simple and inefficient. The other backend uses a Git repo as storage. The +commits are stored as regular Git commits. Commits can be read from and written +to an existing Git repo. This makes it possible to create a Jujube repo and use +it as an alternative interface for a Git repo (it will be backed by the Git repo +just like additional Git worktrees are). + +### Written as a library + +The repo consists of two main parts: the lib crate and the main (CLI) +crate. Most of the code lives in the lib crate. The lib crate does not print +anything to the terminal. The separate lib crate should make it relatively +straight-forward to add a GUI. + + +### Operations are performed repo-first + +Almost all operations are done in the repo first and then possibly reflected in +the working copy. The only exception so far is when committing the working copy, +which naturally uses the working copy as input. + +This makes it faster because the working copy doesn't need to get updated. It +also means that the working copy won't see spurious changes e.g. during a rebase +operation. It makes it safe to update the working copy while some operation is +running. + +### Supports Evolution + +Jujube copies the Evolution feature from Mercurial. It keeps track of when a +commit gets rewritten. A commit has a list of predecessors in addition to the +usual list of parents. This lets the tool figure out where to rebase descendant +commits to when a commit has been rewritten (amended, rebased, etc.). See +https://www.mercurial-scm.org/wiki/ChangesetEvolution for more information. + +### The working copy is a commit + +The working copy gets automatically committed when you interact with the +tool. This simplifies both implementation and UX. It also means that the working +copy is frequently backed up. + +Any changes to the working copy stays in place when you check out another +commit. That is different from Git and Mercurial, but I think it's more +intuitive for new users. To replicate the default behavior of Git/Mercurial, use +`jj rebase -r @ -d ` (`@` is a name for the working copy +commit). There is no need to stash/unstash. + +Commands become more consistent because the same command can operate on the repo +or another commit. For example, `jj log` includes the working copy (much like +`gitk` and other tools include a node for the working copy). `jj squash` +squashes a commit into its parent, including if it's the working copy (like `git +commit --amend`/`hg amend`). + +A commit description can be added to the working copy before "commit". The same +command (`jj describe`) is used for changing the description of any commit. + +### Commits can contains conflicts + +When a merge conflict happens, it is recorded within the tree object as a +special conflict object (not a file object with conflict markers). Conflicts are +stored as a lists of states to add and another list of states to remove. A +regular 3-way merge adds [B,C] and removes [A] (the common ancestor). A +modify/remove conflict adds [B] and removes [A]. An add/add conflict adds +[B,C]. An octopus merge of N commits adds N states and removes N-1 states. A +non-conflict state A is equivalent to a conflict state that just adds [A]. A +"state" here can be a normal file, a symlink, or a tree. This support for +in-tree conflicts has some interesting effects on both implementation and UX. + +It means that there is a consistent way of resolving conflicts: check out a +commit with conflicts in, resolve the conflicts, and amend them into the +conflicted commit. Then evolve descendant commits. + +It naturally enables collaborative conflict resolution. + +The in-tree conflicts means that there is no need for book-keeping in +rebase-like commands to support continue/abort operations. Instead, the rebase +can simply continue and create the desired new DAG shape. + +Conflicts get simplified on rebase by removing pairs of matching states in the +"add" and "remove" lists. For example, if B is based on A and then rebased to C, +and then to D, it will be a regular 3-way merge between B, and D with C as base +(no trace of A). This means that you can keep old commits rebased to head +without resolving conflicts, and you still won't have messy recursive conflicts. + +The conflict handling also results in some Darcs-/Pijul-like properties. For +example, if you rebase a commit and it results in conflicts, and you then back +out that commit, the conflict will go away. (I plan to make that work even if +there had been unrelated changes in the file, but I haven't gotten around to it +yet.) + +The criss-cross merge case becomes simpler. In Git, the virtual ancestor may +have conflicts and you may get nested conflict markers in the working copy. In +Jujube, the result is a merge with multiple parts, which may even get simplified +to not be recursive. + +The in-tree conflicts make it natural and easy to define the contents of a merge +commit to be the difference compared to the merged parents (the so-called "evil" +part of the merge), so that's what Jujube does. Rebasing merge commits therefore +works as you would expect (Git and Mercurial both handle rebasing of merge +commits poorly). It's even possible to change the number of parents while +rebasing, so if A is non-merge commit, you can make it a merge commit a merge +commit with `jj rebase -r A -d B -d C`. `jj diff -r ` will show you the +diff compared to the merged parents. + +I intend for commands that present the contents of a tree (such as listing +files) to use the "add" state(s) of the conflict, but that's not yet done. + +### Operations are logged + +Each write operation is logged to a content-addressed storage, much like the +commit storage. The Operation object has an associated View object, much like +the Commit object has a Tree object. The view object contains all the heads +currently in the repo, as well as the checked-out commit. It will also contain +the refs if I add support for that. The operation object can have multiple +parent operations, so it forms a DAG just like the commit graph does. There is +normally only one parent operation, but there can be multiple parents if +concurrent operations happened. + +I added the operation log as a solution for the problem of making concurrent +repo edit safe. When the repo is loaded, it is loaded at a particular operation, +which provides an immutable view of the repo. For a caller of the library to +start making changes, they then have to start a transaction. Once they are done +making changes to the transaction, they commit. The operation object is then +created. This step cannot fail (except if the file system runs out of space or +such). Pointers to the heads of the operation DAG are kept as files in a +directory (the filename is the operation id). When a new operation object has +been created, its operation id is added to the directory. The transaction's base +opertion id is then removed from that directory. If concurrent operations +happened, there would be multiple new operation ids in the directory and only +one base operation id would have been removed. If a reader sees the repo in this +state, it will attempt to merge the views and create a new operation with +multiple parents. If there are conflicts, the user will have to resolve it (I +haven't implemented that yet). + +As a nice side-effect of adding the operation log to solve the concurrent-edits +problem, we get some very useful UX features. Many UX features come from mapping +commands that work on the commit graph onto the operation graph. For example, if +you map `git revert`/`hg backout` onto the operation graph, you get an operation +that undoes a previous operation (called `jj op undo`). Note that any operation +can be undo, not just the latest one. If you map `git restore`/`hg revert` onto +the operation graph, you get an operation that rewinds the repo state to an +earlier point (called `jj op restore`). + +You can also see what the repo looked like at an earlier point with `jj +--at-op= log`. As mentioned earlier, the checkout is also part of +the view, so that command will show you where the working copy was at that +operation. If you do `jj restore -o `, it will also update the +working copy accordingly. This is actually how the working copy is always +updated: we first commit a transaction with a pointer to the new checkout and +then the working copy is updated to reflect that. + +## Future plans + +TODO