unison/docs/codebase-editor-design.markdown

_About this doc:_ API and UX design for Unison codebase editor WIP

`Id` is the basic type used to refer to things in a Unison codebase:

```Haskell
type Id
  = One Reference
  | Cycle [Hash] Int -- Int is which Hash in the `[Hash]` is being ref'd

type Reference = Builtin Text | HashRef Hash
```

Mutually recursive definitions get a `Cycle`-based `Id`.

`Scope` denotes a `Set Id`. We might build these up symbolically:

```Haskell
Scope.transitiveDependencies : Id -> Scope
Scope.transitiveDependents : Id -> Scope

Scope.union : Scope -> Scope -> Scope
-- maybe other set operations?
-- Scope.intersection : Scope -> Scope -> Scope
```

Here's the codebase API:

```Haskell
effect Codebase where

  -- Parse an expression
  Term.parse : Text ->{Codebase} Term

  -- Parse one or more type declarations
  TypeDeclarations.parse : Text ->{Codebase} TypeDeclarations

  -- Parse one or more term declarations
  TermDeclarations.parse : Text ->{Codebase} TermDeclarations

  -- Get the ids in a type declarations block
  TypeDeclarations.ids : TypeDeclarations -> Map Name Id

  -- Pretty-print some type declarations
  TypeDeclarations.text : TypeDeclarations ->{Codebase} Text

  -- Pretty-print some term declarations
  TermDeclarations.text : TermDeclarations ->{Codebase} Text

  -- optional : TypeDeclarations.Html : TypeDeclarations ->{Codebase} Html

  -- Get the ids in a type declarations block
  TermDeclarations.ids : TermDeclarations -> Map Name Id

  -- Actually add one or more type declarations to the codebase
  TypeDeclarations.add : TypeDeclarations ->{Codebase} ()

  -- Add one or more term declarations to the codebase
  TermDeclarations.add : TermDeclarations ->{Codebase} ()

  -- Resolve a name to zero or more ids with that name
  resolve : Name ->{Codebase} Set Id

  -- idea:
  -- search : Pattern ->{Codebase} Set Id

  -- Look up the names associated with an `Id`
  names : Id ->{Codebase} (Set Name)

  -- Add a name associated with the `Id`
  addName : Id -> Name ->{Codebase} ()

  -- Remove a name from association with the `Id`
  removeName : Id -> Name ->{Codebase} ()

  -- other metadata CRUD operations

  -- `addLicense id licenseId`
  addLicense : Id -> Id ->{Codebase} ()

  -- `addAuthor id authorId`
  addAuthor : Id -> Id ->{Codebase} ()

  getLicense : Id ->{Codebase} License
  getAuthor : Id ->{Codebase} Author

  -- The immediate dependencies of an `Id` (not transitive)
  dependencies : Id ->{Codebase} Set Id

  -- The immediate dependents of an `Id` (not transitive)
  dependents : Id ->{Codebase} Set Id

  -- The immediate ancestor of an `Id`
  ancestor : Id ->{Codebase} Id

  -- Evaluate a term to normal form
  run : Term ->{Codebase} Term
  runIO : Term ->{IO,Codebase} Term
  ...


  -- Create a new empty transaction, with the empty scope
  Transaction.new : Name ->{Codebase} Transaction

  -- Look up a transaction by name
  Transaction.resolve : Name ->{Codebase} Set Transaction

  -- Read the fields of a transaction

  -- The `Id`s that define the scope of the transaction
  Transaction.scope : Transaction ->{Codebase} Set Id

  -- Union the given `Scope` with the existing transaction's `Scope`.
  Transaction.addScope : Transaction -> Scope ->{Codebase} ()

  -- An element of the scope needs to have been updated if one of its
  -- dependencies has been edited or removed as part of the transaction.
  -- `remainder` is just a count of the elements of `scope` needing updating.
  -- Conceptually:
  -- length [ i | i <- Transactions.scope t,
                  Transaction.dependencies i `intersects` (
                    Transaction.editedSet t `union` Transaction.removed t),
                  Set.notSingleton $ Map.lookup i (Transaction.canonical t) ]
  Transaction.remainder : Transaction ->{Codebase} Natural

  -- We never delete from the `edited` set, we only accumulate.
  Transaction.edited : Transaction ->{Codebase} Map Id (Set Id)

  -- Convenience function, just pulls out keys of `edited`.
  Transaction.editedKeySet : Transaction ->{Codebase} Set Id

  -- This is always a submap of `edited` which reflects any conflict resolutions
  Transaction.canonical : Transaction ->{Codebase} Map Id (Set Id)

  -- These are definitions newly introduced as part of this transaction
  Transaction.added : Transaction ->{Codebase} Set Id

  -- These are definitions deleted as part of this transaction
  Transaction.removed : Transaction ->{Codebase} Set Id

  -- Add a definition to the `added` set of the transaction
  Transaction.add : Transaction -> Id ->{Codebase} ()

  -- Add a definition to the `removed` set of the transaction
  Transaction.remove : Transaction -> Id ->{Codebase} ()

  -- `edit t old new` adds `new` to the `edited` map for `old` and
  -- to the `canonical` map for `old`.
  Transaction.edit : Transaction -> Id -> Id ->{Codebase} ()

  -- `resolve id selected` replaces the canonical set with the selected
  -- `Id`, discarding any prior selections
  -- Discarded `Id` can be computed by comparing the `edited` (which has
  -- everything) and `canonical` (which has current selections)
  --
  -- What should this do if the selected `Id` is unrelated to any of the
  -- existing ids?
  Transaction.resolve : Transaction -> Id -> Id ->{Codebase} ()

  -- Actually apply the transaction to the codebase. Idempotent.
  -- What does this actually do???
  -- Presumably something with names...
  -- Arya proposal: If scope includes Id of a namespace / first-class
  --                naming preference object, then update that ... something something something
  -- Observation: Because you can have scopes that are smaller than "the entire
  --              codebase" (which isn't really a thing), git branches aren't
  --              a useful thing for tracking concurrent development (because
  --              git branches only work at the level of the whole git repo).
  -- Paul: seems like we need a `Branch`
  -- `Branch` assigns zero or more names to each Id mapping.
  -- `Transactions` are functions Branch -> Branch.
  -- `Transaction.commit` applies that function to a branch.
  Transaction.commit : Transaction -> Branch ->{Codebase} ()

  -- Branch might just be a name prefix. `<branch-name>.List.sort.name`
  -- You should be able to import some branch

  -- no Transaction.merge for now
```

User preferences wrt to name rendering? Id -> name preferences

### Richer `Edit` type and automatic merging

Observation: if we make the edits to an `Id` a bit richer, can do a lot more work for the user when reconciling concurrent edits. Here's the start of a proposal:

```Haskell
-- A substitution can preserve the old type exactly (can propagate fully),
-- or can be a subtype of the old type (can propagate fully, but need to re-synthesize dependent's
-- types, as they might become more general).
type Substitution.Typing = Exact | Subtype

namespace Substitution.Typing where

  combine : Substitution.Typing -> Substitution.Typing -> Substitution.Typing
  combine Exact Exact = Exact
  combine _ _ = Subtype

-- Edits have a commutative merge operation, see below
type Edit
  = Substitutions (Map Id (Edit.Typing, Id))
  | Replace Id
  | Conflict (Set Edit)
```

The `edits` / `canonical` map inside a `Transaction` instead denotes a `Map Id Edit`. To merge these maps, we just `unionWith Edit.merge`. `Edit.merge` does the obvious thing - substitution maps get merged as long as keys are disjoint, a `Replace i` and a `Substitutions m` applies the substitutions to `i` to produce `i'`, and results in a `Replace i'`, and anything else is a `Conflict` needing user intervention.

This `Edit` merge operation addresses a lot of the common use cases, like if Alice upgrades `foo` in a type preserving way, propagates that change, and Bob updates `bar` in a type preserving way, then Alice and Bob can merge their work automatically. Whereas if we just have an opaque "replace this `Id` with that `Id`", we don't have enough information to be able to merge their work (except via some heuristic where we attempt some sort of tree-based merge of the ASTs... I think this should be a last resort)

### Notes

Paul: I wonder if we can just use Git branches as the branch concept? Use the discipline that if a transaction's scope is less than the whole Git repo, then when applied, the new names are all still prefixed with the transaction id. Only if the transaction is maximal do we remove the prefix. Or perhaps it is best to just move away from using the VCS concept of branches.

### Repository format

Design goal - a Unison repository can be versioned using Git (or Hg, or whatever), and there should never be merge conflicts when merging two Unison repositories. That is, Git merge conflicts are a bad UX for surfacing concurrent edits that the user may wish to reconcile.

```
terms/
  reference-english-JasVXOEBBV8.markdown -- the reference docs ref'd below
  license-8JSJdkVvvow92.markdown
  ...
  jAjGDJnsdfL.ub -- binary form of the term
  jAjGDJnsdfL/
    source-98asdfjKjsldfj.markdown -- source code, in markdown form
    Runar.factorial.name -- just an empty file
    math.factorial.name
    reference-english-JasVXOEBBV8.hash -- reference docs, in English
    reference-spanish-9JasdfjHNBdjj.hash -- reference docs
    doc-english-OD03VvvsjK.hash -- other docs
    license-8JSJdkVvvow92.hash -- reference to the license for this term
types/ -- directory of all type declarations
  8sdfA1baBw.ub -- binary form of the type declaration
  8sdfA1baBw/
    0/ -- constructor id, has a set of names
      Nil.name -- empty file
      Empty.name -- empty file
    1/ -- constructor id, has a set of names
      Cons.name
      Prepend.name
    reference-english-KgLfAIBw312.hash -- reference docs
    doc-english-8AfjKBCXdkw.hash -- other docs
```

Sets are represented by directories of immutable empty files whose file names represent the elements of the set - the sets are union'd as a result of a Git merge. Deletions are handled without conflicts as well.

Observation is that we'll probably want some additional indexing structure (which won't be versioned) which can be cached on disk and derived from the primary repo format. This is useful for answering different queries on the codebase more efficiently.
Update codebase-editor-design.markdown 2018-09-07 18:49:34 +03:00			`_About this doc:_ API and UX design for Unison codebase editor WIP`
codebase editor design notes (very rough) 2018-09-07 01:43:19 +03:00
			`Id` is the basic type used to refer to things in a Unison codebase:

			```Haskell
			`type Id`
			`= One Reference`
			\| Cycle [Hash] Int -- Int is which Hash in the `[Hash]` is being ref'd
Update codebase-editor-design.markdown 2018-09-07 18:49:34 +03:00
			`type Reference = Builtin Text \| HashRef Hash`
codebase editor design notes (very rough) 2018-09-07 01:43:19 +03:00			```

Update codebase-editor-design.markdown 2018-09-07 18:49:34 +03:00			Mutually recursive definitions get a `Cycle`-based `Id`.

codebase editor design notes (very rough) 2018-09-07 01:43:19 +03:00			`Scope` denotes a `Set Id`. We might build these up symbolically:

			```Haskell
			`Scope.transitiveDependencies : Id -> Scope`
			`Scope.transitiveDependents : Id -> Scope`

			`Scope.union : Scope -> Scope -> Scope`
			`-- maybe other set operations?`
			`-- Scope.intersection : Scope -> Scope -> Scope`
			```

			`Here's the codebase API:`

			```Haskell
			`effect Codebase where`

			`-- Parse an expression`
			`Term.parse : Text ->{Codebase} Term`

			`-- Parse one or more type declarations`
			`TypeDeclarations.parse : Text ->{Codebase} TypeDeclarations`

			`-- Parse one or more term declarations`
			`TermDeclarations.parse : Text ->{Codebase} TermDeclarations`

			`-- Get the ids in a type declarations block`
			`TypeDeclarations.ids : TypeDeclarations -> Map Name Id`

			`-- Pretty-print some type declarations`
			`TypeDeclarations.text : TypeDeclarations ->{Codebase} Text`

			`-- Pretty-print some term declarations`
			`TermDeclarations.text : TermDeclarations ->{Codebase} Text`

			`-- optional : TypeDeclarations.Html : TypeDeclarations ->{Codebase} Html`

			`-- Get the ids in a type declarations block`
			`TermDeclarations.ids : TermDeclarations -> Map Name Id`

			`-- Actually add one or more type declarations to the codebase`
			`TypeDeclarations.add : TypeDeclarations ->{Codebase} ()`

			`-- Add one or more term declarations to the codebase`
			`TermDeclarations.add : TermDeclarations ->{Codebase} ()`

			`-- Resolve a name to zero or more ids with that name`
			`resolve : Name ->{Codebase} Set Id`

			`-- idea:`
			`-- search : Pattern ->{Codebase} Set Id`

			-- Look up the names associated with an `Id`
			`names : Id ->{Codebase} (Set Name)`

			-- Add a name associated with the `Id`
			`addName : Id -> Name ->{Codebase} ()`

			-- Remove a name from association with the `Id`
			`removeName : Id -> Name ->{Codebase} ()`

			`-- other metadata CRUD operations`

			-- `addLicense id licenseId`
			`addLicense : Id -> Id ->{Codebase} ()`

			-- `addAuthor id authorId`
			`addAuthor : Id -> Id ->{Codebase} ()`

			`getLicense : Id ->{Codebase} License`
			`getAuthor : Id ->{Codebase} Author`

			-- The immediate dependencies of an `Id` (not transitive)
			`dependencies : Id ->{Codebase} Set Id`

			-- The immediate dependents of an `Id` (not transitive)
			`dependents : Id ->{Codebase} Set Id`

			-- The immediate ancestor of an `Id`
			`ancestor : Id ->{Codebase} Id`

			`-- Evaluate a term to normal form`
			`run : Term ->{Codebase} Term`
			`runIO : Term ->{IO,Codebase} Term`
			`...`


			`-- Create a new empty transaction, with the empty scope`
			`Transaction.new : Name ->{Codebase} Transaction`

			`-- Look up a transaction by name`
			`Transaction.resolve : Name ->{Codebase} Set Transaction`

			`-- Read the fields of a transaction`

			-- The `Id`s that define the scope of the transaction
			`Transaction.scope : Transaction ->{Codebase} Set Id`

			-- Union the given `Scope` with the existing transaction's `Scope`.
			`Transaction.addScope : Transaction -> Scope ->{Codebase} ()`

			`-- An element of the scope needs to have been updated if one of its`
			`-- dependencies has been edited or removed as part of the transaction.`
			-- `remainder` is just a count of the elements of `scope` needing updating.
			`-- Conceptually:`
			`-- length [ i \| i <- Transactions.scope t,`
			Transaction.dependencies i `intersects` (
			Transaction.editedSet t `union` Transaction.removed t),
			`Set.notSingleton $ Map.lookup i (Transaction.canonical t) ]`
			`Transaction.remainder : Transaction ->{Codebase} Natural`

			-- We never delete from the `edited` set, we only accumulate.
			`Transaction.edited : Transaction ->{Codebase} Map Id (Set Id)`

			-- Convenience function, just pulls out keys of `edited`.
			`Transaction.editedKeySet : Transaction ->{Codebase} Set Id`

			-- This is always a submap of `edited` which reflects any conflict resolutions
			`Transaction.canonical : Transaction ->{Codebase} Map Id (Set Id)`

			`-- These are definitions newly introduced as part of this transaction`
			`Transaction.added : Transaction ->{Codebase} Set Id`

			`-- These are definitions deleted as part of this transaction`
			`Transaction.removed : Transaction ->{Codebase} Set Id`

			-- Add a definition to the `added` set of the transaction
			`Transaction.add : Transaction -> Id ->{Codebase} ()`

			-- Add a definition to the `removed` set of the transaction
			`Transaction.remove : Transaction -> Id ->{Codebase} ()`

			-- `edit t old new` adds `new` to the `edited` map for `old` and
			-- to the `canonical` map for `old`.
			`Transaction.edit : Transaction -> Id -> Id ->{Codebase} ()`

			-- `resolve id selected` replaces the canonical set with the selected
			-- `Id`, discarding any prior selections
			-- Discarded `Id` can be computed by comparing the `edited` (which has
			-- everything) and `canonical` (which has current selections)
			`--`
			-- What should this do if the selected `Id` is unrelated to any of the
			`-- existing ids?`
			`Transaction.resolve : Transaction -> Id -> Id ->{Codebase} ()`

			`-- Actually apply the transaction to the codebase. Idempotent.`
			`-- What does this actually do???`
			`-- Presumably something with names...`
			`-- Arya proposal: If scope includes Id of a namespace / first-class`
			`-- naming preference object, then update that ... something something something`
			`-- Observation: Because you can have scopes that are smaller than "the entire`
			`-- codebase" (which isn't really a thing), git branches aren't`
			`-- a useful thing for tracking concurrent development (because`
			`-- git branches only work at the level of the whole git repo).`
			-- Paul: seems like we need a `Branch`
			-- `Branch` assigns zero or more names to each Id mapping.
			-- `Transactions` are functions Branch -> Branch.
			-- `Transaction.commit` applies that function to a branch.
			`Transaction.commit : Transaction -> Branch ->{Codebase} ()`

			-- Branch might just be a name prefix. `<branch-name>.List.sort.name`
			`-- You should be able to import some branch`

			`-- no Transaction.merge for now`
			```

			`User preferences wrt to name rendering? Id -> name preferences`

Update codebase-editor-design.markdown 2018-09-07 18:49:34 +03:00			### Richer `Edit` type and automatic merging

			Observation: if we make the edits to an `Id` a bit richer, can do a lot more work for the user when reconciling concurrent edits. Here's the start of a proposal:

			```Haskell
			`-- A substitution can preserve the old type exactly (can propagate fully),`
			`-- or can be a subtype of the old type (can propagate fully, but need to re-synthesize dependent's`
			`-- types, as they might become more general).`
			`type Substitution.Typing = Exact \| Subtype`

			`namespace Substitution.Typing where`

			`combine : Substitution.Typing -> Substitution.Typing -> Substitution.Typing`
			`combine Exact Exact = Exact`
			`combine _ _ = Subtype`

			`-- Edits have a commutative merge operation, see below`
			`type Edit`
			`= Substitutions (Map Id (Edit.Typing, Id))`
			`\| Replace Id`
			`\| Conflict (Set Edit)`
			```

			The `edits` / `canonical` map inside a `Transaction` instead denotes a `Map Id Edit`. To merge these maps, we just `unionWith Edit.merge`. `Edit.merge` does the obvious thing - substitution maps get merged as long as keys are disjoint, a `Replace i` and a `Substitutions m` applies the substitutions to `i` to produce `i'`, and results in a `Replace i'`, and anything else is a `Conflict` needing user intervention.

			This `Edit` merge operation addresses a lot of the common use cases, like if Alice upgrades `foo` in a type preserving way, propagates that change, and Bob updates `bar` in a type preserving way, then Alice and Bob can merge their work automatically. Whereas if we just have an opaque "replace this `Id` with that `Id`", we don't have enough information to be able to merge their work (except via some heuristic where we attempt some sort of tree-based merge of the ASTs... I think this should be a last resort)

			`### Notes`

			`Paul: I wonder if we can just use Git branches as the branch concept? Use the discipline that if a transaction's scope is less than the whole Git repo, then when applied, the new names are all still prefixed with the transaction id. Only if the transaction is maximal do we remove the prefix. Or perhaps it is best to just move away from using the VCS concept of branches.`

			`### Repository format`

			`Design goal - a Unison repository can be versioned using Git (or Hg, or whatever), and there should never be merge conflicts when merging two Unison repositories. That is, Git merge conflicts are a bad UX for surfacing concurrent edits that the user may wish to reconcile.`

			```
			`terms/`
			`reference-english-JasVXOEBBV8.markdown -- the reference docs ref'd below`
			`license-8JSJdkVvvow92.markdown`
			`...`
			`jAjGDJnsdfL.ub -- binary form of the term`
			`jAjGDJnsdfL/`
			`source-98asdfjKjsldfj.markdown -- source code, in markdown form`
			`Runar.factorial.name -- just an empty file`
			`math.factorial.name`
			`reference-english-JasVXOEBBV8.hash -- reference docs, in English`
			`reference-spanish-9JasdfjHNBdjj.hash -- reference docs`
			`doc-english-OD03VvvsjK.hash -- other docs`
			`license-8JSJdkVvvow92.hash -- reference to the license for this term`
			`types/ -- directory of all type declarations`
			`8sdfA1baBw.ub -- binary form of the type declaration`
			`8sdfA1baBw/`
			`0/ -- constructor id, has a set of names`
			`Nil.name -- empty file`
			`Empty.name -- empty file`
			`1/ -- constructor id, has a set of names`
			`Cons.name`
			`Prepend.name`
			`reference-english-KgLfAIBw312.hash -- reference docs`
			`doc-english-8AfjKBCXdkw.hash -- other docs`
			```

			`Sets are represented by directories of immutable empty files whose file names represent the elements of the set - the sets are union'd as a result of a Git merge. Deletions are handled without conflicts as well.`

			`Observation is that we'll probably want some additional indexing structure (which won't be versioned) which can be cached on disk and derived from the primary repo format. This is useful for answering different queries on the codebase more efficiently.`