more docs

This commit is contained in:
jackfoxy 2023-05-18 13:24:00 -07:00
parent f6b9cf0732
commit 00ce810a17
12 changed files with 150 additions and 55 deletions

55
docs/ref-ch01-introduction.md Normal file → Executable file
View File

@ -2,7 +2,7 @@
## Manifesto
The relational data model is a fundamental component of the computing stack that until now has been conspicuously missing from Urbit. Why is this fundamental technology, with a sound foundation in relational algebra, set theory, and first order predicate calculus so frequently overlooked?
The relational data model has been conspicuously missing from Urbit. Why is this fundamental technology, with a sound foundation in relational algebra, set theory, and first order predicate calculus so frequently overlooked?
1. RDBMS technology is not typically covered in today's CS curriculums.
2. Developers don't want to hassle with setting up a server.
@ -28,37 +28,39 @@ An Urbit RDBMS deserves a _first principles_ approach to design and implementati
The Urbit RDBMS, Obelisk, consists of
1. A scripting language and parser (this document)
1. A scripting language (this document) and parser
2. A plan builder
3. Eventually, a front-end app...anyone can write one from the parser and plan APIs.
3. A front-end agent app...anyone can write one from the parser and plan APIs.
The scripting language, _urQL_, derives from SQL and varies in a few cases.
Queries are constructed in FROM..WHERE..SELECT.. order, the order of events in plan execution.
(The user should be cognizant of the ordering of events.)
Columns are atoms with auras.
Columns are typed atoms.
Table definitions do not allow for nullable columns.
All user-defined names follow the hoon term naming standard.
All user-defined names (excepting aliases) follow the hoon term naming standard.
All except the simplest functions are collected in their own section and aliased inline into SELECT clause and predicates.
Emphasizes composability and improves readability.
All except the simplest functions are collected in their own clause and inlined into SELECT clause and predicates by alias.
There are no inlined subqueries.
Inlined sub-queries banned improving readability.
JOINs and/or CTEs handle all such use cases and emphasize composability.
CTEs can be referenced for certain use cases in predicates.
Relational division is supported with a DIVIDED BY operator.
Set operations support nesting of queries on the right side.
All data manipulation commands (DELETE, INSERT, MERGE, UPDATE) as well as the SELECT statement can accept a dataset output by a prior TRANSFORM step and send its output dataset to the next step.
Reading and/or updating data on foreign ships is allowed provided the ship's pilot has granted permission.
Cross database joins are allowed, but not cross ship joins.
Views cannot be defined on foreign databases.
Queries can operate on previous versions and data of the databases via the AS OF clause.
This document has placeholders for Stored Procedures and Triggers, which have yet to be defined. We anticipate these will be points for integration with hoon.
Pivoting and Windowing will be in a future release.
This document has placeholders for Stored Procedures and Triggers, which have yet to be defined. We anticipate these will be points for integration with hoon and other agents.
## urQL language diagrams and general syntax
@ -81,7 +83,8 @@ All object names follow the hoon rules for terms, i.e. character set restricted
Column, table, and other aliases provide an alternative to referencing the qualified object name and follow the hoon term naming standards except that upper-case alphabetic characters are allowed and alias evaluation is case agnositc, e.g. `t1` and `T1` represent the same alias.
All objects in the database *sys* and namespace *sys* are owned by the system and read only for all user commands. The namespace *sys* may not be specified in any other database.
All objects in the database *sys* and namespace *sys* are owned by the system and read only for all user commands.
The namespace *sys* may not be specified in any other database.
## Common hints used throughout the reference
@ -102,11 +105,14 @@ All objects in the database *sys* and namespace *sys* are owned by the system an
<transform> [ AS ] <alias> --to do: refine this, it's not exactly <transform>
```
`<transform> ::=` from transform diagram.
When used as a `<common-table-expression>` (CTE) `<transform>` output must be a pass-thru virtual-table.
In a CTE the `WITH` clause is virtually the prior CTEs defined in the parent `<transform>`.
`<alias> ::= @t` case-agnostic, see alias naming discussion above.
Each `<common-table-expression>` is always referenced by alias, never inlined.
Each CTE is always referenced by alias, never inlined.
```
<table-set> ::=
@ -122,11 +128,13 @@ When `<view>, <table>` have the same name within a namespace, `<view>` is said t
A base-table, `<table>`, is the only manifestation of `<table-set>` that is not a computation.
Every other manifestation of `<table-set>` is a virtual-table and the row type may be a union type.
Every `<table-set>` is a virtual-table and the row type may be a union type.
If not cached, `<view>` must be evaluated to resolve.
`( column-1 [,...column-n] )` assigns column names to the widest row type of an incoming pass-thru table. `*` accepts an incoming pass-thru virtual-table assuming column names established by the previous set-command (`DELETE`, `INSERT`, `MERGE`, `QUERY`, or `UPDATE`) that created the pass-thru.
`( column-1 [,...column-n] )` assigns column names to the widest row type of an incoming pass-thru table.
`*` accepts an incoming pass-thru virtual-table assuming column names established by the previous set-command (`DELETE`, `INSERT`, `MERGE`, `QUERY`, or `UPDATE`) that created the pass-thru.
Similarly `*` as the output of `DELETE`, `INSERT`, `MERGE` creates a pass-thru virtual-table for consumption by the next step or ultimate product of a `<transform>`.
## Issues
@ -134,12 +142,13 @@ If not cached, `<view>` must be evaluated to resolve.
1. stored procedures TBD
2. triggers TBD
3. https://github.com/sigilante/l10n localization of date/time TBD
4. SELECT single column named top, bottom, or distinct is problematic
5. Add `DISTINCT` and other advanced aggregate features. Grouping Sets. Rollup. Cube. GROUPING function. Feature T301, 'Functional dependencies' from SQL 1999.
6. column:ast vase
7. value-literal:ast vase
8. set operators, multiple commands per transform
9. scalar and aggregate functions
10. grouping FROM/SELECT statements after set operation in `<transform>`
11. add aura @uc Bitcoin address 0c1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa
12. a path forward for arbitrary noun columns?
4. `SELECT` single column named top, bottom, or distinct is problematic
5. Add `DISTINCT` and other advanced aggregate features. Grouping Sets. Rollup. Cube. GROUPING function. Feature T301 'Functional dependencies' from SQL 1999 specification.
6. investigate changing column:ast and value-literal:ast to vase in parser
7. set operators, multiple commands per transform
8. scalar and aggregate functions
19. grouping FROM/SELECT statements after set operation in `<transform>`
10. add aura @uc Bitcoin address 0c1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa
11. a path forward for arbitrary noun columns?
12. pivoting and windowing will be in a future release.
13. `<view>` caching TBD.

35
docs/ref-ch02-types.md Normal file → Executable file
View File

@ -1,7 +1,10 @@
# Types
All data presentations (nouns) of the Obelisk system available for user interaction -- whether reading, manipulation, or creation -- are strongly typed.
All data representations (nouns) of the Obelisk system are strongly typed.
The fundamental data element is an atom typed by an aura. All data cells (the intersection of a table row and table column) are a typed atom. Obelisk supports the following auras:
The fundamental data element is an atom typed by an aura.
All data cells (the intersection of a `<table-set>` row and column) are a typed atom.
Obelisk supports the following auras:
|aura|type|representation|
|----|----|--------------|
@ -32,25 +35,35 @@ The fundamental data element is an atom typed by an aura. All data cells (the in
|@uw|unsigned base-64|0wx5~J|
|@ux|unsigned hexadecimal|0x84.5fed|
All datasets in Obelisk are tables. All tables either are, or derive from, base-tables spawned by `CREATE TABLE`.
All datasets in Obelisk are sets, meaning each typed element only exists once.
They are also commonly regarded as tables, meaning the index of each cell (row/column intersenction) can be calculated.
All tables either are, or derive from, base-tables spawned by `CREATE TABLE`.
Base-table (`<table>`} rows have exactly one type, the table's atomic aura-typed columns in a fixed order.
Base-table (`<table>`) rows have exactly one type, the table's atomic aura-typed columns in a fixed order.
```
<row-type> ::=
list <aura>
```
Each base-table is itself typed by its own definition.
Columns are sets typed by an aura and indexed by name.
```
<column-type> ::=
<aura/name>set
```
Each base-table is itself typed by its `<row-type>`.
```
<table-type> ::=
list <row-type>
<row-type>list
```
Base-table definitions include a unique primary ordering of rows, hence its type. This is not the case for every other instance of table (dataset).
Base-table definitions include a unique primary ordering of rows, hence it has list type, not set type. This is not the case for every other instance of `<table-set>`.
```
<table-set-type> ::=
set <row-type>
| set list <row-type>
```
Rows from `<view>`s, `<common-table-expression>`'s, and command output from `<query>`, `<merge>`, or any other table that is not a base-table can only have an immutable row ordering, if it was so specified. In general all these other tables have types are unions of `<row-type>`.
<row-type>list
| set set <row-type>
```
Rows from `<view>`s, `<common-table-expression>`'s, and command output from `<transform>`, or any other table that is not a base-table can only have an immutable row ordering, if it was so specified (i.e. the `SELECT` statement has an `ORDER BY` clause). In general all these other tables have types are unions of `<row-type>`s.
All the other static types in Obelisk are defined in sur/ast/hoon.
## Remarks
Ultimately even `<table>` rows are typed as sets, not lists, once they are referenced in a statement because statements can generally choose column ordering.

16
docs/ref-ch03-create.md Normal file → Executable file
View File

@ -10,14 +10,19 @@ Example:
CREATE DATABASE my-database
```
Discussion:
`CREATE DATABASE` must be the only command in a script. The script will fail if there are prior commands. As the first command it will succeed and subsequent commands will be ignored.
API:
```
+$ create-database $:([%create-database name=@t])
```
## Remarks
`CREATE DATABASE` must be the only command in a script. The script will fail if there are prior commands. As the first command it will succeed and subsequent commands will be ignored.
## Produced Metadata
## Exceptions
# CREATE INDEX
```
@ -53,6 +58,11 @@ API:
==
```
## Remarks
## Produced Metadata
# Exceptions
# CREATE NAMESPACE

82
docs/ref-ch04-data-manipulation.md Normal file → Executable file
View File

@ -4,15 +4,15 @@ TBD
# DELETE
Deletes rows from a `<table-set>`.
```
<delete> ::=
DELETE [ FROM ] [ <ship-qualifer> ] <table>
DELETE [ FROM ] <table-set>
[ WHERE <predicate> ]
```
Discussion:
Data in the namespace *sys* cannot be deleted.
API:
```
+$ delete
@ -22,6 +22,19 @@ API:
predicate=(unit predicate)
==
```
## Remarks
A stand-alone `DELETE` statement can only operate on a `<table>` and produces a `<transform>` of one command step with no CTEs.
When `<table-set>` is a `<table>` the command potentially mutates `<table>` and if so results in a state change of the Obelisk agent.
Data in the namespace *sys* cannot be deleted.
When `<table-set>` is a virtual table the command produces an output `<table-set>` which may be consumed as a pass-thru by a subsequent `<transform>` step.
## Produced Metadata
@@ROWCOUNT returns the total number of rows deleted
## Exceptions
`<table>` does not exist
@ -30,9 +43,11 @@ API:
# INSERT
Inserts rows into a `<table-set>`.
```
<insert> ::=
INSERT INTO [ <ship-qualifer> ] <table>
INSERT INTO <table-set>
[ ( <column> [ ,...n ] ) ]
{ VALUES (<scalar-expression> [ ,...n ] ) [ ,...n ]
| <query> }
@ -47,11 +62,7 @@ API:
| expression <binary-operator> expression }
```
Discussion:
The `VALUES` or `<query>` must provide data for all columns in the expected order.
Tables in the namespace *sys* cannot be inserted into.
Cord values are represented in single quotes 'this is a cord'.
Escape single quotes with double backslash thusly `'this is a cor\\'d'`.
TBD see functions chapter, still undergoing design development.
API:
```
@ -64,6 +75,25 @@ API:
==
```
## Remarks
A stand-alone `INSERT` statement can only operate on a `<table>` and produces a `<transform>` of one command step with no CTEs.
When `<table-set>` is a `<table>` the command potentially mutates `<table>` and if so results in a state change of the Obelisk agent.
Data in the namespace *sys* cannot be inserted into.
When `<table-set>` is a virtual table the command produces an output `<table-set>` which may be consumed as a pass-thru by a subsequent `<transform>` step.
The `VALUES` or `<query>` must provide data for all columns in the expected order.
Cord values are represented in single quotes 'this is a cord'.
Escape single quotes with double backslash thusly `'this is a cor\\'d'`.
## Produced Metadata
@@ROWCOUNT returns the total number of rows inserted
## Exceptions
`<table>` does not exist
`GRANT` permission on `<table>` violated
@ -71,6 +101,8 @@ API:
# TRUNCATE TABLE
Removes all rows in a base table.
```
<truncate-table> ::=
TRUNCATE TABLE [ <ship-qualifer> ] <table>
@ -84,6 +116,15 @@ API:
table=qualified-object
==
```
## Remarks
The command potentially mutates `<table>` and if so results in a state change of the Obelisk agent.
Tables in the namespace *sys* cannot be truncated.
## Produced Metadata
none
## Exceptions
`<table>` does not exist
@ -92,6 +133,8 @@ API:
# UPDATE
Changes content of selected columns in existing rows of a `<table-set>`.
```
<truncate-table> ::=
UPDATE [ <ship-qualifer> ] <table>
@ -111,6 +154,25 @@ API:
==
```
## Remarks
A stand-alone `UPDATE` statement can only operate on a `<table>` and produces a `<transform>` of one command step with no CTEs.
When `<table-set>` is a `<table>` the command potentially mutates `<table>` and if so results in a state change of the Obelisk agent.
Data in the namespace *sys* cannot be updated.
When `<table-set>` is a virtual table the command produces an output `<table-set>` which may be consumed as a pass-thru by a subsequent `<transform>` step.
The `VALUES` or `<query>` must provide data for all columns in the expected order.
Cord values are represented in single quotes 'this is a cord'.
Escape single quotes with double backslash thusly `'this is a cor\\'d'`.
## Produced Metadata
@@ROWCOUNT returns the total number of rows updated
## Exceptions
`<table>` does not exist
`GRANT` permission on `<table>` violated

0
docs/ref-ch05-query.md Normal file → Executable file
View File

0
docs/ref-ch06-merge.md Normal file → Executable file
View File

17
docs/ref-ch07-transform.md Normal file → Executable file
View File

@ -23,14 +23,6 @@
`AS OF` defaults to `NOW`.
`AS OF <inline-scalar>` inline Scalar function that returns `<timestamp>`.
```
<cmd> ::=
<delete>
| <insert>
| <merge>
| <query>
| <update>
```
```
<set-op> ::=
UNION
@ -45,6 +37,15 @@
Set operators `UNION`, etc. apply the previous result collection to the next query result or result from nested queries `( ... )`.
Left paren `(` can only exist singly, but right paren `)` may be stacked to any depth `...)))`.
```
<cmd> ::=
<delete>
| <insert>
| <merge>
| <query>
| <update>
```
```
<set-functions> ::=

0
docs/ref-ch08-functions.md Normal file → Executable file
View File

0
docs/ref-ch09-permissions.md Normal file → Executable file
View File

0
docs/ref-ch10-alter.md Normal file → Executable file
View File

0
docs/ref-ch11-drop.md Normal file → Executable file
View File

0
urql/sys.kelvin Normal file → Executable file
View File