#936 pretty-print doc literals - spec out modified paragraph reflow/wrapping

Currently Unison takes doc literals like

  [:
  example usage:
   - note the call to bar
     - we pass it 1
  foo =
    bar 1
      baz
  :]

and renders them as

  [: example usage:
  - note the call to bar
  - we pass it 1
  foo =
  bar 1
  baz :]

due to the paragraph reflow/wrapping function.

Also, this function only works the first time you use it on a doc: if you `edit` a doc and mess with it, you won't get the wrapping refreshed the next time you view.  That's because the pretty-printing works by inserting newlines, which are treated as paragraph breaks by Unison the next time it sees the code.

I'm planning to fix this by
 - only doing paragraph reflow/wrapping for lines with zero indent (where zero is the left-most non-whitespace column of the doc)
 - normalizing docs during parse, including removing newlines that are within a paragraph.

See unison-src/transcripts/doc-formatting.md in this commit for more details of how this shakes out.  Not the .md.output because that's showing the current behavior without this change implemented.

Hopefully this is just a few more lines of code to implement, and then #936/#994 is good to merge.
This commit is contained in:
Chris Gibbs 2019-12-16 23:32:36 +00:00
parent 8f284d902c
commit 776af61064
6 changed files with 345 additions and 123 deletions

View File

@ -303,7 +303,7 @@ docBlock = do
segs <- many segment
closeTok <- closeBlock
let a = ann openTok <> ann closeTok
pure . docUnIndent $ Term.app a (Term.constructor a DD.docRef DD.docJoinId) (Term.seq a segs)
pure . docNormalize $ Term.app a (Term.constructor a DD.docRef DD.docJoinId) (Term.seq a segs)
where
segment = blob <|> linky
blob = do
@ -344,10 +344,17 @@ docBlock = do
-- Operates on the text of the Blobs within a doc (as parsed by docBlock):
-- - reduces the whitespace after all newlines so that at least one of the
-- non-initial lines has zero indent.
-- The pretty-printer will add indentation when rendering - we're undoing that here.
docUnIndent :: (Ord v, Show v) => AnnotatedTerm v a -> AnnotatedTerm v a
docUnIndent tm = case tm of
-- non-initial lines has zero indent (important because the pretty-printer adds
-- indenting every time the doc passes through it)
-- TODO implement the following
-- - remove trailing whitespace from each line
-- - remove newlines between any sequence of non-empty zero-indent lines
-- (i.e. undo manual line-breaking within paragraphs.)
-- Should be understood in tandem with Util.Pretty.paragraphyText, which
-- outputs doc text for display/edit/view.
-- See also unison-src/transcripts/doc-formatting.md.
docNormalize :: (Ord v, Show v) => AnnotatedTerm v a -> AnnotatedTerm v a
docNormalize tm = case tm of
-- This pattern is just `DD.DocJoin seqs`, but exploded in order to grab
-- the annotations. The aim is just to map `go` over it.
a@(Term.App' c@(Term.Constructor' DD.DocRef DD.DocJoinId) s@(Term.Sequence' seqs))

View File

@ -144,9 +144,15 @@ wrapImpl (p:ps) = wrap_ . Seq.fromList $
wrapString :: (LL.ListLike s Char, IsString s) => String -> Pretty s
wrapString s = wrap (lit $ fromString s)
-- 0. Preserve all leading and trailing whitespace
-- TODO - replace this spec...
-- 0. Preserve all leading and trailing whitespace
-- 1. Preserve all newlines
-- 2. Wrap all text in between newlines
-- TODO ... with this one:
-- Wrap all text on lines that don't start with a space.
-- Preserve other whitespace.
-- Should be understood in tandem with TermParser.docNormalize.
-- See also unison-src/transcripts/doc-formatting.md.
paragraphyText :: (LL.ListLike s Char, IsString s) => Text -> Pretty s
paragraphyText t = text start <> inner <> text end where
inner = sep "\n" . fmap (wrap . text) . Text.splitOn "\n" $ t'

View File

@ -1,17 +1,18 @@
This transcript verifies a few minor details about doc parsing and pretty-printing.
This transcript explains a few minor details about doc parsing and pretty-printing, both from a user point of view and with some implementation notes. (The ucm `add` commands and their output are hidden for brevity.)
Docs can be used as inline code comments.
```unison
foo : Nat -> Nat
foo n =
[: do the
thing :]
[: do the thing :]
n + 1
```
```ucm
```ucm:hide
.> add
```
```ucm
.> view foo
```
@ -21,48 +22,16 @@ Note that `@` and `:]` must be escaped within docs.
escaping = [: Docs look [: like \@this \:] :]
```
```ucm
```ucm:hide
.> add
```
```ucm
.> view escaping
```
(Alas you can't have `\@` or `\:]` in your doc, as there's currently no way to 'unescape' them.)
### Indenting
Handling of indenting in docs between the parser and pretty-printer is a bit fiddly.
```unison
-- The leading and trailing spaces are stripped from the stored Doc, and one
-- leading and trailing space is inserted again on view/edit
-- by the pretty-printer.
doc1 = [: hi :]
-- The above treatment is only applied to the first line of a Doc.
-- For storing subsequent lines, they are unindented until at least one of
-- them hits the left margin. You don't notice this because the
-- pretty-printer indents them again on view/edit.
doc2 = [: hello
- foo
- bar
and the rest. :]
-- That does mean that the following is pretty-printed not so prettily.
-- To fix that we'd need to get the lexer to help out with interpreting
-- doc literal indentation (because it knows what columns the `[:` was in.)
doc3 = [: - foo
- bar
and the rest. :]
-- You can do the following to avoid that problem.
-- TODO this hashes the same as 3
doc4 = [:
- foo
- bar
and the rest.
:]
-- Note that -- comments are preserved within doc literals.
commented = [:
example:
@ -72,15 +41,116 @@ commented = [:
:]
```
```ucm
```ucm:hide
.> add
.> view doc1 doc2 doc3 doc4 commented
```
```ucm
.> view commented
```
If we edit `foo` starting from the pretty-printed version above, and add it again, we end up with the same thing.
### Indenting, and paragraph reflow
Handling of indenting in docs between the parser and pretty-printer is a bit fiddly.
```unison
-- The leading and trailing spaces are stripped from the stored Doc by the
-- lexer, and one leading and trailing space is inserted again on view/edit
-- by the pretty-printer.
doc1 = [: hi :]
```
```ucm:hide
.> add
```
```ucm
.> view doc1
```
```unison
-- Lines (apart from the first line) are unindented until at least one of
-- them hits the left margin (by a post-processing step in the parser).
-- You may not notice this because the pretty-printer indents them again on
-- view/edit.
doc2 = [: hello
- foo
- bar
and the rest. :]
```
```ucm:hide
.> add
```
```ucm
.> view doc2
```
```unison
doc3 = [: When Unison identifies a paragraph, it removes any newlines from it before storing it, and then reflows the paragraph text to fit the display window on display/view/edit.
For these purposes, a paragraph is any sequence of non-empty lines that have zero indent (after the unindenting mentioned above.)
- So this is not a paragraph, even
though you might want it to be.
And this text | as a paragraph
is not treated | either.
Note that because of the special treatment of the first line mentioned above, where its leading space is removed, it is always treated as a paragraph.
:]
```
```ucm:hide
.> add
```
```ucm
.> view doc3
```
```unison
doc4 = [: Here's another example of some paragraphs.
All these lines have zero indent.
- Apart from this one. :]
```
```ucm:hide
.> add
```
```ucm
.> view doc4
```
```unison
-- The special treatment of the first line does mean that the following
-- is pretty-printed not so prettily. To fix that we'd need to get the
-- lexer to help out with interpreting doc literal indentation (because
-- it knows what columns the `[:` was in.)
doc5 = [: - foo
- bar
and the rest. :]
```
```ucm:hide
.> add
```
```ucm
.> view doc5
```
TODO fence
```
-- You can do the following to avoid that problem.
doc6 = [:
- foo
- bar
and the rest.
:]
```
TODO
### Reflowing paragraphs
```
.> add
```
TODO
```
.> view doc6
```

View File

@ -1,12 +1,11 @@
This transcript verifies a few minor details about doc parsing and pretty-printing.
This transcript explains a few minor details about doc parsing and pretty-printing, both from a user point of view and with some implementation notes. (The ucm `add` commands and their output are hidden for brevity.)
Docs can be used as inline code comments.
```unison
foo : Nat -> Nat
foo n =
[: do the
thing :]
[: do the thing :]
n + 1
```
@ -25,19 +24,12 @@ foo n =
```
```ucm
.> add
⍟ I've added these definitions:
foo : Nat -> Nat
.> view foo
foo : Nat -> Nat
foo n =
use Nat +
[: do the
thing :]
[: do the thing :]
n + 1
```
@ -62,12 +54,6 @@ escaping = [: Docs look [: like \@this \:] :]
```
```ucm
.> add
⍟ I've added these definitions:
escaping : Doc
.> view escaping
escaping : Doc
@ -76,40 +62,7 @@ escaping = [: Docs look [: like \@this \:] :]
```
(Alas you can't have `\@` or `\:]` in your doc, as there's currently no way to 'unescape' them.)
### Indenting
Handling of indenting in docs between the parser and pretty-printer is a bit fiddly.
```unison
-- The leading and trailing spaces are stripped from the stored Doc, and one
-- leading and trailing space is inserted again on view/edit
-- by the pretty-printer.
doc1 = [: hi :]
-- The above treatment is only applied to the first line of a Doc.
-- For storing subsequent lines, they are unindented until at least one of
-- them hits the left margin. You don't notice this because the
-- pretty-printer indents them again on view/edit.
doc2 = [: hello
- foo
- bar
and the rest. :]
-- That does mean that the following is pretty-printed not so prettily.
-- To fix that we'd need to get the lexer to help out with interpreting
-- doc literal indentation (because it knows what columns the `[:` was in.)
doc3 = [: - foo
- bar
and the rest. :]
-- You can do the following to avoid that problem.
-- TODO this hashes the same as 3
doc4 = [:
- foo
- bar
and the rest.
:]
-- Note that -- comments are preserved within doc literals.
commented = [:
example:
@ -128,27 +81,13 @@ commented = [:
⍟ These new definitions are ok to `add`:
commented : Doc
doc1 : Doc
doc2 : Doc
doc3 : Doc
doc4 : Doc
Now evaluating any watch expressions (lines starting with
`>`)... Ctrl+C cancels.
```
```ucm
.> add
⍟ I've added these definitions:
commented : Doc
doc1 : Doc
doc2 : Doc
doc3 : Doc
doc4 : Doc
.> view doc1 doc2 doc3 doc4 commented
.> view commented
commented : Doc
commented =
@ -156,28 +95,219 @@ commented = [:
-- a comment
f x = x + 1 :]
```
### Indenting, and paragraph reflow
Handling of indenting in docs between the parser and pretty-printer is a bit fiddly.
```unison
-- The leading and trailing spaces are stripped from the stored Doc by the
-- lexer, and one leading and trailing space is inserted again on view/edit
-- by the pretty-printer.
doc1 = [: hi :]
```
```ucm
I found and typechecked these definitions in scratch.u. If you
do an `add` or `update`, here's how your codebase would
change:
⍟ These new definitions are ok to `add`:
doc1 : Doc
Now evaluating any watch expressions (lines starting with
`>`)... Ctrl+C cancels.
```
```ucm
.> view doc1
doc1 : Doc
doc1 = [: hi :]
```
```unison
-- Lines (apart from the first line) are unindented until at least one of
-- them hits the left margin (by a post-processing step in the parser).
-- You may not notice this because the pretty-printer indents them again on
-- view/edit.
doc2 = [: hello
- foo
- bar
and the rest. :]
```
```ucm
I found and typechecked these definitions in scratch.u. If you
do an `add` or `update`, here's how your codebase would
change:
⍟ These new definitions are ok to `add`:
doc2 : Doc
Now evaluating any watch expressions (lines starting with
`>`)... Ctrl+C cancels.
```
```ucm
.> view doc2
doc2 : Doc
doc2 =
[: hello
- foo
- bar
and the rest. :]
```
```unison
doc3 = [: When Unison identifies a paragraph, it removes any newlines from it before storing it, and then reflows the paragraph text to fit the display window on display/view/edit.
For these purposes, a paragraph is any sequence of non-empty lines that have zero indent (after the unindenting mentioned above.)
- So this is not a paragraph, even
though you might want it to be.
And this text | as a paragraph
is not treated | either.
Note that because of the special treatment of the first line mentioned above, where its leading space is removed, it is always treated as a paragraph.
:]
```
```ucm
I found and typechecked these definitions in scratch.u. If you
do an `add` or `update`, here's how your codebase would
change:
⍟ These new definitions are ok to `add`:
doc3 : Doc
Now evaluating any watch expressions (lines starting with
`>`)... Ctrl+C cancels.
```
```ucm
.> view doc3
doc3 : Doc
doc3 =
[:
When Unison identifies a paragraph, it removes any newlines
from it before storing it, and then reflows the paragraph
text to fit the display window on display/view/edit.
For these purposes, a paragraph is any sequence of non-empty
lines that have zero indent (after the unindenting mentioned
above.)
- So this is not a paragraph, even
though you might want it to be.
And this text | as a paragraph
is not treated | either.
Note that because of the special treatment of the first line
mentioned above, where its leading space is removed, it is
always treated as a paragraph.
:]
```
```unison
doc4 = [: Here's another example of some paragraphs.
All these lines have zero indent.
- Apart from this one. :]
```
```ucm
I found and typechecked these definitions in scratch.u. If you
do an `add` or `update`, here's how your codebase would
change:
⍟ These new definitions are ok to `add`:
doc4 : Doc
Now evaluating any watch expressions (lines starting with
`>`)... Ctrl+C cancels.
```
```ucm
.> view doc4
doc4 : Doc
doc4 =
[: Here's another example of some paragraphs.
All these lines have zero indent.
- Apart from this one. :]
```
```unison
-- The special treatment of the first line does mean that the following
-- is pretty-printed not so prettily. To fix that we'd need to get the
-- lexer to help out with interpreting doc literal indentation (because
-- it knows what columns the `[:` was in.)
doc5 = [: - foo
- bar
and the rest. :]
```
```ucm
I found and typechecked these definitions in scratch.u. If you
do an `add` or `update`, here's how your codebase would
change:
⍟ These new definitions are ok to `add`:
doc5 : Doc
Now evaluating any watch expressions (lines starting with
`>`)... Ctrl+C cancels.
```
```ucm
.> view doc5
doc5 : Doc
doc5 =
[: - foo
- bar
and the rest. :]
```
If we edit `foo starting from the pretty-printed version above, and add it again, we end up with the same thing.
TODO fence
```
-- You can do the following to avoid that problem.
doc6 = [:
- foo
- bar
and the rest.
:]
```
TODO
```
.> add
### Reflowing paragraphs
```
TODO
```
.> view doc6
```

View File

@ -1,3 +1,6 @@
# Documenting Unison code
Unison documentation is written in Unison. Documentation is a value of the following type:
@ -50,7 +53,7 @@ use .builtin
docs.List.take = [:
`@List.take n xs` returns the first `n` elements of `xs`. (No need to add line breaks manually. The display command will do wrapping of text for you.)
`@List.take n xs` returns the first `n` elements of `xs`. (No need to add line breaks manually. The display command will do wrapping of text for you. Indent any lines where you don't want it to do this.)
## Examples:

View File

@ -1,3 +1,6 @@
# Documenting Unison code
Unison documentation is written in Unison. Documentation is a value of the following type:
@ -92,7 +95,7 @@ use .builtin
docs.List.take = [:
`@List.take n xs` returns the first `n` elements of `xs`. (No need to add line breaks manually. The display command will do wrapping of text for you.)
`@List.take n xs` returns the first `n` elements of `xs`. (No need to add line breaks manually. The display command will do wrapping of text for you. Indent any lines where you don't want it to do this.)
## Examples:
@ -149,7 +152,8 @@ Now that documentation is linked to the definition. We can view it if we like:
`builtin.List.take n xs` returns the first `n` elements of
`xs`. (No need to add line breaks manually. The display
command will do wrapping of text for you.)
command will do wrapping of text for you. Indent any lines
where you don't want it to do this.)
## Examples:
@ -170,7 +174,8 @@ Or there's also a convenient function, `docs`, which shows the `Doc` values that
`builtin.List.take n xs` returns the first `n` elements of
`xs`. (No need to add line breaks manually. The display
command will do wrapping of text for you.)
command will do wrapping of text for you. Indent any lines
where you don't want it to do this.)
## Examples:
@ -194,7 +199,8 @@ Note that if we view the source of the documentation, the various references are
[:
`@builtin.List.take n xs` returns the first `n` elements of
`xs`. (No need to add line breaks manually. The display
command will do wrapping of text for you.)
command will do wrapping of text for you. Indent any lines
where you don't want it to do this.)
## Examples: