From 776af610646042c06acd527c0f0ccaf03f4c4f68 Mon Sep 17 00:00:00 2001 From: Chris Gibbs Date: Mon, 16 Dec 2019 23:32:36 +0000 Subject: [PATCH] #936 pretty-print doc literals - spec out modified paragraph reflow/wrapping Currently Unison takes doc literals like [: example usage: - note the call to bar - we pass it 1 foo = bar 1 baz :] and renders them as [: example usage: - note the call to bar - we pass it 1 foo = bar 1 baz :] due to the paragraph reflow/wrapping function. Also, this function only works the first time you use it on a doc: if you `edit` a doc and mess with it, you won't get the wrapping refreshed the next time you view. That's because the pretty-printing works by inserting newlines, which are treated as paragraph breaks by Unison the next time it sees the code. I'm planning to fix this by - only doing paragraph reflow/wrapping for lines with zero indent (where zero is the left-most non-whitespace column of the doc) - normalizing docs during parse, including removing newlines that are within a paragraph. See unison-src/transcripts/doc-formatting.md in this commit for more details of how this shakes out. Not the .md.output because that's showing the current behavior without this change implemented. Hopefully this is just a few more lines of code to implement, and then #936/#994 is good to merge. --- parser-typechecker/src/Unison/TermParser.hs | 17 +- parser-typechecker/src/Unison/Util/Pretty.hs | 8 +- unison-src/transcripts/doc-formatting.md | 160 ++++++++--- .../transcripts/doc-formatting.output.md | 264 +++++++++++++----- unison-src/transcripts/docs.md | 5 +- unison-src/transcripts/docs.output.md | 14 +- 6 files changed, 345 insertions(+), 123 deletions(-) diff --git a/parser-typechecker/src/Unison/TermParser.hs b/parser-typechecker/src/Unison/TermParser.hs index e477ca4c3..592104a53 100644 --- a/parser-typechecker/src/Unison/TermParser.hs +++ b/parser-typechecker/src/Unison/TermParser.hs @@ -303,7 +303,7 @@ docBlock = do segs <- many segment closeTok <- closeBlock let a = ann openTok <> ann closeTok - pure . docUnIndent $ Term.app a (Term.constructor a DD.docRef DD.docJoinId) (Term.seq a segs) + pure . docNormalize $ Term.app a (Term.constructor a DD.docRef DD.docJoinId) (Term.seq a segs) where segment = blob <|> linky blob = do @@ -344,10 +344,17 @@ docBlock = do -- Operates on the text of the Blobs within a doc (as parsed by docBlock): -- - reduces the whitespace after all newlines so that at least one of the --- non-initial lines has zero indent. --- The pretty-printer will add indentation when rendering - we're undoing that here. -docUnIndent :: (Ord v, Show v) => AnnotatedTerm v a -> AnnotatedTerm v a -docUnIndent tm = case tm of +-- non-initial lines has zero indent (important because the pretty-printer adds +-- indenting every time the doc passes through it) +-- TODO implement the following +-- - remove trailing whitespace from each line +-- - remove newlines between any sequence of non-empty zero-indent lines +-- (i.e. undo manual line-breaking within paragraphs.) +-- Should be understood in tandem with Util.Pretty.paragraphyText, which +-- outputs doc text for display/edit/view. +-- See also unison-src/transcripts/doc-formatting.md. +docNormalize :: (Ord v, Show v) => AnnotatedTerm v a -> AnnotatedTerm v a +docNormalize tm = case tm of -- This pattern is just `DD.DocJoin seqs`, but exploded in order to grab -- the annotations. The aim is just to map `go` over it. a@(Term.App' c@(Term.Constructor' DD.DocRef DD.DocJoinId) s@(Term.Sequence' seqs)) diff --git a/parser-typechecker/src/Unison/Util/Pretty.hs b/parser-typechecker/src/Unison/Util/Pretty.hs index 7c3ef0825..b949dc9ba 100644 --- a/parser-typechecker/src/Unison/Util/Pretty.hs +++ b/parser-typechecker/src/Unison/Util/Pretty.hs @@ -144,9 +144,15 @@ wrapImpl (p:ps) = wrap_ . Seq.fromList $ wrapString :: (LL.ListLike s Char, IsString s) => String -> Pretty s wrapString s = wrap (lit $ fromString s) --- 0. Preserve all leading and trailing whitespace +-- TODO - replace this spec... +-- 0. Preserve all leading and trailing whitespace -- 1. Preserve all newlines -- 2. Wrap all text in between newlines +-- TODO ... with this one: +-- Wrap all text on lines that don't start with a space. +-- Preserve other whitespace. +-- Should be understood in tandem with TermParser.docNormalize. +-- See also unison-src/transcripts/doc-formatting.md. paragraphyText :: (LL.ListLike s Char, IsString s) => Text -> Pretty s paragraphyText t = text start <> inner <> text end where inner = sep "\n" . fmap (wrap . text) . Text.splitOn "\n" $ t' diff --git a/unison-src/transcripts/doc-formatting.md b/unison-src/transcripts/doc-formatting.md index 474e42387..d73365e2c 100644 --- a/unison-src/transcripts/doc-formatting.md +++ b/unison-src/transcripts/doc-formatting.md @@ -1,17 +1,18 @@ -This transcript verifies a few minor details about doc parsing and pretty-printing. +This transcript explains a few minor details about doc parsing and pretty-printing, both from a user point of view and with some implementation notes. (The ucm `add` commands and their output are hidden for brevity.) Docs can be used as inline code comments. ```unison foo : Nat -> Nat foo n = - [: do the - thing :] + [: do the thing :] n + 1 ``` -```ucm +```ucm:hide .> add +``` +```ucm .> view foo ``` @@ -21,48 +22,16 @@ Note that `@` and `:]` must be escaped within docs. escaping = [: Docs look [: like \@this \:] :] ``` -```ucm +```ucm:hide .> add +``` +```ucm .> view escaping ``` (Alas you can't have `\@` or `\:]` in your doc, as there's currently no way to 'unescape' them.) -### Indenting - -Handling of indenting in docs between the parser and pretty-printer is a bit fiddly. - ```unison - --- The leading and trailing spaces are stripped from the stored Doc, and one --- leading and trailing space is inserted again on view/edit --- by the pretty-printer. -doc1 = [: hi :] - --- The above treatment is only applied to the first line of a Doc. --- For storing subsequent lines, they are unindented until at least one of --- them hits the left margin. You don't notice this because the --- pretty-printer indents them again on view/edit. -doc2 = [: hello - - foo - - bar - and the rest. :] - --- That does mean that the following is pretty-printed not so prettily. --- To fix that we'd need to get the lexer to help out with interpreting --- doc literal indentation (because it knows what columns the `[:` was in.) -doc3 = [: - foo - - bar - and the rest. :] - --- You can do the following to avoid that problem. --- TODO this hashes the same as 3 -doc4 = [: - - foo - - bar - and the rest. - :] - -- Note that -- comments are preserved within doc literals. commented = [: example: @@ -72,15 +41,116 @@ commented = [: :] ``` -```ucm +```ucm:hide .> add -.> view doc1 doc2 doc3 doc4 commented +``` +```ucm +.> view commented ``` -If we edit `foo` starting from the pretty-printed version above, and add it again, we end up with the same thing. +### Indenting, and paragraph reflow +Handling of indenting in docs between the parser and pretty-printer is a bit fiddly. + +```unison +-- The leading and trailing spaces are stripped from the stored Doc by the +-- lexer, and one leading and trailing space is inserted again on view/edit +-- by the pretty-printer. +doc1 = [: hi :] +``` + +```ucm:hide +.> add +``` +```ucm +.> view doc1 +``` + +```unison +-- Lines (apart from the first line) are unindented until at least one of +-- them hits the left margin (by a post-processing step in the parser). +-- You may not notice this because the pretty-printer indents them again on +-- view/edit. +doc2 = [: hello + - foo + - bar + and the rest. :] +``` + +```ucm:hide +.> add +``` +```ucm +.> view doc2 +``` + +```unison +doc3 = [: When Unison identifies a paragraph, it removes any newlines from it before storing it, and then reflows the paragraph text to fit the display window on display/view/edit. + +For these purposes, a paragraph is any sequence of non-empty lines that have zero indent (after the unindenting mentioned above.) + + - So this is not a paragraph, even + though you might want it to be. + + And this text | as a paragraph + is not treated | either. + +Note that because of the special treatment of the first line mentioned above, where its leading space is removed, it is always treated as a paragraph. + :] +``` + +```ucm:hide +.> add +``` +```ucm +.> view doc3 +``` + +```unison +doc4 = [: Here's another example of some paragraphs. + + All these lines have zero indent. + + - Apart from this one. :] +``` + +```ucm:hide +.> add +``` +```ucm +.> view doc4 +``` + +```unison +-- The special treatment of the first line does mean that the following +-- is pretty-printed not so prettily. To fix that we'd need to get the +-- lexer to help out with interpreting doc literal indentation (because +-- it knows what columns the `[:` was in.) +doc5 = [: - foo + - bar + and the rest. :] +``` + +```ucm:hide +.> add +``` +```ucm +.> view doc5 +``` +TODO fence +``` +-- You can do the following to avoid that problem. +doc6 = [: + - foo + - bar + and the rest. + :] +``` TODO - -### Reflowing paragraphs - +``` +.> add +``` TODO +``` +.> view doc6 +``` diff --git a/unison-src/transcripts/doc-formatting.output.md b/unison-src/transcripts/doc-formatting.output.md index 4fbf52f4f..add29d403 100644 --- a/unison-src/transcripts/doc-formatting.output.md +++ b/unison-src/transcripts/doc-formatting.output.md @@ -1,12 +1,11 @@ -This transcript verifies a few minor details about doc parsing and pretty-printing. +This transcript explains a few minor details about doc parsing and pretty-printing, both from a user point of view and with some implementation notes. (The ucm `add` commands and their output are hidden for brevity.) Docs can be used as inline code comments. ```unison foo : Nat -> Nat foo n = - [: do the - thing :] + [: do the thing :] n + 1 ``` @@ -25,19 +24,12 @@ foo n = ``` ```ucm -.> add - - ⍟ I've added these definitions: - - foo : Nat -> Nat - .> view foo foo : Nat -> Nat foo n = use Nat + - [: do the - thing :] + [: do the thing :] n + 1 ``` @@ -62,12 +54,6 @@ escaping = [: Docs look [: like \@this \:] :] ``` ```ucm -.> add - - ⍟ I've added these definitions: - - escaping : Doc - .> view escaping escaping : Doc @@ -76,40 +62,7 @@ escaping = [: Docs look [: like \@this \:] :] ``` (Alas you can't have `\@` or `\:]` in your doc, as there's currently no way to 'unescape' them.) -### Indenting - -Handling of indenting in docs between the parser and pretty-printer is a bit fiddly. - ```unison --- The leading and trailing spaces are stripped from the stored Doc, and one --- leading and trailing space is inserted again on view/edit --- by the pretty-printer. -doc1 = [: hi :] - --- The above treatment is only applied to the first line of a Doc. --- For storing subsequent lines, they are unindented until at least one of --- them hits the left margin. You don't notice this because the --- pretty-printer indents them again on view/edit. -doc2 = [: hello - - foo - - bar - and the rest. :] - --- That does mean that the following is pretty-printed not so prettily. --- To fix that we'd need to get the lexer to help out with interpreting --- doc literal indentation (because it knows what columns the `[:` was in.) -doc3 = [: - foo - - bar - and the rest. :] - --- You can do the following to avoid that problem. --- TODO this hashes the same as 3 -doc4 = [: - - foo - - bar - and the rest. - :] - -- Note that -- comments are preserved within doc literals. commented = [: example: @@ -128,27 +81,13 @@ commented = [: ⍟ These new definitions are ok to `add`: commented : Doc - doc1 : Doc - doc2 : Doc - doc3 : Doc - doc4 : Doc Now evaluating any watch expressions (lines starting with `>`)... Ctrl+C cancels. ``` ```ucm -.> add - - ⍟ I've added these definitions: - - commented : Doc - doc1 : Doc - doc2 : Doc - doc3 : Doc - doc4 : Doc - -.> view doc1 doc2 doc3 doc4 commented +.> view commented commented : Doc commented = @@ -156,28 +95,219 @@ commented = [: -- a comment f x = x + 1 :] + +``` +### Indenting, and paragraph reflow + +Handling of indenting in docs between the parser and pretty-printer is a bit fiddly. + +```unison +-- The leading and trailing spaces are stripped from the stored Doc by the +-- lexer, and one leading and trailing space is inserted again on view/edit +-- by the pretty-printer. +doc1 = [: hi :] +``` + +```ucm + + I found and typechecked these definitions in scratch.u. If you + do an `add` or `update`, here's how your codebase would + change: + ⍟ These new definitions are ok to `add`: + + doc1 : Doc + + Now evaluating any watch expressions (lines starting with + `>`)... Ctrl+C cancels. + +``` +```ucm +.> view doc1 + doc1 : Doc doc1 = [: hi :] + +``` +```unison +-- Lines (apart from the first line) are unindented until at least one of +-- them hits the left margin (by a post-processing step in the parser). +-- You may not notice this because the pretty-printer indents them again on +-- view/edit. +doc2 = [: hello + - foo + - bar + and the rest. :] +``` + +```ucm + + I found and typechecked these definitions in scratch.u. If you + do an `add` or `update`, here's how your codebase would + change: + ⍟ These new definitions are ok to `add`: + + doc2 : Doc + + Now evaluating any watch expressions (lines starting with + `>`)... Ctrl+C cancels. + +``` +```ucm +.> view doc2 + doc2 : Doc doc2 = [: hello - foo - bar and the rest. :] + +``` +```unison +doc3 = [: When Unison identifies a paragraph, it removes any newlines from it before storing it, and then reflows the paragraph text to fit the display window on display/view/edit. + +For these purposes, a paragraph is any sequence of non-empty lines that have zero indent (after the unindenting mentioned above.) + + - So this is not a paragraph, even + though you might want it to be. + + And this text | as a paragraph + is not treated | either. + +Note that because of the special treatment of the first line mentioned above, where its leading space is removed, it is always treated as a paragraph. + :] +``` + +```ucm + + I found and typechecked these definitions in scratch.u. If you + do an `add` or `update`, here's how your codebase would + change: + ⍟ These new definitions are ok to `add`: + + doc3 : Doc + + Now evaluating any watch expressions (lines starting with + `>`)... Ctrl+C cancels. + +``` +```ucm +.> view doc3 + doc3 : Doc doc3 = + [: + When Unison identifies a paragraph, it removes any newlines + from it before storing it, and then reflows the paragraph + text to fit the display window on display/view/edit. + + For these purposes, a paragraph is any sequence of non-empty + lines that have zero indent (after the unindenting mentioned + above.) + + - So this is not a paragraph, even + though you might want it to be. + + And this text | as a paragraph + is not treated | either. + + Note that because of the special treatment of the first line + mentioned above, where its leading space is removed, it is + always treated as a paragraph. + :] + +``` +```unison +doc4 = [: Here's another example of some paragraphs. + + All these lines have zero indent. + + - Apart from this one. :] +``` + +```ucm + + I found and typechecked these definitions in scratch.u. If you + do an `add` or `update`, here's how your codebase would + change: + + ⍟ These new definitions are ok to `add`: + + doc4 : Doc + + Now evaluating any watch expressions (lines starting with + `>`)... Ctrl+C cancels. + +``` +```ucm +.> view doc4 + + doc4 : Doc + doc4 = + [: Here's another example of some paragraphs. + + All these lines have zero indent. + + - Apart from this one. :] + +``` +```unison +-- The special treatment of the first line does mean that the following +-- is pretty-printed not so prettily. To fix that we'd need to get the +-- lexer to help out with interpreting doc literal indentation (because +-- it knows what columns the `[:` was in.) +doc5 = [: - foo + - bar + and the rest. :] +``` + +```ucm + + I found and typechecked these definitions in scratch.u. If you + do an `add` or `update`, here's how your codebase would + change: + + ⍟ These new definitions are ok to `add`: + + doc5 : Doc + + Now evaluating any watch expressions (lines starting with + `>`)... Ctrl+C cancels. + +``` +```ucm +.> view doc5 + + doc5 : Doc + doc5 = [: - foo - bar and the rest. :] ``` -If we edit `foo starting from the pretty-printed version above, and add it again, we end up with the same thing. +TODO fence +``` +-- You can do the following to avoid that problem. +doc6 = [: + - foo + - bar + and the rest. + :] + +``` TODO +``` +.> add -### Reflowing paragraphs +``` TODO +``` +.> view doc6 + +``` + diff --git a/unison-src/transcripts/docs.md b/unison-src/transcripts/docs.md index bf099ceb3..8bfc40d5d 100644 --- a/unison-src/transcripts/docs.md +++ b/unison-src/transcripts/docs.md @@ -1,3 +1,6 @@ + + + # Documenting Unison code Unison documentation is written in Unison. Documentation is a value of the following type: @@ -50,7 +53,7 @@ use .builtin docs.List.take = [: -`@List.take n xs` returns the first `n` elements of `xs`. (No need to add line breaks manually. The display command will do wrapping of text for you.) +`@List.take n xs` returns the first `n` elements of `xs`. (No need to add line breaks manually. The display command will do wrapping of text for you. Indent any lines where you don't want it to do this.) ## Examples: diff --git a/unison-src/transcripts/docs.output.md b/unison-src/transcripts/docs.output.md index 565327f67..cb76bd5fd 100644 --- a/unison-src/transcripts/docs.output.md +++ b/unison-src/transcripts/docs.output.md @@ -1,3 +1,6 @@ + + + # Documenting Unison code Unison documentation is written in Unison. Documentation is a value of the following type: @@ -92,7 +95,7 @@ use .builtin docs.List.take = [: -`@List.take n xs` returns the first `n` elements of `xs`. (No need to add line breaks manually. The display command will do wrapping of text for you.) +`@List.take n xs` returns the first `n` elements of `xs`. (No need to add line breaks manually. The display command will do wrapping of text for you. Indent any lines where you don't want it to do this.) ## Examples: @@ -149,7 +152,8 @@ Now that documentation is linked to the definition. We can view it if we like: `builtin.List.take n xs` returns the first `n` elements of `xs`. (No need to add line breaks manually. The display - command will do wrapping of text for you.) + command will do wrapping of text for you. Indent any lines + where you don't want it to do this.) ## Examples: @@ -170,7 +174,8 @@ Or there's also a convenient function, `docs`, which shows the `Doc` values that `builtin.List.take n xs` returns the first `n` elements of `xs`. (No need to add line breaks manually. The display - command will do wrapping of text for you.) + command will do wrapping of text for you. Indent any lines + where you don't want it to do this.) ## Examples: @@ -194,7 +199,8 @@ Note that if we view the source of the documentation, the various references are [: `@builtin.List.take n xs` returns the first `n` elements of `xs`. (No need to add line breaks manually. The display - command will do wrapping of text for you.) + command will do wrapping of text for you. Indent any lines + where you don't want it to do this.) ## Examples: