#936 pretty-print doc literals - spec out modified paragraph reflow/wrapping

Currently Unison takes doc literals like [: example usage: - note the call to bar - we pass it 1 foo = bar 1 baz :] and renders them as [: example usage: - note the call to bar - we pass it 1 foo = bar 1 baz :] due to the paragraph reflow/wrapping function. Also, this function only works the first time you use it on a doc: if you `edit` a doc and mess with it, you won't get the wrapping refreshed the next time you view. That's because the pretty-printing works by inserting newlines, which are treated as paragraph breaks by Unison the next time it sees the code. I'm planning to fix this by - only doing paragraph reflow/wrapping for lines with zero indent (where zero is the left-most non-whitespace column of the doc) - normalizing docs during parse, including removing newlines that are within a paragraph. See unison-src/transcripts/doc-formatting.md in this commit for more details of how this shakes out. Not the .md.output because that's showing the current behavior without this change implemented. Hopefully this is just a few more lines of code to implement, and then #936/#994 is good to merge.
2024-09-17 13:27:30 +03:00 · 2019-12-16 23:32:36 +00:00 · 2019-12-16 23:32:36 +00:00 · 776af61064
commit 776af61064
parent 8f284d902c
6 changed files with 345 additions and 123 deletions
--- a/parser-typechecker/src/Unison/TermParser.hs
+++ b/parser-typechecker/src/Unison/TermParser.hs
@ -303,7 +303,7 @@ docBlock = do
  segs <- many segment
  closeTok <- closeBlock
  let a = ann openTok <> ann closeTok
-  pure . docUnIndent $ Term.app a (Term.constructor a DD.docRef DD.docJoinId) (Term.seq a segs)
+  pure . docNormalize $ Term.app a (Term.constructor a DD.docRef DD.docJoinId) (Term.seq a segs)
  where
  segment = blob <|> linky
  blob = do
@ -344,10 +344,17 @@ docBlock = do

 -- Operates on the text of the Blobs within a doc (as parsed by docBlock):
 -- - reduces the whitespace after all newlines so that at least one of the
--   non-initial lines has zero indent.
-- The pretty-printer will add indentation when rendering - we're undoing that here.
-docUnIndent :: (Ord v, Show v) => AnnotatedTerm v a -> AnnotatedTerm v a
-docUnIndent tm = case tm of
+--   non-initial lines has zero indent (important because the pretty-printer adds
+--   indenting every time the doc passes through it)
+-- TODO implement the following
+-- - remove trailing whitespace from each line
+-- - remove newlines between any sequence of non-empty zero-indent lines
+--   (i.e. undo manual line-breaking within paragraphs.)
+-- Should be understood in tandem with Util.Pretty.paragraphyText, which
+-- outputs doc text for display/edit/view.
+-- See also unison-src/transcripts/doc-formatting.md.
+docNormalize :: (Ord v, Show v) => AnnotatedTerm v a -> AnnotatedTerm v a
+docNormalize tm = case tm of
  -- This pattern is just `DD.DocJoin seqs`, but exploded in order to grab
  -- the annotations.  The aim is just to map `go` over it.
  a@(Term.App' c@(Term.Constructor' DD.DocRef DD.DocJoinId) s@(Term.Sequence' seqs))
--- a/parser-typechecker/src/Unison/Util/Pretty.hs
+++ b/parser-typechecker/src/Unison/Util/Pretty.hs
@ -144,9 +144,15 @@ wrapImpl (p:ps) = wrap_ . Seq.fromList $
 wrapString :: (LL.ListLike s Char, IsString s) => String -> Pretty s
 wrapString s = wrap (lit $ fromString s)

-- 0. Preserve all leading and trailing whitespace 
+-- TODO - replace this spec...
+-- 0. Preserve all leading and trailing whitespace
 -- 1. Preserve all newlines
 -- 2. Wrap all text in between newlines
+-- TODO ... with this one:
+-- Wrap all text on lines that don't start with a space.
+-- Preserve other whitespace.
+-- Should be understood in tandem with TermParser.docNormalize.
+-- See also unison-src/transcripts/doc-formatting.md.
 paragraphyText :: (LL.ListLike s Char, IsString s) => Text -> Pretty s
 paragraphyText t = text start <> inner <> text end where
  inner = sep "\n" . fmap (wrap . text) . Text.splitOn "\n" $ t'
--- a/unison-src/transcripts/doc-formatting.md
+++ b/unison-src/transcripts/doc-formatting.md
@ -1,17 +1,18 @@
-This transcript verifies a few minor details about doc parsing and pretty-printing.
+This transcript explains a few minor details about doc parsing and pretty-printing, both from a user point of view and with some implementation notes.  (The ucm `add` commands and their output are hidden for brevity.)

 Docs can be used as inline code comments.

 ```unison
 foo : Nat -> Nat
 foo n =
-  [: do the
-     thing :]
+  [: do the thing :]
  n + 1
 ```

-```ucm
+```ucm:hide
 .> add
+```
+```ucm
 .> view foo
 ```

@ -21,48 +22,16 @@ Note that `@` and `:]` must be escaped within docs.
 escaping = [: Docs look [: like \@this \:] :]
 ```

-```ucm
+```ucm:hide
 .> add
+```
+```ucm
 .> view escaping
 ```

 (Alas you can't have `\@` or `\:]` in your doc, as there's currently no way to 'unescape' them.)

-### Indenting
-
-Handling of indenting in docs between the parser and pretty-printer is a bit fiddly.
-
 ```unison
-
-- The leading and trailing spaces are stripped from the stored Doc, and one
-- leading and trailing space is inserted again on view/edit
-- by the pretty-printer.
-doc1 = [:   hi   :]
-
-- The above treatment is only applied to the first line of a Doc.
-- For storing subsequent lines, they are unindented until at least one of
-- them hits the left margin.  You don't notice this because the
-- pretty-printer indents them again on view/edit.
-doc2 = [: hello
-            - foo
-            - bar
-          and the rest. :]
-
-- That does mean that the following is pretty-printed not so prettily.
-- To fix that we'd need to get the lexer to help out with interpreting
-- doc literal indentation (because it knows what columns the `[:` was in.)
-doc3 = [:   - foo
-            - bar
-          and the rest. :]
-
-- You can do the following to avoid that problem.
-- TODO this hashes the same as 3
-doc4 = [:
-            - foo
-            - bar
-          and the rest.
-       :]
-
 -- Note that -- comments are preserved within doc literals.
 commented = [:
  example:
@ -72,15 +41,116 @@ commented = [:
 :]
 ```

-```ucm
+```ucm:hide
 .> add
-.> view doc1 doc2 doc3 doc4 commented
+```
+```ucm
+.> view commented
 ```

-If we edit `foo` starting from the pretty-printed version above, and add it again, we end up with the same thing.
+### Indenting, and paragraph reflow

+Handling of indenting in docs between the parser and pretty-printer is a bit fiddly.
+
+```unison
+-- The leading and trailing spaces are stripped from the stored Doc by the
+-- lexer, and one leading and trailing space is inserted again on view/edit
+-- by the pretty-printer.
+doc1 = [:   hi   :]
+```
+
+```ucm:hide
+.> add
+```
+```ucm
+.> view doc1
+```
+
+```unison
+-- Lines (apart from the first line) are unindented until at least one of
+-- them hits the left margin (by a post-processing step in the parser).
+-- You may not notice this because the pretty-printer indents them again on
+-- view/edit.
+doc2 = [: hello
+            - foo
+            - bar
+          and the rest. :]
+```
+
+```ucm:hide
+.> add
+```
+```ucm
+.> view doc2
+```
+
+```unison
+doc3 = [: When Unison identifies a paragraph, it removes any newlines from it before storing it, and then reflows the paragraph text to fit the display window on display/view/edit.
+
+For these purposes, a paragraph is any sequence of non-empty lines that have zero indent (after the unindenting mentioned above.)
+
+ - So this is not a paragraph, even
+   though you might want it to be.
+
+   And this text  | as a paragraph
+   is not treated | either.
+
+Note that because of the special treatment of the first line mentioned above, where its leading space is removed, it is always treated as a paragraph.
+   :]
+```
+
+```ucm:hide
+.> add
+```
+```ucm
+.> view doc3
+```
+
+```unison
+doc4 = [: Here's another example of some paragraphs.
+
+          All these lines have zero indent.
+
+            - Apart from this one. :]
+```
+
+```ucm:hide
+.> add
+```
+```ucm
+.> view doc4
+```
+
+```unison
+-- The special treatment of the first line does mean that the following
+-- is pretty-printed not so prettily.  To fix that we'd need to get the
+-- lexer to help out with interpreting doc literal indentation (because
+-- it knows what columns the `[:` was in.)
+doc5 = [:   - foo
+            - bar
+          and the rest. :]
+```
+
+```ucm:hide
+.> add
+```
+```ucm
+.> view doc5
+```
+TODO fence
+```
+-- You can do the following to avoid that problem.
+doc6 = [:
+            - foo
+            - bar
+          and the rest.
+       :]
+```
 TODO
-
-### Reflowing paragraphs
-
+```
+.> add
+```
 TODO
+```
+.> view doc6
+```
--- a/unison-src/transcripts/doc-formatting.output.md
+++ b/unison-src/transcripts/doc-formatting.output.md
@ -1,12 +1,11 @@
-This transcript verifies a few minor details about doc parsing and pretty-printing.
+This transcript explains a few minor details about doc parsing and pretty-printing, both from a user point of view and with some implementation notes.  (The ucm `add` commands and their output are hidden for brevity.)

 Docs can be used as inline code comments.

 ```unison
 foo : Nat -> Nat
 foo n =
-  [: do the
-     thing :]
+  [: do the thing :]
  n + 1
 ```

@ -25,19 +24,12 @@ foo n =

 ```
 ```ucm
-.> add
-
-  ⍟ I've added these definitions:
-  
-    foo : Nat -> Nat
-
 .> view foo

  foo : Nat -> Nat
  foo n =
    use Nat +
-    [: do the
-    thing :]
+    [: do the thing :]
    n + 1

 ```
@ -62,12 +54,6 @@ escaping = [: Docs look [: like \@this \:] :]

 ```
 ```ucm
-.> add
-
-  ⍟ I've added these definitions:
-  
-    escaping : Doc
-
 .> view escaping

  escaping : Doc
@ -76,40 +62,7 @@ escaping = [: Docs look [: like \@this \:] :]
 ```
 (Alas you can't have `\@` or `\:]` in your doc, as there's currently no way to 'unescape' them.)

-### Indenting
-
-Handling of indenting in docs between the parser and pretty-printer is a bit fiddly.
-
 ```unison
-- The leading and trailing spaces are stripped from the stored Doc, and one
-- leading and trailing space is inserted again on view/edit
-- by the pretty-printer.
-doc1 = [:   hi   :]
-
-- The above treatment is only applied to the first line of a Doc.
-- For storing subsequent lines, they are unindented until at least one of
-- them hits the left margin.  You don't notice this because the
-- pretty-printer indents them again on view/edit.
-doc2 = [: hello
-            - foo
-            - bar
-          and the rest. :]
-
-- That does mean that the following is pretty-printed not so prettily.
-- To fix that we'd need to get the lexer to help out with interpreting
-- doc literal indentation (because it knows what columns the `[:` was in.)
-doc3 = [:   - foo
-            - bar
-          and the rest. :]
-
-- You can do the following to avoid that problem.
-- TODO this hashes the same as 3
-doc4 = [:
-            - foo
-            - bar
-          and the rest.
-       :]
-
 -- Note that -- comments are preserved within doc literals.
 commented = [:
  example:
@ -128,27 +81,13 @@ commented = [:
    ⍟ These new definitions are ok to `add`:
    
      commented : Doc
-      doc1      : Doc
-      doc2      : Doc
-      doc3      : Doc
-      doc4      : Doc
   
  Now evaluating any watch expressions (lines starting with
  `>`)... Ctrl+C cancels.

 ```
 ```ucm
-.> add
-
-  ⍟ I've added these definitions:
-  
-    commented : Doc
-    doc1      : Doc
-    doc2      : Doc
-    doc3      : Doc
-    doc4      : Doc
-
-.> view doc1 doc2 doc3 doc4 commented
+.> view commented

  commented : Doc
  commented =
@ -156,28 +95,219 @@ commented = [:
    
    -- a comment
    f x = x + 1 :]
+
+```
+### Indenting, and paragraph reflow
+
+Handling of indenting in docs between the parser and pretty-printer is a bit fiddly.
+
+```unison
+-- The leading and trailing spaces are stripped from the stored Doc by the
+-- lexer, and one leading and trailing space is inserted again on view/edit
+-- by the pretty-printer.
+doc1 = [:   hi   :]
+```
+
+```ucm
+
+  I found and typechecked these definitions in scratch.u. If you
+  do an `add` or `update`, here's how your codebase would
+  change:
  
+    ⍟ These new definitions are ok to `add`:
+    
+      doc1 : Doc
+   
+  Now evaluating any watch expressions (lines starting with
+  `>`)... Ctrl+C cancels.
+
+```
+```ucm
+.> view doc1
+
  doc1 : Doc
  doc1 = [: hi :]
+
+```
+```unison
+-- Lines (apart from the first line) are unindented until at least one of
+-- them hits the left margin (by a post-processing step in the parser).
+-- You may not notice this because the pretty-printer indents them again on
+-- view/edit.
+doc2 = [: hello
+            - foo
+            - bar
+          and the rest. :]
+```
+
+```ucm
+
+  I found and typechecked these definitions in scratch.u. If you
+  do an `add` or `update`, here's how your codebase would
+  change:
  
+    ⍟ These new definitions are ok to `add`:
+    
+      doc2 : Doc
+   
+  Now evaluating any watch expressions (lines starting with
+  `>`)... Ctrl+C cancels.
+
+```
+```ucm
+.> view doc2
+
  doc2 : Doc
  doc2 =
    [: hello
    - foo
    - bar
    and the rest. :]
+
+```
+```unison
+doc3 = [: When Unison identifies a paragraph, it removes any newlines from it before storing it, and then reflows the paragraph text to fit the display window on display/view/edit.
+
+For these purposes, a paragraph is any sequence of non-empty lines that have zero indent (after the unindenting mentioned above.)
+
+ - So this is not a paragraph, even
+   though you might want it to be.
+
+   And this text  | as a paragraph
+   is not treated | either.
+
+Note that because of the special treatment of the first line mentioned above, where its leading space is removed, it is always treated as a paragraph.
+   :]
+```
+
+```ucm
+
+  I found and typechecked these definitions in scratch.u. If you
+  do an `add` or `update`, here's how your codebase would
+  change:
  
+    ⍟ These new definitions are ok to `add`:
+    
+      doc3 : Doc
+   
+  Now evaluating any watch expressions (lines starting with
+  `>`)... Ctrl+C cancels.
+
+```
+```ucm
+.> view doc3
+
  doc3 : Doc
  doc3 =
+    [:
+    When Unison identifies a paragraph, it removes any newlines
+    from it before storing it, and then reflows the paragraph
+    text to fit the display window on display/view/edit.
+    
+    For these purposes, a paragraph is any sequence of non-empty
+    lines that have zero indent (after the unindenting mentioned
+    above.)
+    
+    - So this is not a paragraph, even
+    though you might want it to be.
+    
+    And this text | as a paragraph
+    is not treated | either.
+    
+    Note that because of the special treatment of the first line
+    mentioned above, where its leading space is removed, it is
+    always treated as a paragraph.
+    :]
+
+```
+```unison
+doc4 = [: Here's another example of some paragraphs.
+
+          All these lines have zero indent.
+
+            - Apart from this one. :]
+```
+
+```ucm
+
+  I found and typechecked these definitions in scratch.u. If you
+  do an `add` or `update`, here's how your codebase would
+  change:
+  
+    ⍟ These new definitions are ok to `add`:
+    
+      doc4 : Doc
+   
+  Now evaluating any watch expressions (lines starting with
+  `>`)... Ctrl+C cancels.
+
+```
+```ucm
+.> view doc4
+
+  doc4 : Doc
+  doc4 =
+    [: Here's another example of some paragraphs.
+    
+    All these lines have zero indent.
+    
+    - Apart from this one. :]
+
+```
+```unison
+-- The special treatment of the first line does mean that the following
+-- is pretty-printed not so prettily.  To fix that we'd need to get the
+-- lexer to help out with interpreting doc literal indentation (because
+-- it knows what columns the `[:` was in.)
+doc5 = [:   - foo
+            - bar
+          and the rest. :]
+```
+
+```ucm
+
+  I found and typechecked these definitions in scratch.u. If you
+  do an `add` or `update`, here's how your codebase would
+  change:
+  
+    ⍟ These new definitions are ok to `add`:
+    
+      doc5 : Doc
+   
+  Now evaluating any watch expressions (lines starting with
+  `>`)... Ctrl+C cancels.
+
+```
+```ucm
+.> view doc5
+
+  doc5 : Doc
+  doc5 =
    [: - foo
    - bar
    and the rest. :]

 ```
-If we edit `foo starting from the pretty-printed version above, and add it again, we end up with the same thing.
+TODO fence
+```
+-- You can do the following to avoid that problem.
+doc6 = [:
+            - foo
+            - bar
+          and the rest.
+       :]
+
+```

 TODO
+```
+.> add

-### Reflowing paragraphs
+```

 TODO
+```
+.> view doc6
+
+```
+
--- a/unison-src/transcripts/docs.md
+++ b/unison-src/transcripts/docs.md
@ -1,3 +1,6 @@
+
+
+
 # Documenting Unison code

 Unison documentation is written in Unison. Documentation is a value of the following type:
@ -50,7 +53,7 @@ use .builtin

 docs.List.take = [:

-`@List.take n xs` returns the first `n` elements of `xs`. (No need to add line breaks manually. The display command will do wrapping of text for you.)
+`@List.take n xs` returns the first `n` elements of `xs`. (No need to add line breaks manually. The display command will do wrapping of text for you.  Indent any lines where you don't want it to do this.)

 ## Examples:

--- a/unison-src/transcripts/docs.output.md
+++ b/unison-src/transcripts/docs.output.md
@ -1,3 +1,6 @@
+
+
+
 # Documenting Unison code

 Unison documentation is written in Unison. Documentation is a value of the following type:
@ -92,7 +95,7 @@ use .builtin

 docs.List.take = [:

-`@List.take n xs` returns the first `n` elements of `xs`. (No need to add line breaks manually. The display command will do wrapping of text for you.)
+`@List.take n xs` returns the first `n` elements of `xs`. (No need to add line breaks manually. The display command will do wrapping of text for you.  Indent any lines where you don't want it to do this.)

 ## Examples:

@ -149,7 +152,8 @@ Now that documentation is linked to the definition. We can view it if we like:

  `builtin.List.take n xs` returns the first `n` elements of
  `xs`. (No need to add line breaks manually. The display
-  command will do wrapping of text for you.)
+  command will do wrapping of text for you. Indent any lines
+  where you don't want it to do this.)
  
  ## Examples:
  
@ -170,7 +174,8 @@ Or there's also a convenient function, `docs`, which shows the `Doc` values that

  `builtin.List.take n xs` returns the first `n` elements of
  `xs`. (No need to add line breaks manually. The display
-  command will do wrapping of text for you.)
+  command will do wrapping of text for you. Indent any lines
+  where you don't want it to do this.)
  
  ## Examples:
  
@ -194,7 +199,8 @@ Note that if we view the source of the documentation, the various references are
    [:
    `@builtin.List.take n xs` returns the first `n` elements of
    `xs`. (No need to add line breaks manually. The display
-    command will do wrapping of text for you.)
+    command will do wrapping of text for you. Indent any lines
+    where you don't want it to do this.)
    
    ## Examples: