From 776af610646042c06acd527c0f0ccaf03f4c4f68 Mon Sep 17 00:00:00 2001
From: Chris Gibbs <atacratic@users.noreply.github.com>
Date: Mon, 16 Dec 2019 23:32:36 +0000
Subject: [PATCH] #936 pretty-print doc literals - spec out modified paragraph
 reflow/wrapping

Currently Unison takes doc literals like

  [:
  example usage:
   - note the call to bar
     - we pass it 1
  foo =
    bar 1
      baz
  :]

and renders them as

  [: example usage:
  - note the call to bar
  - we pass it 1
  foo =
  bar 1
  baz :]

due to the paragraph reflow/wrapping function.

Also, this function only works the first time you use it on a doc: if you `edit` a doc and mess with it, you won't get the wrapping refreshed the next time you view.  That's because the pretty-printing works by inserting newlines, which are treated as paragraph breaks by Unison the next time it sees the code.

I'm planning to fix this by
 - only doing paragraph reflow/wrapping for lines with zero indent (where zero is the left-most non-whitespace column of the doc)
 - normalizing docs during parse, including removing newlines that are within a paragraph.

See unison-src/transcripts/doc-formatting.md in this commit for more details of how this shakes out.  Not the .md.output because that's showing the current behavior without this change implemented.

Hopefully this is just a few more lines of code to implement, and then #936/#994 is good to merge.
---
 parser-typechecker/src/Unison/TermParser.hs   |  17 +-
 parser-typechecker/src/Unison/Util/Pretty.hs  |   8 +-
 unison-src/transcripts/doc-formatting.md      | 160 ++++++++---
 .../transcripts/doc-formatting.output.md      | 264 +++++++++++++-----
 unison-src/transcripts/docs.md                |   5 +-
 unison-src/transcripts/docs.output.md         |  14 +-
 6 files changed, 345 insertions(+), 123 deletions(-)

diff --git a/parser-typechecker/src/Unison/TermParser.hs b/parser-typechecker/src/Unison/TermParser.hs
index e477ca4c3..592104a53 100644
--- a/parser-typechecker/src/Unison/TermParser.hs
+++ b/parser-typechecker/src/Unison/TermParser.hs
@@ -303,7 +303,7 @@ docBlock = do
   segs <- many segment
   closeTok <- closeBlock
   let a = ann openTok <> ann closeTok
-  pure . docUnIndent $ Term.app a (Term.constructor a DD.docRef DD.docJoinId) (Term.seq a segs)
+  pure . docNormalize $ Term.app a (Term.constructor a DD.docRef DD.docJoinId) (Term.seq a segs)
   where
   segment = blob <|> linky
   blob = do
@@ -344,10 +344,17 @@ docBlock = do
 
 -- Operates on the text of the Blobs within a doc (as parsed by docBlock):
 -- - reduces the whitespace after all newlines so that at least one of the
---   non-initial lines has zero indent.
--- The pretty-printer will add indentation when rendering - we're undoing that here.
-docUnIndent :: (Ord v, Show v) => AnnotatedTerm v a -> AnnotatedTerm v a
-docUnIndent tm = case tm of
+--   non-initial lines has zero indent (important because the pretty-printer adds
+--   indenting every time the doc passes through it)
+-- TODO implement the following
+-- - remove trailing whitespace from each line
+-- - remove newlines between any sequence of non-empty zero-indent lines
+--   (i.e. undo manual line-breaking within paragraphs.)
+-- Should be understood in tandem with Util.Pretty.paragraphyText, which
+-- outputs doc text for display/edit/view.
+-- See also unison-src/transcripts/doc-formatting.md.
+docNormalize :: (Ord v, Show v) => AnnotatedTerm v a -> AnnotatedTerm v a
+docNormalize tm = case tm of
   -- This pattern is just `DD.DocJoin seqs`, but exploded in order to grab
   -- the annotations.  The aim is just to map `go` over it.
   a@(Term.App' c@(Term.Constructor' DD.DocRef DD.DocJoinId) s@(Term.Sequence' seqs))
diff --git a/parser-typechecker/src/Unison/Util/Pretty.hs b/parser-typechecker/src/Unison/Util/Pretty.hs
index 7c3ef0825..b949dc9ba 100644
--- a/parser-typechecker/src/Unison/Util/Pretty.hs
+++ b/parser-typechecker/src/Unison/Util/Pretty.hs
@@ -144,9 +144,15 @@ wrapImpl (p:ps) = wrap_ . Seq.fromList $
 wrapString :: (LL.ListLike s Char, IsString s) => String -> Pretty s
 wrapString s = wrap (lit $ fromString s)
 
--- 0. Preserve all leading and trailing whitespace 
+-- TODO - replace this spec...
+-- 0. Preserve all leading and trailing whitespace
 -- 1. Preserve all newlines
 -- 2. Wrap all text in between newlines
+-- TODO ... with this one:
+-- Wrap all text on lines that don't start with a space.
+-- Preserve other whitespace.
+-- Should be understood in tandem with TermParser.docNormalize.
+-- See also unison-src/transcripts/doc-formatting.md.
 paragraphyText :: (LL.ListLike s Char, IsString s) => Text -> Pretty s
 paragraphyText t = text start <> inner <> text end where
   inner = sep "\n" . fmap (wrap . text) . Text.splitOn "\n" $ t'
diff --git a/unison-src/transcripts/doc-formatting.md b/unison-src/transcripts/doc-formatting.md
index 474e42387..d73365e2c 100644
--- a/unison-src/transcripts/doc-formatting.md
+++ b/unison-src/transcripts/doc-formatting.md
@@ -1,17 +1,18 @@
-This transcript verifies a few minor details about doc parsing and pretty-printing.
+This transcript explains a few minor details about doc parsing and pretty-printing, both from a user point of view and with some implementation notes.  (The ucm `add` commands and their output are hidden for brevity.)
 
 Docs can be used as inline code comments.
 
 ```unison
 foo : Nat -> Nat
 foo n =
-  [: do the
-     thing :]
+  [: do the thing :]
   n + 1
 ```
 
-```ucm
+```ucm:hide
 .> add
+```
+```ucm
 .> view foo
 ```
 
@@ -21,48 +22,16 @@ Note that `@` and `:]` must be escaped within docs.
 escaping = [: Docs look [: like \@this \:] :]
 ```
 
-```ucm
+```ucm:hide
 .> add
+```
+```ucm
 .> view escaping
 ```
 
 (Alas you can't have `\@` or `\:]` in your doc, as there's currently no way to 'unescape' them.)
 
-### Indenting
-
-Handling of indenting in docs between the parser and pretty-printer is a bit fiddly.
-
 ```unison
-
--- The leading and trailing spaces are stripped from the stored Doc, and one
--- leading and trailing space is inserted again on view/edit
--- by the pretty-printer.
-doc1 = [:   hi   :]
-
--- The above treatment is only applied to the first line of a Doc.
--- For storing subsequent lines, they are unindented until at least one of
--- them hits the left margin.  You don't notice this because the
--- pretty-printer indents them again on view/edit.
-doc2 = [: hello
-            - foo
-            - bar
-          and the rest. :]
-
--- That does mean that the following is pretty-printed not so prettily.
--- To fix that we'd need to get the lexer to help out with interpreting
--- doc literal indentation (because it knows what columns the `[:` was in.)
-doc3 = [:   - foo
-            - bar
-          and the rest. :]
-
--- You can do the following to avoid that problem.
--- TODO this hashes the same as 3
-doc4 = [:
-            - foo
-            - bar
-          and the rest.
-       :]
-
 -- Note that -- comments are preserved within doc literals.
 commented = [:
   example:
@@ -72,15 +41,116 @@ commented = [:
 :]
 ```
 
-```ucm
+```ucm:hide
 .> add
-.> view doc1 doc2 doc3 doc4 commented
+```
+```ucm
+.> view commented
 ```
 
-If we edit `foo` starting from the pretty-printed version above, and add it again, we end up with the same thing.
+### Indenting, and paragraph reflow
 
+Handling of indenting in docs between the parser and pretty-printer is a bit fiddly.
+
+```unison
+-- The leading and trailing spaces are stripped from the stored Doc by the
+-- lexer, and one leading and trailing space is inserted again on view/edit
+-- by the pretty-printer.
+doc1 = [:   hi   :]
+```
+
+```ucm:hide
+.> add
+```
+```ucm
+.> view doc1
+```
+
+```unison
+-- Lines (apart from the first line) are unindented until at least one of
+-- them hits the left margin (by a post-processing step in the parser).
+-- You may not notice this because the pretty-printer indents them again on
+-- view/edit.
+doc2 = [: hello
+            - foo
+            - bar
+          and the rest. :]
+```
+
+```ucm:hide
+.> add
+```
+```ucm
+.> view doc2
+```
+
+```unison
+doc3 = [: When Unison identifies a paragraph, it removes any newlines from it before storing it, and then reflows the paragraph text to fit the display window on display/view/edit.
+
+For these purposes, a paragraph is any sequence of non-empty lines that have zero indent (after the unindenting mentioned above.)
+
+ - So this is not a paragraph, even
+   though you might want it to be.
+
+   And this text  | as a paragraph
+   is not treated | either.
+
+Note that because of the special treatment of the first line mentioned above, where its leading space is removed, it is always treated as a paragraph.
+   :]
+```
+
+```ucm:hide
+.> add
+```
+```ucm
+.> view doc3
+```
+
+```unison
+doc4 = [: Here's another example of some paragraphs.
+
+          All these lines have zero indent.
+
+            - Apart from this one. :]
+```
+
+```ucm:hide
+.> add
+```
+```ucm
+.> view doc4
+```
+
+```unison
+-- The special treatment of the first line does mean that the following
+-- is pretty-printed not so prettily.  To fix that we'd need to get the
+-- lexer to help out with interpreting doc literal indentation (because
+-- it knows what columns the `[:` was in.)
+doc5 = [:   - foo
+            - bar
+          and the rest. :]
+```
+
+```ucm:hide
+.> add
+```
+```ucm
+.> view doc5
+```
+TODO fence
+```
+-- You can do the following to avoid that problem.
+doc6 = [:
+            - foo
+            - bar
+          and the rest.
+       :]
+```
 TODO
-
-### Reflowing paragraphs
-
+```
+.> add
+```
 TODO
+```
+.> view doc6
+```
diff --git a/unison-src/transcripts/doc-formatting.output.md b/unison-src/transcripts/doc-formatting.output.md
index 4fbf52f4f..add29d403 100644
--- a/unison-src/transcripts/doc-formatting.output.md
+++ b/unison-src/transcripts/doc-formatting.output.md
@@ -1,12 +1,11 @@
-This transcript verifies a few minor details about doc parsing and pretty-printing.
+This transcript explains a few minor details about doc parsing and pretty-printing, both from a user point of view and with some implementation notes.  (The ucm `add` commands and their output are hidden for brevity.)
 
 Docs can be used as inline code comments.
 
 ```unison
 foo : Nat -> Nat
 foo n =
-  [: do the
-     thing :]
+  [: do the thing :]
   n + 1
 ```
 
@@ -25,19 +24,12 @@ foo n =
 
 ```
 ```ucm
-.> add
-
-  ⍟ I've added these definitions:
-  
-    foo : Nat -> Nat
-
 .> view foo
 
   foo : Nat -> Nat
   foo n =
     use Nat +
-    [: do the
-    thing :]
+    [: do the thing :]
     n + 1
 
 ```
@@ -62,12 +54,6 @@ escaping = [: Docs look [: like \@this \:] :]
 
 ```
 ```ucm
-.> add
-
-  ⍟ I've added these definitions:
-  
-    escaping : Doc
-
 .> view escaping
 
   escaping : Doc
@@ -76,40 +62,7 @@ escaping = [: Docs look [: like \@this \:] :]
 ```
 (Alas you can't have `\@` or `\:]` in your doc, as there's currently no way to 'unescape' them.)
 
-### Indenting
-
-Handling of indenting in docs between the parser and pretty-printer is a bit fiddly.
-
 ```unison
--- The leading and trailing spaces are stripped from the stored Doc, and one
--- leading and trailing space is inserted again on view/edit
--- by the pretty-printer.
-doc1 = [:   hi   :]
-
--- The above treatment is only applied to the first line of a Doc.
--- For storing subsequent lines, they are unindented until at least one of
--- them hits the left margin.  You don't notice this because the
--- pretty-printer indents them again on view/edit.
-doc2 = [: hello
-            - foo
-            - bar
-          and the rest. :]
-
--- That does mean that the following is pretty-printed not so prettily.
--- To fix that we'd need to get the lexer to help out with interpreting
--- doc literal indentation (because it knows what columns the `[:` was in.)
-doc3 = [:   - foo
-            - bar
-          and the rest. :]
-
--- You can do the following to avoid that problem.
--- TODO this hashes the same as 3
-doc4 = [:
-            - foo
-            - bar
-          and the rest.
-       :]
-
 -- Note that -- comments are preserved within doc literals.
 commented = [:
   example:
@@ -128,27 +81,13 @@ commented = [:
     ⍟ These new definitions are ok to `add`:
     
       commented : Doc
-      doc1      : Doc
-      doc2      : Doc
-      doc3      : Doc
-      doc4      : Doc
    
   Now evaluating any watch expressions (lines starting with
   `>`)... Ctrl+C cancels.
 
 ```
 ```ucm
-.> add
-
-  ⍟ I've added these definitions:
-  
-    commented : Doc
-    doc1      : Doc
-    doc2      : Doc
-    doc3      : Doc
-    doc4      : Doc
-
-.> view doc1 doc2 doc3 doc4 commented
+.> view commented
 
   commented : Doc
   commented =
@@ -156,28 +95,219 @@ commented = [:
     
     -- a comment
     f x = x + 1 :]
+
+```
+### Indenting, and paragraph reflow
+
+Handling of indenting in docs between the parser and pretty-printer is a bit fiddly.
+
+```unison
+-- The leading and trailing spaces are stripped from the stored Doc by the
+-- lexer, and one leading and trailing space is inserted again on view/edit
+-- by the pretty-printer.
+doc1 = [:   hi   :]
+```
+
+```ucm
+
+  I found and typechecked these definitions in scratch.u. If you
+  do an `add` or `update`, here's how your codebase would
+  change:
   
+    ⍟ These new definitions are ok to `add`:
+    
+      doc1 : Doc
+   
+  Now evaluating any watch expressions (lines starting with
+  `>`)... Ctrl+C cancels.
+
+```
+```ucm
+.> view doc1
+
   doc1 : Doc
   doc1 = [: hi :]
+
+```
+```unison
+-- Lines (apart from the first line) are unindented until at least one of
+-- them hits the left margin (by a post-processing step in the parser).
+-- You may not notice this because the pretty-printer indents them again on
+-- view/edit.
+doc2 = [: hello
+            - foo
+            - bar
+          and the rest. :]
+```
+
+```ucm
+
+  I found and typechecked these definitions in scratch.u. If you
+  do an `add` or `update`, here's how your codebase would
+  change:
   
+    ⍟ These new definitions are ok to `add`:
+    
+      doc2 : Doc
+   
+  Now evaluating any watch expressions (lines starting with
+  `>`)... Ctrl+C cancels.
+
+```
+```ucm
+.> view doc2
+
   doc2 : Doc
   doc2 =
     [: hello
     - foo
     - bar
     and the rest. :]
+
+```
+```unison
+doc3 = [: When Unison identifies a paragraph, it removes any newlines from it before storing it, and then reflows the paragraph text to fit the display window on display/view/edit.
+
+For these purposes, a paragraph is any sequence of non-empty lines that have zero indent (after the unindenting mentioned above.)
+
+ - So this is not a paragraph, even
+   though you might want it to be.
+
+   And this text  | as a paragraph
+   is not treated | either.
+
+Note that because of the special treatment of the first line mentioned above, where its leading space is removed, it is always treated as a paragraph.
+   :]
+```
+
+```ucm
+
+  I found and typechecked these definitions in scratch.u. If you
+  do an `add` or `update`, here's how your codebase would
+  change:
   
+    ⍟ These new definitions are ok to `add`:
+    
+      doc3 : Doc
+   
+  Now evaluating any watch expressions (lines starting with
+  `>`)... Ctrl+C cancels.
+
+```
+```ucm
+.> view doc3
+
   doc3 : Doc
   doc3 =
+    [:
+    When Unison identifies a paragraph, it removes any newlines
+    from it before storing it, and then reflows the paragraph
+    text to fit the display window on display/view/edit.
+    
+    For these purposes, a paragraph is any sequence of non-empty
+    lines that have zero indent (after the unindenting mentioned
+    above.)
+    
+    - So this is not a paragraph, even
+    though you might want it to be.
+    
+    And this text | as a paragraph
+    is not treated | either.
+    
+    Note that because of the special treatment of the first line
+    mentioned above, where its leading space is removed, it is
+    always treated as a paragraph.
+    :]
+
+```
+```unison
+doc4 = [: Here's another example of some paragraphs.
+
+          All these lines have zero indent.
+
+            - Apart from this one. :]
+```
+
+```ucm
+
+  I found and typechecked these definitions in scratch.u. If you
+  do an `add` or `update`, here's how your codebase would
+  change:
+  
+    ⍟ These new definitions are ok to `add`:
+    
+      doc4 : Doc
+   
+  Now evaluating any watch expressions (lines starting with
+  `>`)... Ctrl+C cancels.
+
+```
+```ucm
+.> view doc4
+
+  doc4 : Doc
+  doc4 =
+    [: Here's another example of some paragraphs.
+    
+    All these lines have zero indent.
+    
+    - Apart from this one. :]
+
+```
+```unison
+-- The special treatment of the first line does mean that the following
+-- is pretty-printed not so prettily.  To fix that we'd need to get the
+-- lexer to help out with interpreting doc literal indentation (because
+-- it knows what columns the `[:` was in.)
+doc5 = [:   - foo
+            - bar
+          and the rest. :]
+```
+
+```ucm
+
+  I found and typechecked these definitions in scratch.u. If you
+  do an `add` or `update`, here's how your codebase would
+  change:
+  
+    ⍟ These new definitions are ok to `add`:
+    
+      doc5 : Doc
+   
+  Now evaluating any watch expressions (lines starting with
+  `>`)... Ctrl+C cancels.
+
+```
+```ucm
+.> view doc5
+
+  doc5 : Doc
+  doc5 =
     [: - foo
     - bar
     and the rest. :]
 
 ```
-If we edit `foo starting from the pretty-printed version above, and add it again, we end up with the same thing.
+TODO fence
+```
+-- You can do the following to avoid that problem.
+doc6 = [:
+            - foo
+            - bar
+          and the rest.
+       :]
+
+```
 
 TODO
+```
+.> add
 
-### Reflowing paragraphs
+```
 
 TODO
+```
+.> view doc6
+
+```
+
diff --git a/unison-src/transcripts/docs.md b/unison-src/transcripts/docs.md
index bf099ceb3..8bfc40d5d 100644
--- a/unison-src/transcripts/docs.md
+++ b/unison-src/transcripts/docs.md
@@ -1,3 +1,6 @@
+
+
+
 # Documenting Unison code
 
 Unison documentation is written in Unison. Documentation is a value of the following type:
@@ -50,7 +53,7 @@ use .builtin
 
 docs.List.take = [:
 
-`@List.take n xs` returns the first `n` elements of `xs`. (No need to add line breaks manually. The display command will do wrapping of text for you.)
+`@List.take n xs` returns the first `n` elements of `xs`. (No need to add line breaks manually. The display command will do wrapping of text for you.  Indent any lines where you don't want it to do this.)
 
 ## Examples:
 
diff --git a/unison-src/transcripts/docs.output.md b/unison-src/transcripts/docs.output.md
index 565327f67..cb76bd5fd 100644
--- a/unison-src/transcripts/docs.output.md
+++ b/unison-src/transcripts/docs.output.md
@@ -1,3 +1,6 @@
+
+
+
 # Documenting Unison code
 
 Unison documentation is written in Unison. Documentation is a value of the following type:
@@ -92,7 +95,7 @@ use .builtin
 
 docs.List.take = [:
 
-`@List.take n xs` returns the first `n` elements of `xs`. (No need to add line breaks manually. The display command will do wrapping of text for you.)
+`@List.take n xs` returns the first `n` elements of `xs`. (No need to add line breaks manually. The display command will do wrapping of text for you.  Indent any lines where you don't want it to do this.)
 
 ## Examples:
 
@@ -149,7 +152,8 @@ Now that documentation is linked to the definition. We can view it if we like:
 
   `builtin.List.take n xs` returns the first `n` elements of
   `xs`. (No need to add line breaks manually. The display
-  command will do wrapping of text for you.)
+  command will do wrapping of text for you. Indent any lines
+  where you don't want it to do this.)
   
   ## Examples:
   
@@ -170,7 +174,8 @@ Or there's also a convenient function, `docs`, which shows the `Doc` values that
 
   `builtin.List.take n xs` returns the first `n` elements of
   `xs`. (No need to add line breaks manually. The display
-  command will do wrapping of text for you.)
+  command will do wrapping of text for you. Indent any lines
+  where you don't want it to do this.)
   
   ## Examples:
   
@@ -194,7 +199,8 @@ Note that if we view the source of the documentation, the various references are
     [:
     `@builtin.List.take n xs` returns the first `n` elements of
     `xs`. (No need to add line breaks manually. The display
-    command will do wrapping of text for you.)
+    command will do wrapping of text for you. Indent any lines
+    where you don't want it to do this.)
     
     ## Examples: