Haskell strings can have "gaps", where any amount of whitespace between
two backslashes are ignored. This allows writing multi-line strings. As
an example, all strings below are the same:
```
"foobar"
"foo\ \bar"
"foo\
\bar"
```
When parsing a string literal, lexer usually produces two fields, one
of them is the actual string user wrote as a 'SourceText', the other one
is the sanitized version where gaps and other special characters removed.
While printing the string, GHC's Outputable instance uses the 'SourceText'
field, however since that text contains gaps as-is, we can not change
the original indentation. In order to fix this, this commit splits the
strings by the gaps and print each line separately applying the layout
rules.
Also, it applies the same logic to type-level strings.
I implemented a custom logic where we assign a score to every occurance of
an operator based on their location, and the average of that score determine
the fixity of the operator.
As you can imagine, the solution is a bit brittle; and it is easy to mislead
it if you knowingly craft an input, but it gave acceptable results for every
code snippet I found online. And since it returns the same AST no matter how
we infer the fixities, it is not the end of the world if we infer something
incorrectly.
The code is not really optimised, and I think it has quadratic time
complexity. Notably, we use opTreeLoc function quite often and it traverses
the whole tree every time. Memoizing that on the OpBranch constructor would
make formatting files with reeeally long operator chains a lot faster. We
can do this once we decide to optimize for speed.
Here is an example which fails to parse with bang patterns but succeeds
otherwise:
(!) :: Foo -> Int -> Int
(Foo n) ! p = n + p
To run Ormolu on this we must not enable bang patterns by default.
This removes (or rather puts it to a lower level) logic around “modifying
newline” because it was very hard to reason about and almost blocked my work
on fixing issue #337.
I also dropped debugging output because it's too verbose and I'm not using
it anyway.
As part of these changes I also changed now the ‘newline’ combinator works.
Now, similar to ‘space’, the second ‘newline’ in a row just tells the
rendering engine to prefix next thing with a newline, using the ‘newline’
combinator more than twice in a row has no effect.
To take full advantage of the new feature I also went through the code and
simplified some logic around outputting exact amount of newlines because now
it's harder to get things wrong, so we can be less careful with counting
newlines.
Previously, if an operator had preceding comments attached to its second
argument, they would end up printed right after the operator:
a
+ -- b comment
b
On second run however, the comment would be interpreted as attached to ‘(+)’
and the result would be:
a
+ b -- b comment
Breaking the idempotence guarantees.
The solution that this commit implements includes several steps:
* Introduce the concept of “dirty line”. A line is dirty if it has something
on it that can have a comment attached to it.
* ‘txt’ is supposed to output fixed bits of syntax that cannot have comments
attached to them (at least in Ormolu's model).
* ‘atom’ on the other hand outputs things that mark the current line dirty.
* When we're to print preceding comments for the second argument we check if
the current line is dirty. If it is, we output an extra newline to prevent
the first comment from changing “hosts”.
* Now there is another problem with trailing whitespace after the operator
in that case. We solve that by making spaces a bit “lazy”. When the ‘space’
combinator is used (which is the recommeneded way to separate different
constructs now) it just guarantees that the next thing we'll output on the
same line will be separated from previous output by a single space.
So, using ‘space’ twice results in single space in output still. This has
the extra benefit of simplifying all the logic that made sure that we have
only single space and not 0 or 2 spaces when spaces are inserted
conditionally and independently.
There has been a lot of good intense work lately and as a result of that
some examples have grown considerably. The problem is that we do not show
diffs when something is not formatted as expected, we show entire
"expected/got" files. It works well when files are small, but not so well
where they are huge (some of our examples are well beyond 100 lines). It can
be hard to understand where the problem is.
This commit split long examples into smaller ones to make it easier to see
what went wrong when a test fails.
Attach the comment if the next element is not a sibling. I think this is
quite often what we want, since if we put a comment inside a construct, we
prefer it to stay inside the same element.