Problem: the current usage of filepaths is error-prone and can be
simplified.
Solution: canonicalize filepaths at the boundaries, so their management
will be safer and will simplify the codebase.
Problem: There is currently some problem in stack or cabal
that produces a warning when building this project on
case-insensitive systems.
Solution: The current workaroud for it is to add the GHC
option '-optP-Wno-nonportable-include-path'.
Problem: we are not testing behavior of xrefcheck on Windows
Solution: and add workflow to run
golden and tasty tests on CI
via github-actions windows runner
Some subproblems appear:
1.
Problem: CI build fails beacuse it needs `pcre` package
Solution: add it (somehow), see `install pacman dependencies`
in ci.yml
2.
Problem: Network errors displayed different on different platforms
Solution: collect output from both and use
`assert_diff expected_linux.gold || assert_diff expected_windows.gold`
3:
Problem: "Config matches" test is failing because checkout action
clone files with CRLF, and test assert equality of two ByteStrings
Solution: manually remove CR
Problem: xrefcheck uses utf8 symbols in reports, which are not supported
on most of Windows shells by default.
Sometimes they are printed as question marks (and it cause golden tests to fail)
and sometimes printing of them raise an error.
Solution: use function `withCP65001` from `code-page` package which
sets correct codepage on Windows and do nothing on other OSs
Problem:
We have a function `defConfigText :: Flavor -> ByteString` that
uses `fillHoles` to modify `defConfigUnfilled`.
This is a bit error-prone and very complicated way to have a
`ByteString` with parametric blocks. Also using `ByteString`
instead of `Text` to store text leads to CRLF-related issues when
launched on Windows.
Solution:
Remove `fillHoles` and `defConfigUnfilled`,
`defConfigText` creates a `Text` using `nyan-interpolation`.
Problem: bats tests are not space sensetive
Solution: remove trailing spaces from xrefcheck output
(see next problems), remove `--ignore-trailing-space`
from `assert_diff`
Problem: there are lines containing only spaces in
xrefcheck's output, because `Fmt.indentF` "indents"
empty lines too.
Solution: add `Xrefcheck.Util.Interpolate.interpolateIndentF`
function that is not indenting empty lines.
Same for `Fmt.blockListF` and `Fmt.blockListF'`.
Those functions are not adding trailing newlines, so it's
easier to use it in interpolation blocks.
Problem: when there is a current file link `[a](#b)`, it is
printed like
```
- text: "a"
- link: (trailing space here)
- anchor: b
```
Solution: like with anchors, print `link: -` instead
Problem:
We often need to create large strings, and we use different
fmt tools for this (by-hand concatenation, unlinesF, etc).
Sometimes it is unclear or too heavy, and it always can
be called error-prone
Solution: use `int` quasiquoter to build large strings and
have nice-looking and easy-to-read code
Problem: At the moment, we're using the ignored option for mainly 2
purposes: 1) to ignore all files in the `.git` folder (`.git/**/*`) to
ignore all build-related temporary files (the default config ignores
`.stack-work/**/*`). A more robust alternative might be to ignore all
files implicitly ignored by git.
Solution: Use `git ls-files` to ignore all files implicitly ignored by git.
Problem: output of xrefcheck contains ANSI-colored text,
which is bad when we redirect output to file
or when our terminal is not supporting colors.
Colorising is performed in `Buildable` instances of various types,
so we can't just pass some extra flag here
Solution: add CLI option `--no-color`
Create `colorIfNeeded` and `styleIfNeeded` functions that have
`Data.Reflection.Given ColorMode` constraint, and replace all usages of
`color` and `style` by them, adding new
constraint to instances.
Problem: in #158, we started using `optFootnotes` from the `cmark-gfm`
package. This feature hadn't yet been released on hackage, so we pulled
the package from its github repo.
Since then, cmark-gfm-0.2.5 has been released on hackage.
Solution: pull the latest version from hackage and add a lower bound on
the dependency.
Problem: We depend on `roman-numerals` package,
which is outdatead and can't be built with
GHC 9.2.4, which is used by Hackage to generate Haddocks.
Solution: the only place we use this package is showing header levels,
so we can show them "by hands" instead
Problem: `stack sdist` fails with the following message when there are no version constraints on `base`:
> Package check reported the following errors:
> The dependency 'build-depends: base' does not specify an upper bound on the version number. Each major release of the 'base' package changes the API in various ways and most packages will need some changes to compile with it. The recommended practice is to specify an upper bound on the version of the 'base' package. This ensures your package will continue to build when a new major version of the 'base' package is released. If you are not sure what upper bound to use then use the next major version. For example if you have tested your package with 'base' version 4.5 and 4.6 then use 'build-depends: base >= 4.5 && < 4.7'.
Solution: Add a version constraint to `base`.
Problem: `hspec` and `tasty` are testing frameworks with
almost same functionality,
for historical reasons in xrefcheck we used different frameworks
for tests and links-tests, and in Serokell we prefer `tasty` now.
Solution: use only `tasty`,
rewrite code that use `hspec` using correspondance between
- `testGroup` and `describe`
- `testCase` and `it`
- `shouldBe` and `@?=`
Problem: Currently, xrefcheck fails immediately after the first
observed error because `die` is used right in `markdownScanner` What
we want is dumping all the errors from different markdowns and then
print them as a final xrefcheck's result together with the broken
links. Also, despite the fact that in the `makeError` function we have
4 error messages, 2 of them are not reported, and the test case that
should check this only checks that at least one of the four files
throws an error.
Solution: Make xrefcheck to report all errors. Add `ScanError` type
and propagate errors to report all of them, rather than failing
immediately after the first error is detected.
Problem: We use a 2-step process to parse a URL: we use `parseURI` and
then `mkURIBs`. Both of these functions can fail. At the moment, we're
ignoring their errors and simply throwing a `ExternalResourceInvalidUri`,
and then displaying a generic error message to the user.
Solution: Catch errors from `parseUri` and `mkURIBs` and use them to
tell user why the URL was invalid.
Problem: The current version of xrefcheck doesn't allow the square
brackets and some other special characters, like the angle brackets and
the curly brackets, to be present in the URLs, even in the query
strings, as they need to be percent-encoded first.
Solution: Allow some of the reserved characters, like the brackets, to
be present in the query strings of the URLs.
There exist two main standards of URL parsing: RFC 3986 and the Web
Hypertext Application Technology Working Group's URL standard. Ideally,
we want to be able to parse the URLs in accordance with the latter
standard, because it provides a much less ambiguous set of rules for
percent-encoding special characters, and is essentially a living
standard that gets updated constantly.
We allow these characters to be present in the query strings by using
the `parseURI` function from the `uri-bytestring` library with
`laxURIParseOptions`.
Problem: We had hardcoded HTML tag parser, that doesn't work with add valid HTML tags
Solution: Replace it with `tagsoup` library, that care about all parsing stuff
Problem: The current version of xrefcheck handles the HTTP responses
with the 429 status code just like every other error, when it is
possible to try and eliminate the occurrences of such errors within the
program itself.
Solution: Each time the result of performing a request on a given link
is a 429 error, retrieve the Retry-After information, describing the
delay (in seconds), from the headers of the HTTP response, or,
alternatively, use a configurable default value if the Retry-After
header is absent, and rerun the request after an amount of time
described by the said value had passed. Only after the number of retries
had reached its limiting value, which, as of right now, is not
configurable and is hardcoded, is when the 429 error is converted into
becoming 'unfixable', and any further attempts to remove the error are
terminated.
Additionally, the progress bar has been upgraded and the following
elements are supplied:
1. an extra color -- Blue -- indicating the errors that might get
eliminated during the verification;
2. a timer with the number of seconds left to wait for the restart of
the request; if, during the verification, a new 429 error had emerged
with the new Retry-After value being greater than or equal to the
elapsed time, the timer is immediately updated with that value and
begins ticking down each second from scratch.
Problem: --ignored and --root CLI options had misbehave when you use ./ in path
Solution: add path normalisation and path-equality from System.FilePath instead of common functions for strings
Problem: we had a lot redundant dependencies and had no linter for handling obvious errors
Solution: hlint support and enable -Weverything flag, fix all hints from them, add hlint to the CI pipeline
Problem: At new resolver version we recieved obscure error when tried to cross-compile project to Windows on CI. Changing file-embed version to the old one doesn't help us.
Solution: inline content of this file into haskell source, using raw-string-qq library, that helps us to avoid escaping and typing newline characters.
Problem: We had old resolver version, that used ghc 8.10.4 and outdated versions of libraries
Solution: Change resolver to lts 19.13, that uses ghc 9.0.2 and update haskell.nix configuration files to can build project via `nix-build` command
Problem: In
```
import qualified Foo.Bar as Bar
import Foo.Bar (Bar)
```
names of the imported modules are on different
vertical lines, which disables autosorting,
and makes it harder to read.
Solution: Use `ImportQualifiedPost`
Problem: Stack cannot build projects with mixins, and they only
used to splice universum instead of base.
Solution: Use `NoImplicitPreduce` and import `Universum`
everywhere explicitly.
Problem:
Currently we support only http and https links. If there is an `ftp://`
link, you will get exception.
Solution:
Use `ftp-client` to check connection to ftp, see response statuses and
check file existence. This produces adding new error types and small
refactoring.
Provide a test which is separate executable, where we have to pass CLA -
ftp host.
Co-authored-by: Alexander Bantyev <alexander.bantyev@serokell.io>
Problem:
We do not know what to do with protected links, because we cannot check
them. So we have two options, assume that these links is valid or not.
Solution:
Provide config option for user to decide what to do - assume protected
links valid or not.
Problem: now when we include repository type into the config, it seems
to make sense to generate config differently depending on the repository
type. Especially taking into account that currently in some fields we
mix GitHub and GitLab -specific contents.
Solution:
Leave placeholders in the default config and later fill them from the
code depending on the required repository type.
Add a mandatory repository type parameter to `dump-config` CLI command.
Along with a test checking for config validity, add a golden test on the
produced config so that we could assess how sane it looks like.
Problem: There are no tests checking ignoring
regex performance.
Solution: Add test checking that broken links
matched by regexs are not verified and
test checking that not matched broken links
are verified as links with error.
Problem: It can be convenient not only specifying
exclusions in the config, but also annotating
the excluded thing right in-place. It is about
adding comment before the link or the paragraph
or even the whole file to ignore this item.
Solution: Support annotation as html comment in
a "<!-- xrefcheck: ignore mode -->" format, where
mode is "link" or "paragraph" or "file". Modify
`nodeExtractInfo` from `Xrefcheck.Scanners.Markdown`
just to skip a node in AST while parsing. Take into
account that "ignore file" can only be at the top
of the file or right after the license. In markdown
terms: either the first node must be HTML_BLOCK
with "<!-- xrefcheck: ignore file -->" content
or the first is HTML_BLOCK comment (smth
between "<!--" and "-->") and the second is
HTML_BLOCK with "ignore file" content.
Also take into account that "ignore link" must be
followed by a link. Strictly speaking, there is
either LINK after "ignore link" or TEXT and LINK
(if there is some text before the link).
Problem: External links must be ignored sometimes
and not be checked for validity.
Solution: Add `vcIgnoreRefs` to `Config` that is a list
of regexs using `regex-tdfa`. Each regex define a link
we want to ignore while validating. Change
`checkExternalResource` to support matching the regex.
Problem: `stack upload` does not like that `base` does not have an
upper bound, it reports an error while checking the package.
Solution: add an upper bound on `base` to prevent people from using a
broken version of this package in case of breaking changes in `base`.
Since we build with stack, our current default is 4.13 (deduced from
the resolver). However, 4.14 does not seem to have any breaking changes
https://hackage.haskell.org/package/base-4.14.0.0/changelog
so it should be fine to permit it as well, and that's what we are doing.
Problem: LTS-14.16 that we currently using and a bit old.
And its GHC is old as well.
Solution: update the resolver, make some changes that were necessary.
They are pretty trivial. The only essential breaking change is the
removal of `parseUrl` from `req`.
Problem: base-noprelude is a non-standard dependency, unlike `base`.
Due to our use of `mixins` we don't need `base-noprelude` anymore,
we can just hide `Prelude`.
Solution: use `base` and hide `Prelude`.
Problem: `autoexporter` is used in one place to re-export one module.
Probably one day it will re-export more, but at this point it's really
overkill.
Moreover, it causes Hackage build to fail for the reasons mentioned
here:
https://github.com/haskell/hackage-server/issues/821
Solution: stop using it, re-export manually.
Problem: after we started using mixins feature, generated .cabal file
requires cabal 2.0. Attempt to upload our package to Hackage now reports
an error, saying that `autogen-modules` section have to mention `Path_*`
files.
Solution: add `Paths_xrefcheck` to `generated-other-modules` section of
`package.yaml`.
Problem: lootbox is not uploaded to hackage, so when we upload
xrefcheck, its build on hackage fails.
Solution: we can use `mixins` feature to set Universum as prelude
without any intermediate packages.
This seems to be supported even by quite old versions of stack, like
1.7.3, so that should not bring build problems in most cases.