Commit Graph

79 Commits

Author SHA1 Message Date
Adrián Enríquez
0886062500
[#197] Canonicalize filepaths
Problem: the current usage of filepaths is error-prone and can be
simplified.

Solution: canonicalize filepaths at the boundaries, so their management
will be safer and will simplify the codebase.
2022-12-22 16:29:23 +01:00
Adrián Enríquez
7457a6b109
[#211] Specify language in a golden test
Problem: We have a Golden test that expects an output in English
and fails if a different language is configured.

Solution: Configure explicitly the language before running the
corresponding test.
2022-12-13 10:20:59 +01:00
Adrián Enríquez
e8d79e7f14
[#211] Case insensitive anchors
Problem: Some Markdown flavours such as the GitHub one are case
insensitive regarding anchors, but our analysis is currently
case sensitive and it produces false positives.

Solution: Support case-insensitivity depending on the configured
Markdown flavour. Apply this also to ambiguous and similar anchors
detection.
2022-12-13 10:20:32 +01:00
Adrián Enríquez
9c5f5f82b7
[#218] Change redirects default behaviour
Problem: Xrefcheck currently always follows redirect links.

Solution: We are changing its default behaviour regarding redirect
links to fail and report permanent redirects, and to pass for temporary
redirects. Further PRs will allow the user to configure other policies.
2022-12-12 10:19:01 +01:00
Adrián Enríquez
347f0eecd1
[#228] Rename local file reference tag
Problem: We have found that the current tag for local file references, current file, may lead to ambiguities.

Solution: Rename the tag that we use for local file references to be file-local instead.
2022-12-01 11:54:32 +01:00
Adrián Enríquez
d41a07e7bc
[#202] Remove poorly supported unicode symbols from the program output
Problem: We are using unicode symbols as visual clues in the program output that are not commonly supported and are therefore not always displayed as intended.

Solution: Remove the usage of these symbols, as the program output is already using other visual clues and the result will remain understandable for the user.
2022-12-01 09:47:51 +01:00
Anton Sorokin
fb77575b0b
[#164] Add workflow for running Windows tests on CI
Problem: we are not testing behavior of xrefcheck on Windows

Solution: and add workflow to run
golden and tasty tests on CI
via github-actions windows runner
Some subproblems appear:

1.
Problem: CI build fails beacuse it needs `pcre` package
Solution: add it (somehow), see `install pacman dependencies`
in ci.yml

2.
Problem: Network errors displayed different on different platforms
Solution: collect output from both and use
`assert_diff expected_linux.gold || assert_diff expected_windows.gold`

3:
Problem: "Config matches" test is failing because checkout action
clone files with CRLF, and test assert equality of two ByteStrings
Solution: manually remove CR
2022-11-30 21:00:58 +02:00
Anton Sorokin
7115c657ea
[#223] Use nyan-interpolation for defConfigText
Problem:
We have a function `defConfigText :: Flavor -> ByteString` that
uses  `fillHoles` to modify `defConfigUnfilled`.
This is a bit error-prone and very complicated way to have a
`ByteString` with parametric blocks. Also using `ByteString`
instead of `Text` to store text leads to CRLF-related issues when
launched on Windows.

Solution:
Remove `fillHoles` and `defConfigUnfilled`,
`defConfigText` creates a `Text` using `nyan-interpolation`.
2022-11-29 16:52:13 +02:00
Anton Sorokin
1c0fbfef95
[#200] Add --include-untracked CLI option
Problem: xrefcheck checks only files that are tracked by Git,
but sometimes we want to run xrefcheck on
files without adding them to Git, e.g. when we want to
test some generator of markdown files or when we actively
create markdown files during development.

Solution: add option to treat files that were neither
added to git nor ignored as existing.
2022-11-17 15:38:42 +02:00
Anton Sorokin
da25917fa6
[#200] Warnings about files that weren't added to git yet
Problem: after 0.2.2 release, xrefcheck cares only about files
that were added to Git. That can be confusing for users (see #200)

Solution:
If a scannable (currently it means markdown) file is not ignored
(by git or via config) and not tracked by git, print a warning to
stderr while scanning repo.

If a link target such file, change error message from "file not exists"
to `Link target is not tracked by Git`

Suggest user to run "git add" before running xrefcheck in both cases.

To do this, I've changed the `RepoInfo` type, so it also contains
information about untracked files now.
2022-11-17 15:32:02 +02:00
Anton Sorokin
8012dc94d3
[#213] Do not print trailing whitespaces
Problem: bats tests are not space sensetive
Solution: remove trailing spaces from xrefcheck output
(see next problems), remove `--ignore-trailing-space`
from `assert_diff`

Problem: there are lines containing only spaces in
xrefcheck's output, because `Fmt.indentF` "indents"
empty lines too.
Solution: add `Xrefcheck.Util.Interpolate.interpolateIndentF`
function that is not indenting empty lines.
Same for `Fmt.blockListF` and `Fmt.blockListF'`.
Those functions are not adding trailing newlines, so it's
easier to use it in interpolation blocks.

Problem: when there is a current file link `[a](#b)`, it is
printed like
```
- text: "a"
- link: (trailing space here)
- anchor: b
```
Solution: like with anchors, print `link: -` instead
2022-11-10 15:10:59 +02:00
Anton Sorokin
1476bd3435
[#208] Remove exessive newlines from output
Problem: xrefcheck's output contains many redundant newlines, so
it takes more display space than it could.
For list of places where such newlines appear, see #208

Solution: don't print redundant newlines, so output is more compact.
Since tests are newline-sensetive, I've checked that now there
are no extra  (e.g. 2 ajacent) blank lines in `expected` part of tests.
2022-11-04 14:27:19 +02:00
Anton Sorokin
82bf996615
[#201] Use nyan-interpolation for building error messages
Problem:
We often need to create large strings, and we use different
fmt tools for this (by-hand concatenation, unlinesF, etc).
Sometimes it is unclear or too heavy, and it always can
be called error-prone

Solution: use `int` quasiquoter to build large strings and
have nice-looking and easy-to-read code
2022-11-03 17:39:10 +02:00
Diogo Castro
e93a21e18a
[#206] Make bats tests space-sensitive
Problem: Right now, our bats tests ignore empty lines and
leading/trailing whitespace differences between the expected output and
the actual output.

However, this could lead to accidental bugs in xrefcheck's output.

Trailing whitespace isn't very concerning (except when it's excessive
and it causes the terminal to line-wrap), but additional/missing empty
lines and leading whitespace can lead to significant changes.

Solution: Let's make these tests sensitive to empty lines and leading
whitespace.
2022-10-26 13:14:45 +01:00
Anton Sorokin
23b52729b1
[#169] Rename ignore file annotation to ignore all
Problem: as in #169, `ignore file` annotation is ignoring
not file itself but all links at file, which is not obvious

Solution: rename it to `ignore all`

Also renamed `IMFile :: IgnoreMode` and `IMSFile :: IgnoreModeState`
to `IMAll` and `IMSAll`
2022-10-26 11:31:52 +03:00
Sergey Gulin
1740be676a
[#92] Add support for image links
Problem: We should add support for image links.

Solution: Extract image links as regular links.
2022-10-26 09:37:56 +10:00
Sergey Gulin
9951c171df
[#171] Rename exclusion-related config options
Problem: The behaviours of `ignoreRefs`, `virtualFiles`, `notScanned` and
`ignored` are closely related. We need to make intent of these more
obvious to the user.

Solution: Rename `ignoreRefs`, `virtualFiles`, `notScanned` and
`ignored` to `ignoreExternalRefsTo`, `ignoreLocalRefsTo`,
`ignoreRefsFrom` and `ignore`. Also, update their yaml comments in
default config file.
2022-10-26 09:06:39 +10:00
Sergey Gulin
013457abcc
[#170] [#119] Reorganize top-level config keys
Problem: At the moment, the config yaml is organized in 3 top-level
keys: `traversal`, `verification` and `scanners`. However, the distinction
between the "traversal" and the "verification" stages is not relevant
to the user. This is entirely an internal concern.

Solution: Reorganize yaml config options under `exclusions`, `networking`
and `scanners`.
2022-10-26 07:39:01 +10:00
Anton Sorokin
ef0e26029a
[#157] Add support for autolinks
Problem: GitHub renders "implicit" links like `visit www.google.com` as links,
but we don't check them, since cmark-gfm renders them as text

Solution: add `autolink` extension to `cmark-gfm`, so both
`www.google.com`  and `https://google.com`
will be parsed as link and successfuly verified
2022-10-25 22:20:27 +03:00
Anton Sorokin
602e6a8ec9
[#150] Change ignore link behavior
Problem:
Currently `ignore link` annotation works for a  first link after it
(in a whole  file). That can be bad for user,
e.g. he may forgot to delete this annotation, and now it ignores link in
some random place

Solution:
Throw scan errors when `ignore link` is not followed by a node with link.
To do this, we need to increase amount  of context in `ScannerM`
2022-10-25 11:20:36 +03:00
Anton Sorokin
c8a053a139
[#187] Check globs in config fields and CLI args
Problem: user can think that globs in e.g. config field `ignored`
have same root as all markdown links, so if he will add e.g. `/scripts/*`
to ignore, link `[a](/scripts/*)` will be not checked. But we were
understanding this glob as filesystem-level absolute path
(so this link will still be checked). Also we pretend to check correctness
of user-supplied globs, but we're using glob parser in mode that always
succeed (and transform bad globs to correct somehow).

Solution:  make function `mkGlobPattern` that throws readable error
when see absolute paths (like `/scripts/*`) or malformed globs (like `<a>`),
use it for parsing all globs
in config fields and cli args. Also remove redundant "canonization" of globs
2022-10-24 17:17:41 +03:00
Sergey Gulin
bfbe20a5b0
[#139] Ignore build-related files
Problem: At the moment, we're using the ignored option for mainly 2
purposes: 1) to ignore all files in the `.git` folder (`.git/**/*`) to
ignore all build-related temporary files (the default config ignores
`.stack-work/**/*`). A more robust alternative might be to ignore all
files implicitly ignored by git.

Solution: Use `git ls-files` to ignore all files implicitly ignored by git.
2022-10-21 22:07:00 +10:00
Anton Sorokin
a03c9fff2a
fixup! [#165] [#192] Add tests for specific cases 2022-10-14 15:19:05 +03:00
Anton Sorokin
22b3ce1ad6
[#165] [#192] Add tests for specific cases
Problem: there is no tests for links to directories and non-markdown files.
Also we aren't testing that links are case sensetive

Solution: add such tests to `check-local-refs`, also check that only links
to directory are allowed to have trailing slash,
e.g. `a.md/` is bad and `dir1/` is ok
2022-10-13 18:18:33 +03:00
Sergey Gulin
af9e029853
[#151] Refactor IgnoreMode
Problem: Currently, сonstructor None from `IgnoreMode` is used to
represent two distinct things: 1) no "ignore mode" is currently set
2) unrecognized "ignore mode" is found. It would make more sense to
change `getIgnoreMode`. Instead of returning `Maybe IgnoreMode`, it
should return type `GetIgnoreMode` with 3 possible constructor:
`NotAnAnnotation`, `ValidMode IgnoreMode`, `InvalidMode Text`.

Solution: Remove `None` from `IgnoreMode`, make `getIgnoreMode` to
return `GetIgnoreMode`, use `Maybe Ignore` as scan state where
`Nothing` represents no mode is currently set.
2022-10-10 20:50:11 +10:00
Anton Sorokin
8285cea456
[#138] Report links that escape repo directory
Problem: as in #138, when we see a local link, we are checking only
existance of referred file,
not checking that this file is a part of repo
and link will compatible with Github's renderer

Solution: manually count "nesting levels" of all local links,
checking that number of `".."`'s is always less
then number of real directories
2022-10-10 13:19:55 +03:00
Anton Sorokin
847b21bfbc
[#185] Add CLI option to disable output coloring
Problem: output of xrefcheck contains ANSI-colored text,
which is bad when we redirect output to file
or when our terminal is not supporting colors.
Colorising is performed in `Buildable` instances of various types,
so we can't just pass some extra flag here

Solution: add CLI  option `--no-color`
Create `colorIfNeeded` and `styleIfNeeded` functions that have
`Data.Reflection.Given ColorMode` constraint, and replace all usages of
 `color` and `style` by them, adding new
constraint to instances.
2022-10-08 22:26:10 +03:00
Anton Sorokin
6d45208211
[#165] Add tests for local references
Problem: We want all cases of local references to be covered by bats tests
(e.g. cause we want to be sure that all this works well on Windows),
but there was only one case covered in `check-anchors`

Solution: make a `check-local-refs` folder and test different
types of links  with different roots
2022-10-07 18:29:22 +03:00
Sergey Gulin
2b17bb0942
[#180] Make flavor a required parameter
Problem: As of #159 we made all config fields optional. However, it
makes sense to make the `flavor` field mandatory, as it affects
correctness and the user must make a choice here.

Solution: Make `flavor` a required parameter.
2022-10-04 22:54:45 +10:00
Sergey Gulin
2bcbfa1c47
[#156] Trim redundant config fields from test configs
Problem: In #159 we made all config fields optional. Now we can trim
redundant fields from test configs.

Solution: Trim redundant config fields. Also slightly refactor
`overrideVerify` function to make it more readable.
2022-10-04 22:49:45 +10:00
Sergey Gulin
c94ddfcf7d
[#135] CI: add stylish-haskell and shellcheck
Problem: We should add stylish-haskell and shellcheck to our pipline.

Solution: Add stylish-haskell and shellcheck. Use stylish-haskell on repo.
2022-09-27 19:04:17 +10:00
Anton Sorokin
3df588ac8f
[#155] Footnote syntax support
Problem: we can wrongly report footnotes as broken links (#155),
because footnotes support is disabled by default in cmark-gfm

Solution: add `optFootnotes` to `commonMarkToNode`
(this option was recently added to cmark-gfm-hs,
so we need to temporarily pull it from github instead hackage)
2022-09-26 11:59:07 +03:00
Anton Sorokin
0d983beada
[#140] Reject unknown fields in yaml config
Problem: during parsing yaml file, `fromJSON` instance is used,
and by default it ignores unknown field, and
we want to get errors instead (issue #140)

Solution: change `fromJSON` instance for `Config` and types inside it.
Luckily, they  and only they use `aesonConfigOption`
2022-09-25 20:03:57 +03:00
Anton Sorokin
b412781020
[#149] Replace hspec with tasty
Problem: `hspec` and `tasty` are testing frameworks with
almost same functionality,
for historical reasons in xrefcheck we  used different frameworks
for tests and links-tests, and in Serokell we prefer `tasty` now.

Solution: use only `tasty`,
 rewrite code that use `hspec` using correspondance between
 - `testGroup` and `describe`
 -  `testCase` and `it`
 - `shouldBe` and `@?=`
2022-09-25 18:51:41 +03:00
Sergey Gulin
95d5bad3cd
[#133] Refactor golden tests
Problem: We're using a common pattern in our bats tests:
  Run xrefcheck, redirect output to a temp file
  Check the temp file matches some .gold file using `diff`
  Delete temp file
We could encapsulate this pattern and make it easier to reuse.

Solution: In the `setup` function, create a temp directory. In the
`teardown` function, delete the temp directory. Create a `to_temp`
function that runs xrefcheck with desired options, pipes its output
through the `prepare` helper function and saves it in a file inside
the temp directory. Create a `assert_diff` function that reads the temp
file, and uses `diff` to compare it against some expected output.
2022-09-24 23:47:24 +10:00
Sergey Gulin
a99005d731
[#156] Make all config options optional
Problem: In #126 we made the `ignoreRefs` option required (to match the
other options). However, having it optional is better for
backwards-compatibility and to help users migrate to newer xrefcheck
versions.

Solution: Make all config options optional.
2022-09-24 05:51:39 +10:00
Sergey Gulin
c8d19a3f98
[#56] Dump all the errors from different files
Problem: Currently, xrefcheck fails immediately after the first
observed error because `die` is used right in `markdownScanner` What
we want is dumping all the errors from different markdowns and then
print them as a final xrefcheck's result together with the broken
links. Also, despite the fact that in the `makeError` function we have
4 error messages, 2 of them are not reported, and the test case that
should check this only checks that at least one of the four files
throws an error.

Solution: Make xrefcheck to report all errors. Add `ScanError` type
and propagate errors to report all of them, rather than failing
immediately after the first error is detected.
2022-09-23 17:13:50 +10:00
Sergey Gulin
943d0c881b
[#76] Add tests for virtualFiles glob patterns
Problem: The `virtualFiles` config allows the user to use glob patterns
to specify files that do not physically exist in the repository but
should be treated as existing nevertheless. However, we do not yet
have any around our usage of glob patterns. We should write some to
1) ensure it behaves in a sensible way even in corner cases and 2)
document  the behaviour.

Solution: Add tests that document how the `virtualFiles` glob paterns work.
2022-09-19 17:32:24 +10:00
Sergey Gulin
db77aaa9e4
[#137] Remove checkLocalhost option
Problem: In #85, we added the `checkLocalhost` option to decide
whether to verify links to localhost. However, upon further
reflection, it seems like this could have been subsumed by the
existing `ignoreRefs` option instead.

Solution: Remove `check-localhost` CLI option and `checkLocalhost`
config option. Add a regex matching localhost links to the
`ignoreRefs` field of the default config.
2022-09-16 03:00:53 +10:00
Sergey Gulin
332da5569e
[#77] Add support for glob patterns to ignored and notScanned
Problem: The `virtualFiles` config option supports glob patterns. On the
other hand, `ignored` only supports exact matches and `notScanned`
mathches on prefixes. There is also a bug where `ignored` does not
ignore files if they contain broken xrefcheck annotations.

Solution: Add support for glob patterns to `ignored` and
`notScanned`. Filter ignored files before parsing their contents.
2022-09-08 22:49:27 +10:00
Sergey Gulin
a3f2d28216
[#125] Display URL parsing errors
Problem: We use a 2-step process to parse a URL: we use `parseURI` and
then `mkURIBs`. Both of these functions can fail. At the moment, we're
ignoring their errors and simply throwing a `ExternalResourceInvalidUri`,
and then displaying a generic error message to the user.

Solution: Catch errors from `parseUri` and `mkURIBs` and use them to
tell user why the URL was invalid.
2022-09-08 22:31:12 +10:00
Sergey Gulin
36a1da6473
[#120] Fix bug with ignoring checks for relative anchors
Problem: When a file contains a reference to another file, and that
reference contains an anchor, that anchor is not checked.

Solution: Normalise relative anchor links before check.
2022-09-07 21:48:13 +10:00
Constantine Ter-Matevosian
80b5edd1c7
[#49] Allow certain reserved characters in the URLs
Problem: The current version of xrefcheck doesn't allow the square
brackets and some other special characters, like the angle brackets and
the curly brackets, to be present in the URLs, even in the query
strings, as they need to be percent-encoded first.

Solution: Allow some of the reserved characters, like the brackets, to
be present in the query strings of the URLs.
There exist two main standards of URL parsing: RFC 3986 and the Web
Hypertext Application Technology Working Group's URL standard. Ideally,
we want to be able to parse the URLs in accordance with the latter
standard, because it provides a much less ambiguous set of rules for
percent-encoding special characters, and is essentially a living
standard that gets updated constantly.
We allow these characters to be present in the query strings by using
the `parseURI` function from the `uri-bytestring` library with
`laxURIParseOptions`.
2022-09-06 04:39:40 +10:00
Nurlan Alkuatov
86e17eb3a4 [#99] Support Retry-After headers with dates
Problem: We currently support obtaining `Retry-After` header
values as seconds. However, the http specs state that the header
value can be also a date, e.g: `Wed, 21 Oct 2015 07:28:00 GMT`.

Solution: Support `Retry-After` headers with dates.
2022-09-05 13:17:05 +06:00
Sergey Gulin
c4f0bfb9fc
[#94] Add support for the id attribute in anchors
Problem: The `name` attribute was deprecated, and web devs are now
encouraged to use the `id` attribute instead. We should add support
for the `id` attribute, while retaining support for the `name`
attribute.

Solution: Add support for the `id` attribute.
2022-08-29 18:41:20 +10:00
Sergey Gulin
114a6138a3
[#126] Remove note ignoreRefs is optional from config
Problem: We made `ignoreRefs` a required option. But the config file
generated with the `dump-config` option still contains a note that
`ignoreRefs` can be omitted.

Solution: Remove the note that says `ignoreRefs` can be omitted.
2022-08-27 04:54:56 +10:00
Sergey Gulin
f198777f57
[#126] Make ignoreRefs a required parameter
Problem: The `ignoreRefs` parameter, in the config file, was
optional. This is in stark contrast with all the other parameters, all
of which are required.

Solution: Make `ignoreRefs` a required parameter in the config file.
2022-08-26 21:48:33 +10:00
Nurlan Alkuatov
96960f7ebf [#90] Forbid verifying a single file
Problem: Verifying a single file using `-r` option missbehaves
when there are absolute links present in the file. Since `-r`
option expects a directory, we can just forbid verifying a single
file.

Solution: Fail with an error message when the user tries to
specify a file as the repository's root directory.
2022-07-24 18:59:53 +06:00
Andrei Borzenkov
a6b4513587 [#95] Support HTML tag parsing compatible with HTML spec
Problem: We had hardcoded HTML tag parser, that doesn't work with add valid HTML tags

Solution: Replace it with `tagsoup` library, that care about all parsing stuff
2022-07-17 20:48:51 +04:00
Andrei Borzenkov
2c41713578
[#104] Add maxRetries to configurable options
Problem: maxRetries option was hardcoded in source

Solution: Add it to verify options in config and make new CLI option to override count of retries
2022-07-14 17:25:52 +03:00