Commit Graph

65 Commits

Author SHA1 Message Date
Adrián Enríquez
9421c42421
[#244] Symlink scanner
Problem: As GitHub and GitLab do not render symlinks as the file they
point to, we are considering to implement a new scanner for symlinks
that verifies them up to some extent.

Solution: A scanner that validates the reference from a symlink has been
implemented in the same style as the markdown scanner.
2023-02-01 13:06:57 +01:00
Adrián Enríquez
9c2ac77619
[#270] Handle relative redirects
Problem: Currently, Xrefcheck can follow redirects with an absolute
location link, but it cannot handle relative ones.

Solution: After parsing the location link, obtain the corresponding
absolute link by using the original request one.
2023-01-27 22:50:33 +01:00
Adrián Enríquez
29c1ab1d0c
[#237] Modify coloring options
Problem: We noticed that output was not being colorized in GitLab CI due
to the current implemented guesses for whether showing colors or not.

Solution: On the one hand, we extend the current guesses to also enable
coloring by default when the CI env var is set to true. On the other
hand, we also add a new flag, color, which avoids these guesses and
enables colors.
2023-01-23 18:27:12 +01:00
Adrián Enríquez
fef5153d3a
[#242] No scan symlinks as md files
Problem: When the repository contains a symlink to a markdown file, it
is processed by xrefcheck as if it was the same markdown file but in the
symlink's location. This leads to broken references and can be avoided
because neither GitHub nor GitLab try to render symlinks as the file
they point to.

Solution: Consider symlinks as no scannable files. In the future, we
will consider to include a new dedicated scanner for symlinks if it
works.
2023-01-18 13:28:54 +01:00
Adrián Enríquez
c7633755a9
[Chore] Fix changelog typo
Problem: There is a duplicated line in the changelog unreleased section,
which seems to be an accidental copy-paste typo or a bad merge conflict
resolution.

Solution: Delete the aforementioned changelog line.
2023-01-18 13:28:50 +01:00
Adrián Enríquez
0b4ce991a1
[#25] Redirect links with configuration rules
Problem: We previously changed the default behaviour of Xrefcheck when
following link redirects, but did not provide a way to configure it.

Solution: We are adding a new field in the configuration file to allow
writing a list of redirect rules that will be applied to links that
match them.
2022-12-30 17:11:01 +01:00
Adrián Enríquez
b30413dd41
[#254] Revise dump-config command
Problem: xrefcheck does not allow to print the config to stdout instead
of writing it to a file. Also, it is easy to overwrite your changes by
mistake by executing the command again.

Solution: provide a --stdout flag to print the config to stdout, and do
not write it to a file unless a --force flag has been included.
2022-12-23 12:19:25 +01:00
Adrián Enríquez
e8d79e7f14
[#211] Case insensitive anchors
Problem: Some Markdown flavours such as the GitHub one are case
insensitive regarding anchors, but our analysis is currently
case sensitive and it produces false positives.

Solution: Support case-insensitivity depending on the configured
Markdown flavour. Apply this also to ambiguous and similar anchors
detection.
2022-12-13 10:20:32 +01:00
Adrián Enríquez
9c5f5f82b7
[#218] Change redirects default behaviour
Problem: Xrefcheck currently always follows redirect links.

Solution: We are changing its default behaviour regarding redirect
links to fail and report permanent redirects, and to pass for temporary
redirects. Further PRs will allow the user to configure other policies.
2022-12-12 10:19:01 +01:00
Adrián Enríquez
5ff63f342a
[#228] Changelog entry 2022-12-01 12:00:49 +01:00
Adrián Enríquez
347f0eecd1
[#228] Rename local file reference tag
Problem: We have found that the current tag for local file references, current file, may lead to ambiguities.

Solution: Rename the tag that we use for local file references to be file-local instead.
2022-12-01 11:54:32 +01:00
Adrián Enríquez
d41a07e7bc
[#202] Remove poorly supported unicode symbols from the program output
Problem: We are using unicode symbols as visual clues in the program output that are not commonly supported and are therefore not always displayed as intended.

Solution: Remove the usage of these symbols, as the program output is already using other visual clues and the result will remain understandable for the user.
2022-12-01 09:47:51 +01:00
Anton Sorokin
fb77575b0b
[#164] Add workflow for running Windows tests on CI
Problem: we are not testing behavior of xrefcheck on Windows

Solution: and add workflow to run
golden and tasty tests on CI
via github-actions windows runner
Some subproblems appear:

1.
Problem: CI build fails beacuse it needs `pcre` package
Solution: add it (somehow), see `install pacman dependencies`
in ci.yml

2.
Problem: Network errors displayed different on different platforms
Solution: collect output from both and use
`assert_diff expected_linux.gold || assert_diff expected_windows.gold`

3:
Problem: "Config matches" test is failing because checkout action
clone files with CRLF, and test assert equality of two ByteStrings
Solution: manually remove CR
2022-11-30 21:00:58 +02:00
Anton Sorokin
1c0fbfef95
[#200] Add --include-untracked CLI option
Problem: xrefcheck checks only files that are tracked by Git,
but sometimes we want to run xrefcheck on
files without adding them to Git, e.g. when we want to
test some generator of markdown files or when we actively
create markdown files during development.

Solution: add option to treat files that were neither
added to git nor ignored as existing.
2022-11-17 15:38:42 +02:00
Sorokin-Anton
59cb36cb38
[Chore] Fix changelog entry (#205) 2022-10-26 13:30:47 +03:00
Anton Sorokin
23b52729b1
[#169] Rename ignore file annotation to ignore all
Problem: as in #169, `ignore file` annotation is ignoring
not file itself but all links at file, which is not obvious

Solution: rename it to `ignore all`

Also renamed `IMFile :: IgnoreMode` and `IMSFile :: IgnoreModeState`
to `IMAll` and `IMSAll`
2022-10-26 11:31:52 +03:00
Sergey Gulin
1740be676a
[#92] Add support for image links
Problem: We should add support for image links.

Solution: Extract image links as regular links.
2022-10-26 09:37:56 +10:00
Sergey Gulin
9951c171df
[#171] Rename exclusion-related config options
Problem: The behaviours of `ignoreRefs`, `virtualFiles`, `notScanned` and
`ignored` are closely related. We need to make intent of these more
obvious to the user.

Solution: Rename `ignoreRefs`, `virtualFiles`, `notScanned` and
`ignored` to `ignoreExternalRefsTo`, `ignoreLocalRefsTo`,
`ignoreRefsFrom` and `ignore`. Also, update their yaml comments in
default config file.
2022-10-26 09:06:39 +10:00
Sergey Gulin
013457abcc
[#170] [#119] Reorganize top-level config keys
Problem: At the moment, the config yaml is organized in 3 top-level
keys: `traversal`, `verification` and `scanners`. However, the distinction
between the "traversal" and the "verification" stages is not relevant
to the user. This is entirely an internal concern.

Solution: Reorganize yaml config options under `exclusions`, `networking`
and `scanners`.
2022-10-26 07:39:01 +10:00
Anton Sorokin
ef0e26029a
[#157] Add support for autolinks
Problem: GitHub renders "implicit" links like `visit www.google.com` as links,
but we don't check them, since cmark-gfm renders them as text

Solution: add `autolink` extension to `cmark-gfm`, so both
`www.google.com`  and `https://google.com`
will be parsed as link and successfuly verified
2022-10-25 22:20:27 +03:00
Diogo Castro
94d55601ae
v0.2.2 release 2022-10-25 10:27:27 +01:00
Anton Sorokin
602e6a8ec9
[#150] Change ignore link behavior
Problem:
Currently `ignore link` annotation works for a  first link after it
(in a whole  file). That can be bad for user,
e.g. he may forgot to delete this annotation, and now it ignores link in
some random place

Solution:
Throw scan errors when `ignore link` is not followed by a node with link.
To do this, we need to increase amount  of context in `ScannerM`
2022-10-25 11:20:36 +03:00
Anton Sorokin
c8a053a139
[#187] Check globs in config fields and CLI args
Problem: user can think that globs in e.g. config field `ignored`
have same root as all markdown links, so if he will add e.g. `/scripts/*`
to ignore, link `[a](/scripts/*)` will be not checked. But we were
understanding this glob as filesystem-level absolute path
(so this link will still be checked). Also we pretend to check correctness
of user-supplied globs, but we're using glob parser in mode that always
succeed (and transform bad globs to correct somehow).

Solution:  make function `mkGlobPattern` that throws readable error
when see absolute paths (like `/scripts/*`) or malformed globs (like `<a>`),
use it for parsing all globs
in config fields and cli args. Also remove redundant "canonization" of globs
2022-10-24 17:17:41 +03:00
Sergey Gulin
bfbe20a5b0
[#139] Ignore build-related files
Problem: At the moment, we're using the ignored option for mainly 2
purposes: 1) to ignore all files in the `.git` folder (`.git/**/*`) to
ignore all build-related temporary files (the default config ignores
`.stack-work/**/*`). A more robust alternative might be to ignore all
files implicitly ignored by git.

Solution: Use `git ls-files` to ignore all files implicitly ignored by git.
2022-10-21 22:07:00 +10:00
Anton Sorokin
8285cea456
[#138] Report links that escape repo directory
Problem: as in #138, when we see a local link, we are checking only
existance of referred file,
not checking that this file is a part of repo
and link will compatible with Github's renderer

Solution: manually count "nesting levels" of all local links,
checking that number of `".."`'s is always less
then number of real directories
2022-10-10 13:19:55 +03:00
Anton Sorokin
d096b56951
Automatically disable coloring if it is not supported
Problem: users need to manually disable ANSI coloring when their terminal
is not supporting it. Also we need to explicitly disable colors
if we want to redirect output in file during test

Solution: use `System.Console.Pretty.supportsPretty`
to detect if our terminal is not supporting colors
2022-10-08 22:26:11 +03:00
Anton Sorokin
847b21bfbc
[#185] Add CLI option to disable output coloring
Problem: output of xrefcheck contains ANSI-colored text,
which is bad when we redirect output to file
or when our terminal is not supporting colors.
Colorising is performed in `Buildable` instances of various types,
so we can't just pass some extra flag here

Solution: add CLI  option `--no-color`
Create `colorIfNeeded` and `styleIfNeeded` functions that have
`Data.Reflection.Given ColorMode` constraint, and replace all usages of
 `color` and `style` by them, adding new
constraint to instances.
2022-10-08 22:26:10 +03:00
Anton Sorokin
fd96731b47
[#166] Rename references to current file
Problem: as in #166, we use are wrongly calling links to current file
e.g. `[a](#heading)` as "local", which is confusing with using "local"
as "link to anything in current repository".

Solution: call them "links to current file"
and rename constructor `LocalLoc` to `CurrentFileLoc`
2022-10-07 15:02:16 +03:00
Sergey Gulin
2b17bb0942
[#180] Make flavor a required parameter
Problem: As of #159 we made all config fields optional. However, it
makes sense to make the `flavor` field mandatory, as it affects
correctness and the user must make a choice here.

Solution: Make `flavor` a required parameter.
2022-10-04 22:54:45 +10:00
Diogo Castro
1964f57280
[#162] Do not cancel the progress bar thread
Problem: The output of xrefcheck sometimes appears as a confusing mess,
see here for details and examples:
https://github.com/serokell/xrefcheck/issues/162

The culprit seems to be the `withAsync` used in `verifyRepo`. This
spawns a thread that prints and refreshes the progress bar, while the
main thread coordinates the verification of references.

The problem here is that, as the docs of `withAsync` explain, when the
main thread finishes/throws, the spawned thread will be cancelled with
`uninterruptibleCancel`.

If the thread is in the middle of updating the progress bar (printing
return carriages, writing over an existing line, printing control
characters to change the text color, etc), it'll be abruptly interrupted
and will not be given the chance to finish cleanly.

Solution: replace `withAsync` with `loopAsyncUntil`. This lets the
printer thread finish what it's currently doing.
2022-09-26 14:12:57 +01:00
Anton Sorokin
3df588ac8f
[#155] Footnote syntax support
Problem: we can wrongly report footnotes as broken links (#155),
because footnotes support is disabled by default in cmark-gfm

Solution: add `optFootnotes` to `commonMarkToNode`
(this option was recently added to cmark-gfm-hs,
so we need to temporarily pull it from github instead hackage)
2022-09-26 11:59:07 +03:00
Anton Sorokin
0d983beada
[#140] Reject unknown fields in yaml config
Problem: during parsing yaml file, `fromJSON` instance is used,
and by default it ignores unknown field, and
we want to get errors instead (issue #140)

Solution: change `fromJSON` instance for `Config` and types inside it.
Luckily, they  and only they use `aesonConfigOption`
2022-09-25 20:03:57 +03:00
Diogo Castro
0871e29907
v0.2.1 release 2022-09-24 08:30:53 +01:00
Sergey Gulin
a99005d731
[#156] Make all config options optional
Problem: In #126 we made the `ignoreRefs` option required (to match the
other options). However, having it optional is better for
backwards-compatibility and to help users migrate to newer xrefcheck
versions.

Solution: Make all config options optional.
2022-09-24 05:51:39 +10:00
Sergey Gulin
c8d19a3f98
[#56] Dump all the errors from different files
Problem: Currently, xrefcheck fails immediately after the first
observed error because `die` is used right in `markdownScanner` What
we want is dumping all the errors from different markdowns and then
print them as a final xrefcheck's result together with the broken
links. Also, despite the fact that in the `makeError` function we have
4 error messages, 2 of them are not reported, and the test case that
should check this only checks that at least one of the four files
throws an error.

Solution: Make xrefcheck to report all errors. Add `ScanError` type
and propagate errors to report all of them, rather than failing
immediately after the first error is detected.
2022-09-23 17:13:50 +10:00
Diogo Castro
0a4b70f1e8
[Chore] Fix changelog
Problem: A `0.2.1` section was added to the changelog by mistake in #68,
even though no release was made. This was corrected and undone in #100.
It seems it was added back in again in #88, most likely due to a rebase
mixup.

Solution: Delete this section again.
2022-09-21 11:23:42 +01:00
Sergey Gulin
db77aaa9e4
[#137] Remove checkLocalhost option
Problem: In #85, we added the `checkLocalhost` option to decide
whether to verify links to localhost. However, upon further
reflection, it seems like this could have been subsumed by the
existing `ignoreRefs` option instead.

Solution: Remove `check-localhost` CLI option and `checkLocalhost`
config option. Add a regex matching localhost links to the
`ignoreRefs` field of the default config.
2022-09-16 03:00:53 +10:00
Sergey Gulin
332da5569e
[#77] Add support for glob patterns to ignored and notScanned
Problem: The `virtualFiles` config option supports glob patterns. On the
other hand, `ignored` only supports exact matches and `notScanned`
mathches on prefixes. There is also a bug where `ignored` does not
ignore files if they contain broken xrefcheck annotations.

Solution: Add support for glob patterns to `ignored` and
`notScanned`. Filter ignored files before parsing their contents.
2022-09-08 22:49:27 +10:00
Sergey Gulin
a3f2d28216
[#125] Display URL parsing errors
Problem: We use a 2-step process to parse a URL: we use `parseURI` and
then `mkURIBs`. Both of these functions can fail. At the moment, we're
ignoring their errors and simply throwing a `ExternalResourceInvalidUri`,
and then displaying a generic error message to the user.

Solution: Catch errors from `parseUri` and `mkURIBs` and use them to
tell user why the URL was invalid.
2022-09-08 22:31:12 +10:00
Sergey Gulin
36a1da6473
[#120] Fix bug with ignoring checks for relative anchors
Problem: When a file contains a reference to another file, and that
reference contains an anchor, that anchor is not checked.

Solution: Normalise relative anchor links before check.
2022-09-07 21:48:13 +10:00
Constantine Ter-Matevosian
80b5edd1c7
[#49] Allow certain reserved characters in the URLs
Problem: The current version of xrefcheck doesn't allow the square
brackets and some other special characters, like the angle brackets and
the curly brackets, to be present in the URLs, even in the query
strings, as they need to be percent-encoded first.

Solution: Allow some of the reserved characters, like the brackets, to
be present in the query strings of the URLs.
There exist two main standards of URL parsing: RFC 3986 and the Web
Hypertext Application Technology Working Group's URL standard. Ideally,
we want to be able to parse the URLs in accordance with the latter
standard, because it provides a much less ambiguous set of rules for
percent-encoding special characters, and is essentially a living
standard that gets updated constantly.
We allow these characters to be present in the query strings by using
the `parseURI` function from the `uri-bytestring` library with
`laxURIParseOptions`.
2022-09-06 04:39:40 +10:00
Nurlan Alkuatov
86e17eb3a4 [#99] Support Retry-After headers with dates
Problem: We currently support obtaining `Retry-After` header
values as seconds. However, the http specs state that the header
value can be also a date, e.g: `Wed, 21 Oct 2015 07:28:00 GMT`.

Solution: Support `Retry-After` headers with dates.
2022-09-05 13:17:05 +06:00
Sergey Gulin
ea88032a95
[#94] Changelog update: Add support for the id attribute in anchors
Problem: We added support for the `id` attribute in anchors. Since
this is "user-facing change", we should add an entry to `CHANGES.md`.

Solution: Add an entry to `CHANGES.md`.
2022-08-29 18:48:11 +10:00
Sergey Gulin
81ebde0d99
[#126] Changelog update: Make ignoreRefs a required parameter
Problem: We made `ignoreRefs` a required option. Since this
is "user-facing change", we should add an entry to `CHANGES.md`.

Solution: Add an entry to `CHANGES.md`.
2022-08-29 17:32:50 +10:00
Nurlan Alkuatov
96960f7ebf [#90] Forbid verifying a single file
Problem: Verifying a single file using `-r` option missbehaves
when there are absolute links present in the file. Since `-r`
option expects a directory, we can just forbid verifying a single
file.

Solution: Fail with an error message when the user tries to
specify a file as the repository's root directory.
2022-07-24 18:59:53 +06:00
Diogo Castro
e5be7a4321
[Chore] Add missing changelog entries 2022-07-22 16:51:28 +01:00
Constantine Ter-Matevosian
032395007b
[#31] Handle the "429 too many requests" errors
Problem: The current version of xrefcheck handles the HTTP responses
with the 429 status code just like every other error, when it is
possible to try and eliminate the occurrences of such errors within the
program itself.

Solution: Each time the result of performing a request on a given link
is a 429 error, retrieve the Retry-After information, describing the
delay (in seconds), from the headers of the HTTP response, or,
alternatively, use a configurable default value if the Retry-After
header is absent, and rerun the request after an amount of time
described by the said value had passed. Only after the number of retries
had reached its limiting value, which, as of right now, is not
configurable and is hardcoded, is when the 429 error is converted into
becoming 'unfixable', and any further attempts to remove the error are
terminated.

Additionally, the progress bar has been upgraded and the following
elements are supplied:
1. an extra color -- Blue -- indicating the errors that might get
   eliminated during the verification;
2. a timer with the number of seconds left to wait for the restart of
   the request; if, during the verification, a new 429 error had emerged
   with the new Retry-After value being greater than or equal to the
   elapsed time, the timer is immediately updated with that value and
   begins ticking down each second from scratch.
2022-07-14 17:25:52 +03:00
Андреев Кирилл
a9a156ef41
Undo errorneous version increment 2021-11-09 15:50:13 +04:00
Constantine Ter-Matevosian
b9e7ffb99d
[#75] Fix the root with an appended slash support
Problem: The results of the repository analysis will always contain
invalid references if the root contains a trailing forward slash.

Solution: Strip the root's trailing slash (if present) when having it be
given as an argument of the System.FilePath.Posix.takeDirectory
function.
2021-10-26 15:16:47 +03:00
Constantine Ter-Matevosian
82accc132c [#74] Detect and handle duplicate links during verification
Problem: All the duplicate external links get verified independently,
which is wasteful.

Solution: Store the verification results in the map. Verify the link
once if it hasn't been verified already, and insert the results
into the map; alternatively, if the link has been verified before,
return the results by retrieving the respective value from the map.
The traversal of all the filepath-reference pairs is done via
'forConcurrentlyCaching', which is semantically similar to
Control.Concurrent.Async.forConcurrently, except it stores the
action results in a map (in our case, we need to store the
verification results of only the external references).
2021-10-08 19:29:11 +03:00