Commit Graph

95 Commits

Author SHA1 Message Date
Adrián Enríquez
86d9409e88
[Chore] Fix symlinks test on Windows Git Bash (#283) 2023-05-15 12:43:03 +02:00
Adrián Enríquez
9421c42421
[#244] Symlink scanner
Problem: As GitHub and GitLab do not render symlinks as the file they
point to, we are considering to implement a new scanner for symlinks
that verifies them up to some extent.

Solution: A scanner that validates the reference from a symlink has been
implemented in the same style as the markdown scanner.
2023-02-01 13:06:57 +01:00
Adrián Enríquez
9c2ac77619
[#270] Handle relative redirects
Problem: Currently, Xrefcheck can follow redirects with an absolute
location link, but it cannot handle relative ones.

Solution: After parsing the location link, obtain the corresponding
absolute link by using the original request one.
2023-01-27 22:50:33 +01:00
Adrián Enríquez
236a006230
[#239][#249] Add path case sensitivity tests
Problem: We currenlty have golden tests for anchor case-sensitivity, but
not for path case-sensitivity.

Solution: A golden test for path case-sensitivity has been added and the
previous one has been renamed.
2023-01-27 19:51:14 +01:00
Adrián Enríquez
f16f95cc40
[#239][#249] Parse quoted git ls-files
Problem: When a file name contains unusual characters, its output line
in git ls-files is quoted and Xrefcheck parses the file name wrong.

Solution: Use the git ls-files -z option to output lines verbatim with
null character line terminators. The file not found message has also
been improved for the case in which its link contains a backslash.
2023-01-27 19:51:14 +01:00
Adrián Enríquez
71589e5f5a
[#239][#249] Simplify progress bar interface
Problem: After changing the progress bar interface to require progress
unit witnesses, some functions related to an anonymous task timestamp
also started to require a progress unit witness, which complicates its
usage unnecessarily.

Solution: Do not require a progress unit witness for getTaskTimestamp.
2023-01-27 19:51:14 +01:00
Adrián Enríquez
eea6118476
[#239][#249] Further filepath refactor
Problem: After refactoring the FilePath usages in the codebase to have a
canonical representation of them, we noticed that further improvements
could be applied, such as clarifying whether the path is system
dependent and avoiding absolute file system paths.

Solution: We now use POSIX relative paths during the analysis, and
system dependent ones for reading file contents in the scan phase.
2023-01-27 19:51:04 +01:00
Adrián Enríquez
29c1ab1d0c
[#237] Modify coloring options
Problem: We noticed that output was not being colorized in GitLab CI due
to the current implemented guesses for whether showing colors or not.

Solution: On the one hand, we extend the current guesses to also enable
coloring by default when the CI env var is set to true. On the other
hand, we also add a new flag, color, which avoids these guesses and
enables colors.
2023-01-23 18:27:12 +01:00
Adrián Enríquez
62ad4bf3da
[#235] Refactor progress bar interface
Problem: The current progress bar interface is quite low-level and
error-prone.

Solution: The proposed solution here modifies both the progress bar
internal model and interface by representing progress units by
witnesses. It does not require great changes, but the resulting
interface is more clear and less error-prone.
2023-01-20 10:44:49 +01:00
Adrián Enríquez
fef5153d3a
[#242] No scan symlinks as md files
Problem: When the repository contains a symlink to a markdown file, it
is processed by xrefcheck as if it was the same markdown file but in the
symlink's location. This leads to broken references and can be avoided
because neither GitHub nor GitLab try to render symlinks as the file
they point to.

Solution: Consider symlinks as no scannable files. In the future, we
will consider to include a new dedicated scanner for symlinks if it
works.
2023-01-18 13:28:54 +01:00
Adrián Enríquez
eccb6d0068
[#25] Handle 304 redirect
Problem: After recent work on Xrefcheck redirect behavior, it remained
to discuss how to handle 304 redirects.

Solution: With the current default configuration, 304 redirects are
considered as valid, which seems appropriate taking into account that
304 responses usually mean that you previously received a successful
response for that same request and there is no need to retransmit the
resource again. We add a test case for this default config and update
the FAQ section.
2023-01-16 18:51:59 +01:00
Adrián Enríquez
05fe537ae1
[Chore] Simplify regexp usages 2022-12-30 17:12:16 +01:00
Adrián Enríquez
0b4ce991a1
[#25] Redirect links with configuration rules
Problem: We previously changed the default behaviour of Xrefcheck when
following link redirects, but did not provide a way to configure it.

Solution: We are adding a new field in the configuration file to allow
writing a list of redirect rules that will be applied to links that
match them.
2022-12-30 17:11:01 +01:00
YuriRomanowski
a4dc29bf2a
[#217] Retry on response timeout (#234)
Problem: Currently, getting response timeout immediately results in
fail, it's desired to have a possibility to configure retries on
timeouts.

Solution: The new ExternalHttpTimeout error is added, which is treated
in a similar way as the ExternalHttpTooManyRequests error.
A new field is added to the config meaning how many timeouts are
allowed. Default value equals to 1.
2022-12-29 21:59:48 +05:00
Adrián Enríquez
c9486e7ac6
[Chore] Update comment and error message 2022-12-23 15:17:13 +01:00
Adrián Enríquez
b30413dd41
[#254] Revise dump-config command
Problem: xrefcheck does not allow to print the config to stdout instead
of writing it to a file. Also, it is easy to overwrite your changes by
mistake by executing the command again.

Solution: provide a --stdout flag to print the config to stdout, and do
not write it to a file unless a --force flag has been included.
2022-12-23 12:19:25 +01:00
Adrián Enríquez
0886062500
[#197] Canonicalize filepaths
Problem: the current usage of filepaths is error-prone and can be
simplified.

Solution: canonicalize filepaths at the boundaries, so their management
will be safer and will simplify the codebase.
2022-12-22 16:29:23 +01:00
Adrián Enríquez
7457a6b109
[#211] Specify language in a golden test
Problem: We have a Golden test that expects an output in English
and fails if a different language is configured.

Solution: Configure explicitly the language before running the
corresponding test.
2022-12-13 10:20:59 +01:00
Adrián Enríquez
e8d79e7f14
[#211] Case insensitive anchors
Problem: Some Markdown flavours such as the GitHub one are case
insensitive regarding anchors, but our analysis is currently
case sensitive and it produces false positives.

Solution: Support case-insensitivity depending on the configured
Markdown flavour. Apply this also to ambiguous and similar anchors
detection.
2022-12-13 10:20:32 +01:00
Adrián Enríquez
9c5f5f82b7
[#218] Change redirects default behaviour
Problem: Xrefcheck currently always follows redirect links.

Solution: We are changing its default behaviour regarding redirect
links to fail and report permanent redirects, and to pass for temporary
redirects. Further PRs will allow the user to configure other policies.
2022-12-12 10:19:01 +01:00
Adrián Enríquez
347f0eecd1
[#228] Rename local file reference tag
Problem: We have found that the current tag for local file references, current file, may lead to ambiguities.

Solution: Rename the tag that we use for local file references to be file-local instead.
2022-12-01 11:54:32 +01:00
Adrián Enríquez
d41a07e7bc
[#202] Remove poorly supported unicode symbols from the program output
Problem: We are using unicode symbols as visual clues in the program output that are not commonly supported and are therefore not always displayed as intended.

Solution: Remove the usage of these symbols, as the program output is already using other visual clues and the result will remain understandable for the user.
2022-12-01 09:47:51 +01:00
Anton Sorokin
fb77575b0b
[#164] Add workflow for running Windows tests on CI
Problem: we are not testing behavior of xrefcheck on Windows

Solution: and add workflow to run
golden and tasty tests on CI
via github-actions windows runner
Some subproblems appear:

1.
Problem: CI build fails beacuse it needs `pcre` package
Solution: add it (somehow), see `install pacman dependencies`
in ci.yml

2.
Problem: Network errors displayed different on different platforms
Solution: collect output from both and use
`assert_diff expected_linux.gold || assert_diff expected_windows.gold`

3:
Problem: "Config matches" test is failing because checkout action
clone files with CRLF, and test assert equality of two ByteStrings
Solution: manually remove CR
2022-11-30 21:00:58 +02:00
Anton Sorokin
7115c657ea
[#223] Use nyan-interpolation for defConfigText
Problem:
We have a function `defConfigText :: Flavor -> ByteString` that
uses  `fillHoles` to modify `defConfigUnfilled`.
This is a bit error-prone and very complicated way to have a
`ByteString` with parametric blocks. Also using `ByteString`
instead of `Text` to store text leads to CRLF-related issues when
launched on Windows.

Solution:
Remove `fillHoles` and `defConfigUnfilled`,
`defConfigText` creates a `Text` using `nyan-interpolation`.
2022-11-29 16:52:13 +02:00
Anton Sorokin
1c0fbfef95
[#200] Add --include-untracked CLI option
Problem: xrefcheck checks only files that are tracked by Git,
but sometimes we want to run xrefcheck on
files without adding them to Git, e.g. when we want to
test some generator of markdown files or when we actively
create markdown files during development.

Solution: add option to treat files that were neither
added to git nor ignored as existing.
2022-11-17 15:38:42 +02:00
Anton Sorokin
da25917fa6
[#200] Warnings about files that weren't added to git yet
Problem: after 0.2.2 release, xrefcheck cares only about files
that were added to Git. That can be confusing for users (see #200)

Solution:
If a scannable (currently it means markdown) file is not ignored
(by git or via config) and not tracked by git, print a warning to
stderr while scanning repo.

If a link target such file, change error message from "file not exists"
to `Link target is not tracked by Git`

Suggest user to run "git add" before running xrefcheck in both cases.

To do this, I've changed the `RepoInfo` type, so it also contains
information about untracked files now.
2022-11-17 15:32:02 +02:00
Anton Sorokin
8012dc94d3
[#213] Do not print trailing whitespaces
Problem: bats tests are not space sensetive
Solution: remove trailing spaces from xrefcheck output
(see next problems), remove `--ignore-trailing-space`
from `assert_diff`

Problem: there are lines containing only spaces in
xrefcheck's output, because `Fmt.indentF` "indents"
empty lines too.
Solution: add `Xrefcheck.Util.Interpolate.interpolateIndentF`
function that is not indenting empty lines.
Same for `Fmt.blockListF` and `Fmt.blockListF'`.
Those functions are not adding trailing newlines, so it's
easier to use it in interpolation blocks.

Problem: when there is a current file link `[a](#b)`, it is
printed like
```
- text: "a"
- link: (trailing space here)
- anchor: b
```
Solution: like with anchors, print `link: -` instead
2022-11-10 15:10:59 +02:00
Anton Sorokin
1476bd3435
[#208] Remove exessive newlines from output
Problem: xrefcheck's output contains many redundant newlines, so
it takes more display space than it could.
For list of places where such newlines appear, see #208

Solution: don't print redundant newlines, so output is more compact.
Since tests are newline-sensetive, I've checked that now there
are no extra  (e.g. 2 ajacent) blank lines in `expected` part of tests.
2022-11-04 14:27:19 +02:00
Anton Sorokin
82bf996615
[#201] Use nyan-interpolation for building error messages
Problem:
We often need to create large strings, and we use different
fmt tools for this (by-hand concatenation, unlinesF, etc).
Sometimes it is unclear or too heavy, and it always can
be called error-prone

Solution: use `int` quasiquoter to build large strings and
have nice-looking and easy-to-read code
2022-11-03 17:39:10 +02:00
Diogo Castro
e93a21e18a
[#206] Make bats tests space-sensitive
Problem: Right now, our bats tests ignore empty lines and
leading/trailing whitespace differences between the expected output and
the actual output.

However, this could lead to accidental bugs in xrefcheck's output.

Trailing whitespace isn't very concerning (except when it's excessive
and it causes the terminal to line-wrap), but additional/missing empty
lines and leading whitespace can lead to significant changes.

Solution: Let's make these tests sensitive to empty lines and leading
whitespace.
2022-10-26 13:14:45 +01:00
Anton Sorokin
23b52729b1
[#169] Rename ignore file annotation to ignore all
Problem: as in #169, `ignore file` annotation is ignoring
not file itself but all links at file, which is not obvious

Solution: rename it to `ignore all`

Also renamed `IMFile :: IgnoreMode` and `IMSFile :: IgnoreModeState`
to `IMAll` and `IMSAll`
2022-10-26 11:31:52 +03:00
Sergey Gulin
1740be676a
[#92] Add support for image links
Problem: We should add support for image links.

Solution: Extract image links as regular links.
2022-10-26 09:37:56 +10:00
Sergey Gulin
9951c171df
[#171] Rename exclusion-related config options
Problem: The behaviours of `ignoreRefs`, `virtualFiles`, `notScanned` and
`ignored` are closely related. We need to make intent of these more
obvious to the user.

Solution: Rename `ignoreRefs`, `virtualFiles`, `notScanned` and
`ignored` to `ignoreExternalRefsTo`, `ignoreLocalRefsTo`,
`ignoreRefsFrom` and `ignore`. Also, update their yaml comments in
default config file.
2022-10-26 09:06:39 +10:00
Sergey Gulin
013457abcc
[#170] [#119] Reorganize top-level config keys
Problem: At the moment, the config yaml is organized in 3 top-level
keys: `traversal`, `verification` and `scanners`. However, the distinction
between the "traversal" and the "verification" stages is not relevant
to the user. This is entirely an internal concern.

Solution: Reorganize yaml config options under `exclusions`, `networking`
and `scanners`.
2022-10-26 07:39:01 +10:00
Anton Sorokin
ef0e26029a
[#157] Add support for autolinks
Problem: GitHub renders "implicit" links like `visit www.google.com` as links,
but we don't check them, since cmark-gfm renders them as text

Solution: add `autolink` extension to `cmark-gfm`, so both
`www.google.com`  and `https://google.com`
will be parsed as link and successfuly verified
2022-10-25 22:20:27 +03:00
Anton Sorokin
602e6a8ec9
[#150] Change ignore link behavior
Problem:
Currently `ignore link` annotation works for a  first link after it
(in a whole  file). That can be bad for user,
e.g. he may forgot to delete this annotation, and now it ignores link in
some random place

Solution:
Throw scan errors when `ignore link` is not followed by a node with link.
To do this, we need to increase amount  of context in `ScannerM`
2022-10-25 11:20:36 +03:00
Anton Sorokin
c8a053a139
[#187] Check globs in config fields and CLI args
Problem: user can think that globs in e.g. config field `ignored`
have same root as all markdown links, so if he will add e.g. `/scripts/*`
to ignore, link `[a](/scripts/*)` will be not checked. But we were
understanding this glob as filesystem-level absolute path
(so this link will still be checked). Also we pretend to check correctness
of user-supplied globs, but we're using glob parser in mode that always
succeed (and transform bad globs to correct somehow).

Solution:  make function `mkGlobPattern` that throws readable error
when see absolute paths (like `/scripts/*`) or malformed globs (like `<a>`),
use it for parsing all globs
in config fields and cli args. Also remove redundant "canonization" of globs
2022-10-24 17:17:41 +03:00
Sergey Gulin
bfbe20a5b0
[#139] Ignore build-related files
Problem: At the moment, we're using the ignored option for mainly 2
purposes: 1) to ignore all files in the `.git` folder (`.git/**/*`) to
ignore all build-related temporary files (the default config ignores
`.stack-work/**/*`). A more robust alternative might be to ignore all
files implicitly ignored by git.

Solution: Use `git ls-files` to ignore all files implicitly ignored by git.
2022-10-21 22:07:00 +10:00
Anton Sorokin
a03c9fff2a
fixup! [#165] [#192] Add tests for specific cases 2022-10-14 15:19:05 +03:00
Anton Sorokin
22b3ce1ad6
[#165] [#192] Add tests for specific cases
Problem: there is no tests for links to directories and non-markdown files.
Also we aren't testing that links are case sensetive

Solution: add such tests to `check-local-refs`, also check that only links
to directory are allowed to have trailing slash,
e.g. `a.md/` is bad and `dir1/` is ok
2022-10-13 18:18:33 +03:00
Sergey Gulin
af9e029853
[#151] Refactor IgnoreMode
Problem: Currently, сonstructor None from `IgnoreMode` is used to
represent two distinct things: 1) no "ignore mode" is currently set
2) unrecognized "ignore mode" is found. It would make more sense to
change `getIgnoreMode`. Instead of returning `Maybe IgnoreMode`, it
should return type `GetIgnoreMode` with 3 possible constructor:
`NotAnAnnotation`, `ValidMode IgnoreMode`, `InvalidMode Text`.

Solution: Remove `None` from `IgnoreMode`, make `getIgnoreMode` to
return `GetIgnoreMode`, use `Maybe Ignore` as scan state where
`Nothing` represents no mode is currently set.
2022-10-10 20:50:11 +10:00
Anton Sorokin
8285cea456
[#138] Report links that escape repo directory
Problem: as in #138, when we see a local link, we are checking only
existance of referred file,
not checking that this file is a part of repo
and link will compatible with Github's renderer

Solution: manually count "nesting levels" of all local links,
checking that number of `".."`'s is always less
then number of real directories
2022-10-10 13:19:55 +03:00
Anton Sorokin
847b21bfbc
[#185] Add CLI option to disable output coloring
Problem: output of xrefcheck contains ANSI-colored text,
which is bad when we redirect output to file
or when our terminal is not supporting colors.
Colorising is performed in `Buildable` instances of various types,
so we can't just pass some extra flag here

Solution: add CLI  option `--no-color`
Create `colorIfNeeded` and `styleIfNeeded` functions that have
`Data.Reflection.Given ColorMode` constraint, and replace all usages of
 `color` and `style` by them, adding new
constraint to instances.
2022-10-08 22:26:10 +03:00
Anton Sorokin
6d45208211
[#165] Add tests for local references
Problem: We want all cases of local references to be covered by bats tests
(e.g. cause we want to be sure that all this works well on Windows),
but there was only one case covered in `check-anchors`

Solution: make a `check-local-refs` folder and test different
types of links  with different roots
2022-10-07 18:29:22 +03:00
Sergey Gulin
2b17bb0942
[#180] Make flavor a required parameter
Problem: As of #159 we made all config fields optional. However, it
makes sense to make the `flavor` field mandatory, as it affects
correctness and the user must make a choice here.

Solution: Make `flavor` a required parameter.
2022-10-04 22:54:45 +10:00
Sergey Gulin
2bcbfa1c47
[#156] Trim redundant config fields from test configs
Problem: In #159 we made all config fields optional. Now we can trim
redundant fields from test configs.

Solution: Trim redundant config fields. Also slightly refactor
`overrideVerify` function to make it more readable.
2022-10-04 22:49:45 +10:00
Sergey Gulin
c94ddfcf7d
[#135] CI: add stylish-haskell and shellcheck
Problem: We should add stylish-haskell and shellcheck to our pipline.

Solution: Add stylish-haskell and shellcheck. Use stylish-haskell on repo.
2022-09-27 19:04:17 +10:00
Anton Sorokin
3df588ac8f
[#155] Footnote syntax support
Problem: we can wrongly report footnotes as broken links (#155),
because footnotes support is disabled by default in cmark-gfm

Solution: add `optFootnotes` to `commonMarkToNode`
(this option was recently added to cmark-gfm-hs,
so we need to temporarily pull it from github instead hackage)
2022-09-26 11:59:07 +03:00
Anton Sorokin
0d983beada
[#140] Reject unknown fields in yaml config
Problem: during parsing yaml file, `fromJSON` instance is used,
and by default it ignores unknown field, and
we want to get errors instead (issue #140)

Solution: change `fromJSON` instance for `Config` and types inside it.
Luckily, they  and only they use `aesonConfigOption`
2022-09-25 20:03:57 +03:00
Anton Sorokin
b412781020
[#149] Replace hspec with tasty
Problem: `hspec` and `tasty` are testing frameworks with
almost same functionality,
for historical reasons in xrefcheck we  used different frameworks
for tests and links-tests, and in Serokell we prefer `tasty` now.

Solution: use only `tasty`,
 rewrite code that use `hspec` using correspondance between
 - `testGroup` and `describe`
 -  `testCase` and `it`
 - `shouldBe` and `@?=`
2022-09-25 18:51:41 +03:00