Check cross-references in repository documents
Go to file
Diogo Castro 2fd11bfeb1
[#147] Improve readme
Improved the readme and fixed several problems:
* Mention support for GitLab - this is important and wasn't mentioned
  anywhere.
* Add a FAQ clarifying how xrefcheck behaves in some important
  situations.
* We don't need to get into a lot of detail about the syntax of the
  `xrefcheck: ignore` annotations, where they're allowed and where
  they're not. A general idea and a couple of examples are more than
  enough.
* Added the backlink `[↑](#xrefcheck)` where it was missing.
* Fixed inconsistent level headers: we we're using `###` where we should
  be using `##`
* `nix run` should now be `nix shell`
* Add a link to `tests/configs/github-config.yaml` which contains a list
  of all supported config options.
* Instead of mentioning GitHub Actions in the "usage" section and nix in
  a separate section, mention everything in the "usage" section.
* Fixed link to `stack2cabal`
* Fixed typos and rephrased some bits.
2022-11-18 11:39:02 +00:00
.buildkite [Chore] Fix dockerhub tags 2022-11-18 11:36:46 +00:00
.github [Chore] Update release checklist 2022-09-28 10:58:59 +01:00
.reuse [#149] Replace hspec with tasty 2022-09-25 18:51:41 +03:00
docker Add a windows build 2020-07-03 12:08:22 +03:00
exec Improve readability of imports 2021-11-01 15:25:29 +04:00
ftp-tests [#171] Rename exclusion-related config options 2022-10-26 09:06:39 +10:00
LICENSES [INT-128] Make the repository REUSE compliant 2019-12-19 16:19:27 +03:00
make [#134] Add makefile 2022-09-27 18:33:19 +10:00
nix [Chore] Bump nix dependencies 2022-10-07 21:21:24 +01:00
release Add a windows build 2020-07-03 12:08:22 +03:00
scripts [#135] CI: add stylish-haskell and shellcheck 2022-09-27 19:04:17 +10:00
src/Xrefcheck [#200] Add --include-untracked CLI option 2022-11-17 15:38:42 +02:00
tests [#200] Add --include-untracked CLI option 2022-11-17 15:38:42 +02:00
.gitignore Add hie.yaml to .gitignore 2022-09-25 19:33:11 +03:00
.gitmodules [#133] Refactor golden tests 2022-09-24 23:47:24 +10:00
.hlint.yaml [#213] Do not print trailing whitespaces 2022-11-10 15:10:59 +02:00
.stylish-haskell.yaml [#135] CI: add stylish-haskell and shellcheck 2022-09-27 19:04:17 +10:00
CHANGES.md [#200] Add --include-untracked CLI option 2022-11-17 15:38:42 +02:00
ci.nix [#135] CI: add stylish-haskell and shellcheck 2022-09-27 19:04:17 +10:00
default.nix Add a windows build 2020-07-03 12:08:22 +03:00
LICENSE Add README and other required meta files 2019-03-14 20:29:36 +03:00
Makefile [#135] CI: add stylish-haskell and shellcheck 2022-09-27 19:04:17 +10:00
package.yaml fixup! fixup! fixup! [#89] Handle user interrupts 2022-11-10 16:20:12 +02:00
README.md [#147] Improve readme 2022-11-18 11:39:02 +00:00
Setup.hs [INT-128] Make the repository REUSE compliant 2019-12-19 16:19:27 +03:00
stack.yaml [#201] Use nyan-interpolation for building error messages 2022-11-03 17:39:10 +02:00
stack.yaml.lock [#201] Use nyan-interpolation for building error messages 2022-11-03 17:39:10 +02:00
stack.yaml.lock.license [INT-128] Make the repository REUSE compliant 2019-12-19 16:19:27 +03:00
xrefcheck.nix [#139] Ignore build-related files 2022-10-21 22:07:00 +10:00

Xrefcheck

Build status

Xrefcheck is a tool for verifying local and external references in a repository's documentation that is quick, easy to setup, and suitable to be run on a CI pipeline.

Output sample

Motivation

As a project evolves, links in markdown documentation have a tendency to become broken. This is usually because:

  1. A file has been moved;
  2. A markdown header has been renamed;
  3. An external site has ceased to exist.

This tool will help you to keep references in order. You can run xrefcheck continuously in your CI pipeline, and it will let you know when it finds a broken link.

Aims

Comparing to alternative solutions, this tool tries to achieve the following points:

  • Quickness
    • References are verified in parallel.
    • References with the same target URI are only verified once.
    • It first attempts to verify external links with a HEAD request; only when that fails does it try a GET request.
  • Resilience
    • When you have many links to the same domain, the service is likely to start replying with "429 Too Many Requests". When this happens, xrefcheck will wait the requested amount of seconds before retrying.
  • Easy setup - no extra actions required, just run xrefcheck in the repository root.
  • Conservative verifier allows using this tool in CI, no false positives (e.g. on sites which require authentication) should be reported.

Features

  • Supports both GitHub and GitLab flavored markdown.
  • Supports Windows and Unix systems.
  • Supports relative and absolute local links.
  • Supports external links (http, https, ftp and ftps).
  • Detects broken and ambiguous anchors in local links.
  • Integration with GitHub Actions.

Dependencies

Xrefcheck requires you to have git version 2.18.0 or later in your PATH.

Usage

We provide the following ways for you to use xrefcheck:

If none of those are suitable for you, please open an issue!

To find all broken links in a repository, simply run xrefcheck from its root folder:

xrefcheck

To also display a list of all links and anchors:

xrefcheck --verbose

For description of other options:

xrefcheck --help

To configure xrefcheck, run:

xrefcheck dump-config --type GitHub

This will create a .xrefcheck.yaml file with all the configuration options, here's an example. This file should be committed to your repository.

Build instructions

Run stack install to build everything and install the executable. If you wish to use cabal, you need to run stack2cabal first!

FAQ

  1. How do I ignore specific files?

    • To ignore a specific file, you can either use the --ignore <glob pattern> command-line option, or the ignore list in the config file. Links to those files will be reported as errors, links from those files will not be verified.
  2. How do I ignore specific links?

    • Add an entry to the ignoreLocalRefsTo or ignoreExternalRefsTo lists in the config file.
    • Alternatively, add a <!-- xrefcheck: ignore link --> annotation before the link:
      <!-- xrefcheck: ignore link -->
      Link to some [invalid resource](https://fictitious.uri/).
      
      A [valid link](https://www.google.com)
      followed by an <!-- xrefcheck: ignore link --> [invalid link](https://fictitious.uri/).
      
    • You can also use a <!-- xrefcheck: ignore paragraph --> annotation to ignore all links in a paragraph.
  3. How do I ignore all links from a specific markdown file?

    • Add a glob pattern to the ignoreRefsFrom list in the config file.
    • Or add a <!-- xrefcheck: ignore all --> at the top of the file.
  4. How do I ignore all external links?

    • If you wish to ignore all http/ftp links, you can use --mode local-only.
  5. How does xrefcheck handle links that require authentication?

    • It's common for projects to contains links to protected resources. By default, when xrefcheck attempts to verify a link and is faced with a 403 Forbidden or a 401 Unauthorized, it assumes the link is valid.
    • This behavior can be disabled by setting ignoreAuthFailures: false in the config file.
  6. How does xrefcheck handle redirects?

    • xrefcheck follows up to 10 HTTP redirects.
  7. How does xrefcheck handle localhost links?

    • By default, xrefcheck will ignore links to localhost.
    • This behavior can be disabled by removing the corresponding entry from the ignoreExternalRefsTo list in the config file.

Further work

  • Support for non-Unix systems.
  • Support link detection in different languages, not only Markdown.
    • Haskell Haddock is first in turn.

A comparison with other solutions

  • linky - a well-configurable verifier written in Rust, scans one specified file at a time and works well with system utilities like find. This tool requires some configuring before it can be applied to a repository or added to CI.
  • awesome_bot - a solution written in Ruby that can be easily included in CI or integrated into GitHub. Its features include duplicated URLs detection, specifying allowed HTTP error codes and reporting generation. At the moment of writing, it scans only external references and checking anchors is not possible.
  • remark-validate-links and remark-lint-no-dead-urls - highly configurable JavaScript solution for checking local and external links respectively. It is able to check multiple repositores at once if they are gathered in one folder. Doesn't handle "429 Too Many Requests", so false positives are likely when you have many links to the same domain.
  • markdown-link-check - another checker written in JavaScript, scans one specific file at a time. Supports mailto: link resolution.
  • url-checker - GitHub Action which checks external links in specified files. Does not check local links.
  • linkcheck - advanced site crawler, verifies links in HTML files. There are other solutions for this particular task which we don't mention here.

At the moment of writing, the listed solutions don't support ftp/ftps links.

Issue tracker

We use GitHub issues as our issue tracker. You can login using your GitHub account to leave a comment or create a new issue.

For Contributors

Please see CONTRIBUTING.md for more information.

About Serokell

Xrefcheck is maintained and funded with ❤️ by Serokell. The names and logo for Serokell are trademark of Serokell OÜ.

We love open source software! See our other projects or hire us to design, develop and grow your idea!