Problem: We're using a common pattern in our bats tests:
Run xrefcheck, redirect output to a temp file
Check the temp file matches some .gold file using `diff`
Delete temp file
We could encapsulate this pattern and make it easier to reuse.
Solution: In the `setup` function, create a temp directory. In the
`teardown` function, delete the temp directory. Create a `to_temp`
function that runs xrefcheck with desired options, pipes its output
through the `prepare` helper function and saves it in a file inside
the temp directory. Create a `assert_diff` function that reads the temp
file, and uses `diff` to compare it against some expected output.
Problem: In #126 we made the `ignoreRefs` option required (to match the
other options). However, having it optional is better for
backwards-compatibility and to help users migrate to newer xrefcheck
versions.
Solution: Make all config options optional.
Problem: Currently, xrefcheck fails immediately after the first
observed error because `die` is used right in `markdownScanner` What
we want is dumping all the errors from different markdowns and then
print them as a final xrefcheck's result together with the broken
links. Also, despite the fact that in the `makeError` function we have
4 error messages, 2 of them are not reported, and the test case that
should check this only checks that at least one of the four files
throws an error.
Solution: Make xrefcheck to report all errors. Add `ScanError` type
and propagate errors to report all of them, rather than failing
immediately after the first error is detected.
Problem: A `0.2.1` section was added to the changelog by mistake in #68,
even though no release was made. This was corrected and undone in #100.
It seems it was added back in again in #88, most likely due to a rebase
mixup.
Solution: Delete this section again.
Problem: The `virtualFiles` config allows the user to use glob patterns
to specify files that do not physically exist in the repository but
should be treated as existing nevertheless. However, we do not yet
have any around our usage of glob patterns. We should write some to
1) ensure it behaves in a sensible way even in corner cases and 2)
document the behaviour.
Solution: Add tests that document how the `virtualFiles` glob paterns work.
Problem: In #85, we added the `checkLocalhost` option to decide
whether to verify links to localhost. However, upon further
reflection, it seems like this could have been subsumed by the
existing `ignoreRefs` option instead.
Solution: Remove `check-localhost` CLI option and `checkLocalhost`
config option. Add a regex matching localhost links to the
`ignoreRefs` field of the default config.
Problem: The `virtualFiles` config option supports glob patterns. On the
other hand, `ignored` only supports exact matches and `notScanned`
mathches on prefixes. There is also a bug where `ignored` does not
ignore files if they contain broken xrefcheck annotations.
Solution: Add support for glob patterns to `ignored` and
`notScanned`. Filter ignored files before parsing their contents.
Problem: We use a 2-step process to parse a URL: we use `parseURI` and
then `mkURIBs`. Both of these functions can fail. At the moment, we're
ignoring their errors and simply throwing a `ExternalResourceInvalidUri`,
and then displaying a generic error message to the user.
Solution: Catch errors from `parseUri` and `mkURIBs` and use them to
tell user why the URL was invalid.
Problem: When a file contains a reference to another file, and that
reference contains an anchor, that anchor is not checked.
Solution: Normalise relative anchor links before check.
Problem: The current version of xrefcheck doesn't allow the square
brackets and some other special characters, like the angle brackets and
the curly brackets, to be present in the URLs, even in the query
strings, as they need to be percent-encoded first.
Solution: Allow some of the reserved characters, like the brackets, to
be present in the query strings of the URLs.
There exist two main standards of URL parsing: RFC 3986 and the Web
Hypertext Application Technology Working Group's URL standard. Ideally,
we want to be able to parse the URLs in accordance with the latter
standard, because it provides a much less ambiguous set of rules for
percent-encoding special characters, and is essentially a living
standard that gets updated constantly.
We allow these characters to be present in the query strings by using
the `parseURI` function from the `uri-bytestring` library with
`laxURIParseOptions`.
Problem: We currently support obtaining `Retry-After` header
values as seconds. However, the http specs state that the header
value can be also a date, e.g: `Wed, 21 Oct 2015 07:28:00 GMT`.
Solution: Support `Retry-After` headers with dates.
Problem: We added support for the `id` attribute in anchors. Since
this is "user-facing change", we should add an entry to `CHANGES.md`.
Solution: Add an entry to `CHANGES.md`.
Problem: The `name` attribute was deprecated, and web devs are now
encouraged to use the `id` attribute instead. We should add support
for the `id` attribute, while retaining support for the `name`
attribute.
Solution: Add support for the `id` attribute.
Problem: We made `ignoreRefs` a required option. Since this
is "user-facing change", we should add an entry to `CHANGES.md`.
Solution: Add an entry to `CHANGES.md`.
Problem: We made `ignoreRefs` a required option. But the `README.md`
still contains a note that `ignoreRefs` is optional.
Solution: Remove the note that says `ignoreRefs` is optional.
Problem: We made `ignoreRefs` a required option. But the config file
generated with the `dump-config` option still contains a note that
`ignoreRefs` can be omitted.
Solution: Remove the note that says `ignoreRefs` can be omitted.
Problem: The `ignoreRefs` parameter, in the config file, was
optional. This is in stark contrast with all the other parameters, all
of which are required.
Solution: Make `ignoreRefs` a required parameter in the config file.
Problem: Verifying a single file using `-r` option missbehaves
when there are absolute links present in the file. Since `-r`
option expects a directory, we can just forbid verifying a single
file.
Solution: Fail with an error message when the user tries to
specify a file as the repository's root directory.
Problem: The help message of the `--mode` option doesn't say which modes
are supported. The only way for a user to find that out is to
purposefully call `--mode` with a garbage value, and then xrefcheck will
complain about it and print the list of supported modes.
Solution: Improve the help message to display the list of supported
modes.
Problem: We had hardcoded HTML tag parser, that doesn't work with add valid HTML tags
Solution: Replace it with `tagsoup` library, that care about all parsing stuff
Problem: The current version of xrefcheck handles the HTTP responses
with the 429 status code just like every other error, when it is
possible to try and eliminate the occurrences of such errors within the
program itself.
Solution: Each time the result of performing a request on a given link
is a 429 error, retrieve the Retry-After information, describing the
delay (in seconds), from the headers of the HTTP response, or,
alternatively, use a configurable default value if the Retry-After
header is absent, and rerun the request after an amount of time
described by the said value had passed. Only after the number of retries
had reached its limiting value, which, as of right now, is not
configurable and is hardcoded, is when the 429 error is converted into
becoming 'unfixable', and any further attempts to remove the error are
terminated.
Additionally, the progress bar has been upgraded and the following
elements are supplied:
1. an extra color -- Blue -- indicating the errors that might get
eliminated during the verification;
2. a timer with the number of seconds left to wait for the restart of
the request; if, during the verification, a new 429 error had emerged
with the new Retry-After value being greater than or equal to the
elapsed time, the timer is immediately updated with that value and
begins ticking down each second from scratch.
Problem: We used default ports to test error reports in checking of localhost link, but this port may be in use by other program, so xrefcheck reports another message
Solution: Specify port by value that is likely not to be used by other programs