Problem:
1. Running `nix develop` in the project root does not provide a
development environment of any sort, as evidenced by:
ghc-pkg list
printing only boot libraries.
2. Tests can fail with this error:
xrefcheck-tests: Network.Socket.bind: resource busy (Address already in use)
due to a potential conflict with an already running application.
Solution:
1. flake.nix: inherit devShells
2. tests: configurable mock server port
With these changes I've been able to build the project and run its tests
as follows:
nix shell nixpkgs#haskellPackages.hpack nixpkgs#cabal-install
nix develop -c $SHELL
hpack
vsftpd \
-orun_as_launching_user=yes \
-olisten_port=2221 \
-olisten=yes \
-oftp_username=$(whoami) \
-oanon_root=./ftp-tests/ftp_root \
-opasv_min_port=2222 \
-ohide_file='{.*}' \
-odeny_file='{.*}' \
-oseccomp_sandbox=no \
-olog_ftp_protocol=yes \
-oxferlog_enable=yes \
-ovsftpd_log_file=./ftp.log &
cabal test ftp-tests --test-options="--ftp-host ftp://127.0.0.1:2221"
cabal test xrefcheck-tests --test-options="--mock-server-port 3001"
Problem: node16 is now deprecated and github-runner provided by nixpkgs
no longer supports this runtime. However,
"actions/{checkout,upload-artifact}@v3" uses this runtime.
Solution: Update CI pipeline to use "actions/{checkout,upload-artifact}@v4".
Problem: After a recent switch to GitHub Actions, job that creates
prerelease suddenly stopped working.
Solution: Switch to the autoreleasing script from serokell.nix
which uses 'gh' instead of 'hub'.
Problem: we'd like to get some statistics about our users, so that
we know how many people use `xrefcheck` or at least view its README,
from what OS they do it, etc.
Solution:
There is a tool called [Scarf](https://about.scarf.sh/) that
can be used for this purpose.
1. Create a Docker package in Scarf which acts as a proxy,
so that instead of `docker pull serokell/xrefcheck` one will use
`docker pull serokell.docker.scarf.sh/serokell/xrefcheck` and
Scarf will automatically record some (anonymyzed) data.
2. Create a File package in Scarf which acts as another proxy.
It downloads artifacts from GitHub releases, e. g.:
https://serokell.gateway.scarf.sh/xrefcheck/v0.2.2/xrefcheck-x86_64-linux
3. Mention these links in usage instructions.
4. Add a tracking pixel to README to also get statistics about
README views.
Problem: For historical reasons some Serokell projects use niv to
organize nix-related dependencies. However, the majority of other
projects in our company use nix flakes to organize these dependencies.
Solution: Add flake and provide ci attibutes as flake outputs.
Problem: As GitHub and GitLab do not render symlinks as the file they
point to, we are considering to implement a new scanner for symlinks
that verifies them up to some extent.
Solution: A scanner that validates the reference from a symlink has been
implemented in the same style as the markdown scanner.
Problem: Currently, Xrefcheck can follow redirects with an absolute
location link, but it cannot handle relative ones.
Solution: After parsing the location link, obtain the corresponding
absolute link by using the original request one.
Problem: We currenlty have golden tests for anchor case-sensitivity, but
not for path case-sensitivity.
Solution: A golden test for path case-sensitivity has been added and the
previous one has been renamed.
Problem: When a file name contains unusual characters, its output line
in git ls-files is quoted and Xrefcheck parses the file name wrong.
Solution: Use the git ls-files -z option to output lines verbatim with
null character line terminators. The file not found message has also
been improved for the case in which its link contains a backslash.
Problem: After changing the progress bar interface to require progress
unit witnesses, some functions related to an anonymous task timestamp
also started to require a progress unit witness, which complicates its
usage unnecessarily.
Solution: Do not require a progress unit witness for getTaskTimestamp.
Problem: After refactoring the FilePath usages in the codebase to have a
canonical representation of them, we noticed that further improvements
could be applied, such as clarifying whether the path is system
dependent and avoiding absolute file system paths.
Solution: We now use POSIX relative paths during the analysis, and
system dependent ones for reading file contents in the scan phase.
Problem: We noticed that output was not being colorized in GitLab CI due
to the current implemented guesses for whether showing colors or not.
Solution: On the one hand, we extend the current guesses to also enable
coloring by default when the CI env var is set to true. On the other
hand, we also add a new flag, color, which avoids these guesses and
enables colors.
Problem: The current progress bar interface is quite low-level and
error-prone.
Solution: The proposed solution here modifies both the progress bar
internal model and interface by representing progress units by
witnesses. It does not require great changes, but the resulting
interface is more clear and less error-prone.
Problem: When the repository contains a symlink to a markdown file, it
is processed by xrefcheck as if it was the same markdown file but in the
symlink's location. This leads to broken references and can be avoided
because neither GitHub nor GitLab try to render symlinks as the file
they point to.
Solution: Consider symlinks as no scannable files. In the future, we
will consider to include a new dedicated scanner for symlinks if it
works.
Problem: There is a duplicated line in the changelog unreleased section,
which seems to be an accidental copy-paste typo or a bad merge conflict
resolution.
Solution: Delete the aforementioned changelog line.
Problem: After recent work on Xrefcheck redirect behavior, it remained
to discuss how to handle 304 redirects.
Solution: With the current default configuration, 304 redirects are
considered as valid, which seems appropriate taking into account that
304 responses usually mean that you previously received a successful
response for that same request and there is no need to retransmit the
resource again. We add a test case for this default config and update
the FAQ section.
Problem: We previously changed the default behaviour of Xrefcheck when
following link redirects, but did not provide a way to configure it.
Solution: We are adding a new field in the configuration file to allow
writing a list of redirect rules that will be applied to links that
match them.
Problem: Currently, getting response timeout immediately results in
fail, it's desired to have a possibility to configure retries on
timeouts.
Solution: The new ExternalHttpTimeout error is added, which is treated
in a similar way as the ExternalHttpTooManyRequests error.
A new field is added to the config meaning how many timeouts are
allowed. Default value equals to 1.