feat: improve change detection
Previously, we were storing the last `modtime` and `size` for a given path in `boltdb`.
This was used to determine if a file had changed and whether we should format it.
In addition, if a formatter's executable changed (`modtime` or `size`), we would delete
all path entries in the database before processing, thereby forcing each file to be
formatted.
This commit introduces a new approach to change detection, one which takes into account
changes to the underlying file, the formatter's executable, _and_ the formatter's
configuration.
Now, when deciding if we should format a file, we do the following:
- Hash each matching formatter, in sequence, using its `name`, `options`, `priority`
as well as the `modtime` and `size` of its executable.
This is pre-computed on a per-pipeline basis.
- We then add the file's `modtime` and `size` to generate a format signature.
You can think of this signature as a unique representation of what we are about to do
with the file.
- The format signature is then compared with a cache entry (if available).
- If the signatures match, we have already applied this sequence of formatters, with these
particular options etc. to this file when it had this `modtime` and `size`, so there is
no more processing to be done.
- If the signatures do no match, we should format the file, recording the new format
signature in the cache when we are finished.
This approach is simpler in terms of storage, and has the added benefit of finer grained
change detection versus the brute force cache busting we were doing before.
In terms of performance impact, with the pre-computing of hashes per-pipeline and the
simpler storage schema, there appears to have been no significant impact.
Manual testing with [nixpkgs](https://github.com/nixos/nixpkgs) shows comparable run times
for both hot and cold caches.
> [!NOTE]
> Since this changes the database schema, rather than implementing some form of migration
> logic to remove the old buckets and so on, I decided to upgrade the hash algorithm we
> use when determining the filename for the db file.
>
> Previously, we were using `sha1`, matching the behaviour from `1.0`.
> Now we use `sha256`, which results in a slightly longer db name, but has the benefit of
> ensuring a new db instance will be created on first invocation, as well as making
> [gosec](https://golangci-lint.run/usage/linters/#gosec) happy.
Closes #455
Signed-off-by: Brian McGee <brian@bmcgee.ie>
2024-10-17 13:54:09 +03:00
|
|
|
package format //nolint:testpackage
|
2024-10-16 15:44:41 +03:00
|
|
|
|
|
|
|
import (
|
feat: improve change detection
Previously, we were storing the last `modtime` and `size` for a given path in `boltdb`.
This was used to determine if a file had changed and whether we should format it.
In addition, if a formatter's executable changed (`modtime` or `size`), we would delete
all path entries in the database before processing, thereby forcing each file to be
formatted.
This commit introduces a new approach to change detection, one which takes into account
changes to the underlying file, the formatter's executable, _and_ the formatter's
configuration.
Now, when deciding if we should format a file, we do the following:
- Hash each matching formatter, in sequence, using its `name`, `options`, `priority`
as well as the `modtime` and `size` of its executable.
This is pre-computed on a per-pipeline basis.
- We then add the file's `modtime` and `size` to generate a format signature.
You can think of this signature as a unique representation of what we are about to do
with the file.
- The format signature is then compared with a cache entry (if available).
- If the signatures match, we have already applied this sequence of formatters, with these
particular options etc. to this file when it had this `modtime` and `size`, so there is
no more processing to be done.
- If the signatures do no match, we should format the file, recording the new format
signature in the cache when we are finished.
This approach is simpler in terms of storage, and has the added benefit of finer grained
change detection versus the brute force cache busting we were doing before.
In terms of performance impact, with the pre-computing of hashes per-pipeline and the
simpler storage schema, there appears to have been no significant impact.
Manual testing with [nixpkgs](https://github.com/nixos/nixpkgs) shows comparable run times
for both hot and cold caches.
> [!NOTE]
> Since this changes the database schema, rather than implementing some form of migration
> logic to remove the old buckets and so on, I decided to upgrade the hash algorithm we
> use when determining the filename for the db file.
>
> Previously, we were using `sha1`, matching the behaviour from `1.0`.
> Now we use `sha256`, which results in a slightly longer db name, but has the benefit of
> ensuring a new db instance will be created on first invocation, as well as making
> [gosec](https://golangci-lint.run/usage/linters/#gosec) happy.
Closes #455
Signed-off-by: Brian McGee <brian@bmcgee.ie>
2024-10-17 13:54:09 +03:00
|
|
|
"os"
|
|
|
|
"os/exec"
|
|
|
|
"path/filepath"
|
2024-10-16 15:44:41 +03:00
|
|
|
"testing"
|
feat: improve change detection
Previously, we were storing the last `modtime` and `size` for a given path in `boltdb`.
This was used to determine if a file had changed and whether we should format it.
In addition, if a formatter's executable changed (`modtime` or `size`), we would delete
all path entries in the database before processing, thereby forcing each file to be
formatted.
This commit introduces a new approach to change detection, one which takes into account
changes to the underlying file, the formatter's executable, _and_ the formatter's
configuration.
Now, when deciding if we should format a file, we do the following:
- Hash each matching formatter, in sequence, using its `name`, `options`, `priority`
as well as the `modtime` and `size` of its executable.
This is pre-computed on a per-pipeline basis.
- We then add the file's `modtime` and `size` to generate a format signature.
You can think of this signature as a unique representation of what we are about to do
with the file.
- The format signature is then compared with a cache entry (if available).
- If the signatures match, we have already applied this sequence of formatters, with these
particular options etc. to this file when it had this `modtime` and `size`, so there is
no more processing to be done.
- If the signatures do no match, we should format the file, recording the new format
signature in the cache when we are finished.
This approach is simpler in terms of storage, and has the added benefit of finer grained
change detection versus the brute force cache busting we were doing before.
In terms of performance impact, with the pre-computing of hashes per-pipeline and the
simpler storage schema, there appears to have been no significant impact.
Manual testing with [nixpkgs](https://github.com/nixos/nixpkgs) shows comparable run times
for both hot and cold caches.
> [!NOTE]
> Since this changes the database schema, rather than implementing some form of migration
> logic to remove the old buckets and so on, I decided to upgrade the hash algorithm we
> use when determining the filename for the db file.
>
> Previously, we were using `sha1`, matching the behaviour from `1.0`.
> Now we use `sha256`, which results in a slightly longer db name, but has the benefit of
> ensuring a new db instance will be created on first invocation, as well as making
> [gosec](https://golangci-lint.run/usage/linters/#gosec) happy.
Closes #455
Signed-off-by: Brian McGee <brian@bmcgee.ie>
2024-10-17 13:54:09 +03:00
|
|
|
"time"
|
2024-10-16 15:44:41 +03:00
|
|
|
|
2024-11-09 18:52:33 +03:00
|
|
|
"github.com/numtide/treefmt/v2/config"
|
|
|
|
"github.com/numtide/treefmt/v2/stats"
|
|
|
|
"github.com/numtide/treefmt/v2/test"
|
2024-10-16 15:44:41 +03:00
|
|
|
"github.com/stretchr/testify/require"
|
|
|
|
)
|
|
|
|
|
|
|
|
func TestInvalidFormatterName(t *testing.T) {
|
|
|
|
as := require.New(t)
|
|
|
|
|
|
|
|
const batchSize = 1024
|
|
|
|
|
|
|
|
cfg := &config.Config{}
|
|
|
|
cfg.OnUnmatched = "info"
|
|
|
|
|
|
|
|
statz := stats.New()
|
|
|
|
|
|
|
|
// simple "empty" config
|
feat: improve change detection
Previously, we were storing the last `modtime` and `size` for a given path in `boltdb`.
This was used to determine if a file had changed and whether we should format it.
In addition, if a formatter's executable changed (`modtime` or `size`), we would delete
all path entries in the database before processing, thereby forcing each file to be
formatted.
This commit introduces a new approach to change detection, one which takes into account
changes to the underlying file, the formatter's executable, _and_ the formatter's
configuration.
Now, when deciding if we should format a file, we do the following:
- Hash each matching formatter, in sequence, using its `name`, `options`, `priority`
as well as the `modtime` and `size` of its executable.
This is pre-computed on a per-pipeline basis.
- We then add the file's `modtime` and `size` to generate a format signature.
You can think of this signature as a unique representation of what we are about to do
with the file.
- The format signature is then compared with a cache entry (if available).
- If the signatures match, we have already applied this sequence of formatters, with these
particular options etc. to this file when it had this `modtime` and `size`, so there is
no more processing to be done.
- If the signatures do no match, we should format the file, recording the new format
signature in the cache when we are finished.
This approach is simpler in terms of storage, and has the added benefit of finer grained
change detection versus the brute force cache busting we were doing before.
In terms of performance impact, with the pre-computing of hashes per-pipeline and the
simpler storage schema, there appears to have been no significant impact.
Manual testing with [nixpkgs](https://github.com/nixos/nixpkgs) shows comparable run times
for both hot and cold caches.
> [!NOTE]
> Since this changes the database schema, rather than implementing some form of migration
> logic to remove the old buckets and so on, I decided to upgrade the hash algorithm we
> use when determining the filename for the db file.
>
> Previously, we were using `sha1`, matching the behaviour from `1.0`.
> Now we use `sha256`, which results in a slightly longer db name, but has the benefit of
> ensuring a new db instance will be created on first invocation, as well as making
> [gosec](https://golangci-lint.run/usage/linters/#gosec) happy.
Closes #455
Signed-off-by: Brian McGee <brian@bmcgee.ie>
2024-10-17 13:54:09 +03:00
|
|
|
_, err := NewCompositeFormatter(cfg, &statz, batchSize)
|
2024-10-16 15:44:41 +03:00
|
|
|
as.NoError(err)
|
|
|
|
|
|
|
|
// valid name using all the acceptable characters
|
|
|
|
cfg.FormatterConfigs = map[string]*config.Formatter{
|
|
|
|
"echo_command-1234567890": {
|
2024-10-21 13:42:29 +03:00
|
|
|
Command: "echo",
|
|
|
|
Includes: []string{"*"},
|
2024-10-16 15:44:41 +03:00
|
|
|
},
|
|
|
|
}
|
|
|
|
|
feat: improve change detection
Previously, we were storing the last `modtime` and `size` for a given path in `boltdb`.
This was used to determine if a file had changed and whether we should format it.
In addition, if a formatter's executable changed (`modtime` or `size`), we would delete
all path entries in the database before processing, thereby forcing each file to be
formatted.
This commit introduces a new approach to change detection, one which takes into account
changes to the underlying file, the formatter's executable, _and_ the formatter's
configuration.
Now, when deciding if we should format a file, we do the following:
- Hash each matching formatter, in sequence, using its `name`, `options`, `priority`
as well as the `modtime` and `size` of its executable.
This is pre-computed on a per-pipeline basis.
- We then add the file's `modtime` and `size` to generate a format signature.
You can think of this signature as a unique representation of what we are about to do
with the file.
- The format signature is then compared with a cache entry (if available).
- If the signatures match, we have already applied this sequence of formatters, with these
particular options etc. to this file when it had this `modtime` and `size`, so there is
no more processing to be done.
- If the signatures do no match, we should format the file, recording the new format
signature in the cache when we are finished.
This approach is simpler in terms of storage, and has the added benefit of finer grained
change detection versus the brute force cache busting we were doing before.
In terms of performance impact, with the pre-computing of hashes per-pipeline and the
simpler storage schema, there appears to have been no significant impact.
Manual testing with [nixpkgs](https://github.com/nixos/nixpkgs) shows comparable run times
for both hot and cold caches.
> [!NOTE]
> Since this changes the database schema, rather than implementing some form of migration
> logic to remove the old buckets and so on, I decided to upgrade the hash algorithm we
> use when determining the filename for the db file.
>
> Previously, we were using `sha1`, matching the behaviour from `1.0`.
> Now we use `sha256`, which results in a slightly longer db name, but has the benefit of
> ensuring a new db instance will be created on first invocation, as well as making
> [gosec](https://golangci-lint.run/usage/linters/#gosec) happy.
Closes #455
Signed-off-by: Brian McGee <brian@bmcgee.ie>
2024-10-17 13:54:09 +03:00
|
|
|
_, err = NewCompositeFormatter(cfg, &statz, batchSize)
|
2024-10-16 15:44:41 +03:00
|
|
|
as.NoError(err)
|
|
|
|
|
|
|
|
// test with some bad examples
|
|
|
|
for _, character := range []string{
|
|
|
|
" ", ":", "?", "*", "[", "]", "(", ")", "|", "&", "<", ">", "\\", "/", "%", "$", "#", "@", "`", "'",
|
|
|
|
} {
|
|
|
|
cfg.FormatterConfigs = map[string]*config.Formatter{
|
|
|
|
"touch_" + character: {
|
|
|
|
Command: "touch",
|
|
|
|
},
|
|
|
|
}
|
|
|
|
|
feat: improve change detection
Previously, we were storing the last `modtime` and `size` for a given path in `boltdb`.
This was used to determine if a file had changed and whether we should format it.
In addition, if a formatter's executable changed (`modtime` or `size`), we would delete
all path entries in the database before processing, thereby forcing each file to be
formatted.
This commit introduces a new approach to change detection, one which takes into account
changes to the underlying file, the formatter's executable, _and_ the formatter's
configuration.
Now, when deciding if we should format a file, we do the following:
- Hash each matching formatter, in sequence, using its `name`, `options`, `priority`
as well as the `modtime` and `size` of its executable.
This is pre-computed on a per-pipeline basis.
- We then add the file's `modtime` and `size` to generate a format signature.
You can think of this signature as a unique representation of what we are about to do
with the file.
- The format signature is then compared with a cache entry (if available).
- If the signatures match, we have already applied this sequence of formatters, with these
particular options etc. to this file when it had this `modtime` and `size`, so there is
no more processing to be done.
- If the signatures do no match, we should format the file, recording the new format
signature in the cache when we are finished.
This approach is simpler in terms of storage, and has the added benefit of finer grained
change detection versus the brute force cache busting we were doing before.
In terms of performance impact, with the pre-computing of hashes per-pipeline and the
simpler storage schema, there appears to have been no significant impact.
Manual testing with [nixpkgs](https://github.com/nixos/nixpkgs) shows comparable run times
for both hot and cold caches.
> [!NOTE]
> Since this changes the database schema, rather than implementing some form of migration
> logic to remove the old buckets and so on, I decided to upgrade the hash algorithm we
> use when determining the filename for the db file.
>
> Previously, we were using `sha1`, matching the behaviour from `1.0`.
> Now we use `sha256`, which results in a slightly longer db name, but has the benefit of
> ensuring a new db instance will be created on first invocation, as well as making
> [gosec](https://golangci-lint.run/usage/linters/#gosec) happy.
Closes #455
Signed-off-by: Brian McGee <brian@bmcgee.ie>
2024-10-17 13:54:09 +03:00
|
|
|
_, err = NewCompositeFormatter(cfg, &statz, batchSize)
|
|
|
|
as.ErrorIs(err, ErrInvalidName)
|
2024-10-16 15:44:41 +03:00
|
|
|
}
|
|
|
|
}
|
feat: improve change detection
Previously, we were storing the last `modtime` and `size` for a given path in `boltdb`.
This was used to determine if a file had changed and whether we should format it.
In addition, if a formatter's executable changed (`modtime` or `size`), we would delete
all path entries in the database before processing, thereby forcing each file to be
formatted.
This commit introduces a new approach to change detection, one which takes into account
changes to the underlying file, the formatter's executable, _and_ the formatter's
configuration.
Now, when deciding if we should format a file, we do the following:
- Hash each matching formatter, in sequence, using its `name`, `options`, `priority`
as well as the `modtime` and `size` of its executable.
This is pre-computed on a per-pipeline basis.
- We then add the file's `modtime` and `size` to generate a format signature.
You can think of this signature as a unique representation of what we are about to do
with the file.
- The format signature is then compared with a cache entry (if available).
- If the signatures match, we have already applied this sequence of formatters, with these
particular options etc. to this file when it had this `modtime` and `size`, so there is
no more processing to be done.
- If the signatures do no match, we should format the file, recording the new format
signature in the cache when we are finished.
This approach is simpler in terms of storage, and has the added benefit of finer grained
change detection versus the brute force cache busting we were doing before.
In terms of performance impact, with the pre-computing of hashes per-pipeline and the
simpler storage schema, there appears to have been no significant impact.
Manual testing with [nixpkgs](https://github.com/nixos/nixpkgs) shows comparable run times
for both hot and cold caches.
> [!NOTE]
> Since this changes the database schema, rather than implementing some form of migration
> logic to remove the old buckets and so on, I decided to upgrade the hash algorithm we
> use when determining the filename for the db file.
>
> Previously, we were using `sha1`, matching the behaviour from `1.0`.
> Now we use `sha256`, which results in a slightly longer db name, but has the benefit of
> ensuring a new db instance will be created on first invocation, as well as making
> [gosec](https://golangci-lint.run/usage/linters/#gosec) happy.
Closes #455
Signed-off-by: Brian McGee <brian@bmcgee.ie>
2024-10-17 13:54:09 +03:00
|
|
|
|
|
|
|
func TestFormatSignature(t *testing.T) {
|
|
|
|
as := require.New(t)
|
|
|
|
|
|
|
|
const batchSize = 1024
|
|
|
|
|
|
|
|
statz := stats.New()
|
|
|
|
|
|
|
|
tempDir := t.TempDir()
|
|
|
|
|
|
|
|
// symlink some formatters into temp dir, so we can mess with their mod times
|
|
|
|
binPath := filepath.Join(tempDir, "bin")
|
|
|
|
as.NoError(os.Mkdir(binPath, 0o755))
|
|
|
|
|
|
|
|
binaries := []string{"black", "elm-format", "gofmt"}
|
|
|
|
|
|
|
|
for _, name := range binaries {
|
|
|
|
src, err := exec.LookPath(name)
|
|
|
|
as.NoError(err)
|
|
|
|
as.NoError(os.Symlink(src, filepath.Join(binPath, name)))
|
|
|
|
}
|
|
|
|
|
|
|
|
// prepend our test bin directory to PATH
|
|
|
|
t.Setenv("PATH", binPath+":"+os.Getenv("PATH"))
|
|
|
|
|
|
|
|
// start with 2 formatters
|
|
|
|
cfg := &config.Config{
|
|
|
|
OnUnmatched: "info",
|
|
|
|
FormatterConfigs: map[string]*config.Formatter{
|
|
|
|
"python": {
|
|
|
|
Command: "black",
|
|
|
|
Includes: []string{"*.py"},
|
|
|
|
},
|
|
|
|
"elm": {
|
|
|
|
Command: "elm-format",
|
|
|
|
Options: []string{"--yes"},
|
|
|
|
Includes: []string{"*.elm"},
|
|
|
|
},
|
|
|
|
},
|
|
|
|
}
|
|
|
|
|
|
|
|
oldSignature := assertSignatureChangedAndStable(t, as, cfg, nil)
|
|
|
|
|
|
|
|
t.Run("change formatter mod time", func(t *testing.T) {
|
|
|
|
for _, name := range []string{"black", "elm-format"} {
|
|
|
|
t.Logf("changing mod time of %s", name)
|
|
|
|
|
|
|
|
// tweak mod time
|
|
|
|
newTime := time.Now().Add(-time.Minute)
|
|
|
|
as.NoError(test.Lutimes(t, filepath.Join(binPath, name), newTime, newTime))
|
|
|
|
|
|
|
|
oldSignature = assertSignatureChangedAndStable(t, as, cfg, oldSignature)
|
|
|
|
}
|
|
|
|
})
|
|
|
|
|
|
|
|
t.Run("modify formatter options", func(_ *testing.T) {
|
|
|
|
f, err := NewCompositeFormatter(cfg, &statz, batchSize)
|
|
|
|
as.NoError(err)
|
|
|
|
|
|
|
|
oldSignature = assertSignatureChangedAndStable(t, as, cfg, nil)
|
|
|
|
|
|
|
|
// adjust python includes
|
|
|
|
python := cfg.FormatterConfigs["python"]
|
|
|
|
python.Includes = []string{"*.py", "*.pyi"}
|
|
|
|
|
|
|
|
newHash, err := f.signature()
|
|
|
|
as.NoError(err)
|
|
|
|
as.Equal(oldSignature, newHash, "hash should not have changed")
|
|
|
|
|
|
|
|
// adjust python excludes
|
|
|
|
python.Excludes = []string{"*.pyi"}
|
|
|
|
|
|
|
|
newHash, err = f.signature()
|
|
|
|
as.NoError(err)
|
|
|
|
as.Equal(oldSignature, newHash, "hash should not have changed")
|
|
|
|
|
|
|
|
// adjust python options
|
|
|
|
python.Options = []string{"-w", "-s"}
|
|
|
|
oldSignature = assertSignatureChangedAndStable(t, as, cfg, oldSignature)
|
|
|
|
|
|
|
|
// adjust python priority
|
|
|
|
python.Priority = 100
|
|
|
|
oldSignature = assertSignatureChangedAndStable(t, as, cfg, oldSignature)
|
|
|
|
|
|
|
|
// adjust command
|
|
|
|
python.Command = "deadnix"
|
|
|
|
oldSignature = assertSignatureChangedAndStable(t, as, cfg, oldSignature)
|
|
|
|
})
|
|
|
|
|
|
|
|
t.Run("add/remove formatters", func(_ *testing.T) {
|
|
|
|
cfg.FormatterConfigs["go"] = &config.Formatter{
|
|
|
|
Command: "gofmt",
|
|
|
|
Options: []string{"-w"},
|
|
|
|
Includes: []string{"*.go"},
|
|
|
|
}
|
|
|
|
|
|
|
|
oldSignature = assertSignatureChangedAndStable(t, as, cfg, oldSignature)
|
|
|
|
|
|
|
|
// remove python formatter
|
|
|
|
delete(cfg.FormatterConfigs, "python")
|
|
|
|
oldSignature = assertSignatureChangedAndStable(t, as, cfg, oldSignature)
|
|
|
|
|
|
|
|
// remove elm formatter
|
|
|
|
delete(cfg.FormatterConfigs, "elm")
|
|
|
|
oldSignature = assertSignatureChangedAndStable(t, as, cfg, oldSignature)
|
|
|
|
})
|
|
|
|
}
|
|
|
|
|
|
|
|
func assertSignatureChangedAndStable(
|
|
|
|
t *testing.T,
|
|
|
|
as *require.Assertions,
|
|
|
|
cfg *config.Config,
|
|
|
|
oldSignature signature,
|
|
|
|
) (h signature) {
|
|
|
|
t.Helper()
|
|
|
|
|
|
|
|
statz := stats.New()
|
|
|
|
f, err := NewCompositeFormatter(cfg, &statz, 1024)
|
|
|
|
as.NoError(err)
|
|
|
|
|
|
|
|
newHash, err := f.signature()
|
|
|
|
as.NoError(err)
|
|
|
|
as.NotEqual(oldSignature, newHash, "hash should have changed")
|
|
|
|
|
|
|
|
sameHash, err := f.signature()
|
|
|
|
as.NoError(err)
|
|
|
|
as.Equal(newHash, sameHash, "hash should not have changed")
|
|
|
|
|
|
|
|
return newHash
|
|
|
|
}
|