treefmt/format/formatter_test.go

185 lines
4.5 KiB
Go
Raw Normal View History

feat: improve change detection Previously, we were storing the last `modtime` and `size` for a given path in `boltdb`. This was used to determine if a file had changed and whether we should format it. In addition, if a formatter's executable changed (`modtime` or `size`), we would delete all path entries in the database before processing, thereby forcing each file to be formatted. This commit introduces a new approach to change detection, one which takes into account changes to the underlying file, the formatter's executable, _and_ the formatter's configuration. Now, when deciding if we should format a file, we do the following: - Hash each matching formatter, in sequence, using its `name`, `options`, `priority` as well as the `modtime` and `size` of its executable. This is pre-computed on a per-pipeline basis. - We then add the file's `modtime` and `size` to generate a format signature. You can think of this signature as a unique representation of what we are about to do with the file. - The format signature is then compared with a cache entry (if available). - If the signatures match, we have already applied this sequence of formatters, with these particular options etc. to this file when it had this `modtime` and `size`, so there is no more processing to be done. - If the signatures do no match, we should format the file, recording the new format signature in the cache when we are finished. This approach is simpler in terms of storage, and has the added benefit of finer grained change detection versus the brute force cache busting we were doing before. In terms of performance impact, with the pre-computing of hashes per-pipeline and the simpler storage schema, there appears to have been no significant impact. Manual testing with [nixpkgs](https://github.com/nixos/nixpkgs) shows comparable run times for both hot and cold caches. > [!NOTE] > Since this changes the database schema, rather than implementing some form of migration > logic to remove the old buckets and so on, I decided to upgrade the hash algorithm we > use when determining the filename for the db file. > > Previously, we were using `sha1`, matching the behaviour from `1.0`. > Now we use `sha256`, which results in a slightly longer db name, but has the benefit of > ensuring a new db instance will be created on first invocation, as well as making > [gosec](https://golangci-lint.run/usage/linters/#gosec) happy. Closes #455 Signed-off-by: Brian McGee <brian@bmcgee.ie>
2024-10-17 13:54:09 +03:00
package format //nolint:testpackage
import (
feat: improve change detection Previously, we were storing the last `modtime` and `size` for a given path in `boltdb`. This was used to determine if a file had changed and whether we should format it. In addition, if a formatter's executable changed (`modtime` or `size`), we would delete all path entries in the database before processing, thereby forcing each file to be formatted. This commit introduces a new approach to change detection, one which takes into account changes to the underlying file, the formatter's executable, _and_ the formatter's configuration. Now, when deciding if we should format a file, we do the following: - Hash each matching formatter, in sequence, using its `name`, `options`, `priority` as well as the `modtime` and `size` of its executable. This is pre-computed on a per-pipeline basis. - We then add the file's `modtime` and `size` to generate a format signature. You can think of this signature as a unique representation of what we are about to do with the file. - The format signature is then compared with a cache entry (if available). - If the signatures match, we have already applied this sequence of formatters, with these particular options etc. to this file when it had this `modtime` and `size`, so there is no more processing to be done. - If the signatures do no match, we should format the file, recording the new format signature in the cache when we are finished. This approach is simpler in terms of storage, and has the added benefit of finer grained change detection versus the brute force cache busting we were doing before. In terms of performance impact, with the pre-computing of hashes per-pipeline and the simpler storage schema, there appears to have been no significant impact. Manual testing with [nixpkgs](https://github.com/nixos/nixpkgs) shows comparable run times for both hot and cold caches. > [!NOTE] > Since this changes the database schema, rather than implementing some form of migration > logic to remove the old buckets and so on, I decided to upgrade the hash algorithm we > use when determining the filename for the db file. > > Previously, we were using `sha1`, matching the behaviour from `1.0`. > Now we use `sha256`, which results in a slightly longer db name, but has the benefit of > ensuring a new db instance will be created on first invocation, as well as making > [gosec](https://golangci-lint.run/usage/linters/#gosec) happy. Closes #455 Signed-off-by: Brian McGee <brian@bmcgee.ie>
2024-10-17 13:54:09 +03:00
"os"
"os/exec"
"path/filepath"
"testing"
feat: improve change detection Previously, we were storing the last `modtime` and `size` for a given path in `boltdb`. This was used to determine if a file had changed and whether we should format it. In addition, if a formatter's executable changed (`modtime` or `size`), we would delete all path entries in the database before processing, thereby forcing each file to be formatted. This commit introduces a new approach to change detection, one which takes into account changes to the underlying file, the formatter's executable, _and_ the formatter's configuration. Now, when deciding if we should format a file, we do the following: - Hash each matching formatter, in sequence, using its `name`, `options`, `priority` as well as the `modtime` and `size` of its executable. This is pre-computed on a per-pipeline basis. - We then add the file's `modtime` and `size` to generate a format signature. You can think of this signature as a unique representation of what we are about to do with the file. - The format signature is then compared with a cache entry (if available). - If the signatures match, we have already applied this sequence of formatters, with these particular options etc. to this file when it had this `modtime` and `size`, so there is no more processing to be done. - If the signatures do no match, we should format the file, recording the new format signature in the cache when we are finished. This approach is simpler in terms of storage, and has the added benefit of finer grained change detection versus the brute force cache busting we were doing before. In terms of performance impact, with the pre-computing of hashes per-pipeline and the simpler storage schema, there appears to have been no significant impact. Manual testing with [nixpkgs](https://github.com/nixos/nixpkgs) shows comparable run times for both hot and cold caches. > [!NOTE] > Since this changes the database schema, rather than implementing some form of migration > logic to remove the old buckets and so on, I decided to upgrade the hash algorithm we > use when determining the filename for the db file. > > Previously, we were using `sha1`, matching the behaviour from `1.0`. > Now we use `sha256`, which results in a slightly longer db name, but has the benefit of > ensuring a new db instance will be created on first invocation, as well as making > [gosec](https://golangci-lint.run/usage/linters/#gosec) happy. Closes #455 Signed-off-by: Brian McGee <brian@bmcgee.ie>
2024-10-17 13:54:09 +03:00
"time"
"github.com/numtide/treefmt/v2/config"
"github.com/numtide/treefmt/v2/stats"
"github.com/numtide/treefmt/v2/test"
"github.com/stretchr/testify/require"
)
func TestInvalidFormatterName(t *testing.T) {
as := require.New(t)
const batchSize = 1024
cfg := &config.Config{}
cfg.OnUnmatched = "info"
statz := stats.New()
// simple "empty" config
feat: improve change detection Previously, we were storing the last `modtime` and `size` for a given path in `boltdb`. This was used to determine if a file had changed and whether we should format it. In addition, if a formatter's executable changed (`modtime` or `size`), we would delete all path entries in the database before processing, thereby forcing each file to be formatted. This commit introduces a new approach to change detection, one which takes into account changes to the underlying file, the formatter's executable, _and_ the formatter's configuration. Now, when deciding if we should format a file, we do the following: - Hash each matching formatter, in sequence, using its `name`, `options`, `priority` as well as the `modtime` and `size` of its executable. This is pre-computed on a per-pipeline basis. - We then add the file's `modtime` and `size` to generate a format signature. You can think of this signature as a unique representation of what we are about to do with the file. - The format signature is then compared with a cache entry (if available). - If the signatures match, we have already applied this sequence of formatters, with these particular options etc. to this file when it had this `modtime` and `size`, so there is no more processing to be done. - If the signatures do no match, we should format the file, recording the new format signature in the cache when we are finished. This approach is simpler in terms of storage, and has the added benefit of finer grained change detection versus the brute force cache busting we were doing before. In terms of performance impact, with the pre-computing of hashes per-pipeline and the simpler storage schema, there appears to have been no significant impact. Manual testing with [nixpkgs](https://github.com/nixos/nixpkgs) shows comparable run times for both hot and cold caches. > [!NOTE] > Since this changes the database schema, rather than implementing some form of migration > logic to remove the old buckets and so on, I decided to upgrade the hash algorithm we > use when determining the filename for the db file. > > Previously, we were using `sha1`, matching the behaviour from `1.0`. > Now we use `sha256`, which results in a slightly longer db name, but has the benefit of > ensuring a new db instance will be created on first invocation, as well as making > [gosec](https://golangci-lint.run/usage/linters/#gosec) happy. Closes #455 Signed-off-by: Brian McGee <brian@bmcgee.ie>
2024-10-17 13:54:09 +03:00
_, err := NewCompositeFormatter(cfg, &statz, batchSize)
as.NoError(err)
// valid name using all the acceptable characters
cfg.FormatterConfigs = map[string]*config.Formatter{
"echo_command-1234567890": {
Command: "echo",
Includes: []string{"*"},
},
}
feat: improve change detection Previously, we were storing the last `modtime` and `size` for a given path in `boltdb`. This was used to determine if a file had changed and whether we should format it. In addition, if a formatter's executable changed (`modtime` or `size`), we would delete all path entries in the database before processing, thereby forcing each file to be formatted. This commit introduces a new approach to change detection, one which takes into account changes to the underlying file, the formatter's executable, _and_ the formatter's configuration. Now, when deciding if we should format a file, we do the following: - Hash each matching formatter, in sequence, using its `name`, `options`, `priority` as well as the `modtime` and `size` of its executable. This is pre-computed on a per-pipeline basis. - We then add the file's `modtime` and `size` to generate a format signature. You can think of this signature as a unique representation of what we are about to do with the file. - The format signature is then compared with a cache entry (if available). - If the signatures match, we have already applied this sequence of formatters, with these particular options etc. to this file when it had this `modtime` and `size`, so there is no more processing to be done. - If the signatures do no match, we should format the file, recording the new format signature in the cache when we are finished. This approach is simpler in terms of storage, and has the added benefit of finer grained change detection versus the brute force cache busting we were doing before. In terms of performance impact, with the pre-computing of hashes per-pipeline and the simpler storage schema, there appears to have been no significant impact. Manual testing with [nixpkgs](https://github.com/nixos/nixpkgs) shows comparable run times for both hot and cold caches. > [!NOTE] > Since this changes the database schema, rather than implementing some form of migration > logic to remove the old buckets and so on, I decided to upgrade the hash algorithm we > use when determining the filename for the db file. > > Previously, we were using `sha1`, matching the behaviour from `1.0`. > Now we use `sha256`, which results in a slightly longer db name, but has the benefit of > ensuring a new db instance will be created on first invocation, as well as making > [gosec](https://golangci-lint.run/usage/linters/#gosec) happy. Closes #455 Signed-off-by: Brian McGee <brian@bmcgee.ie>
2024-10-17 13:54:09 +03:00
_, err = NewCompositeFormatter(cfg, &statz, batchSize)
as.NoError(err)
// test with some bad examples
for _, character := range []string{
" ", ":", "?", "*", "[", "]", "(", ")", "|", "&", "<", ">", "\\", "/", "%", "$", "#", "@", "`", "'",
} {
cfg.FormatterConfigs = map[string]*config.Formatter{
"touch_" + character: {
Command: "touch",
},
}
feat: improve change detection Previously, we were storing the last `modtime` and `size` for a given path in `boltdb`. This was used to determine if a file had changed and whether we should format it. In addition, if a formatter's executable changed (`modtime` or `size`), we would delete all path entries in the database before processing, thereby forcing each file to be formatted. This commit introduces a new approach to change detection, one which takes into account changes to the underlying file, the formatter's executable, _and_ the formatter's configuration. Now, when deciding if we should format a file, we do the following: - Hash each matching formatter, in sequence, using its `name`, `options`, `priority` as well as the `modtime` and `size` of its executable. This is pre-computed on a per-pipeline basis. - We then add the file's `modtime` and `size` to generate a format signature. You can think of this signature as a unique representation of what we are about to do with the file. - The format signature is then compared with a cache entry (if available). - If the signatures match, we have already applied this sequence of formatters, with these particular options etc. to this file when it had this `modtime` and `size`, so there is no more processing to be done. - If the signatures do no match, we should format the file, recording the new format signature in the cache when we are finished. This approach is simpler in terms of storage, and has the added benefit of finer grained change detection versus the brute force cache busting we were doing before. In terms of performance impact, with the pre-computing of hashes per-pipeline and the simpler storage schema, there appears to have been no significant impact. Manual testing with [nixpkgs](https://github.com/nixos/nixpkgs) shows comparable run times for both hot and cold caches. > [!NOTE] > Since this changes the database schema, rather than implementing some form of migration > logic to remove the old buckets and so on, I decided to upgrade the hash algorithm we > use when determining the filename for the db file. > > Previously, we were using `sha1`, matching the behaviour from `1.0`. > Now we use `sha256`, which results in a slightly longer db name, but has the benefit of > ensuring a new db instance will be created on first invocation, as well as making > [gosec](https://golangci-lint.run/usage/linters/#gosec) happy. Closes #455 Signed-off-by: Brian McGee <brian@bmcgee.ie>
2024-10-17 13:54:09 +03:00
_, err = NewCompositeFormatter(cfg, &statz, batchSize)
as.ErrorIs(err, ErrInvalidName)
}
}
feat: improve change detection Previously, we were storing the last `modtime` and `size` for a given path in `boltdb`. This was used to determine if a file had changed and whether we should format it. In addition, if a formatter's executable changed (`modtime` or `size`), we would delete all path entries in the database before processing, thereby forcing each file to be formatted. This commit introduces a new approach to change detection, one which takes into account changes to the underlying file, the formatter's executable, _and_ the formatter's configuration. Now, when deciding if we should format a file, we do the following: - Hash each matching formatter, in sequence, using its `name`, `options`, `priority` as well as the `modtime` and `size` of its executable. This is pre-computed on a per-pipeline basis. - We then add the file's `modtime` and `size` to generate a format signature. You can think of this signature as a unique representation of what we are about to do with the file. - The format signature is then compared with a cache entry (if available). - If the signatures match, we have already applied this sequence of formatters, with these particular options etc. to this file when it had this `modtime` and `size`, so there is no more processing to be done. - If the signatures do no match, we should format the file, recording the new format signature in the cache when we are finished. This approach is simpler in terms of storage, and has the added benefit of finer grained change detection versus the brute force cache busting we were doing before. In terms of performance impact, with the pre-computing of hashes per-pipeline and the simpler storage schema, there appears to have been no significant impact. Manual testing with [nixpkgs](https://github.com/nixos/nixpkgs) shows comparable run times for both hot and cold caches. > [!NOTE] > Since this changes the database schema, rather than implementing some form of migration > logic to remove the old buckets and so on, I decided to upgrade the hash algorithm we > use when determining the filename for the db file. > > Previously, we were using `sha1`, matching the behaviour from `1.0`. > Now we use `sha256`, which results in a slightly longer db name, but has the benefit of > ensuring a new db instance will be created on first invocation, as well as making > [gosec](https://golangci-lint.run/usage/linters/#gosec) happy. Closes #455 Signed-off-by: Brian McGee <brian@bmcgee.ie>
2024-10-17 13:54:09 +03:00
func TestFormatSignature(t *testing.T) {
as := require.New(t)
const batchSize = 1024
statz := stats.New()
tempDir := t.TempDir()
// symlink some formatters into temp dir, so we can mess with their mod times
binPath := filepath.Join(tempDir, "bin")
as.NoError(os.Mkdir(binPath, 0o755))
binaries := []string{"black", "elm-format", "gofmt"}
for _, name := range binaries {
src, err := exec.LookPath(name)
as.NoError(err)
as.NoError(os.Symlink(src, filepath.Join(binPath, name)))
}
// prepend our test bin directory to PATH
t.Setenv("PATH", binPath+":"+os.Getenv("PATH"))
// start with 2 formatters
cfg := &config.Config{
OnUnmatched: "info",
FormatterConfigs: map[string]*config.Formatter{
"python": {
Command: "black",
Includes: []string{"*.py"},
},
"elm": {
Command: "elm-format",
Options: []string{"--yes"},
Includes: []string{"*.elm"},
},
},
}
oldSignature := assertSignatureChangedAndStable(t, as, cfg, nil)
t.Run("change formatter mod time", func(t *testing.T) {
for _, name := range []string{"black", "elm-format"} {
t.Logf("changing mod time of %s", name)
// tweak mod time
newTime := time.Now().Add(-time.Minute)
as.NoError(test.Lutimes(t, filepath.Join(binPath, name), newTime, newTime))
oldSignature = assertSignatureChangedAndStable(t, as, cfg, oldSignature)
}
})
t.Run("modify formatter options", func(_ *testing.T) {
f, err := NewCompositeFormatter(cfg, &statz, batchSize)
as.NoError(err)
oldSignature = assertSignatureChangedAndStable(t, as, cfg, nil)
// adjust python includes
python := cfg.FormatterConfigs["python"]
python.Includes = []string{"*.py", "*.pyi"}
newHash, err := f.signature()
as.NoError(err)
as.Equal(oldSignature, newHash, "hash should not have changed")
// adjust python excludes
python.Excludes = []string{"*.pyi"}
newHash, err = f.signature()
as.NoError(err)
as.Equal(oldSignature, newHash, "hash should not have changed")
// adjust python options
python.Options = []string{"-w", "-s"}
oldSignature = assertSignatureChangedAndStable(t, as, cfg, oldSignature)
// adjust python priority
python.Priority = 100
oldSignature = assertSignatureChangedAndStable(t, as, cfg, oldSignature)
// adjust command
python.Command = "deadnix"
oldSignature = assertSignatureChangedAndStable(t, as, cfg, oldSignature)
})
t.Run("add/remove formatters", func(_ *testing.T) {
cfg.FormatterConfigs["go"] = &config.Formatter{
Command: "gofmt",
Options: []string{"-w"},
Includes: []string{"*.go"},
}
oldSignature = assertSignatureChangedAndStable(t, as, cfg, oldSignature)
// remove python formatter
delete(cfg.FormatterConfigs, "python")
oldSignature = assertSignatureChangedAndStable(t, as, cfg, oldSignature)
// remove elm formatter
delete(cfg.FormatterConfigs, "elm")
oldSignature = assertSignatureChangedAndStable(t, as, cfg, oldSignature)
})
}
func assertSignatureChangedAndStable(
t *testing.T,
as *require.Assertions,
cfg *config.Config,
oldSignature signature,
) (h signature) {
t.Helper()
statz := stats.New()
f, err := NewCompositeFormatter(cfg, &statz, 1024)
as.NoError(err)
newHash, err := f.signature()
as.NoError(err)
as.NotEqual(oldSignature, newHash, "hash should have changed")
sameHash, err := f.signature()
as.NoError(err)
as.Equal(newHash, sameHash, "hash should not have changed")
return newHash
}