mirror of
https://github.com/borgbackup/borg.git
synced 2024-10-03 23:42:59 +03:00
- changes to locally stored files cache: - store as files.<H(archive_name)> - user can manually control suffix via env var - if local files cache is not found, build from previous archive. - enable rebuilding the files cache via loading the previous archive's metadata from the repo (better than starting with empty files cache and needing to read/chunk/hash all files). previous archive == same archive name, latest timestamp in repo. - remove AdHocCache (not needed any more, slow) - remove BORG_CACHE_IMPL, we only have one - remove cache lock (this was blocking parallel backups to same repo from same machine/user). Cache entries now have ctime AND mtime. Note: TTL and age still needed for discarding removed files. But due to the separate files caches per series, the TTL was lowered to 2 (from 20).
This commit is contained in:
parent
385eeeb4d5
commit
a891559578
43
docs/faq.rst
43
docs/faq.rst
@ -837,50 +837,29 @@ already used.
|
|||||||
By default, ctime (change time) is used for the timestamps to have a rather
|
By default, ctime (change time) is used for the timestamps to have a rather
|
||||||
safe change detection (see also the --files-cache option).
|
safe change detection (see also the --files-cache option).
|
||||||
|
|
||||||
Furthermore, pathnames recorded in files cache are always absolute, even if you
|
Furthermore, pathnames used as key into the files cache are **as archived**,
|
||||||
specify source directories with relative pathname. If relative pathnames are
|
so make sure these are always the same (see ``borg list``).
|
||||||
stable, but absolute are not (for example if you mount a filesystem without
|
|
||||||
stable mount points for each backup or if you are running the backup from a
|
|
||||||
filesystem snapshot whose name is not stable), borg will assume that files are
|
|
||||||
different and will report them as 'added', even though no new chunks will be
|
|
||||||
actually recorded for them. To avoid this, you could bind mount your source
|
|
||||||
directory in a directory with the stable path.
|
|
||||||
|
|
||||||
.. _always_chunking:
|
.. _always_chunking:
|
||||||
|
|
||||||
It always chunks all my files, even unchanged ones!
|
It always chunks all my files, even unchanged ones!
|
||||||
---------------------------------------------------
|
---------------------------------------------------
|
||||||
|
|
||||||
Borg maintains a files cache where it remembers the timestamp, size and
|
Borg maintains a files cache where it remembers the timestamps, size and
|
||||||
inode of files. When Borg does a new backup and starts processing a
|
inode of files. When Borg does a new backup and starts processing a
|
||||||
file, it first looks whether the file has changed (compared to the values
|
file, it first looks whether the file has changed (compared to the values
|
||||||
stored in the files cache). If the values are the same, the file is assumed
|
stored in the files cache). If the values are the same, the file is assumed
|
||||||
unchanged and thus its contents won't get chunked (again).
|
unchanged and thus its contents won't get chunked (again).
|
||||||
|
|
||||||
Borg can't keep an infinite history of files of course, thus entries
|
The files cache is stored separately (using a different filename suffix) per
|
||||||
in the files cache have a "maximum time to live" which is set via the
|
archive series, thus using always the same name for the archive is strongly
|
||||||
environment variable BORG_FILES_CACHE_TTL (and defaults to 20).
|
recommended. The "rebuild files cache from previous archive in repo" feature
|
||||||
Every time you do a backup (on the same machine, using the same user), the
|
also depends on that.
|
||||||
cache entries' ttl values of files that were not "seen" are incremented by 1
|
Alternatively, there is also BORG_FILES_CACHE_SUFFIX which can be used to
|
||||||
and if they reach BORG_FILES_CACHE_TTL, the entry is removed from the cache.
|
manually set a custom suffix (if you can't just use the same archive name).
|
||||||
|
|
||||||
So, for example, if you do daily backups of 26 different data sets A, B,
|
Another possible reason is that files don't always have the same path -
|
||||||
C, ..., Z on one machine (using the default TTL), the files from A will be
|
borg uses the paths as seen in the archive when using ``borg list``.
|
||||||
already forgotten when you repeat the same backups on the next day and it
|
|
||||||
will be slow because it would chunk all the files each time. If you set
|
|
||||||
BORG_FILES_CACHE_TTL to at least 26 (or maybe even a small multiple of that),
|
|
||||||
it would be much faster.
|
|
||||||
|
|
||||||
Besides using a higher BORG_FILES_CACHE_TTL (which also increases memory usage),
|
|
||||||
there is also BORG_FILES_CACHE_SUFFIX which can be used to have separate (smaller)
|
|
||||||
files caches for each backup set instead of the default one (big) unified files cache.
|
|
||||||
|
|
||||||
Another possible reason is that files don't always have the same path, for
|
|
||||||
example if you mount a filesystem without stable mount points for each backup
|
|
||||||
or if you are running the backup from a filesystem snapshot whose name is not
|
|
||||||
stable. If the directory where you mount a filesystem is different every time,
|
|
||||||
Borg assumes they are different files. This is true even if you back up these
|
|
||||||
files with relative pathnames - borg uses full pathnames in files cache regardless.
|
|
||||||
|
|
||||||
It is possible for some filesystems, such as ``mergerfs`` or network filesystems,
|
It is possible for some filesystems, such as ``mergerfs`` or network filesystems,
|
||||||
to return inconsistent inode numbers across runs, causing borg to consider them changed.
|
to return inconsistent inode numbers across runs, causing borg to consider them changed.
|
||||||
|
@ -474,18 +474,20 @@ guess what files you have based on a specific set of chunk sizes).
|
|||||||
The cache
|
The cache
|
||||||
---------
|
---------
|
||||||
|
|
||||||
The **files cache** is stored in ``cache/files`` and is used at backup time to
|
The **files cache** is stored in ``cache/files.<SUFFIX>`` and is used at backup
|
||||||
quickly determine whether a given file is unchanged and we have all its chunks.
|
time to quickly determine whether a given file is unchanged and we have all its
|
||||||
|
chunks.
|
||||||
|
|
||||||
In memory, the files cache is a key -> value mapping (a Python *dict*) and contains:
|
In memory, the files cache is a key -> value mapping (a Python *dict*) and contains:
|
||||||
|
|
||||||
* key: id_hash of the encoded, absolute file path
|
* key: id_hash of the encoded path (same path as seen in archive)
|
||||||
* value:
|
* value:
|
||||||
|
|
||||||
|
- age (0 [newest], ..., BORG_FILES_CACHE_TTL - 1)
|
||||||
- file inode number
|
- file inode number
|
||||||
- file size
|
- file size
|
||||||
- file ctime_ns (or mtime_ns)
|
- file ctime_ns
|
||||||
- age (0 [newest], 1, 2, 3, ..., BORG_FILES_CACHE_TTL - 1)
|
- file mtime_ns
|
||||||
- list of chunk (id, size) tuples representing the file's contents
|
- list of chunk (id, size) tuples representing the file's contents
|
||||||
|
|
||||||
To determine whether a file has not changed, cached values are looked up via
|
To determine whether a file has not changed, cached values are looked up via
|
||||||
@ -514,7 +516,7 @@ be told to ignore the inode number in the check via --files-cache.
|
|||||||
The age value is used for cache management. If a file is "seen" in a backup
|
The age value is used for cache management. If a file is "seen" in a backup
|
||||||
run, its age is reset to 0, otherwise its age is incremented by one.
|
run, its age is reset to 0, otherwise its age is incremented by one.
|
||||||
If a file was not seen in BORG_FILES_CACHE_TTL backups, its cache entry is
|
If a file was not seen in BORG_FILES_CACHE_TTL backups, its cache entry is
|
||||||
removed. See also: :ref:`always_chunking` and :ref:`a_status_oddity`
|
removed.
|
||||||
|
|
||||||
The files cache is a python dictionary, storing python objects, which
|
The files cache is a python dictionary, storing python objects, which
|
||||||
generates a lot of overhead.
|
generates a lot of overhead.
|
||||||
|
@ -66,8 +66,7 @@ General:
|
|||||||
cache entries for backup sources other than the current sources.
|
cache entries for backup sources other than the current sources.
|
||||||
BORG_FILES_CACHE_TTL
|
BORG_FILES_CACHE_TTL
|
||||||
When set to a numeric value, this determines the maximum "time to live" for the files cache
|
When set to a numeric value, this determines the maximum "time to live" for the files cache
|
||||||
entries (default: 20). The files cache is used to determine quickly whether a file is unchanged.
|
entries (default: 2). The files cache is used to determine quickly whether a file is unchanged.
|
||||||
The FAQ explains this more detailed in: :ref:`always_chunking`
|
|
||||||
BORG_USE_CHUNKS_ARCHIVE
|
BORG_USE_CHUNKS_ARCHIVE
|
||||||
When set to no (default: yes), the ``chunks.archive.d`` folder will not be used. This reduces
|
When set to no (default: yes), the ``chunks.archive.d`` folder will not be used. This reduces
|
||||||
disk space usage but slows down cache resyncs.
|
disk space usage but slows down cache resyncs.
|
||||||
@ -85,15 +84,6 @@ General:
|
|||||||
- ``pyfuse3``: only try to load pyfuse3
|
- ``pyfuse3``: only try to load pyfuse3
|
||||||
- ``llfuse``: only try to load llfuse
|
- ``llfuse``: only try to load llfuse
|
||||||
- ``none``: do not try to load an implementation
|
- ``none``: do not try to load an implementation
|
||||||
BORG_CACHE_IMPL
|
|
||||||
Choose the implementation for the clientside cache, choose one of:
|
|
||||||
|
|
||||||
- ``adhoc``: builds a non-persistent chunks cache by querying the repo. Chunks cache contents
|
|
||||||
are somewhat sloppy for already existing chunks, concerning their refcount ("infinite") and
|
|
||||||
size (0). No files cache (slow, will chunk all input files). DEPRECATED.
|
|
||||||
- ``adhocwithfiles``: Like ``adhoc``, but with a persistent files cache. Default implementation.
|
|
||||||
- ``cli``: Determine the cache implementation from cli options. Without special options, will
|
|
||||||
usually end up with the ``local`` implementation.
|
|
||||||
BORG_SELFTEST
|
BORG_SELFTEST
|
||||||
This can be used to influence borg's builtin self-tests. The default is to execute the tests
|
This can be used to influence borg's builtin self-tests. The default is to execute the tests
|
||||||
at the beginning of each borg command invocation.
|
at the beginning of each borg command invocation.
|
||||||
|
@ -1345,7 +1345,7 @@ def process_file(self, *, path, parent_fd, name, st, cache, flags=flags_normal,
|
|||||||
item.chunks.append(chunk_entry)
|
item.chunks.append(chunk_entry)
|
||||||
else: # normal case, no "2nd+" hardlink
|
else: # normal case, no "2nd+" hardlink
|
||||||
if not is_special_file:
|
if not is_special_file:
|
||||||
hashed_path = safe_encode(os.path.join(self.cwd, path))
|
hashed_path = safe_encode(item.path) # path as in archive item!
|
||||||
started_hashing = time.monotonic()
|
started_hashing = time.monotonic()
|
||||||
path_hash = self.key.id_hash(hashed_path)
|
path_hash = self.key.id_hash(hashed_path)
|
||||||
self.stats.hashing_time += time.monotonic() - started_hashing
|
self.stats.hashing_time += time.monotonic() - started_hashing
|
||||||
|
@ -161,13 +161,12 @@ def wrapper(self, args, **kwargs):
|
|||||||
if "compression" in args:
|
if "compression" in args:
|
||||||
manifest_.repo_objs.compressor = args.compression.compressor
|
manifest_.repo_objs.compressor = args.compression.compressor
|
||||||
if secure:
|
if secure:
|
||||||
assert_secure(repository, manifest_, self.lock_wait)
|
assert_secure(repository, manifest_)
|
||||||
if cache:
|
if cache:
|
||||||
with Cache(
|
with Cache(
|
||||||
repository,
|
repository,
|
||||||
manifest_,
|
manifest_,
|
||||||
progress=getattr(args, "progress", False),
|
progress=getattr(args, "progress", False),
|
||||||
lock_wait=self.lock_wait,
|
|
||||||
cache_mode=getattr(args, "files_cache_mode", FILES_CACHE_MODE_DISABLED),
|
cache_mode=getattr(args, "files_cache_mode", FILES_CACHE_MODE_DISABLED),
|
||||||
iec=getattr(args, "iec", False),
|
iec=getattr(args, "iec", False),
|
||||||
) as cache_:
|
) as cache_:
|
||||||
@ -230,7 +229,7 @@ def wrapper(self, args, **kwargs):
|
|||||||
manifest_ = Manifest.load(
|
manifest_ = Manifest.load(
|
||||||
repository, compatibility, ro_cls=RepoObj if repository.version > 1 else RepoObj1
|
repository, compatibility, ro_cls=RepoObj if repository.version > 1 else RepoObj1
|
||||||
)
|
)
|
||||||
assert_secure(repository, manifest_, self.lock_wait)
|
assert_secure(repository, manifest_)
|
||||||
if manifest:
|
if manifest:
|
||||||
kwargs["other_manifest"] = manifest_
|
kwargs["other_manifest"] = manifest_
|
||||||
if cache:
|
if cache:
|
||||||
@ -238,7 +237,6 @@ def wrapper(self, args, **kwargs):
|
|||||||
repository,
|
repository,
|
||||||
manifest_,
|
manifest_,
|
||||||
progress=False,
|
progress=False,
|
||||||
lock_wait=self.lock_wait,
|
|
||||||
cache_mode=getattr(args, "files_cache_mode", FILES_CACHE_MODE_DISABLED),
|
cache_mode=getattr(args, "files_cache_mode", FILES_CACHE_MODE_DISABLED),
|
||||||
iec=getattr(args, "iec", False),
|
iec=getattr(args, "iec", False),
|
||||||
) as cache_:
|
) as cache_:
|
||||||
|
@ -222,10 +222,9 @@ def create_inner(archive, cache, fso):
|
|||||||
repository,
|
repository,
|
||||||
manifest,
|
manifest,
|
||||||
progress=args.progress,
|
progress=args.progress,
|
||||||
lock_wait=self.lock_wait,
|
|
||||||
prefer_adhoc_cache=args.prefer_adhoc_cache,
|
|
||||||
cache_mode=args.files_cache_mode,
|
cache_mode=args.files_cache_mode,
|
||||||
iec=args.iec,
|
iec=args.iec,
|
||||||
|
archive_name=args.name,
|
||||||
) as cache:
|
) as cache:
|
||||||
archive = Archive(
|
archive = Archive(
|
||||||
manifest,
|
manifest,
|
||||||
@ -787,12 +786,6 @@ def build_parser_create(self, subparsers, common_parser, mid_common_parser):
|
|||||||
help="only display items with the given status characters (see description)",
|
help="only display items with the given status characters (see description)",
|
||||||
)
|
)
|
||||||
subparser.add_argument("--json", action="store_true", help="output stats as JSON. Implies ``--stats``.")
|
subparser.add_argument("--json", action="store_true", help="output stats as JSON. Implies ``--stats``.")
|
||||||
subparser.add_argument(
|
|
||||||
"--prefer-adhoc-cache",
|
|
||||||
dest="prefer_adhoc_cache",
|
|
||||||
action="store_true",
|
|
||||||
help="experimental: prefer AdHocCache (w/o files cache) over AdHocWithFilesCache (with files cache).",
|
|
||||||
)
|
|
||||||
subparser.add_argument(
|
subparser.add_argument(
|
||||||
"--stdin-name",
|
"--stdin-name",
|
||||||
metavar="NAME",
|
metavar="NAME",
|
||||||
|
@ -37,7 +37,7 @@ def _list_inner(cache):
|
|||||||
|
|
||||||
# Only load the cache if it will be used
|
# Only load the cache if it will be used
|
||||||
if ItemFormatter.format_needs_cache(format):
|
if ItemFormatter.format_needs_cache(format):
|
||||||
with Cache(repository, manifest, lock_wait=self.lock_wait) as cache:
|
with Cache(repository, manifest) as cache:
|
||||||
_list_inner(cache)
|
_list_inner(cache)
|
||||||
else:
|
else:
|
||||||
_list_inner(cache=None)
|
_list_inner(cache=None)
|
||||||
|
@ -111,7 +111,7 @@ def do_prune(self, args, repository, manifest):
|
|||||||
keep += prune_split(archives, rule, num, kept_because)
|
keep += prune_split(archives, rule, num, kept_because)
|
||||||
|
|
||||||
to_delete = set(archives) - set(keep)
|
to_delete = set(archives) - set(keep)
|
||||||
with Cache(repository, manifest, lock_wait=self.lock_wait, iec=args.iec) as cache:
|
with Cache(repository, manifest, iec=args.iec) as cache:
|
||||||
list_logger = logging.getLogger("borg.output.list")
|
list_logger = logging.getLogger("borg.output.list")
|
||||||
# set up counters for the progress display
|
# set up counters for the progress display
|
||||||
to_delete_len = len(to_delete)
|
to_delete_len = len(to_delete)
|
||||||
|
@ -12,11 +12,12 @@
|
|||||||
files_cache_logger = create_logger("borg.debug.files_cache")
|
files_cache_logger = create_logger("borg.debug.files_cache")
|
||||||
|
|
||||||
from .constants import CACHE_README, FILES_CACHE_MODE_DISABLED, ROBJ_FILE_STREAM
|
from .constants import CACHE_README, FILES_CACHE_MODE_DISABLED, ROBJ_FILE_STREAM
|
||||||
|
from .checksums import xxh64
|
||||||
from .hashindex import ChunkIndex, ChunkIndexEntry
|
from .hashindex import ChunkIndex, ChunkIndexEntry
|
||||||
from .helpers import Error
|
from .helpers import Error
|
||||||
from .helpers import get_cache_dir, get_security_dir
|
from .helpers import get_cache_dir, get_security_dir
|
||||||
from .helpers import hex_to_bin, parse_stringified_list
|
from .helpers import hex_to_bin, bin_to_hex, parse_stringified_list
|
||||||
from .helpers import format_file_size
|
from .helpers import format_file_size, safe_encode
|
||||||
from .helpers import safe_ns
|
from .helpers import safe_ns
|
||||||
from .helpers import yes
|
from .helpers import yes
|
||||||
from .helpers import ProgressIndicatorMessage
|
from .helpers import ProgressIndicatorMessage
|
||||||
@ -25,14 +26,13 @@
|
|||||||
from .item import ChunkListEntry
|
from .item import ChunkListEntry
|
||||||
from .crypto.key import PlaintextKey
|
from .crypto.key import PlaintextKey
|
||||||
from .crypto.file_integrity import IntegrityCheckedFile, FileIntegrityError
|
from .crypto.file_integrity import IntegrityCheckedFile, FileIntegrityError
|
||||||
from .fslocking import Lock
|
|
||||||
from .manifest import Manifest
|
from .manifest import Manifest
|
||||||
from .platform import SaveFile
|
from .platform import SaveFile
|
||||||
from .remote import RemoteRepository
|
from .remote import RemoteRepository
|
||||||
from .repository import LIST_SCAN_LIMIT, Repository
|
from .repository import LIST_SCAN_LIMIT, Repository
|
||||||
|
|
||||||
# note: cmtime might be either a ctime or a mtime timestamp, chunks is a list of ChunkListEntry
|
# chunks is a list of ChunkListEntry
|
||||||
FileCacheEntry = namedtuple("FileCacheEntry", "age inode size cmtime chunks")
|
FileCacheEntry = namedtuple("FileCacheEntry", "age inode size ctime mtime chunks")
|
||||||
|
|
||||||
|
|
||||||
class SecurityManager:
|
class SecurityManager:
|
||||||
@ -154,7 +154,7 @@ def assert_key_type(self, key):
|
|||||||
if self.known() and not self.key_matches(key):
|
if self.known() and not self.key_matches(key):
|
||||||
raise Cache.EncryptionMethodMismatch()
|
raise Cache.EncryptionMethodMismatch()
|
||||||
|
|
||||||
def assert_secure(self, manifest, key, *, warn_if_unencrypted=True, lock_wait=None):
|
def assert_secure(self, manifest, key, *, warn_if_unencrypted=True):
|
||||||
# warn_if_unencrypted=False is only used for initializing a new repository.
|
# warn_if_unencrypted=False is only used for initializing a new repository.
|
||||||
# Thus, avoiding asking about a repository that's currently initializing.
|
# Thus, avoiding asking about a repository that's currently initializing.
|
||||||
self.assert_access_unknown(warn_if_unencrypted, manifest, key)
|
self.assert_access_unknown(warn_if_unencrypted, manifest, key)
|
||||||
@ -194,9 +194,9 @@ def assert_access_unknown(self, warn_if_unencrypted, manifest, key):
|
|||||||
raise Cache.CacheInitAbortedError()
|
raise Cache.CacheInitAbortedError()
|
||||||
|
|
||||||
|
|
||||||
def assert_secure(repository, manifest, lock_wait):
|
def assert_secure(repository, manifest):
|
||||||
sm = SecurityManager(repository)
|
sm = SecurityManager(repository)
|
||||||
sm.assert_secure(manifest, manifest.key, lock_wait=lock_wait)
|
sm.assert_secure(manifest, manifest.key)
|
||||||
|
|
||||||
|
|
||||||
def cache_dir(repository, path=None):
|
def cache_dir(repository, path=None):
|
||||||
@ -204,13 +204,11 @@ def cache_dir(repository, path=None):
|
|||||||
|
|
||||||
|
|
||||||
class CacheConfig:
|
class CacheConfig:
|
||||||
def __init__(self, repository, path=None, lock_wait=None):
|
def __init__(self, repository, path=None):
|
||||||
self.repository = repository
|
self.repository = repository
|
||||||
self.path = cache_dir(repository, path)
|
self.path = cache_dir(repository, path)
|
||||||
logger.debug("Using %s as cache", self.path)
|
logger.debug("Using %s as cache", self.path)
|
||||||
self.config_path = os.path.join(self.path, "config")
|
self.config_path = os.path.join(self.path, "config")
|
||||||
self.lock = None
|
|
||||||
self.lock_wait = lock_wait
|
|
||||||
|
|
||||||
def __enter__(self):
|
def __enter__(self):
|
||||||
self.open()
|
self.open()
|
||||||
@ -235,7 +233,6 @@ def create(self):
|
|||||||
config.write(fd)
|
config.write(fd)
|
||||||
|
|
||||||
def open(self):
|
def open(self):
|
||||||
self.lock = Lock(os.path.join(self.path, "lock"), exclusive=True, timeout=self.lock_wait).acquire()
|
|
||||||
self.load()
|
self.load()
|
||||||
|
|
||||||
def load(self):
|
def load(self):
|
||||||
@ -278,9 +275,7 @@ def save(self, manifest=None):
|
|||||||
self._config.write(fd)
|
self._config.write(fd)
|
||||||
|
|
||||||
def close(self):
|
def close(self):
|
||||||
if self.lock is not None:
|
pass
|
||||||
self.lock.release()
|
|
||||||
self.lock = None
|
|
||||||
|
|
||||||
def _check_upgrade(self, config_path):
|
def _check_upgrade(self, config_path):
|
||||||
try:
|
try:
|
||||||
@ -296,10 +291,6 @@ def _check_upgrade(self, config_path):
|
|||||||
raise Exception("%s does not look like a Borg cache." % config_path) from None
|
raise Exception("%s does not look like a Borg cache." % config_path) from None
|
||||||
|
|
||||||
|
|
||||||
def get_cache_impl():
|
|
||||||
return os.environ.get("BORG_CACHE_IMPL", "adhocwithfiles")
|
|
||||||
|
|
||||||
|
|
||||||
class Cache:
|
class Cache:
|
||||||
"""Client Side cache"""
|
"""Client Side cache"""
|
||||||
|
|
||||||
@ -330,8 +321,7 @@ class RepositoryReplay(Error):
|
|||||||
|
|
||||||
@staticmethod
|
@staticmethod
|
||||||
def break_lock(repository, path=None):
|
def break_lock(repository, path=None):
|
||||||
path = cache_dir(repository, path)
|
pass
|
||||||
Lock(os.path.join(path, "lock"), exclusive=True).break_lock()
|
|
||||||
|
|
||||||
@staticmethod
|
@staticmethod
|
||||||
def destroy(repository, path=None):
|
def destroy(repository, path=None):
|
||||||
@ -350,71 +340,117 @@ def __new__(
|
|||||||
sync=True,
|
sync=True,
|
||||||
warn_if_unencrypted=True,
|
warn_if_unencrypted=True,
|
||||||
progress=False,
|
progress=False,
|
||||||
lock_wait=None,
|
|
||||||
prefer_adhoc_cache=False,
|
|
||||||
cache_mode=FILES_CACHE_MODE_DISABLED,
|
cache_mode=FILES_CACHE_MODE_DISABLED,
|
||||||
iec=False,
|
iec=False,
|
||||||
|
archive_name=None,
|
||||||
):
|
):
|
||||||
def adhocwithfiles():
|
return AdHocWithFilesCache(
|
||||||
return AdHocWithFilesCache(
|
manifest=manifest,
|
||||||
manifest=manifest,
|
path=path,
|
||||||
path=path,
|
warn_if_unencrypted=warn_if_unencrypted,
|
||||||
warn_if_unencrypted=warn_if_unencrypted,
|
progress=progress,
|
||||||
progress=progress,
|
iec=iec,
|
||||||
iec=iec,
|
cache_mode=cache_mode,
|
||||||
lock_wait=lock_wait,
|
archive_name=archive_name,
|
||||||
cache_mode=cache_mode,
|
)
|
||||||
)
|
|
||||||
|
|
||||||
def adhoc():
|
|
||||||
return AdHocCache(manifest=manifest, lock_wait=lock_wait, iec=iec)
|
|
||||||
|
|
||||||
impl = get_cache_impl()
|
|
||||||
if impl != "cli":
|
|
||||||
methods = dict(adhocwithfiles=adhocwithfiles, adhoc=adhoc)
|
|
||||||
try:
|
|
||||||
method = methods[impl]
|
|
||||||
except KeyError:
|
|
||||||
raise RuntimeError("Unknown BORG_CACHE_IMPL value: %s" % impl)
|
|
||||||
return method()
|
|
||||||
|
|
||||||
return adhoc() if prefer_adhoc_cache else adhocwithfiles()
|
|
||||||
|
|
||||||
|
|
||||||
class FilesCacheMixin:
|
class FilesCacheMixin:
|
||||||
"""
|
"""
|
||||||
Massively accelerate processing of unchanged files by caching their chunks list.
|
Massively accelerate processing of unchanged files.
|
||||||
With that, we can avoid having to read and chunk them to get their chunks list.
|
We read the "files cache" (either from cache directory or from previous archive
|
||||||
|
in repo) that has metadata for all "already stored" files, like size, ctime/mtime,
|
||||||
|
inode number and chunks id/size list.
|
||||||
|
When finding a file on disk, we use the metadata to determine if the file is unchanged.
|
||||||
|
If so, we use the cached chunks list and skip reading/chunking the file contents.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
FILES_CACHE_NAME = "files"
|
FILES_CACHE_NAME = "files"
|
||||||
|
|
||||||
def __init__(self, cache_mode):
|
def __init__(self, cache_mode, archive_name=None):
|
||||||
|
self.archive_name = archive_name # ideally a SERIES name
|
||||||
|
assert not ("c" in cache_mode and "m" in cache_mode)
|
||||||
|
assert "d" in cache_mode or "c" in cache_mode or "m" in cache_mode
|
||||||
self.cache_mode = cache_mode
|
self.cache_mode = cache_mode
|
||||||
self._files = None
|
self._files = None
|
||||||
self._newest_cmtime = None
|
self._newest_cmtime = 0
|
||||||
|
self._newest_path_hashes = set()
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def files(self):
|
def files(self):
|
||||||
if self._files is None:
|
if self._files is None:
|
||||||
self._files = self._read_files_cache()
|
self._files = self._read_files_cache() # try loading from cache dir
|
||||||
|
if self._files is None:
|
||||||
|
self._files = self._build_files_cache() # try loading from repository
|
||||||
|
if self._files is None:
|
||||||
|
self._files = {} # start from scratch
|
||||||
return self._files
|
return self._files
|
||||||
|
|
||||||
|
def _build_files_cache(self):
|
||||||
|
"""rebuild the files cache by reading previous archive from repository"""
|
||||||
|
if "d" in self.cache_mode: # d(isabled)
|
||||||
|
return
|
||||||
|
|
||||||
|
if not self.archive_name:
|
||||||
|
return
|
||||||
|
|
||||||
|
from .archive import Archive
|
||||||
|
|
||||||
|
# get the latest archive with the IDENTICAL name, supporting archive series:
|
||||||
|
archives = self.manifest.archives.list(match=self.archive_name, sort_by=["ts"], last=1)
|
||||||
|
if not archives:
|
||||||
|
# nothing found
|
||||||
|
return
|
||||||
|
prev_archive = archives[0]
|
||||||
|
|
||||||
|
files = {}
|
||||||
|
logger.debug(
|
||||||
|
f"Building files cache from {prev_archive.name} {prev_archive.ts} {bin_to_hex(prev_archive.id)} ..."
|
||||||
|
)
|
||||||
|
files_cache_logger.debug("FILES-CACHE-BUILD: starting...")
|
||||||
|
archive = Archive(self.manifest, prev_archive.id)
|
||||||
|
for item in archive.iter_items(preload=False):
|
||||||
|
# only put regular files' infos into the files cache:
|
||||||
|
if stat.S_ISREG(item.mode):
|
||||||
|
path_hash = self.key.id_hash(safe_encode(item.path))
|
||||||
|
# keep track of the key(s) for the most recent timestamp(s):
|
||||||
|
ctime_ns = item.ctime
|
||||||
|
if ctime_ns > self._newest_cmtime:
|
||||||
|
self._newest_cmtime = ctime_ns
|
||||||
|
self._newest_path_hashes = {path_hash}
|
||||||
|
elif ctime_ns == self._newest_cmtime:
|
||||||
|
self._newest_path_hashes.add(path_hash)
|
||||||
|
mtime_ns = item.mtime
|
||||||
|
if mtime_ns > self._newest_cmtime:
|
||||||
|
self._newest_cmtime = mtime_ns
|
||||||
|
self._newest_path_hashes = {path_hash}
|
||||||
|
elif mtime_ns == self._newest_cmtime:
|
||||||
|
self._newest_path_hashes.add(path_hash)
|
||||||
|
# add the file to the in-memory files cache
|
||||||
|
entry = FileCacheEntry(
|
||||||
|
item.get("inode", 0), item.size, int_to_timestamp(ctime_ns), int_to_timestamp(mtime_ns), item.chunks
|
||||||
|
)
|
||||||
|
files[path_hash] = msgpack.packb(entry) # takes about 240 Bytes per file
|
||||||
|
# deal with special snapshot / timestamp granularity case, see FAQ:
|
||||||
|
for path_hash in self._newest_path_hashes:
|
||||||
|
del files[path_hash]
|
||||||
|
files_cache_logger.debug("FILES-CACHE-BUILD: finished, %d entries loaded.", len(files))
|
||||||
|
return files
|
||||||
|
|
||||||
def files_cache_name(self):
|
def files_cache_name(self):
|
||||||
suffix = os.environ.get("BORG_FILES_CACHE_SUFFIX", "")
|
suffix = os.environ.get("BORG_FILES_CACHE_SUFFIX", "")
|
||||||
return self.FILES_CACHE_NAME + "." + suffix if suffix else self.FILES_CACHE_NAME
|
# when using archive series, we automatically make up a separate cache file per series.
|
||||||
|
# when not, the user may manually do that by using the env var.
|
||||||
|
if not suffix:
|
||||||
|
# avoid issues with too complex or long archive_name by hashing it:
|
||||||
|
suffix = bin_to_hex(xxh64(self.archive_name.encode()))
|
||||||
|
return self.FILES_CACHE_NAME + "." + suffix
|
||||||
|
|
||||||
def discover_files_cache_name(self, path):
|
def discover_files_cache_names(self, path):
|
||||||
return [
|
return [fn for fn in os.listdir(path) if fn.startswith(self.FILES_CACHE_NAME + ".")]
|
||||||
fn for fn in os.listdir(path) if fn == self.FILES_CACHE_NAME or fn.startswith(self.FILES_CACHE_NAME + ".")
|
|
||||||
][0]
|
|
||||||
|
|
||||||
def _create_empty_files_cache(self, path):
|
|
||||||
with IntegrityCheckedFile(path=os.path.join(path, self.files_cache_name()), write=True) as fd:
|
|
||||||
pass # empty file
|
|
||||||
return fd.integrity_data
|
|
||||||
|
|
||||||
def _read_files_cache(self):
|
def _read_files_cache(self):
|
||||||
|
"""read files cache from cache directory"""
|
||||||
if "d" in self.cache_mode: # d(isabled)
|
if "d" in self.cache_mode: # d(isabled)
|
||||||
return
|
return
|
||||||
|
|
||||||
@ -447,17 +483,17 @@ def _read_files_cache(self):
|
|||||||
except FileIntegrityError as fie:
|
except FileIntegrityError as fie:
|
||||||
msg = "The files cache is corrupted. [%s]" % str(fie)
|
msg = "The files cache is corrupted. [%s]" % str(fie)
|
||||||
if msg is not None:
|
if msg is not None:
|
||||||
logger.warning(msg)
|
logger.debug(msg)
|
||||||
logger.warning("Continuing without files cache - expect lower performance.")
|
files = None
|
||||||
files = {}
|
files_cache_logger.debug("FILES-CACHE-LOAD: finished, %d entries loaded.", len(files or {}))
|
||||||
files_cache_logger.debug("FILES-CACHE-LOAD: finished, %d entries loaded.", len(files))
|
|
||||||
return files
|
return files
|
||||||
|
|
||||||
def _write_files_cache(self, files):
|
def _write_files_cache(self, files):
|
||||||
|
"""write files cache to cache directory"""
|
||||||
if self._newest_cmtime is None:
|
if self._newest_cmtime is None:
|
||||||
# was never set because no files were modified/added
|
# was never set because no files were modified/added
|
||||||
self._newest_cmtime = 2**63 - 1 # nanoseconds, good until y2262
|
self._newest_cmtime = 2**63 - 1 # nanoseconds, good until y2262
|
||||||
ttl = int(os.environ.get("BORG_FILES_CACHE_TTL", 20))
|
ttl = int(os.environ.get("BORG_FILES_CACHE_TTL", 2))
|
||||||
files_cache_logger.debug("FILES-CACHE-SAVE: starting...")
|
files_cache_logger.debug("FILES-CACHE-SAVE: starting...")
|
||||||
# TODO: use something like SaveFile here, but that didn't work due to SyncFile missing .seek().
|
# TODO: use something like SaveFile here, but that didn't work due to SyncFile missing .seek().
|
||||||
with IntegrityCheckedFile(path=os.path.join(self.path, self.files_cache_name()), write=True) as fd:
|
with IntegrityCheckedFile(path=os.path.join(self.path, self.files_cache_name()), write=True) as fd:
|
||||||
@ -469,7 +505,7 @@ def _write_files_cache(self, files):
|
|||||||
entry = FileCacheEntry(*msgpack.unpackb(item))
|
entry = FileCacheEntry(*msgpack.unpackb(item))
|
||||||
if (
|
if (
|
||||||
entry.age == 0
|
entry.age == 0
|
||||||
and timestamp_to_int(entry.cmtime) < self._newest_cmtime
|
and max(timestamp_to_int(entry.ctime), timestamp_to_int(entry.mtime)) < self._newest_cmtime
|
||||||
or entry.age > 0
|
or entry.age > 0
|
||||||
and entry.age < ttl
|
and entry.age < ttl
|
||||||
):
|
):
|
||||||
@ -517,10 +553,10 @@ def file_known_and_unchanged(self, hashed_path, path_hash, st):
|
|||||||
if "i" in cache_mode and entry.inode != st.st_ino:
|
if "i" in cache_mode and entry.inode != st.st_ino:
|
||||||
files_cache_logger.debug("KNOWN-CHANGED: file inode number has changed: %r", hashed_path)
|
files_cache_logger.debug("KNOWN-CHANGED: file inode number has changed: %r", hashed_path)
|
||||||
return True, None
|
return True, None
|
||||||
if "c" in cache_mode and timestamp_to_int(entry.cmtime) != st.st_ctime_ns:
|
if "c" in cache_mode and timestamp_to_int(entry.ctime) != st.st_ctime_ns:
|
||||||
files_cache_logger.debug("KNOWN-CHANGED: file ctime has changed: %r", hashed_path)
|
files_cache_logger.debug("KNOWN-CHANGED: file ctime has changed: %r", hashed_path)
|
||||||
return True, None
|
return True, None
|
||||||
elif "m" in cache_mode and timestamp_to_int(entry.cmtime) != st.st_mtime_ns:
|
if "m" in cache_mode and timestamp_to_int(entry.mtime) != st.st_mtime_ns:
|
||||||
files_cache_logger.debug("KNOWN-CHANGED: file mtime has changed: %r", hashed_path)
|
files_cache_logger.debug("KNOWN-CHANGED: file mtime has changed: %r", hashed_path)
|
||||||
return True, None
|
return True, None
|
||||||
# we ignored the inode number in the comparison above or it is still same.
|
# we ignored the inode number in the comparison above or it is still same.
|
||||||
@ -538,30 +574,25 @@ def file_known_and_unchanged(self, hashed_path, path_hash, st):
|
|||||||
def memorize_file(self, hashed_path, path_hash, st, chunks):
|
def memorize_file(self, hashed_path, path_hash, st, chunks):
|
||||||
if not stat.S_ISREG(st.st_mode):
|
if not stat.S_ISREG(st.st_mode):
|
||||||
return
|
return
|
||||||
cache_mode = self.cache_mode
|
|
||||||
# note: r(echunk) modes will update the files cache, d(isabled) mode won't
|
# note: r(echunk) modes will update the files cache, d(isabled) mode won't
|
||||||
if "d" in cache_mode:
|
if "d" in self.cache_mode:
|
||||||
files_cache_logger.debug("FILES-CACHE-NOUPDATE: files cache disabled")
|
files_cache_logger.debug("FILES-CACHE-NOUPDATE: files cache disabled")
|
||||||
return
|
return
|
||||||
if "c" in cache_mode:
|
ctime_ns = safe_ns(st.st_ctime_ns)
|
||||||
cmtime_type = "ctime"
|
mtime_ns = safe_ns(st.st_mtime_ns)
|
||||||
cmtime_ns = safe_ns(st.st_ctime_ns)
|
|
||||||
elif "m" in cache_mode:
|
|
||||||
cmtime_type = "mtime"
|
|
||||||
cmtime_ns = safe_ns(st.st_mtime_ns)
|
|
||||||
else: # neither 'c' nor 'm' in cache_mode, avoid UnboundLocalError
|
|
||||||
cmtime_type = "ctime"
|
|
||||||
cmtime_ns = safe_ns(st.st_ctime_ns)
|
|
||||||
entry = FileCacheEntry(
|
entry = FileCacheEntry(
|
||||||
age=0, inode=st.st_ino, size=st.st_size, cmtime=int_to_timestamp(cmtime_ns), chunks=chunks
|
age=0,
|
||||||
|
inode=st.st_ino,
|
||||||
|
size=st.st_size,
|
||||||
|
ctime=int_to_timestamp(ctime_ns),
|
||||||
|
mtime=int_to_timestamp(mtime_ns),
|
||||||
|
chunks=chunks,
|
||||||
)
|
)
|
||||||
self.files[path_hash] = msgpack.packb(entry)
|
self.files[path_hash] = msgpack.packb(entry)
|
||||||
self._newest_cmtime = max(self._newest_cmtime or 0, cmtime_ns)
|
self._newest_cmtime = max(self._newest_cmtime or 0, ctime_ns)
|
||||||
|
self._newest_cmtime = max(self._newest_cmtime or 0, mtime_ns)
|
||||||
files_cache_logger.debug(
|
files_cache_logger.debug(
|
||||||
"FILES-CACHE-UPDATE: put %r [has %s] <- %r",
|
"FILES-CACHE-UPDATE: put %r <- %r", entry._replace(chunks="[%d entries]" % len(entry.chunks)), hashed_path
|
||||||
entry._replace(chunks="[%d entries]" % len(entry.chunks)),
|
|
||||||
cmtime_type,
|
|
||||||
hashed_path,
|
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
@ -615,7 +646,7 @@ def seen_chunk(self, id, size=None):
|
|||||||
if entry.refcount and size is not None:
|
if entry.refcount and size is not None:
|
||||||
assert isinstance(entry.size, int)
|
assert isinstance(entry.size, int)
|
||||||
if not entry.size:
|
if not entry.size:
|
||||||
# AdHocWithFilesCache / AdHocCache:
|
# AdHocWithFilesCache:
|
||||||
# Here *size* is used to update the chunk's size information, which will be zero for existing chunks.
|
# Here *size* is used to update the chunk's size information, which will be zero for existing chunks.
|
||||||
self.chunks[id] = entry._replace(size=size)
|
self.chunks[id] = entry._replace(size=size)
|
||||||
return entry.refcount != 0
|
return entry.refcount != 0
|
||||||
@ -659,7 +690,13 @@ def add_chunk(
|
|||||||
|
|
||||||
class AdHocWithFilesCache(FilesCacheMixin, ChunksMixin):
|
class AdHocWithFilesCache(FilesCacheMixin, ChunksMixin):
|
||||||
"""
|
"""
|
||||||
Like AdHocCache, but with a files cache.
|
An ad-hoc chunks and files cache.
|
||||||
|
|
||||||
|
Chunks: it does not maintain accurate reference count.
|
||||||
|
Chunks that were not added during the current lifetime won't have correct size set (0 bytes)
|
||||||
|
and will have an infinite reference count (MAX_VALUE).
|
||||||
|
|
||||||
|
Files: if a previous_archive_id is given, ad-hoc build a in-memory files cache from that archive.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(
|
def __init__(
|
||||||
@ -668,16 +705,15 @@ def __init__(
|
|||||||
path=None,
|
path=None,
|
||||||
warn_if_unencrypted=True,
|
warn_if_unencrypted=True,
|
||||||
progress=False,
|
progress=False,
|
||||||
lock_wait=None,
|
|
||||||
cache_mode=FILES_CACHE_MODE_DISABLED,
|
cache_mode=FILES_CACHE_MODE_DISABLED,
|
||||||
iec=False,
|
iec=False,
|
||||||
|
archive_name=None,
|
||||||
):
|
):
|
||||||
"""
|
"""
|
||||||
:param warn_if_unencrypted: print warning if accessing unknown unencrypted repository
|
:param warn_if_unencrypted: print warning if accessing unknown unencrypted repository
|
||||||
:param lock_wait: timeout for lock acquisition (int [s] or None [wait forever])
|
|
||||||
:param cache_mode: what shall be compared in the file stat infos vs. cached stat infos comparison
|
:param cache_mode: what shall be compared in the file stat infos vs. cached stat infos comparison
|
||||||
"""
|
"""
|
||||||
FilesCacheMixin.__init__(self, cache_mode)
|
FilesCacheMixin.__init__(self, cache_mode, archive_name)
|
||||||
ChunksMixin.__init__(self)
|
ChunksMixin.__init__(self)
|
||||||
assert isinstance(manifest, Manifest)
|
assert isinstance(manifest, Manifest)
|
||||||
self.manifest = manifest
|
self.manifest = manifest
|
||||||
@ -688,7 +724,7 @@ def __init__(
|
|||||||
|
|
||||||
self.path = cache_dir(self.repository, path)
|
self.path = cache_dir(self.repository, path)
|
||||||
self.security_manager = SecurityManager(self.repository)
|
self.security_manager = SecurityManager(self.repository)
|
||||||
self.cache_config = CacheConfig(self.repository, self.path, lock_wait)
|
self.cache_config = CacheConfig(self.repository, self.path)
|
||||||
|
|
||||||
# Warn user before sending data to a never seen before unencrypted repository
|
# Warn user before sending data to a never seen before unencrypted repository
|
||||||
if not os.path.exists(self.path):
|
if not os.path.exists(self.path):
|
||||||
@ -721,7 +757,6 @@ def create(self):
|
|||||||
with open(os.path.join(self.path, "README"), "w") as fd:
|
with open(os.path.join(self.path, "README"), "w") as fd:
|
||||||
fd.write(CACHE_README)
|
fd.write(CACHE_README)
|
||||||
self.cache_config.create()
|
self.cache_config.create()
|
||||||
self._create_empty_files_cache(self.path)
|
|
||||||
|
|
||||||
def open(self):
|
def open(self):
|
||||||
if not os.path.isdir(self.path):
|
if not os.path.isdir(self.path):
|
||||||
@ -757,7 +792,6 @@ def check_cache_compatibility(self):
|
|||||||
def wipe_cache(self):
|
def wipe_cache(self):
|
||||||
logger.warning("Discarding incompatible cache and forcing a cache rebuild")
|
logger.warning("Discarding incompatible cache and forcing a cache rebuild")
|
||||||
self._chunks = ChunkIndex()
|
self._chunks = ChunkIndex()
|
||||||
self._create_empty_files_cache(self.path)
|
|
||||||
self.cache_config.manifest_id = ""
|
self.cache_config.manifest_id = ""
|
||||||
self.cache_config._config.set("cache", "manifest", "")
|
self.cache_config._config.set("cache", "manifest", "")
|
||||||
|
|
||||||
@ -773,46 +807,3 @@ def update_compatibility(self):
|
|||||||
|
|
||||||
self.cache_config.ignored_features.update(repo_features - my_features)
|
self.cache_config.ignored_features.update(repo_features - my_features)
|
||||||
self.cache_config.mandatory_features.update(repo_features & my_features)
|
self.cache_config.mandatory_features.update(repo_features & my_features)
|
||||||
|
|
||||||
|
|
||||||
class AdHocCache(ChunksMixin):
|
|
||||||
"""
|
|
||||||
Ad-hoc, non-persistent cache.
|
|
||||||
|
|
||||||
The AdHocCache does not maintain accurate reference count, nor does it provide a files cache
|
|
||||||
(which would require persistence).
|
|
||||||
Chunks that were not added during the current AdHocCache lifetime won't have correct size set
|
|
||||||
(0 bytes) and will have an infinite reference count (MAX_VALUE).
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, manifest, warn_if_unencrypted=True, lock_wait=None, iec=False):
|
|
||||||
ChunksMixin.__init__(self)
|
|
||||||
assert isinstance(manifest, Manifest)
|
|
||||||
self.manifest = manifest
|
|
||||||
self.repository = manifest.repository
|
|
||||||
self.key = manifest.key
|
|
||||||
self.repo_objs = manifest.repo_objs
|
|
||||||
|
|
||||||
self.security_manager = SecurityManager(self.repository)
|
|
||||||
self.security_manager.assert_secure(manifest, self.key, lock_wait=lock_wait)
|
|
||||||
|
|
||||||
# Public API
|
|
||||||
|
|
||||||
def __enter__(self):
|
|
||||||
self._chunks = None
|
|
||||||
return self
|
|
||||||
|
|
||||||
def __exit__(self, exc_type, exc_val, exc_tb):
|
|
||||||
if exc_type is None:
|
|
||||||
self.security_manager.save(self.manifest, self.key)
|
|
||||||
self._chunks = None
|
|
||||||
|
|
||||||
files = None # type: ignore
|
|
||||||
cache_mode = "d"
|
|
||||||
|
|
||||||
def file_known_and_unchanged(self, hashed_path, path_hash, st):
|
|
||||||
files_cache_logger.debug("UNKNOWN: files cache not implemented")
|
|
||||||
return False, None
|
|
||||||
|
|
||||||
def memorize_file(self, hashed_path, path_hash, st, chunks):
|
|
||||||
pass
|
|
||||||
|
@ -1189,7 +1189,7 @@ def default(self, o):
|
|||||||
from ..legacyremote import LegacyRemoteRepository
|
from ..legacyremote import LegacyRemoteRepository
|
||||||
from ..remote import RemoteRepository
|
from ..remote import RemoteRepository
|
||||||
from ..archive import Archive
|
from ..archive import Archive
|
||||||
from ..cache import AdHocCache, AdHocWithFilesCache
|
from ..cache import AdHocWithFilesCache
|
||||||
|
|
||||||
if isinstance(o, (LegacyRepository, LegacyRemoteRepository)) or isinstance(o, (Repository, RemoteRepository)):
|
if isinstance(o, (LegacyRepository, LegacyRemoteRepository)) or isinstance(o, (Repository, RemoteRepository)):
|
||||||
return {"id": bin_to_hex(o.id), "location": o._location.canonical_path()}
|
return {"id": bin_to_hex(o.id), "location": o._location.canonical_path()}
|
||||||
@ -1197,8 +1197,6 @@ def default(self, o):
|
|||||||
return o.info()
|
return o.info()
|
||||||
if isinstance(o, (AdHocWithFilesCache,)):
|
if isinstance(o, (AdHocWithFilesCache,)):
|
||||||
return {"path": o.path}
|
return {"path": o.path}
|
||||||
if isinstance(o, AdHocCache):
|
|
||||||
return {}
|
|
||||||
if callable(getattr(o, "to_json", None)):
|
if callable(getattr(o, "to_json", None)):
|
||||||
return o.to_json()
|
return o.to_json()
|
||||||
return super().default(o)
|
return super().default(o)
|
||||||
|
@ -1,4 +1,3 @@
|
|||||||
import io
|
|
||||||
import json
|
import json
|
||||||
import os
|
import os
|
||||||
from configparser import ConfigParser
|
from configparser import ConfigParser
|
||||||
@ -16,26 +15,6 @@ def corrupt_archiver(archiver):
|
|||||||
archiver.cache_path = json.loads(cmd(archiver, "repo-info", "--json"))["cache"].get("path")
|
archiver.cache_path = json.loads(cmd(archiver, "repo-info", "--json"))["cache"].get("path")
|
||||||
|
|
||||||
|
|
||||||
def corrupt(file, amount=1):
|
|
||||||
with open(file, "r+b") as fd:
|
|
||||||
fd.seek(-amount, io.SEEK_END)
|
|
||||||
corrupted = bytes(255 - c for c in fd.read(amount))
|
|
||||||
fd.seek(-amount, io.SEEK_END)
|
|
||||||
fd.write(corrupted)
|
|
||||||
|
|
||||||
|
|
||||||
def test_cache_files(archiver):
|
|
||||||
corrupt_archiver(archiver)
|
|
||||||
if archiver.cache_path is None:
|
|
||||||
pytest.skip("no cache path for this kind of Cache implementation")
|
|
||||||
|
|
||||||
cmd(archiver, "create", "test", "input")
|
|
||||||
corrupt(os.path.join(archiver.cache_path, "files"))
|
|
||||||
out = cmd(archiver, "create", "test1", "input")
|
|
||||||
# borg warns about the corrupt files cache, but then continues without files cache.
|
|
||||||
assert "files cache is corrupted" in out
|
|
||||||
|
|
||||||
|
|
||||||
def test_old_version_interfered(archiver):
|
def test_old_version_interfered(archiver):
|
||||||
corrupt_archiver(archiver)
|
corrupt_archiver(archiver)
|
||||||
if archiver.cache_path is None:
|
if archiver.cache_path is None:
|
||||||
|
@ -11,7 +11,6 @@
|
|||||||
import pytest
|
import pytest
|
||||||
|
|
||||||
from ... import platform
|
from ... import platform
|
||||||
from ...cache import get_cache_impl
|
|
||||||
from ...constants import * # NOQA
|
from ...constants import * # NOQA
|
||||||
from ...manifest import Manifest
|
from ...manifest import Manifest
|
||||||
from ...platform import is_cygwin, is_win32, is_darwin
|
from ...platform import is_cygwin, is_win32, is_darwin
|
||||||
@ -525,25 +524,6 @@ def test_create_pattern_intermediate_folders_first(archivers, request):
|
|||||||
assert out_list.index("d x/b") < out_list.index("- x/b/foo_b")
|
assert out_list.index("d x/b") < out_list.index("- x/b/foo_b")
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.skipif(get_cache_impl() != "adhoc", reason="only works with AdHocCache")
|
|
||||||
def test_create_no_cache_sync_adhoc(archivers, request): # TODO: add test for AdHocWithFilesCache
|
|
||||||
archiver = request.getfixturevalue(archivers)
|
|
||||||
create_test_files(archiver.input_path)
|
|
||||||
cmd(archiver, "repo-create", RK_ENCRYPTION)
|
|
||||||
cmd(archiver, "repo-delete", "--cache-only")
|
|
||||||
create_json = json.loads(
|
|
||||||
cmd(archiver, "create", "--no-cache-sync", "--prefer-adhoc-cache", "--json", "test", "input")
|
|
||||||
)
|
|
||||||
info_json = json.loads(cmd(archiver, "info", "-a", "test", "--json"))
|
|
||||||
create_stats = create_json["cache"]["stats"]
|
|
||||||
info_stats = info_json["cache"]["stats"]
|
|
||||||
assert create_stats == info_stats
|
|
||||||
cmd(archiver, "repo-delete", "--cache-only")
|
|
||||||
cmd(archiver, "create", "--no-cache-sync", "--prefer-adhoc-cache", "test2", "input")
|
|
||||||
cmd(archiver, "repo-info")
|
|
||||||
cmd(archiver, "check")
|
|
||||||
|
|
||||||
|
|
||||||
def test_create_archivename_with_placeholder(archivers, request):
|
def test_create_archivename_with_placeholder(archivers, request):
|
||||||
archiver = request.getfixturevalue(archivers)
|
archiver = request.getfixturevalue(archivers)
|
||||||
create_test_files(archiver.input_path)
|
create_test_files(archiver.input_path)
|
||||||
@ -676,7 +656,7 @@ def test_file_status(archivers, request):
|
|||||||
assert "A input/file1" in output
|
assert "A input/file1" in output
|
||||||
assert "A input/file2" in output
|
assert "A input/file2" in output
|
||||||
# should find first file as unmodified
|
# should find first file as unmodified
|
||||||
output = cmd(archiver, "create", "--list", "test2", "input")
|
output = cmd(archiver, "create", "--list", "test", "input")
|
||||||
assert "U input/file1" in output
|
assert "U input/file1" in output
|
||||||
# although surprising, this is expected. For why, see:
|
# although surprising, this is expected. For why, see:
|
||||||
# https://borgbackup.readthedocs.org/en/latest/faq.html#i-am-seeing-a-added-status-for-a-unchanged-file
|
# https://borgbackup.readthedocs.org/en/latest/faq.html#i-am-seeing-a-added-status-for-a-unchanged-file
|
||||||
@ -693,13 +673,13 @@ def test_file_status_cs_cache_mode(archivers, request):
|
|||||||
time.sleep(1) # file2 must have newer timestamps than file1
|
time.sleep(1) # file2 must have newer timestamps than file1
|
||||||
create_regular_file(archiver.input_path, "file2", size=10)
|
create_regular_file(archiver.input_path, "file2", size=10)
|
||||||
cmd(archiver, "repo-create", RK_ENCRYPTION)
|
cmd(archiver, "repo-create", RK_ENCRYPTION)
|
||||||
cmd(archiver, "create", "test1", "input", "--list", "--files-cache=ctime,size")
|
cmd(archiver, "create", "test", "input", "--list", "--files-cache=ctime,size")
|
||||||
# modify file1, but cheat with the mtime (and atime) and also keep same size:
|
# modify file1, but cheat with the mtime (and atime) and also keep same size:
|
||||||
st = os.stat("input/file1")
|
st = os.stat("input/file1")
|
||||||
create_regular_file(archiver.input_path, "file1", contents=b"321")
|
create_regular_file(archiver.input_path, "file1", contents=b"321")
|
||||||
os.utime("input/file1", ns=(st.st_atime_ns, st.st_mtime_ns))
|
os.utime("input/file1", ns=(st.st_atime_ns, st.st_mtime_ns))
|
||||||
# this mode uses ctime for change detection, so it should find file1 as modified
|
# this mode uses ctime for change detection, so it should find file1 as modified
|
||||||
output = cmd(archiver, "create", "test2", "input", "--list", "--files-cache=ctime,size")
|
output = cmd(archiver, "create", "test", "input", "--list", "--files-cache=ctime,size")
|
||||||
assert "M input/file1" in output
|
assert "M input/file1" in output
|
||||||
|
|
||||||
|
|
||||||
@ -710,12 +690,12 @@ def test_file_status_ms_cache_mode(archivers, request):
|
|||||||
time.sleep(1) # file2 must have newer timestamps than file1
|
time.sleep(1) # file2 must have newer timestamps than file1
|
||||||
create_regular_file(archiver.input_path, "file2", size=10)
|
create_regular_file(archiver.input_path, "file2", size=10)
|
||||||
cmd(archiver, "repo-create", RK_ENCRYPTION)
|
cmd(archiver, "repo-create", RK_ENCRYPTION)
|
||||||
cmd(archiver, "create", "--list", "--files-cache=mtime,size", "test1", "input")
|
cmd(archiver, "create", "--list", "--files-cache=mtime,size", "test", "input")
|
||||||
# change mode of file1, no content change:
|
# change mode of file1, no content change:
|
||||||
st = os.stat("input/file1")
|
st = os.stat("input/file1")
|
||||||
os.chmod("input/file1", st.st_mode ^ stat.S_IRWXO) # this triggers a ctime change, but mtime is unchanged
|
os.chmod("input/file1", st.st_mode ^ stat.S_IRWXO) # this triggers a ctime change, but mtime is unchanged
|
||||||
# this mode uses mtime for change detection, so it should find file1 as unmodified
|
# this mode uses mtime for change detection, so it should find file1 as unmodified
|
||||||
output = cmd(archiver, "create", "--list", "--files-cache=mtime,size", "test2", "input")
|
output = cmd(archiver, "create", "--list", "--files-cache=mtime,size", "test", "input")
|
||||||
assert "U input/file1" in output
|
assert "U input/file1" in output
|
||||||
|
|
||||||
|
|
||||||
@ -726,9 +706,9 @@ def test_file_status_rc_cache_mode(archivers, request):
|
|||||||
time.sleep(1) # file2 must have newer timestamps than file1
|
time.sleep(1) # file2 must have newer timestamps than file1
|
||||||
create_regular_file(archiver.input_path, "file2", size=10)
|
create_regular_file(archiver.input_path, "file2", size=10)
|
||||||
cmd(archiver, "repo-create", RK_ENCRYPTION)
|
cmd(archiver, "repo-create", RK_ENCRYPTION)
|
||||||
cmd(archiver, "create", "--list", "--files-cache=rechunk,ctime", "test1", "input")
|
cmd(archiver, "create", "--list", "--files-cache=rechunk,ctime", "test", "input")
|
||||||
# no changes here, but this mode rechunks unconditionally
|
# no changes here, but this mode rechunks unconditionally
|
||||||
output = cmd(archiver, "create", "--list", "--files-cache=rechunk,ctime", "test2", "input")
|
output = cmd(archiver, "create", "--list", "--files-cache=rechunk,ctime", "test", "input")
|
||||||
assert "A input/file1" in output
|
assert "A input/file1" in output
|
||||||
|
|
||||||
|
|
||||||
@ -748,7 +728,7 @@ def test_file_status_excluded(archivers, request):
|
|||||||
if has_lchflags:
|
if has_lchflags:
|
||||||
assert "- input/file3" in output
|
assert "- input/file3" in output
|
||||||
# should find second file as excluded
|
# should find second file as excluded
|
||||||
output = cmd(archiver, "create", "test1", "input", "--list", "--exclude-nodump", "--exclude", "*/file2")
|
output = cmd(archiver, "create", "test", "input", "--list", "--exclude-nodump", "--exclude", "*/file2")
|
||||||
assert "U input/file1" in output
|
assert "U input/file1" in output
|
||||||
assert "- input/file2" in output
|
assert "- input/file2" in output
|
||||||
if has_lchflags:
|
if has_lchflags:
|
||||||
@ -781,14 +761,14 @@ def to_dict(borg_create_output):
|
|||||||
create_regular_file(archiver.input_path, "testfile1", contents=b"test1")
|
create_regular_file(archiver.input_path, "testfile1", contents=b"test1")
|
||||||
time.sleep(1.0 if is_darwin else 0.01) # testfile2 must have newer timestamps than testfile1
|
time.sleep(1.0 if is_darwin else 0.01) # testfile2 must have newer timestamps than testfile1
|
||||||
create_regular_file(archiver.input_path, "testfile2", contents=b"test2")
|
create_regular_file(archiver.input_path, "testfile2", contents=b"test2")
|
||||||
result = cmd(archiver, "create", "--stats", "test_archive2", archiver.input_path)
|
result = cmd(archiver, "create", "--stats", "test_archive", archiver.input_path)
|
||||||
result = to_dict(result)
|
result = to_dict(result)
|
||||||
assert result["Added files"] == 2
|
assert result["Added files"] == 2
|
||||||
assert result["Unchanged files"] == 0
|
assert result["Unchanged files"] == 0
|
||||||
assert result["Modified files"] == 0
|
assert result["Modified files"] == 0
|
||||||
# Archive a dir with 1 unmodified file and 1 modified
|
# Archive a dir with 1 unmodified file and 1 modified
|
||||||
create_regular_file(archiver.input_path, "testfile1", contents=b"new data")
|
create_regular_file(archiver.input_path, "testfile1", contents=b"new data")
|
||||||
result = cmd(archiver, "create", "--stats", "test_archive3", archiver.input_path)
|
result = cmd(archiver, "create", "--stats", "test_archive", archiver.input_path)
|
||||||
result = to_dict(result)
|
result = to_dict(result)
|
||||||
# Should process testfile2 as added because of
|
# Should process testfile2 as added because of
|
||||||
# https://borgbackup.readthedocs.io/en/stable/faq.html#i-am-seeing-a-added-status-for-an-unchanged-file
|
# https://borgbackup.readthedocs.io/en/stable/faq.html#i-am-seeing-a-added-status-for-an-unchanged-file
|
||||||
@ -826,18 +806,18 @@ def test_create_topical(archivers, request):
|
|||||||
output = cmd(archiver, "create", "test", "input")
|
output = cmd(archiver, "create", "test", "input")
|
||||||
assert "file1" not in output
|
assert "file1" not in output
|
||||||
# shouldn't be listed even if unchanged
|
# shouldn't be listed even if unchanged
|
||||||
output = cmd(archiver, "create", "test0", "input")
|
output = cmd(archiver, "create", "test", "input")
|
||||||
assert "file1" not in output
|
assert "file1" not in output
|
||||||
# should list the file as unchanged
|
# should list the file as unchanged
|
||||||
output = cmd(archiver, "create", "test1", "input", "--list", "--filter=U")
|
output = cmd(archiver, "create", "test", "input", "--list", "--filter=U")
|
||||||
assert "file1" in output
|
assert "file1" in output
|
||||||
# should *not* list the file as changed
|
# should *not* list the file as changed
|
||||||
output = cmd(archiver, "create", "test2", "input", "--list", "--filter=AM")
|
output = cmd(archiver, "create", "test", "input", "--list", "--filter=AM")
|
||||||
assert "file1" not in output
|
assert "file1" not in output
|
||||||
# change the file
|
# change the file
|
||||||
create_regular_file(archiver.input_path, "file1", size=1024 * 100)
|
create_regular_file(archiver.input_path, "file1", size=1024 * 100)
|
||||||
# should list the file as changed
|
# should list the file as changed
|
||||||
output = cmd(archiver, "create", "test3", "input", "--list", "--filter=AM")
|
output = cmd(archiver, "create", "test", "input", "--list", "--filter=AM")
|
||||||
assert "file1" in output
|
assert "file1" in output
|
||||||
|
|
||||||
|
|
||||||
|
@ -1,17 +1,17 @@
|
|||||||
import os.path
|
import os
|
||||||
|
|
||||||
import pytest
|
import pytest
|
||||||
|
|
||||||
from .hashindex import H
|
from .hashindex import H
|
||||||
from .key import TestKey
|
from .key import TestKey
|
||||||
from ..archive import Statistics
|
from ..archive import Statistics
|
||||||
from ..cache import AdHocCache
|
from ..cache import AdHocWithFilesCache
|
||||||
from ..crypto.key import AESOCBRepoKey
|
from ..crypto.key import AESOCBRepoKey
|
||||||
from ..manifest import Manifest
|
from ..manifest import Manifest
|
||||||
from ..repository import Repository
|
from ..repository import Repository
|
||||||
|
|
||||||
|
|
||||||
class TestAdHocCache:
|
class TestAdHocWithFilesCache:
|
||||||
@pytest.fixture
|
@pytest.fixture
|
||||||
def repository(self, tmpdir):
|
def repository(self, tmpdir):
|
||||||
self.repository_location = os.path.join(str(tmpdir), "repository")
|
self.repository_location = os.path.join(str(tmpdir), "repository")
|
||||||
@ -32,7 +32,7 @@ def manifest(self, repository, key):
|
|||||||
|
|
||||||
@pytest.fixture
|
@pytest.fixture
|
||||||
def cache(self, repository, key, manifest):
|
def cache(self, repository, key, manifest):
|
||||||
return AdHocCache(manifest)
|
return AdHocWithFilesCache(manifest)
|
||||||
|
|
||||||
def test_does_not_contain_manifest(self, cache):
|
def test_does_not_contain_manifest(self, cache):
|
||||||
assert not cache.seen_chunk(Manifest.MANIFEST_ID)
|
assert not cache.seen_chunk(Manifest.MANIFEST_ID)
|
||||||
@ -40,11 +40,6 @@ def test_does_not_contain_manifest(self, cache):
|
|||||||
def test_seen_chunk_add_chunk_size(self, cache):
|
def test_seen_chunk_add_chunk_size(self, cache):
|
||||||
assert cache.add_chunk(H(1), {}, b"5678", stats=Statistics()) == (H(1), 4)
|
assert cache.add_chunk(H(1), {}, b"5678", stats=Statistics()) == (H(1), 4)
|
||||||
|
|
||||||
def test_files_cache(self, cache):
|
|
||||||
assert cache.file_known_and_unchanged(b"foo", bytes(32), None) == (False, None)
|
|
||||||
assert cache.cache_mode == "d"
|
|
||||||
assert cache.files is None
|
|
||||||
|
|
||||||
def test_reuse_after_add_chunk(self, cache):
|
def test_reuse_after_add_chunk(self, cache):
|
||||||
assert cache.add_chunk(H(3), {}, b"5678", stats=Statistics()) == (H(3), 4)
|
assert cache.add_chunk(H(3), {}, b"5678", stats=Statistics()) == (H(3), 4)
|
||||||
assert cache.reuse_chunk(H(3), 4, Statistics()) == (H(3), 4)
|
assert cache.reuse_chunk(H(3), 4, Statistics()) == (H(3), 4)
|
||||||
@ -52,3 +47,9 @@ def test_reuse_after_add_chunk(self, cache):
|
|||||||
def test_existing_reuse_after_add_chunk(self, cache):
|
def test_existing_reuse_after_add_chunk(self, cache):
|
||||||
assert cache.add_chunk(H(1), {}, b"5678", stats=Statistics()) == (H(1), 4)
|
assert cache.add_chunk(H(1), {}, b"5678", stats=Statistics()) == (H(1), 4)
|
||||||
assert cache.reuse_chunk(H(1), 4, Statistics()) == (H(1), 4)
|
assert cache.reuse_chunk(H(1), 4, Statistics()) == (H(1), 4)
|
||||||
|
|
||||||
|
def test_files_cache(self, cache):
|
||||||
|
st = os.stat(".")
|
||||||
|
assert cache.file_known_and_unchanged(b"foo", bytes(32), st) == (False, None)
|
||||||
|
assert cache.cache_mode == "d"
|
||||||
|
assert cache.files == {}
|
||||||
|
@ -104,7 +104,6 @@ def archiver(tmp_path, set_env_variables):
|
|||||||
archiver.patterns_file_path = os.fspath(tmp_path / "patterns")
|
archiver.patterns_file_path = os.fspath(tmp_path / "patterns")
|
||||||
os.environ["BORG_KEYS_DIR"] = archiver.keys_path
|
os.environ["BORG_KEYS_DIR"] = archiver.keys_path
|
||||||
os.environ["BORG_CACHE_DIR"] = archiver.cache_path
|
os.environ["BORG_CACHE_DIR"] = archiver.cache_path
|
||||||
# os.environ["BORG_CACHE_IMPL"] = "adhocwithfiles"
|
|
||||||
os.mkdir(archiver.input_path)
|
os.mkdir(archiver.input_path)
|
||||||
os.chmod(archiver.input_path, 0o777) # avoid troubles with fakeroot / FUSE
|
os.chmod(archiver.input_path, 0o777) # avoid troubles with fakeroot / FUSE
|
||||||
os.mkdir(archiver.output_path)
|
os.mkdir(archiver.output_path)
|
||||||
|
Loading…
Reference in New Issue
Block a user