From dd1023df44db2f348f59991432e61ce4db507dbf Mon Sep 17 00:00:00 2001 From: Manu Date: Wed, 9 Mar 2022 14:06:40 +0400 Subject: [PATCH 1/2] Add discussion and note on file attrs. --- docs/faq.rst | 58 ++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 49 insertions(+), 9 deletions(-) diff --git a/docs/faq.rst b/docs/faq.rst index f678c3d83..020e11ffd 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -880,13 +880,30 @@ If you run into that, try this: What's the expected backup performance? --------------------------------------- -A first backup will usually be somehow "slow" because there is a lot of data -to process. Performance here depends on a lot of factors, so it is hard to -give specific numbers. +Compared to simply copying files (e.g. with ``rsync``), Borg has more work to do. +This can make creation of the first archive slower, but saves time +and disk space on subsequent runs. Here what Borg does when you run ``borg create``: + +- Borg chunks the file (using the relatively expensive buzhash algorithm) +- It then computes the "id" of the chunk (hmac-sha256 (often slow, except + if your CPU has sha256 acceleration) or blake2b (fast, in software)) +- Then it checks whether this chunk is already in the repo (local hashtable lookup, + fast). If so, the processing of the chunk is completed here. Otherwise it needs to + process the chunk: +- Compresses the chunks (the default lz4 is super fast) +- Encrypts (AES, usually fast if your CPU has AES acceleration as usual + since about 10y) +- Authenticates ("signs") using hmac-sha256 or blake2b (see above), +- Transmits to repo (remote repo), usually involving an SSH connection + (does its own encryption) +- Stores the chunk into a key/value store (the key is the chunk id, the value + is the data). While doing that, it computes a CRC32 of the data (repo low-level + checksum, used by borg check --repository) and also updates the repo index + (another hashtable). Subsequent backups are usually very fast if most files are unchanged and only a few are new or modified. The high performance on unchanged files primarily depends -only on a few factors (like fs recursion + metadata reading performance and the +only on a few factors (like FS recursion + metadata reading performance and the files cache working as expected) and much less on other factors. E.g., for this setup: @@ -904,14 +921,37 @@ few FAQ entries below. .. _slow_backup: -Why is backup slow for me? +Why is my backup so slow? -------------------------- -So, if you feel your Borg backup is too slow somehow, you should find out why. +If you feel your Borg backup is too slow somehow, here is what you can do: -The usual way to approach this is to add ``--list --filter=AME --stats`` to your -``borg create`` call to produce more log output, including a file list (with file status -characters) and also some statistics at the end of the backup. +- Make sure Borg has enough RAM (depends on how big your repo is / how many + files you have) +- Use one of the blake2 modes for --encryption except if you positively know + your CPU (and openssl) accelerates sha256 (then stay with hmac-sha256). +- Don't use any expensive compression. The default is lz4 and super fast. + Uncompressed is often slower than lz4. +- Just wait. You can also interrupt it and start it again as often as you like, + it will converge against a valid "completed" state (see ``--checkpoint-interval``, + maybe use the default, but in any case don't make it too short). It is starting + from the beginning each time, but it is still faster then as it does not store + data into the repo which it already has there from last checkpoint. +- If you don’t need additional file attributes, you can disable them with ``--noflags``, + ``--noacls``, ``--noxattrs``. This can lead to noticable performance improvements + when your backup consists of many small files. + +If you feel that Borg "freezes" on a file, it could be in the middle of processing a +large file (like ISOs or VM images). Borg < 1.2 announces file names *after* finishing +with the file. This can lead to displaying the name of a small file, while processing the +next (larger) file. For very big files this can lead to the progress display show some +previous short file for a long time while it processes the big one. With Borg 1.2 this +was changed to announcing the filename before starting to process it. + +To see what files have changed and take more time processing, you can also add +``--list --filter=AME --stats`` to your ``borg create`` call to produce more log output, +including a file list (with file status characters) and also some statistics at +the end of the backup. Then you do the backup and look at the log output: From af5103d8335bba3265780b33b48c47ec50e42f6d Mon Sep 17 00:00:00 2001 From: Manu Date: Wed, 9 Mar 2022 15:46:44 +0400 Subject: [PATCH 2/2] Incorporate feedback by TW --- docs/faq.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/faq.rst b/docs/faq.rst index 020e11ffd..d982aec2c 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -890,12 +890,12 @@ and disk space on subsequent runs. Here what Borg does when you run ``borg creat - Then it checks whether this chunk is already in the repo (local hashtable lookup, fast). If so, the processing of the chunk is completed here. Otherwise it needs to process the chunk: -- Compresses the chunks (the default lz4 is super fast) +- Compresses (the default lz4 is super fast) - Encrypts (AES, usually fast if your CPU has AES acceleration as usual since about 10y) - Authenticates ("signs") using hmac-sha256 or blake2b (see above), -- Transmits to repo (remote repo), usually involving an SSH connection - (does its own encryption) +- Transmits to repo. If the repo is remote, this usually involves an SSH connection + (does its own encryption / authentication). - Stores the chunk into a key/value store (the key is the chunk id, the value is the data). While doing that, it computes a CRC32 of the data (repo low-level checksum, used by borg check --repository) and also updates the repo index