From d36da71fb6961cafb4c475f85558efe41e68691c Mon Sep 17 00:00:00 2001 From: sedrubal Date: Sat, 1 Aug 2020 19:33:41 +0200 Subject: [PATCH] duperemove: add page (#4231) --- pages/linux/duperemove.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) create mode 100644 pages/linux/duperemove.md diff --git a/pages/linux/duperemove.md b/pages/linux/duperemove.md new file mode 100644 index 0000000000..a5746fa941 --- /dev/null +++ b/pages/linux/duperemove.md @@ -0,0 +1,22 @@ +# duperemove + +> Finds duplicate file system extents and optionally schedule them for deduplication. +> An extent is small part of a file inside the file system. +> On some file systems one extent can be referenced multiple times, when parts of the content of the files are identical. +> More information: . + +- Search for duplicate extents in a directory and show them: + +`duperemove -r {{path/to/directory}}` + +- Deduplicate duplicate extents on a Btrfs or XFS (experimental) file system: + +`duperemove -r -d {{path/to/directory}}` + +- Use a hash file to store extent hashes (less memory usage and can be reused on subsequent runs): + +`duperemove -r -d --hashfile={{path/to/hashfile}} {{path/to/directory}}` + +- Limit I/O threads (for hashing and dedupe stage) and CPU threads (for duplicate extent finding stage): + +`duperemove -r -d --hashfile={{path/to/hashfile}} --io-threads={{N}} --cpu-threads={{N}} {{path/to/directory}}`