Table of Contents
BASH - Files - Delete duplicate files
To execute a command in every directory, including sub-directories.
find . -type d \( ! -name . \) -exec bash -c "cd '{}' && fdupes -dN . " \;
or
find . -maxdepth 3 -type d \( ! -name . \) -exec bash -c "cd '{}' && fdupes -dN . " \;
NOTE: Set the maxdepth as required; or remove it completely.
Simply removes all duplicate files with the same filename.
WARNING: Does not check file contents.
find . -regex '.* ([0-9]).*' -delete
for file in $(find $rootdir -name "*.jpg"); do echo $(md5sum $file); done | sort
NOTE: Files with the same contents generated the same hash, so duplicates could be found easily.
Alternative:
find $rootdir -name '*.jpg' -exec md5sum {} + | sort
find "$@" -type f -print0 | xargs -0 -n1 md5sum | sort --key=1,32 | uniq -w 32 -d --all-repeated=separate | sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/ls \1/'
NOTE:
- xargs calculates the MD5 checksum of all the files found in all the folders passed as arguments to the script.
- Next, sort and uniq extract all the elements that have a common checksum (and are, therefore, copies of the same file) and build a sequence of shell commands to remove them.
- Several options inside the script make sure that things will work even if you have file names with spaces or non ASCII characters.
Use fslint
FSlint is a duplicate file finder utility that has both GUI and CLI modes.
FSlint not just finds the duplicates, but also bad symlinks, bad names, temp files, bad IDS, empty directories, and non stripped binaries etc.
sudo apt install fslint
NOTE: FSlint command line options:
FSlint provides a collection of the following CLI utilities to find duplicates in your filesystem:
- findup — find DUPlicate files
- findnl — find Name Lint (problems with filenames)
- findu8 — find filenames with invalid utf8 encoding
- findbl — find Bad Links (various problems with symlinks)
- findsn — find Same Name (problems with clashing names)
- finded — find Empty Directories
- findid — find files with dead user IDs
- findns — find Non Stripped executables
- findrs — find Redundant Whitespace in files
- findtf — find Temporary Files
- findul — find possibly Unused Libraries
- zipdir — Reclaim wasted space in ext2 directory entries
All of these utilities are available under /usr/share/fslint/fslint/fslint location.
For example, to find duplicates in a given directory, do:
/usr/share/fslint/fslint/findup ~/Downloads/
Similarly, to find empty directories, the command would be:
/usr/share/fslint/fslint/finded ~/Downloads/
To get more details on each utility, for example findup, run:
/usr/share/fslint/fslint/findup --help
For more details about FSlint, refer the help section and man pages.
/usr/share/fslint/fslint/fslint --help
Man page:
man fslint
Using fdupes
Fdupes is a command line utility to identify and remove the duplicate files within specified directories and the sub-directories.
Fdupes identifies the duplicates by comparing file sizes, partial MD5 signatures, full MD5 signatures, and finally performing a byte-by-byte comparison for verification.
Similar to the Rdfind utility, Fdupes comes with quite handful of options to perform operations, such as:
- Recursively search duplicate files in directories and sub-directories
- Exclude empty files and hidden files from consideration
- Show the size of the duplicates
- Delete duplicates immediately as they encountered
- Exclude files with different owner/group or permission bits as duplicates
- And a lot more.
sudo apt install fdupes
NOTE: Usage:
Fdupes usage is pretty simple. Just run the following command to find out the duplicate files in a directory, for example ~/Downloads.
fdupes ~/Downloads
Sample output from my system:
/home/peter/Downloads/Hyperledger.pdf /home/peter/Downloads/Hyperledger(1).pdf
As you can see, I have a duplicate file in /home/sk/Downloads/ directory. It shows the duplicates from the parent directory only. How to view the duplicates from sub-directories? Just use -r option like below.
fdupes -r ~/Downloads
Now you will see the duplicates from /home/sk/Downloads/ directory and its sub-directories as well.
Fdupes can also be able to find duplicates from multiple directories at once.
fdupes ~/Downloads ~/Documents/ostechnix
You can even search multiple directories, one recursively like below:
fdupes ~/Downloads -r ~/Documents/ostechnix
NOTE: This searches for duplicates in “~/Downloads” directory and “~/Documents/ostechnix” directory and its sub-directories.
Sometimes, you might want to know the size of the duplicates in a directory. If so, use -S option like below.
fdupes -S ~/Downloads 403635 bytes each: /home/sk/Downloads/Hyperledger.pdf /home/sk/Downloads/Hyperledger(1).pdf
Similarly, to view the size of the duplicates in parent and child directories, use -Sr option.
We can exclude empty and hidden files from consideration using -n and -A respectively.
fdupes -n ~/Downloads fdupes -A ~/Downloads
NOTE: The first command will exclude zero-length files from consideration and the latter will exclude hidden files from consideration while searching for duplicates in the specified directory.
To summarize duplicate files information, use -m option.
fdupes -m ~/Downloads 1 duplicate files (in 1 sets), occupying 403.6 kilobytes
To delete all duplicates, use -d option.
fdupes -d ~/Downloads
Sample output:
[1] /home/sk/Downloads/Hyperledger Fabric Installation.pdf [2] /home/sk/Downloads/Hyperledger Fabric Installation(1).pdf Set 1 of 1, preserve files [1 - 2, all]:
This command will prompt you for files to preserve and delete all other duplicates. Just enter any number to preserve the corresponding file and delete the remaining files. Pay more attention while using this option. You might delete original files if you’re not be careful.
If you want to preserve the first file in each set of duplicates and delete the others without prompting each time, use -dN option (not recommended).
fdupes -dN ~/Downloads
To delete duplicates as they are encountered, use -I flag.
fdupes -I ~/Downloads
For more details about Fdupes, view the help section and man pages.
fdupes --help
Manpage:
man fdupes
Using rdfind
Rdfind stands for redundant data find; and is a free and open source utility to find duplicate files across and/or within directories and sub-directories.
It compares files based on their content, not on their file names.
Rdfind uses ranking algorithm to classify original and duplicate files. If you have two or more equal files, Rdfind is smart enough to find which is original file, and consider the rest of the files as duplicates.
Once it found the duplicates, it will report them to you. You can decide to either delete them or replace them with hard links or symbolic (soft) links.
sudo apt install rdfind rdfind ~/Downloads rdfind -deleteduplicates true ~/.
NOTE: rdfind saves the results in a file named results.txt in the current working directory.
You can view the name of the possible duplicate files in results.txt file.
By reviewing the results.txt file, you can easily find the duplicates. You can remove the duplicates manually if you want to.
You can use the -dryrun option to find all duplicates in a given directory without changing anything and output the summary in your Terminal:
rdfind -dryrun true ~/Downloads
Once you found the duplicates, you can replace them with either hardlinks or symlinks.
To replace all duplicates with hardlinks, run:
rdfind -makehardlinks true ~/Downloads
To replace all duplicates with symlinks/soft links, run:
rdfind -makesymlinks true ~/Downloads
You may have some empty files in a directory and want to ignore them. If so, use -ignoreempty option like below.
rdfind -ignoreempty true ~/Downloads
If you don’t want the old files anymore, just delete duplicate files instead of replacing them with hard or soft links.
To delete all duplicates, simply run:
rdfind -deleteduplicates true ~/Downloads
If you do not want to ignore empty files and delete them along with all duplicates, run:
rdfind -deleteduplicates true -ignoreempty false ~/Downloads
For more details, refer the help section:
rdfind --help
The manual pages:
man rdfind