This is an old revision of the document!
Ubuntu - File - Delete duplicate files
find . -regex '.* ([0-9]).*' -delete
find "$@" -type f -print0 | xargs -0 -n1 md5sum | sort --key=1,32 | uniq -w 32 -d --all-repeated=separate | sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/ls \1/'
Using rdfind
Rdfind stands for redundant data find; and is a free and open source utility to find duplicate files across and/or within directories and sub-directories.
It compares files based on their content, not on their file names.
Rdfind uses ranking algorithm to classify original and duplicate files. If you have two or more equal files, Rdfind is smart enough to find which is original file, and consider the rest of the files as duplicates.
Once it found the duplicates, it will report them to you. You can decide to either delete them or replace them with hard links or symbolic (soft) links.
sudo apt install rdfind rdfind ~/Downloads rdfind -deleteduplicates true ~/.
NOTE: rdfind saves the results in a file named results.txt in the current working directory.
You can view the name of the possible duplicate files in results.txt file.
By reviewing the results.txt file, you can easily find the duplicates. You can remove the duplicates manually if you want to.
You can use the -dryrun option to find all duplicates in a given directory without changing anything and output the summary in your Terminal:
rdfind -dryrun true ~/Downloads
Once you found the duplicates, you can replace them with either hardlinks or symlinks.
To replace all duplicates with hardlinks, run:
rdfind -makehardlinks true ~/Downloads
To replace all duplicates with symlinks/soft links, run:
rdfind -makesymlinks true ~/Downloads
You may have some empty files in a directory and want to ignore them. If so, use -ignoreempty option like below.
rdfind -ignoreempty true ~/Downloads
If you don’t want the old files anymore, just delete duplicate files instead of replacing them with hard or soft links.
To delete all duplicates, simply run:
rdfind -deleteduplicates true ~/Downloads
If you do not want to ignore empty files and delete them along with all duplicates, run:
rdfind -deleteduplicates true -ignoreempty false ~/Downloads
For more details, refer the help section:
rdfind --help
And, the manual pages:
man rdfind