User Tools

Site Tools


ubuntu:file:delete_duplicate_files

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ubuntu:file:delete_duplicate_files [2020/02/01 14:46] peterubuntu:file:delete_duplicate_files [2022/06/13 10:22] (current) – removed peter
Line 1: Line 1:
-====== Ubuntu - File - Delete duplicate files ====== 
- 
-<code bash> 
-find . -regex '.* ([0-9]).*' -delete 
-</code> 
- 
----- 
- 
-<code bash> 
-find "$@" -type f -print0 | xargs -0 -n1 md5sum | sort --key=1,32 | uniq -w 32 -d --all-repeated=separate | sed -r 's/^[0-9a-f]*( )*//;s/([^a-zA-Z0-9./_-])/\\\1/g;s/(.+)/ls \1/' 
-</code> 
- 
----- 
- 
-===== Using fdupes ===== 
- 
-**Fdupes** is a command line utility to identify and remove the duplicate files within specified directories and the sub-directories. 
- 
-Fdupes identifies the duplicates by comparing file sizes, partial MD5 signatures, full MD5 signatures, and finally performing a byte-by-byte comparison for verification. 
- 
-Similar to the **Rdfind** utility, Fdupes comes with quite handful of options to perform operations, such as: 
- 
-  * Recursively search duplicate files in directories and sub-directories 
-  * Exclude empty files and hidden files from consideration 
-  * Show the size of the duplicates 
-  * Delete duplicates immediately as they encountered 
-  * Exclude files with different owner/group or permission bits as duplicates 
-  *  And a lot more. 
- 
-<code bash> 
-sudo apt install fdupes 
-</code> 
- 
- 
-Usage: 
- 
-Fdupes usage is pretty simple. Just run the following command to find out the duplicate files in a directory, for example ~/Downloads. 
- 
-<code bash> 
-fdupes ~/Downloads 
-</code> 
- 
-Sample output from my system: 
- 
-<code bash> 
-/home/peter/Downloads/Hyperledger.pdf 
-/home/peter/Downloads/Hyperledger(1).pdf 
-</code> 
- 
-As you can see, I have a duplicate file in /home/sk/Downloads/ directory. It shows the duplicates from the parent directory only. How to view the duplicates from sub-directories? Just use **-r** option like below. 
- 
-<code bash> 
-fdupes -r ~/Downloads 
-</code> 
- 
-Now you will see the duplicates from /home/sk/Downloads/ directory and its sub-directories as well. 
- 
-Fdupes can also be able to find duplicates from multiple directories at once. 
- 
- 
-<code bash> 
-fdupes ~/Downloads ~/Documents/ostechnix 
-</code> 
- 
-You can even search multiple directories, one recursively like below: 
- 
-<code bash> 
-fdupes ~/Downloads -r ~/Documents/ostechnix 
-</code> 
- 
-The above commands searches for duplicates in “~/Downloads” directory and “~/Documents/ostechnix” directory and its sub-directories. 
- 
-Sometimes, you might want to know the size of the duplicates in a directory. If so, use -S option like below. 
- 
-<code bash> 
-fdupes -S ~/Downloads 
-403635 bytes each:  
-/home/sk/Downloads/Hyperledger.pdf 
-/home/sk/Downloads/Hyperledger(1).pdf 
-</code> 
- 
-Similarly, to view the size of the duplicates in parent and child directories, use -Sr option. 
- 
-We can exclude empty and hidden files from consideration using -n and -A respectively. 
- 
-<code bash> 
-fdupes -n ~/Downloads 
-fdupes -A ~/Downloads 
-</code> 
- 
-The first command will exclude zero-length files from consideration and the latter will exclude hidden files from consideration while searching for duplicates in the specified directory. 
- 
-To summarize duplicate files information, use -m option. 
- 
-<code bash> 
-fdupes -m ~/Downloads 
-1 duplicate files (in 1 sets), occupying 403.6 kilobytes 
-</code> 
- 
-To delete all duplicates, use -d option. 
- 
-<code bash> 
-fdupes -d ~/Downloads 
-</code> 
- 
-Sample output: 
- 
-<code bash> 
-[1] /home/sk/Downloads/Hyperledger Fabric Installation.pdf 
-[2] /home/sk/Downloads/Hyperledger Fabric Installation(1).pdf 
- 
-Set 1 of 1, preserve files [1 - 2, all]: 
-</code> 
- 
-This command will prompt you for files to preserve and delete all other duplicates. Just enter any number to preserve the corresponding file and delete the remaining files. Pay more attention while using this option. You might delete original files if you’re not be careful. 
- 
-If you want to preserve the first file in each set of duplicates and delete the others without prompting each time, use -dN option (not recommended). 
- 
-<code bash> 
-fdupes -dN ~/Downloads 
-</code> 
- 
-To delete duplicates as they are encountered, use -I flag. 
- 
-<code bash> 
-fdupes -I ~/Downloads 
-</code> 
- 
-For more details about Fdupes, view the help section and man pages. 
- 
-<code bash> 
-fdupes --help 
-</code> 
- 
-Manpage: 
- 
-<code bash> 
-man fdupes 
-</code> 
- 
- 
- 
----- 
- 
-===== Using rdfind ===== 
- 
-**Rdfind** stands for redundant data find; and is a free and open source utility to find duplicate files across and/or within directories and sub-directories. 
- 
-It compares files based on their content, not on their file names. 
- 
-Rdfind uses ranking algorithm to classify original and duplicate files.  If you have two or more equal files, Rdfind is smart enough to find which is original file, and consider the rest of the files as duplicates. 
- 
-Once it found the duplicates, it will report them to you.  You can decide to either delete them or replace them with hard links or symbolic (soft) links. 
- 
-<code bash> 
-sudo apt install rdfind 
- 
-rdfind ~/Downloads 
- 
-rdfind -deleteduplicates true ~/. 
-</code> 
- 
-<WRAP info> 
-NOTE:  rdfind saves the results in a file named results.txt in the current working directory.   
- 
-You can view the name of the possible duplicate files in results.txt file. 
- 
-By reviewing the results.txt file, you can easily find the duplicates.  You can remove the duplicates manually if you want to. 
-</WRAP> 
- 
- 
-You can use the **-dryrun** option to find all duplicates in a given directory without changing anything and output the summary in your Terminal: 
- 
-<code bash> 
-rdfind -dryrun true ~/Downloads 
-</code> 
- 
- 
-Once you found the duplicates, you can replace them with either hardlinks or symlinks. 
- 
-To replace all duplicates with hardlinks, run: 
- 
-<code bash> 
-rdfind -makehardlinks true ~/Downloads 
-</code> 
- 
- 
-To replace all duplicates with symlinks/soft links, run: 
- 
-<code bash> 
-rdfind -makesymlinks true ~/Downloads 
-</code> 
- 
- 
-You may have some empty files in a directory and want to ignore them. If so, use **-ignoreempty** option like below. 
- 
-<code bash> 
-rdfind -ignoreempty true ~/Downloads 
-</code> 
- 
- 
-If you don’t want the old files anymore, just delete duplicate files instead of replacing them with hard or soft links. 
- 
-To delete all duplicates, simply run: 
- 
-<code bash> 
-rdfind -deleteduplicates true ~/Downloads 
-</code> 
- 
- 
-If you do not want to ignore empty files and delete them along with all duplicates, run: 
- 
-<code bash> 
-rdfind -deleteduplicates true -ignoreempty false ~/Downloads 
-</code> 
- 
- 
-For more details, refer the help section: 
- 
-<code bash> 
-rdfind --help 
-</code> 
- 
- 
-The manual pages: 
- 
-<code bash> 
-man rdfind 
-</code> 
  
ubuntu/file/delete_duplicate_files.1580568379.txt.gz · Last modified: 2020/07/15 09:30 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki