Finding and removing duplicate files in osx with a script
up vote
8
down vote
favorite
From: http://www.chriswrites.com/2012/02/how-to-find-and-delete-duplicate-files-in-mac-os-x/
How do I modify this to only delete the first version of the file it sees.
Open Terminal from Spotlight or the Utilities folder
Change to the directory (folder) you want to search from (including sub-folders) using the cd command. At the command prompt type cd for example cd ~/Documents to change directory to your home Documents folder
At the command prompt, type the following command:
find . -size 20 ! -type d -exec cksum {} ; | sort | tee /tmp/f.tmp | cut -f 1,2 -d ' ' | uniq -d | grep -hif – /tmp/f.tmp > duplicates.txt
This method uses a simple checksum to determine whether files are identical. The names of duplicate items will be listed in a file named duplicates.txt in the current directory. Open this to view the names of identical files
There are now various ways to delete the duplicates. To delete all the files in the text file, at the command prompt type:
while read file; do rm "$file"; done < duplicates.txt
macos bash
migrated from stackoverflow.com Sep 30 '12 at 11:43
This question came from our site for professional and enthusiast programmers.
add a comment |
up vote
8
down vote
favorite
From: http://www.chriswrites.com/2012/02/how-to-find-and-delete-duplicate-files-in-mac-os-x/
How do I modify this to only delete the first version of the file it sees.
Open Terminal from Spotlight or the Utilities folder
Change to the directory (folder) you want to search from (including sub-folders) using the cd command. At the command prompt type cd for example cd ~/Documents to change directory to your home Documents folder
At the command prompt, type the following command:
find . -size 20 ! -type d -exec cksum {} ; | sort | tee /tmp/f.tmp | cut -f 1,2 -d ' ' | uniq -d | grep -hif – /tmp/f.tmp > duplicates.txt
This method uses a simple checksum to determine whether files are identical. The names of duplicate items will be listed in a file named duplicates.txt in the current directory. Open this to view the names of identical files
There are now various ways to delete the duplicates. To delete all the files in the text file, at the command prompt type:
while read file; do rm "$file"; done < duplicates.txt
macos bash
migrated from stackoverflow.com Sep 30 '12 at 11:43
This question came from our site for professional and enthusiast programmers.
add a comment |
up vote
8
down vote
favorite
up vote
8
down vote
favorite
From: http://www.chriswrites.com/2012/02/how-to-find-and-delete-duplicate-files-in-mac-os-x/
How do I modify this to only delete the first version of the file it sees.
Open Terminal from Spotlight or the Utilities folder
Change to the directory (folder) you want to search from (including sub-folders) using the cd command. At the command prompt type cd for example cd ~/Documents to change directory to your home Documents folder
At the command prompt, type the following command:
find . -size 20 ! -type d -exec cksum {} ; | sort | tee /tmp/f.tmp | cut -f 1,2 -d ' ' | uniq -d | grep -hif – /tmp/f.tmp > duplicates.txt
This method uses a simple checksum to determine whether files are identical. The names of duplicate items will be listed in a file named duplicates.txt in the current directory. Open this to view the names of identical files
There are now various ways to delete the duplicates. To delete all the files in the text file, at the command prompt type:
while read file; do rm "$file"; done < duplicates.txt
macos bash
From: http://www.chriswrites.com/2012/02/how-to-find-and-delete-duplicate-files-in-mac-os-x/
How do I modify this to only delete the first version of the file it sees.
Open Terminal from Spotlight or the Utilities folder
Change to the directory (folder) you want to search from (including sub-folders) using the cd command. At the command prompt type cd for example cd ~/Documents to change directory to your home Documents folder
At the command prompt, type the following command:
find . -size 20 ! -type d -exec cksum {} ; | sort | tee /tmp/f.tmp | cut -f 1,2 -d ' ' | uniq -d | grep -hif – /tmp/f.tmp > duplicates.txt
This method uses a simple checksum to determine whether files are identical. The names of duplicate items will be listed in a file named duplicates.txt in the current directory. Open this to view the names of identical files
There are now various ways to delete the duplicates. To delete all the files in the text file, at the command prompt type:
while read file; do rm "$file"; done < duplicates.txt
macos bash
macos bash
edited Sep 27 '13 at 18:20
Daniel Beck♦
91.5k12229284
91.5k12229284
asked Sep 30 '12 at 4:47
Jay
147227
147227
migrated from stackoverflow.com Sep 30 '12 at 11:43
This question came from our site for professional and enthusiast programmers.
migrated from stackoverflow.com Sep 30 '12 at 11:43
This question came from our site for professional and enthusiast programmers.
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
up vote
3
down vote
accepted
Firstly, you'll have to reorder the first command line so the order of files found by the find command is maintained:
find . -size 20 ! -type d -exec cksum {} ; | tee /tmp/f.tmp | cut -f 1,2 -d ‘ ‘ | sort | uniq -d | grep -hif – /tmp/f.tmp > duplicates.txt
(Note: for testing purposes in my machine I used find . -type f -exec cksum {} ;)
Secondly, one way to print all but the first duplicate is by use of an auxiliary file, let's say /tmp/f2.tmp. Then we could do something like:
while read line; do
checksum=$(echo "$line" | cut -f 1,2 -d' ')
file=$(echo "$line" | cut -f 3 -d' ')
if grep "$checksum" /tmp/f2.tmp > /dev/null; then
# /tmp/f2.tmp already contains the checksum
# print the file name
# (printf is safer than echo, when for example "$file" starts with "-")
printf %s\n "$file"
else
echo "$checksum" >> /tmp/f2.tmp
fi
done < duplicates.txt
Just make sure that /tmp/f2.tmp exists and is empty before you run this, for example through the following commands:
rm /tmp/f2.tmp
touch /tmp/f2.tmp
Hope this helps =)
add a comment |
up vote
22
down vote
Another option is to use fdupes:
brew install fdupes
fdupes -r .
fdupes -r . finds duplicate files recursively under the current directory. Add -d to delete the duplicates — you'll be prompted which files to keep; if instead you add -dN, fdupes will always keep the first file and delete other files.
3
fdupesis awesome! Worked like a charm! Thanks bro.!
– racl101
Dec 3 '14 at 6:40
add a comment |
up vote
2
down vote
I wrote a script that renames your files to match a hash of their contents.
It uses a subset of the file's bytes so it's fast, and if there's a collision it appends a counter to the name like this:
3101ace8db9f.jpg
3101ace8db9f (1).jpg
3101ace8db9f (2).jpg
This makes it easy to review and delete duplicates on your own, without trusting somebody else's software with your photos more than you need to.
Script:
https://gist.github.com/SimplGy/75bb4fd26a12d4f16da6df1c4e506562

add a comment |
up vote
0
down vote
This is done with the help of EagleFiler app, developed by Michael Tsai.
tell application "EagleFiler"
set _checksums to {}
set _recordsSeen to {}
set _records to selected records of browser window 1
set _trash to trash of document of browser window 1
repeat with _record in _records
set _checksum to _record's checksum
set _matches to my findMatch(_checksum, _checksums, _recordsSeen)
if _matches is {} then
set _checksums to {_checksum} & _checksums
set _recordsSeen to {_record} & _recordsSeen
else
set _otherRecord to item 1 of _matches
if _otherRecord's modification date > _record's modification date
then
set _record's container to _trash
else
set _otherRecord's container to _trash
set _checksums to {_checksum} & _checksums
set _recordsSeen to {_record} & _recordsSeen
end if
end if
end repeat
end tell
on findMatch(_checksum, _checksums, _recordsSeen)
tell application "EagleFiler"
if _checksum is "" then return {}
if _checksums contains _checksum then
repeat with i from 1 to length of _checksums
if item i of _checksums is _checksum then
return item i of _recordsSeen
end if
end repeat
end if
return {}
end tell
end findMatch
You can also automatically delete duplicates with duplicate file remover suggested in this post.
1
(1) What is “EagleFiler”? Is it part of macOS? If not, where do you get it? (2) Is that meant to be one long block of code (the way I fixed it)? (3) Please fix your indentation. (4) Exactly how does one use this?
– Scott
Nov 7 at 8:29
add a comment |
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
Firstly, you'll have to reorder the first command line so the order of files found by the find command is maintained:
find . -size 20 ! -type d -exec cksum {} ; | tee /tmp/f.tmp | cut -f 1,2 -d ‘ ‘ | sort | uniq -d | grep -hif – /tmp/f.tmp > duplicates.txt
(Note: for testing purposes in my machine I used find . -type f -exec cksum {} ;)
Secondly, one way to print all but the first duplicate is by use of an auxiliary file, let's say /tmp/f2.tmp. Then we could do something like:
while read line; do
checksum=$(echo "$line" | cut -f 1,2 -d' ')
file=$(echo "$line" | cut -f 3 -d' ')
if grep "$checksum" /tmp/f2.tmp > /dev/null; then
# /tmp/f2.tmp already contains the checksum
# print the file name
# (printf is safer than echo, when for example "$file" starts with "-")
printf %s\n "$file"
else
echo "$checksum" >> /tmp/f2.tmp
fi
done < duplicates.txt
Just make sure that /tmp/f2.tmp exists and is empty before you run this, for example through the following commands:
rm /tmp/f2.tmp
touch /tmp/f2.tmp
Hope this helps =)
add a comment |
up vote
3
down vote
accepted
Firstly, you'll have to reorder the first command line so the order of files found by the find command is maintained:
find . -size 20 ! -type d -exec cksum {} ; | tee /tmp/f.tmp | cut -f 1,2 -d ‘ ‘ | sort | uniq -d | grep -hif – /tmp/f.tmp > duplicates.txt
(Note: for testing purposes in my machine I used find . -type f -exec cksum {} ;)
Secondly, one way to print all but the first duplicate is by use of an auxiliary file, let's say /tmp/f2.tmp. Then we could do something like:
while read line; do
checksum=$(echo "$line" | cut -f 1,2 -d' ')
file=$(echo "$line" | cut -f 3 -d' ')
if grep "$checksum" /tmp/f2.tmp > /dev/null; then
# /tmp/f2.tmp already contains the checksum
# print the file name
# (printf is safer than echo, when for example "$file" starts with "-")
printf %s\n "$file"
else
echo "$checksum" >> /tmp/f2.tmp
fi
done < duplicates.txt
Just make sure that /tmp/f2.tmp exists and is empty before you run this, for example through the following commands:
rm /tmp/f2.tmp
touch /tmp/f2.tmp
Hope this helps =)
add a comment |
up vote
3
down vote
accepted
up vote
3
down vote
accepted
Firstly, you'll have to reorder the first command line so the order of files found by the find command is maintained:
find . -size 20 ! -type d -exec cksum {} ; | tee /tmp/f.tmp | cut -f 1,2 -d ‘ ‘ | sort | uniq -d | grep -hif – /tmp/f.tmp > duplicates.txt
(Note: for testing purposes in my machine I used find . -type f -exec cksum {} ;)
Secondly, one way to print all but the first duplicate is by use of an auxiliary file, let's say /tmp/f2.tmp. Then we could do something like:
while read line; do
checksum=$(echo "$line" | cut -f 1,2 -d' ')
file=$(echo "$line" | cut -f 3 -d' ')
if grep "$checksum" /tmp/f2.tmp > /dev/null; then
# /tmp/f2.tmp already contains the checksum
# print the file name
# (printf is safer than echo, when for example "$file" starts with "-")
printf %s\n "$file"
else
echo "$checksum" >> /tmp/f2.tmp
fi
done < duplicates.txt
Just make sure that /tmp/f2.tmp exists and is empty before you run this, for example through the following commands:
rm /tmp/f2.tmp
touch /tmp/f2.tmp
Hope this helps =)
Firstly, you'll have to reorder the first command line so the order of files found by the find command is maintained:
find . -size 20 ! -type d -exec cksum {} ; | tee /tmp/f.tmp | cut -f 1,2 -d ‘ ‘ | sort | uniq -d | grep -hif – /tmp/f.tmp > duplicates.txt
(Note: for testing purposes in my machine I used find . -type f -exec cksum {} ;)
Secondly, one way to print all but the first duplicate is by use of an auxiliary file, let's say /tmp/f2.tmp. Then we could do something like:
while read line; do
checksum=$(echo "$line" | cut -f 1,2 -d' ')
file=$(echo "$line" | cut -f 3 -d' ')
if grep "$checksum" /tmp/f2.tmp > /dev/null; then
# /tmp/f2.tmp already contains the checksum
# print the file name
# (printf is safer than echo, when for example "$file" starts with "-")
printf %s\n "$file"
else
echo "$checksum" >> /tmp/f2.tmp
fi
done < duplicates.txt
Just make sure that /tmp/f2.tmp exists and is empty before you run this, for example through the following commands:
rm /tmp/f2.tmp
touch /tmp/f2.tmp
Hope this helps =)
answered Sep 30 '12 at 13:42
Janito Vaqueiro Ferreira Filho
1462
1462
add a comment |
add a comment |
up vote
22
down vote
Another option is to use fdupes:
brew install fdupes
fdupes -r .
fdupes -r . finds duplicate files recursively under the current directory. Add -d to delete the duplicates — you'll be prompted which files to keep; if instead you add -dN, fdupes will always keep the first file and delete other files.
3
fdupesis awesome! Worked like a charm! Thanks bro.!
– racl101
Dec 3 '14 at 6:40
add a comment |
up vote
22
down vote
Another option is to use fdupes:
brew install fdupes
fdupes -r .
fdupes -r . finds duplicate files recursively under the current directory. Add -d to delete the duplicates — you'll be prompted which files to keep; if instead you add -dN, fdupes will always keep the first file and delete other files.
3
fdupesis awesome! Worked like a charm! Thanks bro.!
– racl101
Dec 3 '14 at 6:40
add a comment |
up vote
22
down vote
up vote
22
down vote
Another option is to use fdupes:
brew install fdupes
fdupes -r .
fdupes -r . finds duplicate files recursively under the current directory. Add -d to delete the duplicates — you'll be prompted which files to keep; if instead you add -dN, fdupes will always keep the first file and delete other files.
Another option is to use fdupes:
brew install fdupes
fdupes -r .
fdupes -r . finds duplicate files recursively under the current directory. Add -d to delete the duplicates — you'll be prompted which files to keep; if instead you add -dN, fdupes will always keep the first file and delete other files.
edited Dec 18 '15 at 15:19
waldyrious
187110
187110
answered Nov 28 '12 at 15:45
user495470
30.7k586125
30.7k586125
3
fdupesis awesome! Worked like a charm! Thanks bro.!
– racl101
Dec 3 '14 at 6:40
add a comment |
3
fdupesis awesome! Worked like a charm! Thanks bro.!
– racl101
Dec 3 '14 at 6:40
3
3
fdupes is awesome! Worked like a charm! Thanks bro.!– racl101
Dec 3 '14 at 6:40
fdupes is awesome! Worked like a charm! Thanks bro.!– racl101
Dec 3 '14 at 6:40
add a comment |
up vote
2
down vote
I wrote a script that renames your files to match a hash of their contents.
It uses a subset of the file's bytes so it's fast, and if there's a collision it appends a counter to the name like this:
3101ace8db9f.jpg
3101ace8db9f (1).jpg
3101ace8db9f (2).jpg
This makes it easy to review and delete duplicates on your own, without trusting somebody else's software with your photos more than you need to.
Script:
https://gist.github.com/SimplGy/75bb4fd26a12d4f16da6df1c4e506562

add a comment |
up vote
2
down vote
I wrote a script that renames your files to match a hash of their contents.
It uses a subset of the file's bytes so it's fast, and if there's a collision it appends a counter to the name like this:
3101ace8db9f.jpg
3101ace8db9f (1).jpg
3101ace8db9f (2).jpg
This makes it easy to review and delete duplicates on your own, without trusting somebody else's software with your photos more than you need to.
Script:
https://gist.github.com/SimplGy/75bb4fd26a12d4f16da6df1c4e506562

add a comment |
up vote
2
down vote
up vote
2
down vote
I wrote a script that renames your files to match a hash of their contents.
It uses a subset of the file's bytes so it's fast, and if there's a collision it appends a counter to the name like this:
3101ace8db9f.jpg
3101ace8db9f (1).jpg
3101ace8db9f (2).jpg
This makes it easy to review and delete duplicates on your own, without trusting somebody else's software with your photos more than you need to.
Script:
https://gist.github.com/SimplGy/75bb4fd26a12d4f16da6df1c4e506562

I wrote a script that renames your files to match a hash of their contents.
It uses a subset of the file's bytes so it's fast, and if there's a collision it appends a counter to the name like this:
3101ace8db9f.jpg
3101ace8db9f (1).jpg
3101ace8db9f (2).jpg
This makes it easy to review and delete duplicates on your own, without trusting somebody else's software with your photos more than you need to.
Script:
https://gist.github.com/SimplGy/75bb4fd26a12d4f16da6df1c4e506562

answered Nov 27 '16 at 0:42
SimplGy
1387
1387
add a comment |
add a comment |
up vote
0
down vote
This is done with the help of EagleFiler app, developed by Michael Tsai.
tell application "EagleFiler"
set _checksums to {}
set _recordsSeen to {}
set _records to selected records of browser window 1
set _trash to trash of document of browser window 1
repeat with _record in _records
set _checksum to _record's checksum
set _matches to my findMatch(_checksum, _checksums, _recordsSeen)
if _matches is {} then
set _checksums to {_checksum} & _checksums
set _recordsSeen to {_record} & _recordsSeen
else
set _otherRecord to item 1 of _matches
if _otherRecord's modification date > _record's modification date
then
set _record's container to _trash
else
set _otherRecord's container to _trash
set _checksums to {_checksum} & _checksums
set _recordsSeen to {_record} & _recordsSeen
end if
end if
end repeat
end tell
on findMatch(_checksum, _checksums, _recordsSeen)
tell application "EagleFiler"
if _checksum is "" then return {}
if _checksums contains _checksum then
repeat with i from 1 to length of _checksums
if item i of _checksums is _checksum then
return item i of _recordsSeen
end if
end repeat
end if
return {}
end tell
end findMatch
You can also automatically delete duplicates with duplicate file remover suggested in this post.
1
(1) What is “EagleFiler”? Is it part of macOS? If not, where do you get it? (2) Is that meant to be one long block of code (the way I fixed it)? (3) Please fix your indentation. (4) Exactly how does one use this?
– Scott
Nov 7 at 8:29
add a comment |
up vote
0
down vote
This is done with the help of EagleFiler app, developed by Michael Tsai.
tell application "EagleFiler"
set _checksums to {}
set _recordsSeen to {}
set _records to selected records of browser window 1
set _trash to trash of document of browser window 1
repeat with _record in _records
set _checksum to _record's checksum
set _matches to my findMatch(_checksum, _checksums, _recordsSeen)
if _matches is {} then
set _checksums to {_checksum} & _checksums
set _recordsSeen to {_record} & _recordsSeen
else
set _otherRecord to item 1 of _matches
if _otherRecord's modification date > _record's modification date
then
set _record's container to _trash
else
set _otherRecord's container to _trash
set _checksums to {_checksum} & _checksums
set _recordsSeen to {_record} & _recordsSeen
end if
end if
end repeat
end tell
on findMatch(_checksum, _checksums, _recordsSeen)
tell application "EagleFiler"
if _checksum is "" then return {}
if _checksums contains _checksum then
repeat with i from 1 to length of _checksums
if item i of _checksums is _checksum then
return item i of _recordsSeen
end if
end repeat
end if
return {}
end tell
end findMatch
You can also automatically delete duplicates with duplicate file remover suggested in this post.
1
(1) What is “EagleFiler”? Is it part of macOS? If not, where do you get it? (2) Is that meant to be one long block of code (the way I fixed it)? (3) Please fix your indentation. (4) Exactly how does one use this?
– Scott
Nov 7 at 8:29
add a comment |
up vote
0
down vote
up vote
0
down vote
This is done with the help of EagleFiler app, developed by Michael Tsai.
tell application "EagleFiler"
set _checksums to {}
set _recordsSeen to {}
set _records to selected records of browser window 1
set _trash to trash of document of browser window 1
repeat with _record in _records
set _checksum to _record's checksum
set _matches to my findMatch(_checksum, _checksums, _recordsSeen)
if _matches is {} then
set _checksums to {_checksum} & _checksums
set _recordsSeen to {_record} & _recordsSeen
else
set _otherRecord to item 1 of _matches
if _otherRecord's modification date > _record's modification date
then
set _record's container to _trash
else
set _otherRecord's container to _trash
set _checksums to {_checksum} & _checksums
set _recordsSeen to {_record} & _recordsSeen
end if
end if
end repeat
end tell
on findMatch(_checksum, _checksums, _recordsSeen)
tell application "EagleFiler"
if _checksum is "" then return {}
if _checksums contains _checksum then
repeat with i from 1 to length of _checksums
if item i of _checksums is _checksum then
return item i of _recordsSeen
end if
end repeat
end if
return {}
end tell
end findMatch
You can also automatically delete duplicates with duplicate file remover suggested in this post.
This is done with the help of EagleFiler app, developed by Michael Tsai.
tell application "EagleFiler"
set _checksums to {}
set _recordsSeen to {}
set _records to selected records of browser window 1
set _trash to trash of document of browser window 1
repeat with _record in _records
set _checksum to _record's checksum
set _matches to my findMatch(_checksum, _checksums, _recordsSeen)
if _matches is {} then
set _checksums to {_checksum} & _checksums
set _recordsSeen to {_record} & _recordsSeen
else
set _otherRecord to item 1 of _matches
if _otherRecord's modification date > _record's modification date
then
set _record's container to _trash
else
set _otherRecord's container to _trash
set _checksums to {_checksum} & _checksums
set _recordsSeen to {_record} & _recordsSeen
end if
end if
end repeat
end tell
on findMatch(_checksum, _checksums, _recordsSeen)
tell application "EagleFiler"
if _checksum is "" then return {}
if _checksums contains _checksum then
repeat with i from 1 to length of _checksums
if item i of _checksums is _checksum then
return item i of _recordsSeen
end if
end repeat
end if
return {}
end tell
end findMatch
You can also automatically delete duplicates with duplicate file remover suggested in this post.
edited 2 days ago
answered Nov 7 at 8:10
Dejise
11
11
1
(1) What is “EagleFiler”? Is it part of macOS? If not, where do you get it? (2) Is that meant to be one long block of code (the way I fixed it)? (3) Please fix your indentation. (4) Exactly how does one use this?
– Scott
Nov 7 at 8:29
add a comment |
1
(1) What is “EagleFiler”? Is it part of macOS? If not, where do you get it? (2) Is that meant to be one long block of code (the way I fixed it)? (3) Please fix your indentation. (4) Exactly how does one use this?
– Scott
Nov 7 at 8:29
1
1
(1) What is “EagleFiler”? Is it part of macOS? If not, where do you get it? (2) Is that meant to be one long block of code (the way I fixed it)? (3) Please fix your indentation. (4) Exactly how does one use this?
– Scott
Nov 7 at 8:29
(1) What is “EagleFiler”? Is it part of macOS? If not, where do you get it? (2) Is that meant to be one long block of code (the way I fixed it)? (3) Please fix your indentation. (4) Exactly how does one use this?
– Scott
Nov 7 at 8:29
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f481456%2ffinding-and-removing-duplicate-files-in-osx-with-a-script%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password