Find duplicate files
Update: As Pádraig Brady, fslint maintainer, pointed out: fslint/findup *is* a shell script.
My 500-GB Seagate FreeAgent Desktop is almost filled to the brim (there's *only* ~70GB free space left) so I need to find all duplicate files for clean-up.
Fortunately, there are tools to do just this. I tried fslint, which is also available in the Fedora repository. I also found several nifty scripts on the web.
I settled for a Perl script, found in PerlMonks, which I modified a bit (used digest() instead of hexdigest(), removed calculation of duplicate file size).
I'm a shell-script junkie, so I whipped up something in Bash. It's not as fast as the Perl implementation or fslint, but it does the job.
(Awk is pretty cool, isn't it?)
Of course, I tested all three on a directory with about 300 or so duplicate files, here are the results.
fslint/findup:
real 0m3.093s
user 0m1.812s
sys 0m0.368sPerl:
real 0m4.668s
user 0m0.644s
sys 0m0.188sShell:
real 0m30.475s
user 0m1.842s
sys 0m1.692sOkay, so the shell script's performance was abysmal, but hey, it's always reassuring to know that there are more than one way to do it. (Errr... that's a Perl motto.)
fslint/findup is shell script :)
ReplyDeletehttp://code.google.com/p/fslint/source/browse/trunk/fslint/findup
WOW! Didn't realize that. Looks like I don't have to reinvent the wheel, then.
ReplyDeleteThanks for pointing this out. :)