Find duplicate files

Update: As Pádraig Brady, fslint maintainer, pointed out: fslint/findup *is* a shell script.

My 500-GB Seagate FreeAgent Desktop is almost filled to the brim (there's *only* ~70GB free space left) so I need to find all duplicate files for clean-up.

Fortunately, there are tools to do just this. I tried fslint, which is also available in the Fedora repository. I also found several nifty scripts on the web.

I settled for a Perl script, found in PerlMonks, which I modified a bit (used digest() instead of hexdigest(), removed calculation of duplicate file size).



I'm a shell-script junkie, so I whipped up something in Bash. It's not as fast as the Perl implementation or fslint, but it does the job.



(Awk is pretty cool, isn't it?)

Of course, I tested all three on a directory with about 300 or so duplicate files, here are the results.

fslint/findup:

real    0m3.093s
user    0m1.812s
sys     0m0.368s

Perl:

real    0m4.668s
user    0m0.644s
sys     0m0.188s

Shell:

real    0m30.475s
user    0m1.842s
sys     0m1.692s

Okay, so the shell script's performance was abysmal, but hey, it's always reassuring to know that there are more than one way to do it. (Errr... that's a Perl motto.)

Comments

  1. fslint/findup is shell script :)
    http://code.google.com/p/fslint/source/browse/trunk/fslint/findup

    ReplyDelete
  2. WOW! Didn't realize that. Looks like I don't have to reinvent the wheel, then.

    Thanks for pointing this out. :)

    ReplyDelete

Post a Comment

Popular posts from this blog

Pull files off Android phone