Your Ad Here

Removing duplicate files in your iTunes library

One of the problems with a networked drive acting as the iTunes library for multiple computers in the house is that I often end up with duplicate songs in the library. I think this comes from importing the library directory when you also have “Copy files to the iTunes library” set in the iTunes advanced prefs. ITunes imports a song and then tries to copy it over itself. Seeing there’s already a file there by that name it creates a file called “songname 1.mp3”

At least that’s my theory.

To clean up a library full of duplicate files, here’s what I did. I installed Duff, a unix utility that quickly finds duplicate files. Duff works by comparing the actual files of any two files that have identical sizes. I sent Duff’s output to a text file with the command

duff -r /Volumes/music/ > duplicatemusic.txt

Once Duff was done running, I ran a short command to grab all the lines from the output that end in 1.mp3 and delete them.

cat duplicatemusic.txt | grep 1.mp3 | tr '\012' '\000' | xargs -0 rm

If you’ve done whatever it is that causes duplicates a number of times, you might have *2.mp3, *3.mp3, etc. Just run that command again, replacing 1.mp3 with 2.mp3 and so on. The one liner above could probably be improved to grep for any single digit followed by .mp3, but it’s quick enough to run it a few times that I didn’t bother.

Jonathan Dingman
September 4, 2007 5:08 AM

Awesome Adam, thanks so much. I just transfered all my music over from my desktop to my Mac and I was dreading having to go through and remove all duplicate songs.

This is a life saver, thanks again.

Raj
September 4, 2007 7:25 AM

This is useful, I never knew about the tr command. I’m pretty new to bash, so am I missing something, or could you have piped the output of duff right into grep without having to save the contents to a file?

Adam Kalsey
September 4, 2007 8:27 AM

I could have, but creating the file accomplished several things.

Inspecting the job first. My Japanese language instruction files are named “Pimsleur Learning Japanese 1.mp3” and so forth. They would have been gone. And somehow I ended up with only the duplicate files on a couple of songs. I only had the …1.mp3 copy of some. So I edited out the lines I didn’t want to delete.

Avoiding multiple passes. Duff is fast, but it still takes over an hour or so to process 10k files on a NAS device over Wifi. Since I’m running the removal command multiple times (1.mp3, 2.mp3, etc) I wouldn’t want to make Duff do the same job repeatedly. Just save the output and work from there.

Making sure it’s really done. By visually inspecting the file, I discovered that some duplicate files ended up with much higher numbers at the end. If iTunes sees a number at the end of a file, it just increments that when making its copy. So my copy of Nelson Riddle’s Theme from Route 66 became “Theme from Route 66 67.mp3”

Magento
October 6, 2007 10:18 AM

That’s a nice way of cleaning out the dupes. No more manually deleting, that’s a timesaver. It would be nice though if there was some kind of option in itunes to filter dupes from the library.

Thanks for the tip.

Andrew
October 31, 2007 1:38 AM

Hi Adam,

I’m having a slightly different problem. iTunes indicates that I have 28.58GB of music in the library. When I go to my Music folder and highlight all the music folders the total size is 32GB. 3.42 is a pretty big discrepancy. Do you know of any program that would match the iTunes library with the actual files on the hard drive and delete the ones that are not in the library?

Thanks

bill johnston
October 31, 2007 3:18 PM

create an empty working file

touch .tmpDupeFile .tmpSortedFile

build a big list of md5 signatures

find . -type f -print0 | xargs -0 md5 -r > .tmpDupeFile

sort the signatures

cat .tmpDupeFile | sort > .tmpSortedFile

create a list of duplicated files

cat .tmpSortedFile | awk '{ if ($1 == oldmd5) { printf "rm %s \n", $2 } oldmd5 = $1 }' > duplicates.sh

clean up

rm .tmpDupeFile
rm .tmpSortedFile

Andrew
October 31, 2007 5:14 PM

Bill,

I appreciate your help but I’m afraid that I just don’t know what to do with all that information since I’m not a programmer. I was hoping for some kind of software solution to my problem. Thanks.

Adam Kalsey
January 3, 2008 12:06 PM

Bill,

Your awk script doesn’t take into account that most iTunes filenames contain spaces. A file called “My Music” ends up creating ‘rm My’ instead of ‘rm “My Music”’

I changed it to…

cat .tmpSortedFile | awk ‘{ if ($1 == oldmd5) { printf “rm "%s" \n”, substr($0, index($0, ” “)+1) } oldmd5 = $1 }’ > duplicates.sh

This grabs the whole filename and wraps it in quotes so that duplicates.sh works properly.

Todd
January 13, 2008 9:10 PM

Adam, you are THE MAN! Thank you so much for this. I had imported my MP3’s into iTunes from an external drive, and M3U files in the folders caused duplicates to be created in my iTunes library. I was just getting ready to delete my entire iTunes library and re-import because of the duplicates. Your work here saved me hours of reimporting. For some reason, I didn’t find your site earlier (when Googling for iTunes duplicate solutions), but did find it when searching for info on making sure deleting items from iTunes would also delete the files (hah). I had booted to Windows to use Windows and Robocopy to move my M3U files into a backup folder. I was literally just launching Mac OS to delete my library and start over when I found your site.

I’m running Mac OSX Leopard, and I’m not sure that you are?… Either way, I found some descrepancies in your method when running under OSX. I’ve created a blog entry on my site with the changes for OSX, including a quick tutorial on running the compilation process for Duff, for those who might not be familiar with it. (I wasn’t, so I documented it as I went.)

Anyway, you can check out my post at this link. http://www.togeo.com/togeo/wordpress/?p=47

Thanks again for the excellent post on your site, you’ve got a cool blog otherwise, too!!!

Franky
May 12, 2008 1:47 PM

How do you send Duff’s output to a text file? I’m dumb. I’m running Leopard on an iMac and I have triples and quadruples of the same songs and it’s killing storage capacity.


Your comments:

Text only, no HTML. URLs will automatically be converted to links. Your email address is required, but it will not be displayed on the site.

Name:

Email: (not displayed)

If you don't feel comfortable giving me your real email address, don't expect me to feel comfortable publishing your comment.

Website (optional):

Lijit Search

Best Of

Recently Read

Get More

Subscribe | Archives

Recently

Sprout Test (May 7)
A test post for Sprout widgets.
Product Leadership (May 3)
An anthology of product leadership writing.
Fighting Monster patent claims (Apr 16)
The patent bully picked on the wrong little guy.
Peavy's pine tar (Apr 6)
Jake Peavy's cheating
Bush and Morgan on inner city baseball (Mar 30)
Morgan and Bush discuss the role of baseball in the inner cities.
Not a fork (Mar 27)
We have no intention of forking Drupal. That would be nuts. So what are we doing then?
Eating our dogfood in the sausage factory (Mar 26)
Recursive development for the new Drupal powered community platform.

Subscribe to this site's feed.

Elsewhere

Feed Crier
Get alerted by IM when your favorite web sites and feeds are updated.
SacStarts
The Sacramento technology startup community.
Pinewood Freak
Pinewood Derby tips and tricks
Del.icio.us
My tagstream at del.icio.us.
Waddlespot
My son's Club Penguin community. News, blogs, tips, and tricks.

Contact

Adam Kalsey

Mobile: 916.600.2497

Email: adam AT kalsey.com

AIM or Skype: akalsey

Resume

PGP Key

©1999-2008 Adam Kalsey.
Content management by Movable Type.