Today I was curious what my friend's favourite artists were. Rather than just ask him and give away any surprise of this years birthday present, I decided to sneakily scrape his iTunes library. He has a wide taste in music and buys a lot of it, so he's a great source of new music. His iTunes library is filled with interesting artists, songs and compilations. On the other hand, he's in the industry and collects a lot of amateur music. When looking for an ideal birthday gift idea in his iTunes library, I'll need to consider how much he plays any song.
iTunes keeps it's music metadata in an XML file for decorating music files with other metadata not supported in ID3 tags.
Using LINQ, there are two ways I could query this data:-
- Use LINQ to XML.
- Convert the DTD to XSD, generate proxy classes using LINQ to XSD and then query the loaded file with LINQ to Objects.
I *need* my LINQ & Lambda fix - now - so I'm going to the take the fast option numero uno. If anyone is interested in seeing LINQ to XSD in action, let me know in the comments.
Niel Bornstein's article on hacking the iTunes XML describes the DTD clearly. Even though his article is old, the key-value pair structure allows for the addition of new properties in newer iTunes versions.
If you've at least seen some XML before, simply perusing the file is enough to understand the simple format.
- On Windows, you can find the library XML file in My Documents\My Music\iTunes\iTunes Music Library.XML
- On OS X, it's in ~/Music/iTunes/iTunes Music Library.XML
Basically, the document root is a plist tag split up into three sections - header, track info and playlist info. Each section is within a dict tag, and each entity within the Track and Playlist sections are contained within dict tags.
The XML element for a track looks like :-
1: <key>839</key>
2: <dict>
3: <key>Track ID</key><integer>839</integer>
4: <key>Name</key><string>Sweet Georgia Brown</string>
5: <key>Artist</key><string>Count Basie & His Orchestra</string>
6: <key>Composer</key><string>Bernie/Pinkard/Casey</string>
7: <key>Album</key><string>Prime Time</string>
8: <key>Genre</key><string>Jazz</string>
9: <key>Kind</key><string>Protected AAC audio file</string>
10: <key>Size</key><integer>3771502</integer>
11: <key>Total Time</key><integer>219173</integer>
12: <key>Disc Number</key><integer>1</integer>
13: <key>Disc Count</key><integer>1</integer>
14: <key>Track Number</key><integer>3</integer>
15: <key>Track Count</key><integer>8</integer>
16: <key>Year</key><integer>1977</integer>
17: <key>Date Modified</key><date>2004-06-16T18:10:55Z</date>
18: <key>Date Added</key><date>2004-06-16T18:08:31Z</date>
19: <key>Bit Rate</key><integer>128</integer>
20: <key>Sample Rate</key><integer>44100</integer>
21: <key>Play Count</key><integer>3</integer>
22: <key>Play Date</key><integer>-1119376103</integer>
23: <key>Play Date UTC</key><date>2004-08-17T16:39:53Z</date>
24: <key>Rating</key><integer>100</integer>
25: <key>Artwork Count</key><integer>1</integer>
26: <key>File Type</key><integer>1295274016</integer>
27: <key>File Creator</key><integer>1752133483</integer>
28: <key>Location</key><string>file://localhost/Users/niel/Music/music.mp4</string>
29: <key>File Folder Count</key><integer>4</integer>
30: <key>Library Folder Count</key><integer>1</integer>
31: </dict>
To construct my query, I broke my goal into three tasks:
- Find the collection of tracks.
- For each song, find the play count and artist name.
- Descendingly sort the song list by play count.
- Select any artist only once.
And here's the code to write out my friends' favourite artists in descending order of popularity.
1: using System;
2: using System.IO;
3: using System.Collections.Generic;
4: using System.Linq;
5: using System.Text;
6: using System.Xml.Linq;
7:
8: namespace iTunesXmlParser {
9: class Program {
10:
11:
12: static void Main(string[] args) {
13: string xmlFile = args[1];
14: using (var sr = new StreamReader(xmlFile)) {
15: var doc = XDocument.Load(sr);
16:
17: // find the tracks dictionary
18: XElement tracksDictionary = (from element in doc.Descendants()
19: where element.Value.Equals("Tracks")
20: select element).First().NextNode as XElement;
21:
22: // get all distinct track artist names ordered descendingly by track play count
23: // NOTE: if no play count exists for a track, the track is assumed to have 0 plays
24:
25: List<string> artistNames = (from el in tracksDictionary.Descendants()
26: let playCountValues = (from trackField in el.Descendants()
27: where trackField.Value == "Play Count"
28: select (trackField.NextNode as XElement))
29: let artistValues = (from trackField in el.Descendants()
30: where trackField.Value == "Artist"
31: select (trackField.NextNode as XElement))
32: where
33: el.Name.LocalName == "dict" // find the track key-value pair property set
34: && artistValues.Count() > 0 // an artist property should exist
35:
36: orderby playCountValues.Count() == 0? 0 : Int32.Parse(playCountValues.First().Value) descending
37: select artistValues.First().Value).Distinct().ToList();
38:
39: Console.WriteLine("Number of artists:" + artistNames.Count());
40:
41: // write out the artist names in decending order of popularity
42: foreach (var artistName in artistNames) {
43: Console.WriteLine(artistName);
44: }
45: }
46:
47: Console.WriteLine("hit enter to exit");
48: Console.ReadLine();
49: }
50: }
51: }
The let keyword allows you to define a variable within a query. In my case, the playCountValues and artistValues are the each IEnumerable<XElements> of the play count and artist name respectively. let is great for keeping variables that will be reused more than once elsewhere in the query.
This query is a bit complex and unusually long because the iTunes XML doesn't store key-values like <Name>DJ Shadow</Name> (the typical format). Notice the need to get the value element with (element.NextNode as XElement).
After running this on my friends library, the console spat out over 5000 lines of artist names. Success! Top of the list is "Zero 7". Second on the list is "DJ Shadow" - who I've never heard before. I'll definitely check these artists out.
The next thing I'll do is find out what albums of "Zero 7" he's got and perhaps cross reference the list with the an Amazon page to find an album he doesn't own. This years birthday present decision will be completely automated :)
Till next time on LINQ & Lambda, if there's any particular LINQ provider you want me to cover, please leave a message in the comments.
