Re.Mark

My Life As A Blog

Archive for June 2010

Dynamic Football Stats

with one comment

A couple of days ago I noticed that The Guardian had made data about the England vs USA game that took place last week available.  I downloaded the data (which is in a Google Apps spreadsheet) and saved each sheet as a CSV file.

Originally, I intended to read the data with IronPython.  Reading CSV data with Python is very simple – there’s a built in CSV module.  However, this module is written in C, which means it’s not available in IronPython – see here for more info.  There is aproject called IronClad that allows Python modules written in C to be used from IronPython.  At the moment, it’s built against .NET 2, which means that I could get it to work in .NET 2, but I had plans to use .NET 4 and the dynamic support in C#.   Time for another approach.

Using the CsvReader class,it’s easy to access the data in a CSV file.  I started with the Player Summaries sheet.  To make this dynamic (and, therefore, useful for each of these sheets and, potentially, other as yet unknown sheets) I created a class to hold each row of data.  Here it is:


public class DynamicDataObject : DynamicObject
{
    private readonly Dictionary<string, dynamic> data;

    public DynamicDataObject(Dictionary<string, dynamic> data)
    {
        this.data = data;
    }

    public override bool TryGetMember(GetMemberBinder binder, out object result)
    {
        result = data[binder.Name];
        return (result != null);
    }
}

By inheriting from DynamicObject, it will be possible to call this class dynamically – meaning that I can use the names of the data fields as defined properties on the class.  Next I created a DataReader class that reads the data from the CSV file and stores it as an IEnumerable<DynamicDataObject>.  Here’s that class:


public class DataReader : IEnumerable<DynamicDataObject>
{
    private readonly List<DynamicDataObject> dataList;

    public DataReader(string filename)
    {
        this.dataList = new List<DynamicDataObject>();
        using (StreamReader streamReader = new StreamReader(filename))
        {
            using (CsvReader reader = new CsvReader(streamReader, true))
            {
                string[] headers = reader.GetFieldHeaders();
                Dictionary<string, string> cleanHeaders = CleanHeaders(headers);
                while (reader.ReadNextRecord())
                {
                    Dictionary<string, dynamic> data = new Dictionary<string, dynamic>();
                    foreach (string header in headers)
                    {
                        int result;
                        dynamic value;
                        if (int.TryParse(reader[header],  out result))
                        {
                            value = result;
                        }
                        else
                        {
                            value = reader[header];
                        }
                        data.Add(cleanHeaders[header], value);
                    }
                    this.dataList.Add(new DynamicDataObject(data));
                }
            }
        }
    }

    private Dictionary<string, string> CleanHeaders(string[] headers)
    {
        Dictionary<string, string> result = new Dictionary<string, string>();
        foreach (string header in headers)
        {
            string cleanheader = header.Replace(' ', '_');
            cleanheader = cleanheader.Split('(')[0];
            result.Add(header, cleanheader);
        }
        return result;
    }

    #region IEnumerable<DynamicDataObject> Members

    public IEnumerator<DynamicDataObject> GetEnumerator()
    {
        return this.dataList.GetEnumerator();
    }

    #endregion

    #region IEnumerable Members

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return this.dataList.GetEnumerator();
    }

    #endregion
}

There’s a couple of things worth pointing out.  The first is that I’ve cleaned up the field names so that they can be used in code (by replacing spaces with underscores and removing anything in brackets).  The second is that if a value is an integer, I’m storing it as an integer.  I’m storing these values of type dynamic, which will come in handy when we want to query the data.

Speaking of querying that data, I wanted to use LINQ.  Here’s some simple code I wrote in a console application to try it out:


static void Main(string[] args)
{
    DataReader reader = new DataReader(@"C:\Users\Mark\Downloads\Eng-USA Data\Player Summaries.csv");

    var result = from dynamic player in reader
                 where player.Goals > 0
                 select player;

    foreach (dynamic player in result)
    {
        Console.WriteLine(player.Player_Name + " - " + player.Goals + " goals");
    }

    Console.WriteLine("Press any key to exit...");
    Console.ReadKey();
}

And here’s the output:

image

By using the dynamic support in C#, the LINQ query just works and I can reference properties dynamically without having to create a class specifically for each sheet of data.  It’s important in the LINQ query to declare the player of type dynamic – otherwise C# will revert to its statically typed ways and inform you that the Goals property doesn’t exist, which, given that it only exists at runtime, is correct.  Now I can analyse the data easily.  Doesn’t change the result though…

Written by remark

June 17, 2010 at 12:41 pm

Posted in .NET, c#, Python

Occasionally Connected Silverlight Applications

leave a comment »

Earlier this year, Dr Dave and I worked on a Proof of Concept with Trader Media (probably most famous for Autotrader) and Fortune Cookie.  You can read more about the project here.   The application needed to be able to cope with being disconnected some of the time.  Dr Dave and I took what we learned from this aspect of the project and wrote an article for MSDN Magazine, which you can read here.

Written by remark

June 8, 2010 at 10:01 am

Finding Lyrics

with 2 comments

I was looking some lyrics up online this week, so I wondered how hard to would be to write a simple application to find lyrics to your favourite song.  Or to your least favourite song.  Or, in fact, to any arbritrary song.  Via programmableweb, I found the API to lyricsfly, which looked easy to use.  Another IronPython console app beckoned.

Keeping it simple, I decided to use optparse to parse the command-line options and urllib to make the http calls.  This way the program can be called with the user_id that lyricsfly requires (head to their website and you can get a temporary weekly key to try this out) along with the artist name and song title.  What I decided not to do at this stage was to process the resulting XML.  Or handle any errors.  Or handle cases where the user_id, artist or title is not supplied.  But, although rudimentary, it works.  Here’s the code:


from System import Console
import urllib
from optparse import OptionParser

print "Starting"

parser = OptionParser()
parser.add_option("-i", "--user_id",
                  action="store", type="string", dest="user_id",
                  help="The user id for the Lyrics Fly service")
parser.add_option("-a", "--artist",
                  action="store", type="string", dest="artist",
                  help="Artist name")
parser.add_option("-t", "--title",
                  action="store", type="string", dest="title",
                  help="Song title")                  

(options, args) = parser.parse_args()

print "Parsed options"

if (options.user_id):
    user_id = options.user_id
 
if (options.artist):
    artist = options.artist
    
if (options.title):
    title = options.title  
    
print "Getting Lyrics for " + artist + " - " + title 

query = urllib.urlencode([("i", user_id), ("a", artist), ("t", title)])   

url = "http://api.lyricsfly.com/api/api.php?" + query
print url

data = urllib.urlopen(url)
print data.read()

print
print "Press any key to exit.."
Console.ReadKey()

Written by remark

June 4, 2010 at 4:51 pm

Posted in .NET, Python