Re.Mark

My Life As A Blog

Archive for the ‘Python’ Category

A Bit More Text Processing

leave a comment »

Yesterday I wrote a simple Python class counting word frequency.  The next step, I figured, was to strip out punctuation.  So, taking the class I wrote yesterday, I ended up with this:

from collections import defaultdict
import re

class Reader:
    
    def read(self, strings):
        self.data = defaultdict(int)
        for string in strings:
            clean_string = string.replace('\n', '')
            clean_string = self.__split_sentences(clean_string)
            clean_string = self.__remove_punctuation(clean_string)
            for sentence in clean_string.splitlines():
                clean_sentence = sentence.strip()                
                for word in clean_sentence.split(" "):
                    if not (word.isspace()):                    
                        self.data[word.lower()] += 1
        for key in self.data:
            print key, self.data[key]
            
    def __split_sentences(self, string):
        return re.sub('[.:?!]','\n', string)
        
    def __remove_punctuation(self, string):
        return re.sub('[;,\(\)\-"]', '', string)

Still seems fairly simple – strip out new lines, replace end of sentences with new line characters, strip out other punctuation, split into sentences, strip out extra whitespace and then split into words.  Also, this time around I’m doing all word comparison in lower case.  I’ve also changed the naming convention to be a bit more pythonic.  The double underscores, for those wondering, make the method private (by convention).  I’ve used the Python Regular Expression library (re) to do the (only slightly) more complex string replacements.  OK, so here’s some code to use in an interactive shell session to see the new class at work:

>>> from TextProcessor import Reader
>>> reader = Reader()
>>> text = ["This is a sentence.  And another one!", "This is"]
>>> reader.read(text)
a 1
and 1
sentence 1
this 2
is 2
one 1
another 1

It’s edging closer to a usable class and still pretty simple.  Probably needs to return something rather than printing the output.

Written by remark

January 6, 2010 at 3:19 pm

Posted in Development, Python

Simple Text Processing with Python

leave a comment »

On a couple of recent projects being able to calculate word frequency in text was a good idea.  There are a number of ways to do this, but I wondered how hard it would be to do in Python.  I saw a little bit of Michael Sparks’ talk at DevDays in London, and, while I don’t remember the detail, it did act as a reminder.  It turns out that a naive implementation is very easy indeed.  Here it is:

from collections import defaultdict

class Reader:
     def Read(self, strings):
         self.data = defaultdict(int)
         for string in strings:
             for word in string.split(" "):
                 self.data[word] += 1
         for key in self.data:
             print key, self.data[key]

Take a collection of strings, split them into words and then keep track of the count of each word using the word as a key.  The important class here is defaultdict without which I’d have to check to see if a word had already been inserted (not difficult but would have been more code.)  Here’s some code that uses the Reader class and its output:

>>> from TextProcessor import Reader
>>> reader = Reader()
>>> strings = ["Hello", "Hello", "Hello World", "Some other stuff with more spaces in it"]
>>> reader.Read(strings)
spaces 1
stuff 1
Some 1
it 1
other 1
in 1
World 1
with 1
Hello 3
more 1

To make it useful, there’s more to be done (punctuation being an obvious issue for this implementation) but it’s a useful start.

Written by remark

January 5, 2010 at 4:46 pm

Posted in Development, Python

Simple Configuration with IronPython and .NET 4

with 2 comments

Recently I posted a short article about how to do simple configuration with IronPython.  I figured that it would be easier with .NET 4.0 thanks to the dynamic support.  And it is.  Using Visual Studio 2010, create a new Console Application.  Add one file called Configuration.py and set the copy output property to Copy Always.  Here’s the Python code for that file:


configuration.Text = "Hello from IronPython"
configuration.Count = 4

And here’s the code for the Program class:


class Program
{
    static void Main(string[] args)
    {
        dynamic configuration = ConfigureFromIronPython();

        for (int i = 0; i < configuration.Count; i++)
        {
            Console.WriteLine(configuration.Text);
        }

        Console.WriteLine("Press any key to exit...");
        Console.ReadKey(true);
    }

    private static dynamic ConfigureFromIronPython()
    {
        dynamic configuration = new ExpandoObject();
        ScriptEngine engine = Python.CreateEngine();
        ScriptScope scope = engine.CreateScope();
        scope.SetVariable("configuration", configuration);
        engine.ExecuteFile("Configuration.py", scope);
        return configuration;
    }
}

For that to work you need add references to IronPython and Microsoft.Scripting (there’s a CTP of IronPython for .NET 4.0 that you can get here.)  You’ll also need a few using statements:


using System.Dynamic;
using IronPython.Hosting;
using Microsoft.Scripting.Hosting;

The two things that make this work are the ExpandoObject and the dynamic keyword.  The ExpandoObject is a dynamic property bag that allows us to set and get members at runtime as needed.  The dynamic keyword means that .NET will resolve the properties at runtime (rather than the traditional behaviour of checking at compile time.)  The result is (I think) more simple and elegant than the configuration code to which I’ve become accustomed.

Written by remark

September 28, 2009 at 12:53 pm

Posted in .NET, c#, Design, Python

Simple configuration using IronPython

leave a comment »

I’ve just posted an article on how to use IronPython to configure an application – read it here.

Written by remark

September 14, 2009 at 4:49 pm

Posted in .NET, c#, Design, Development, Python

IronPython in Action Discount

leave a comment »

If you like the sound of IronPython in Action then get your ebook or print copy of IronPython in Action at a 40% discount, courtesy of Manning Publications. Valid only at manning.com/foord – Use code remarkipia40 at checkout.

Written by remark

July 1, 2009 at 4:32 pm

Posted in .NET, Books, Python

Iron Python in Action

with one comment

I first installed IronPython in late 2006  – the combination of .NET and Python being impossible to resist.  My interest and enthusiasm were both raised further by the announcement of the DLR at MIX in 2007.  Since then, I’ve learned about it from blog posts, presentations and occasional bouts of experimentation.  But I always wanted a book about it to learn more and to illuminate the undiscovered corners.  So, I was impatient for the publication of ironpythoninaction[1] IronPython In Action and bought a copy as soon as it became available.  Having just finished the book, I thought I’d post some thoughts about it.

The first thing I realised when I started to read the book was that the authors, Michael Foord and Christian Moorhead, had to satisfy two discrete audiences: Python programmers interested in the .NET implementation and .NET programmers interested in the DLR and Python.  The result is that there are sections that provide introductions to aspects of both Python and .NET.  I think you’ll want more information on whichever area is new to you, but this is a good starting point.

This is a how-to book, so once the introductions are over, it gets into accomplishing specific tasks.  In doing so, a number of dynamic language attributes like duck typing and first-class functions are introduced.  There’s good coverage of how to use IronPython in a number of .NET technologies such as ASP .NET, WPF, WinForms and Silverlight.

My personal interest was to see how to combine C# and IronPython and this topic is covered including sections on metaprogramming and embedding.  I would have liked to see more on this topic – given the breadth of what is being covered, the consequence has to be that there is a limited amount of space for any given area.  And that for me summarises the book – a very good introduction to what you can achieve with IronPython and the DLR. 

If you’re interested in the DLR and IronPython, this book is worth reading.  It’s a very good introduction  – and will serve as a useful reference when you come to start your next foray into IronPython.  And if you’re not interested in the DLR and IronPython, reading this book may just change your mind.

Written by remark

July 1, 2009 at 2:40 pm

Posted in .NET, Books, Python

AIC 2009 Slides available

leave a comment »

As Matt announced, the slides from the Architect Insight Conference 2009 are all now online.  The keynote videos are there too.  As Marc notes, there’s something there for most architectural interests – including my session on Dynamic Languages and Architecture (with what could become my trademark use of translucent black.)

Written by remark

June 16, 2009 at 5:18 pm

Posted in .NET, Architecture, Events, Python, Ruby

Tagged with

How to embed IronPython

leave a comment »

I’ve just posted an article about embedding IronPython into your apps as a scripting language. Read it here.

Written by remark

June 10, 2009 at 7:27 pm

Use .NET classes in IronPython

leave a comment »

It’s really simple to use a .NET class in IronPython.  The first thing to remember is that you’ll need to add a reference (just like you would in .NET).  To do that you use the AddReference method of the clr module.  In the case of an assembly you’ve written, IronPython needs to be able to find it, which means it needs to be in sys.path.  It turns out that you can add the path to your assembly by using the append method of sys.path.  Here’s a simple example.  First, let’s create a simple class called User in C# in a solution called SampleClasses:

namespace SampleClasses

{

    public class User

    {

        public string Name{ get; set;}

        public DateTime DateOfBirth { get; set; }

    }

}

For the sake of simplicity, let’s copy the dll to a folder called lib on the C drive.  OK.  Time to fire up IronPython (which I’m going to assume you’ve already installed.)  Open a command prompt and type “ipy”.  You should see something like the following:

image

Next, let’s ensure the SampleClasses assembly is available to IronPython:

>>>import sys
>>>sys.path.append(‘C:\\lib’)

Once we’ve done that we can add a reference:

>>>import clr
>>>clr.AddReference(‘SampleClasses’)

Now, we need to import the User name from the SampleClasses namespace:

>>>from SampleClasses import User

We’re all set.  Create an instance of user and set one of the properties:

>>>a.User()
>>>a.Name=’Bob’
>>>a.Name
‘Bob’

That’s all there is to it.  Now you can go and experiment with IronPython and classes you’ve already written in .NET.

Written by remark

June 4, 2009 at 2:49 pm

Posted in .NET, c#, Python, Software

AIC 2009 – Dynamic Languages and Architecture

leave a comment »

Thanks to all of you who attended my session at AIC earlier today.  The slides will be made available on line over the next week or so. 

I think the interesting capability made possible by the DLR is using static languages and dynamic languages together.  And there’s another benefit of learning a new language: when we only use one language we tend to think in that language – having other languages in our toolkit means that we have other approaches available to us.

So, where can you start to take advantage of dynamic languages?  The areas I discussed today were:

  • extending your application by adding scripting support to your application
  • configuring your application with a dynamic language
  • creating a DSL using a dynamic language
  • writing one or more layers of your architecture in a dynamic language
  • testing your application(s) with a dynamic language

Of all of these, extending and testing are probably the best places to start.

I also talked a bit about the DLR (and a couple of the Iron Languages – Iron Python and Iron Ruby) and the way that C# will be taking advantage of the DLR. It’s fascinating to see the evolution of programming languages and how the trends of dynamic, functional and concurrent programming are influencing this evolution.

Here are the links that I gave out in my session:

http://www.codeplex.com/IronPython

http://ironruby.net/

http://www.codeplex.com/dlr

http://devhawk.net/

http://blog.jimmy.schementi.com/

Dynamic Languages on .NET (www.microsoftpdc.com)

http://ironpython-urls.blogspot.com/

http://www.ironpythoninaction.com/

I also mentioned the Anders Hejlsberg session on the future of C# – you can watch that here – and a Channel 9 video of Anders Hejlsberg and Gilad Bracha discussing language design, which you can find here.

As part of the preparation for the session, I exchanged some emails with a few folks including Michael Foord.  For those of you who’d like to see Michael’s take on this subject, he’s posted about it here.

Written by remark

May 8, 2009 at 5:29 pm

Follow

Get every new post delivered to your Inbox.