Re.Mark

My Life As A Blog

Archive for the ‘Python’ Category

Text Processing in WPF

with one comment

Having seen that I could run my Python code to calculate word frequency in IronPython, my thoughts turned to displaying the results.  Having previously built a simple WPF application in IronPython, I decided to start there.  First step was to create a simple XAML file:


<Window xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
Title="Text Processor App" Width="640" Height="480">
  <StackPanel>
    <Label>Text Processor</Label>
    <ListBox x:Name="listbox1">
	<ListBox.ItemTemplate>
	    <DataTemplate>
		<StackPanel Orientation="Horizontal" >
		    <TextBlock Text="{Binding Path=[0]}" Margin="0,0,10,0" />
		    <TextBlock Text="{Binding Path=[1]}" />
		</StackPanel>
	    </DataTemplate>
        </ListBox.ItemTemplate>
    </ListBox>
  </StackPanel>
</Window>

I called this file main.xaml.  The results from the Reader class in Python are a list of tuples – each tuple will have the word and the frequency with which it appears.  The CLR type is IronPython.Runtime.PythonTuple – which allows us to access the values via an indexer.  The next step was to create a file called main.py (in the same folder as main.xaml):


import clr
clr.AddReferenceByPartialName("PresentationFramework")

from System.IO import File
from System.Windows.Markup import XamlReader
from System.Windows import Application
from TextProcessor import Reader

file = File.OpenRead('main.xaml')
window = XamlReader.Load(file)
reader = Reader()
list = ["This is a multiline string.\nWith many lines.", "This isn't.", "And nor is this"]
reader.read(list)
window.Content.Children[1].ItemsSource = reader.get_sorted_results()

Application().Run(window)

The next step is to fire up a command prompt and type:

ipy main.py

And it works. Here’s the output on my machine.

image

Written by remark

January 11, 2010 at 7:49 pm

Posted in .NET, Development, Python

Running Text Processor in IronPython

leave a comment »

I couldn’t resist trying to run the Python code I wrote earlier today in IronPython.  And the good news is it just works.  I span up an IronPython interactive shell and went through the same lines as I had done with Python earlier.  Not a surprise, of course, but may help when it comes to displaying the results.

Written by remark

January 7, 2010 at 6:58 pm

Posted in .NET, Development, Python

Sorting results in Python

leave a comment »

After yesterday’s exercise, I decided that today it’d be good to be able to retrieve the results.  And I’d like the results to be sorted.  A quick trawl across the internet and I found this post about sorting dictionaries in Python.  For an introduction to sorting in Python, this article is helpful.  So, I modified the Reader class by adding an import statement:

from operator import itemgetter

I took out the result printing loop and added a new function:

def get_sorted_results(self):
    return sorted(self.data.iteritems(), key=itemgetter(1), reverse=True)

And that’s it.  To see the sorted results, here’s some code entered into the interactive shell:

>>> from TextProcessor import Reader
>>> reader = Reader()
>>> list = ["This is a multiline string.\nWith many lines.", "This isn't.", "And nor is this."]
>>> reader.read(list)
>>> for key, value in reader.get_sorted_results():
...     print key, value
... 
this 3
is 2
a 1
and 1
string 1
many 1
lines 1
multiline 1
nor 1
with 1
isn't 1

The next step would seem to be displaying the results in something other than a console.

Written by remark

January 7, 2010 at 6:18 pm

Posted in Development, Python

A Bit More Text Processing

leave a comment »

Yesterday I wrote a simple Python class counting word frequency.  The next step, I figured, was to strip out punctuation.  So, taking the class I wrote yesterday, I ended up with this:

from collections import defaultdict
import re

class Reader:
    
    def read(self, strings):
        self.data = defaultdict(int)
        for string in strings:
            clean_string = string.replace('\n', '')
            clean_string = self.__split_sentences(clean_string)
            clean_string = self.__remove_punctuation(clean_string)
            for sentence in clean_string.splitlines():
                clean_sentence = sentence.strip()                
                for word in clean_sentence.split(" "):
                    if not (word.isspace()):                    
                        self.data[word.lower()] += 1
        for key in self.data:
            print key, self.data[key]
            
    def __split_sentences(self, string):
        return re.sub('[.:?!]','\n', string)
        
    def __remove_punctuation(self, string):
        return re.sub('[;,\(\)\-"]', '', string)

Still seems fairly simple – strip out new lines, replace end of sentences with new line characters, strip out other punctuation, split into sentences, strip out extra whitespace and then split into words.  Also, this time around I’m doing all word comparison in lower case.  I’ve also changed the naming convention to be a bit more pythonic.  The double underscores, for those wondering, make the method private (by convention).  I’ve used the Python Regular Expression library (re) to do the (only slightly) more complex string replacements.  OK, so here’s some code to use in an interactive shell session to see the new class at work:

>>> from TextProcessor import Reader
>>> reader = Reader()
>>> text = ["This is a sentence.  And another one!", "This is"]
>>> reader.read(text)
a 1
and 1
sentence 1
this 2
is 2
one 1
another 1

It’s edging closer to a usable class and still pretty simple.  Probably needs to return something rather than printing the output.

Written by remark

January 6, 2010 at 3:19 pm

Posted in Development, Python

Simple Text Processing with Python

leave a comment »

On a couple of recent projects being able to calculate word frequency in text was a good idea.  There are a number of ways to do this, but I wondered how hard it would be to do in Python.  I saw a little bit of Michael Sparks’ talk at DevDays in London, and, while I don’t remember the detail, it did act as a reminder.  It turns out that a naive implementation is very easy indeed.  Here it is:

from collections import defaultdict

class Reader:
     def Read(self, strings):
         self.data = defaultdict(int)
         for string in strings:
             for word in string.split(" "):
                 self.data[word] += 1
         for key in self.data:
             print key, self.data[key]

Take a collection of strings, split them into words and then keep track of the count of each word using the word as a key.  The important class here is defaultdict without which I’d have to check to see if a word had already been inserted (not difficult but would have been more code.)  Here’s some code that uses the Reader class and its output:

>>> from TextProcessor import Reader
>>> reader = Reader()
>>> strings = ["Hello", "Hello", "Hello World", "Some other stuff with more spaces in it"]
>>> reader.Read(strings)
spaces 1
stuff 1
Some 1
it 1
other 1
in 1
World 1
with 1
Hello 3
more 1

To make it useful, there’s more to be done (punctuation being an obvious issue for this implementation) but it’s a useful start.

Written by remark

January 5, 2010 at 4:46 pm

Posted in Development, Python

Simple Configuration with IronPython and .NET 4

with 2 comments

Recently I posted a short article about how to do simple configuration with IronPython.  I figured that it would be easier with .NET 4.0 thanks to the dynamic support.  And it is.  Using Visual Studio 2010, create a new Console Application.  Add one file called Configuration.py and set the copy output property to Copy Always.  Here’s the Python code for that file:


configuration.Text = "Hello from IronPython"
configuration.Count = 4

And here’s the code for the Program class:


class Program
{
    static void Main(string[] args)
    {
        dynamic configuration = ConfigureFromIronPython();

        for (int i = 0; i < configuration.Count; i++)
        {
            Console.WriteLine(configuration.Text);
        }

        Console.WriteLine("Press any key to exit...");
        Console.ReadKey(true);
    }

    private static dynamic ConfigureFromIronPython()
    {
        dynamic configuration = new ExpandoObject();
        ScriptEngine engine = Python.CreateEngine();
        ScriptScope scope = engine.CreateScope();
        scope.SetVariable("configuration", configuration);
        engine.ExecuteFile("Configuration.py", scope);
        return configuration;
    }
}

For that to work you need add references to IronPython and Microsoft.Scripting (there’s a CTP of IronPython for .NET 4.0 that you can get here.)  You’ll also need a few using statements:


using System.Dynamic;
using IronPython.Hosting;
using Microsoft.Scripting.Hosting;

The two things that make this work are the ExpandoObject and the dynamic keyword.  The ExpandoObject is a dynamic property bag that allows us to set and get members at runtime as needed.  The dynamic keyword means that .NET will resolve the properties at runtime (rather than the traditional behaviour of checking at compile time.)  The result is (I think) more simple and elegant than the configuration code to which I’ve become accustomed.

Written by remark

September 28, 2009 at 12:53 pm

Posted in .NET, c#, Design, Python

Simple configuration using IronPython

leave a comment »

I’ve just posted an article on how to use IronPython to configure an application – read it here.

Written by remark

September 14, 2009 at 4:49 pm

Posted in .NET, c#, Design, Development, Python

Follow

Get every new post delivered to your Inbox.