Optimising the ipaddress module from Python 3.3

February 27th, 2014 by exhuma.twn

As of Python 3.2, the “ipaddress” module has been integrated into the stdlib. Personally, I find it a bit premature, as the library code does not look to be very PEP8 compliant. Still, it fills a huge gap in the stdlib.

In the last days, I needed to find a way to collapse consecutive IP networks into supernets whenever possible. Turns out, there’s a function for that: ipaddress.collapse_addresses. Unfortunately, I was unable to use it directly as-is because I don’t have a collection of networks, but rather object instances which have “network” as a member variable. And it would be impossible to extract the networks, collapse them and correlate the results back to the original instances.

So I decided to dive into the stdlib source code and get some “inspiration” to accomplish this task. To me personally, the code was fairly difficult to follow. About 60 lines comprised of two functions where one calls the other one recursively.

I thought I could do better. And preliminary tests are promising. It’s no longer recursive (it’s shift-reduceish if you will) and about 30 lines shorter. Now, the original code does some type checking which I might decide to add later on, increasing the number of lines a bit, and maybe even hit performance. I’m still confident.

A run with 30k IPv6 networks took 93 seconds with the new algorithm using up 490MB of memory. The old, stdlib code took 230 seconds to finish with a peak memory usage of 550MB. All in all, good results.

Note that in both cases, the 30k addresses had to be loaded into memory, so they will take up a considerable amount as well, but that size is the same in both runs.

I still have an idea in mind to improve the memory usage. I’ll give that a try.

Here are a few stats:

With the new algorithm:

collapsing 300000 IPv6 networks 1 times
generating 300000 addresses...
... done
new:  92.98410562699428
        Command being timed: "./env/bin/python mantest.py 300000"
        User time (seconds): 92.79
        System time (seconds): 0.28
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:33.07
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 491496
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 123911
        Voluntary context switches: 1
        Involuntary context switches: 154
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

and with the old algorithm:

collapsing 300000 IPv6 networks 1 times
generating 300000 addresses...
... done
old:  229.66894743399462
        Command being timed: "./env/bin/python mantest.py 300000"
        User time (seconds): 229.35
        System time (seconds): 0.38
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 3:49.76
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 549592
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 144970
        Voluntary context switches: 1
        Involuntary context switches: 1218
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

I’ll add more details as I go… I’m too “into it” and keep forgetting time and to post fun stuff on-line… stay tuned.

Colourising python logging for console output.

December 27th, 2013 by exhuma.twn

I’ve seen my fair share of code fragments colourising console output. Especially when using logging. Sometimes the colour codes are directly embedded into the format string, which makes it really hairy to deal with different colours for different levels. Sometimes even the log message is wrapped in a colour string along the lines: LOG.info("{YELLOW}Message{NORMAL}") or something equally atrocious.

Most logging frameworks support this use-case with “Formatters”. Use them! Here’s a quick example of how to do it “the right way™”:

Disclaimer: For whatever reason, this gist is borking the foobar.lu theme. I’m guessing it’s the UTF-8 char in the docstring? So maybe a web-server misconfig? So I’ll have to link it the “old way”! Go figure…

Clicky clicky → https://gist.github.com/exhuma/8147910

Introduction to google-closure with plovr

September 1st, 2013 by exhuma.twn

I’m about to embark on a quest to understand the development for custom google-closure components (UI widgets if you will). Reading through the relevant section in “Closure – The Definitive Guide” makes me believe, it’s not all too difficult. But there are still a bunch of concepts which I need to familiarize myself with. This article briefly outlines my aim for this “learning trail”, and starts of with a tiny HelloWorld project using plovr. This article assume a minimal knowledge of google closure (you should know what “provides” and “requires”. “exportSymbol” should also not surprise you) Read the rest of this entry »

Automagic __repr__ for SQLAlchemy entities with primary key columns with Declarative Base.

July 5th, 2013 by exhuma.twn

According to the Python documentation about __repr__, a call to repr() should give you a valid Python expression if possible. This is a very useful guideline. And it is also something I I like to implement in my Python projects as much as possible.

Now, for mapped database entities, you might argue that it makes sense to have a default constructor as long as it accepts the primary key columns.

By default, it is possible to create new instances by specifying column values in SQLAlchemy. For example:

user = User(name=u'John Doe', email=u'john.doe@example.com')

It should be possible to create such “repr” values automatically for primary keys. All the required meta info is available. Digging through the SA docs, I found that it is possible the customize Base in order to add behaviour to all mapped entities!

Here’s the result:

With this in place, all representations of DB entities will finally make sense and be copy/pasteable directly into your code.

Of course, by nature of ORMs, the new instances created this way will be detached from the session and need to be merged before you can do any DB related operations on them! A simple example:

from mymodel import User, Session

sess = Session()
user = User(name=u'John Doe', email=u'john.doe@example.com')
user = sess.merge(user)

Uploading the contents of a variable using fabric

June 25th, 2013 by exhuma.twn

More than once I needed to create files on the staging/production box which I had no need of on the local development box (For example complex logging configuration).

This fragment contains a simple function which tries to do this in a safe manner, and also ensuring proper cleanup in case of failure.

Formatting PostgreSQL CSV logs

April 24th, 2013 by exhuma.twn

The problem

Today I needed to keep an eye on PostgreSQL logs. Luckily, I decided upon installation to log everything using the “csvlog” format. But there’s a small catch, depending how you read that log. This catch is newline characters in database queries.

This has nothing to do with PostgreSQL directly. In fact, it does the right thing, in that it quotes all required fields. Now, a quoted field can contain a newline character. But if you read the file on a line-by-line basis (using methods like file_handle.readline, this will case problems. No matter what programming language you use, if you call readline, it will read up to the next newline character and return that. So, let’s say you have the following CSV record:

2013-03-21 10:41:19.651 CET,"ipbase","ipbase_test",13426,"[local]",514ad5bf.3472,139,"SELECT",2013-03-21 10:41:19 CET,2/5828,3741,LOG,00000,"duration: 0.404 ms  statement: SELECT\n                 p2.device,\n                 p2.scope,\n                 p2.label,\n                 p2.direction\n             FROM port p1\n             INNER JOIN port p2 USING (link)\n             WHERE p1.device='E'\n             AND p1.scope='provisioned'\n             AND p1.label='Eg'\n             AND (p1.device = p2.device\n                 AND p1.scope = p2.scope\n                 AND p1.label=p2.label) = false",,,,,,,,,""

If you read this naïvely with “readline” calls, you will get the following:

 1:2013-03-21 10:41:19.651 CET,"ipbase","ipbase_test",13426,"[local]",514ad5bf.3472,139,"SELECT",2013-03-21 10:41:19 CET,2/5828,3741,LOG,00000,"duration: 0.404 ms  statement: SELECT
 2:                p2.device,
 3:                p2.scope,
 4:                p2.label,
 5:                p2.direction
 6:            FROM port p1
 7:            INNER JOIN port p2 USING (link)
 8:            WHERE p1.device='E'
 9:            AND p1.scope='provisioned'
10:            AND p1.label='Eg'
11:            AND (p1.device = p2.device
12:                AND p1.scope = p2.scope
13:                AND p1.label=p2.label) = false",,,,,,,,,""

Now, this is really annoying if you want to parse the file properly.

The solution

Read the file byte-by-byte, and feed a line to the CSV parser only if you hit a newline outside of quoted text. Obviously you should consider the newline style (\n, \r or \r\n) and the proper quote and escape characters when doing this.

What about Python?

It turns out, Python’s csv module suffers from this problem. The builtin CSV module reads files line-by-line. However, it is possible to override the default behavior.

For my own purpose, I wrote a simple script, reading from the postgres log until interrupted.

You are free to use this for your own purpose, modify or extend it as you like.

You can find it here: exhuma/postgresql-logmon

Recovering from a corrupted git repo

February 23rd, 2013 by exhuma.twn

I do a lot of work on the go. Offline. Sometimes it takes a long time to push changes to a remote repository. As always, Murphy’s law applies, and the one repo that explodes into my face is the one with ten days worth of work in it.

While working, suddenly my laptop hang. Music looping. No mouse movement. Nothing. The only possible solution was to do a cold-reboot. I was not worried. Everything was saved, and I only changed a few lines and can easily recover if something went awry. So I rebooted.

Once back in the system, I immediately wanted to do a git status and git diff. Git spat back the following error message:

jukebox$ git st
fatal: object 9bd41c2f96f295924af92a9da175cb3686f13359 is corrupted

My Laptop had shown some strange and erratic behaviour over the last few days already. I already left a memtest running for about 24 hours earlier this week without errors. The only possible explanation left was the hard-disk.

Fun times ahead! 10 days of work at risk… 10 days of important changes! Sweat building up my forehead. Bloody sweat!

I trust my tools to keep my code safe. I trust git. I trust vim. I do microscopic commits, and I knew my current uncommitted changes only involved a few lines. So maybe only the last commit got corrupted? Let’s see…

Read the rest of this entry »

Adding unicode glyphs to docutils (ReStructured Text, Sphinx, …) documents.

February 13th, 2013 by exhuma.twn

docutils supports adding a wide range of unicode glyphs into documents, while still keeping the source document readable and plain ASCII.

The character mappings are available in include files, and have to be included into the document before being useful.

For example, adding a horizontal arrow is done by including 'isoamsa.txt' which provides (amongst others) the replacement |hArr|. An example document might be:

.. include:: <isoamsa.txt>

This is some text containing an arrow symbol: |hArr|

Hav a look at the extensions available in the official docutils distribution for more information.

Getting ENUM DataType to work with Doctrine 2 in Zend Framework 2

January 24th, 2013 by wickeddoc

When using Doctrine 2 with a MySQL Database which has tables with ENUM datatypes, you might run into the following error message:

‘Unknown database type enum requested, Doctrine\DBAL\Platforms\MySqlPlatform may not support it.’

This is because Doctrine 2 doesn’t support the ENUM DataType natively as you can read here: Doctrine Cookbook.

In their Doctrine Cookbook they give a solution how you can resolve this by mapping the ENUM to a STRING datatype. But if you’re using Zend Framework 2 you’ll probably run into the same question as I have: where do I put this stuff for it to work?

This might not be the perfect solution, but it worked for me.

Adding the required code to the onBootstrap() method of the Module.php file of the “default” module did the job.

class Module
    public function onBootstrap(MvcEvent $e)
        $em = $e->getApplication()->getServiceManager()->get('Doctrine\ORM\EntityManager');
        $platform = $em->getConnection()->getDatabasePlatform();
        $platform->registerDoctrineTypeMapping('enum', 'string');


More convenient development with the closure library

August 23rd, 2012 by exhuma.twn

While cleaning up my build process using ‘closure’, I stumbled across plovr. After using it for only 20 minutes, I am convinced that this should be in the toolbox of *everyone* developing with the closure library!

While the documentation is still a bit sparse, you can have it set up in no time! There’s no need to regurgitate a basic setup example in this post. Everything necessary is readily availble over at http://plovr.com/!

