Optimising the ipaddress module from Python 3.3

February 27th, 2014 by exhuma.twn

As of Python 3.2, the “ipaddress” module has been integrated into the stdlib. Personally, I find it a bit premature, as the library code does not look to be very PEP8 compliant. Still, it fills a huge gap in the stdlib.

In the last days, I needed to find a way to collapse consecutive IP networks into supernets whenever possible. Turns out, there’s a function for that: ipaddress.collapse_addresses. Unfortunately, I was unable to use it directly as-is because I don’t have a collection of networks, but rather object instances which have “network” as a member variable. And it would be impossible to extract the networks, collapse them and correlate the results back to the original instances.

So I decided to dive into the stdlib source code and get some “inspiration” to accomplish this task. To me personally, the code was fairly difficult to follow. About 60 lines comprised of two functions where one calls the other one recursively.

I thought I could do better. And preliminary tests are promising. It’s no longer recursive (it’s shift-reduceish if you will) and about 30 lines shorter. Now, the original code does some type checking which I might decide to add later on, increasing the number of lines a bit, and maybe even hit performance. I’m still confident.

A run with 30k IPv6 networks took 93 seconds with the new algorithm using up 490MB of memory. The old, stdlib code took 230 seconds to finish with a peak memory usage of 550MB. All in all, good results.

Note that in both cases, the 30k addresses had to be loaded into memory, so they will take up a considerable amount as well, but that size is the same in both runs.

I still have an idea in mind to improve the memory usage. I’ll give that a try.

Here are a few stats:

With the new algorithm:

collapsing 300000 IPv6 networks 1 times
generating 300000 addresses...
... done
new:  92.98410562699428
        Command being timed: "./env/bin/python mantest.py 300000"
        User time (seconds): 92.79
        System time (seconds): 0.28
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:33.07
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 491496
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 123911
        Voluntary context switches: 1
        Involuntary context switches: 154
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

and with the old algorithm:

collapsing 300000 IPv6 networks 1 times
generating 300000 addresses...
... done
old:  229.66894743399462
        Command being timed: "./env/bin/python mantest.py 300000"
        User time (seconds): 229.35
        System time (seconds): 0.38
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 3:49.76
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 549592
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 144970
        Voluntary context switches: 1
        Involuntary context switches: 1218
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

I’ll add more details as I go… I’m too “into it” and keep forgetting time and to post fun stuff on-line… stay tuned.

Posted in Python | No Comments »

Colourising python logging for console output.

December 27th, 2013 by exhuma.twn

I’ve seen my fair share of code fragments colourising console output. Especially when using logging. Sometimes the colour codes are directly embedded into the format string, which makes it really hairy to deal with different colours for different levels. Sometimes even the log message is wrapped in a colour string along the lines: LOG.info("{YELLOW}Message{NORMAL}") or something equally atrocious.

Most logging frameworks support this use-case with “Formatters”. Use them! Here’s a quick example of how to do it “the right way™”:

Disclaimer: For whatever reason, this gist is borking the foobar.lu theme. I’m guessing it’s the UTF-8 char in the docstring? So maybe a web-server misconfig? So I’ll have to link it the “old way”! Go figure…

Clicky clicky → https://gist.github.com/exhuma/8147910

Posted in Python | No Comments »

Automagic __repr__ for SQLAlchemy entities with primary key columns with Declarative Base.

July 5th, 2013 by exhuma.twn

According to the Python documentation about __repr__, a call to repr() should give you a valid Python expression if possible. This is a very useful guideline. And it is also something I I like to implement in my Python projects as much as possible.

Now, for mapped database entities, you might argue that it makes sense to have a default constructor as long as it accepts the primary key columns.

By default, it is possible to create new instances by specifying column values in SQLAlchemy. For example:

user = User(name=u'John Doe', email=u'john.doe@example.com')

It should be possible to create such “repr” values automatically for primary keys. All the required meta info is available. Digging through the SA docs, I found that it is possible the customize Base in order to add behaviour to all mapped entities!

Here’s the result:

With this in place, all representations of DB entities will finally make sense and be copy/pasteable directly into your code.

Of course, by nature of ORMs, the new instances created this way will be detached from the session and need to be merged before you can do any DB related operations on them! A simple example:

from mymodel import User, Session

sess = Session()
user = User(name=u'John Doe', email=u'john.doe@example.com')
user = sess.merge(user)
sess.refresh(user)

Posted in Python | 6 Comments »

Uploading the contents of a variable using fabric

June 25th, 2013 by exhuma.twn

More than once I needed to create files on the staging/production box which I had no need of on the local development box (For example complex logging configuration).

This fragment contains a simple function which tries to do this in a safe manner, and also ensuring proper cleanup in case of failure.

Posted in Coding Voodoo, Python | No Comments »

Formatting PostgreSQL CSV logs

April 24th, 2013 by exhuma.twn

The problem

Today I needed to keep an eye on PostgreSQL logs. Luckily, I decided upon installation to log everything using the “csvlog” format. But there’s a small catch, depending how you read that log. This catch is newline characters in database queries.

This has nothing to do with PostgreSQL directly. In fact, it does the right thing, in that it quotes all required fields. Now, a quoted field can contain a newline character. But if you read the file on a line-by-line basis (using methods like file_handle.readline, this will case problems. No matter what programming language you use, if you call readline, it will read up to the next newline character and return that. So, let’s say you have the following CSV record:

2013-03-21 10:41:19.651 CET,"ipbase","ipbase_test",13426,"[local]",514ad5bf.3472,139,"SELECT",2013-03-21 10:41:19 CET,2/5828,3741,LOG,00000,"duration: 0.404 ms  statement: SELECT\n                 p2.device,\n                 p2.scope,\n                 p2.label,\n                 p2.direction\n             FROM port p1\n             INNER JOIN port p2 USING (link)\n             WHERE p1.device='E'\n             AND p1.scope='provisioned'\n             AND p1.label='Eg'\n             AND (p1.device = p2.device\n                 AND p1.scope = p2.scope\n                 AND p1.label=p2.label) = false",,,,,,,,,""

If you read this naïvely with “readline” calls, you will get the following:

 1:2013-03-21 10:41:19.651 CET,"ipbase","ipbase_test",13426,"[local]",514ad5bf.3472,139,"SELECT",2013-03-21 10:41:19 CET,2/5828,3741,LOG,00000,"duration: 0.404 ms  statement: SELECT
 2:                p2.device,
 3:                p2.scope,
 4:                p2.label,
 5:                p2.direction
 6:            FROM port p1
 7:            INNER JOIN port p2 USING (link)
 8:            WHERE p1.device='E'
 9:            AND p1.scope='provisioned'
10:            AND p1.label='Eg'
11:            AND (p1.device = p2.device
12:                AND p1.scope = p2.scope
13:                AND p1.label=p2.label) = false",,,,,,,,,""

Now, this is really annoying if you want to parse the file properly.

The solution

Read the file byte-by-byte, and feed a line to the CSV parser only if you hit a newline outside of quoted text. Obviously you should consider the newline style (\n, \r or \r\n) and the proper quote and escape characters when doing this.

What about Python?

It turns out, Python’s csv module suffers from this problem. The builtin CSV module reads files line-by-line. However, it is possible to override the default behavior.

For my own purpose, I wrote a simple script, reading from the postgres log until interrupted.

You are free to use this for your own purpose, modify or extend it as you like.

You can find it here: exhuma/postgresql-logmon

Posted in Python | No Comments »

A comprehensive guide through Python packaging (a.k.a. setup scripts)

May 13th, 2012 by exhuma.twn

One of the really useful things in python are the setup scripts. When doing “serious” business, you really should look into them. Setup scripts are amazingly powerful. But they don’t necessarily need to be complex. But because of this flexibility, the documentation around them seems like a lot to read. Additionally, the state of packaging has changed quite a bit over the years. So you now have a couple of packages (distutils, setuptools, distribute) which all seem to try to solve the same problem. See the current state of packaging for more details on this.

This post attempts to summarize the important bits using a “Hello World” project and steeping through the process of creating the setup.py file:

  • Creating a package distribution
  • Automatic generation of executables
  • Version numbers
  • Dependency management
  • Publishing your package (thus also making it available for automatic dependency resolution)
  • Some more food for thought.
NOTE: The setup.py script we will construct in this post, will use two bits of code which may not work in all cases:

  • importing the package itself (to centralize the version number)
  • Reading the long_description content from a text-file

Both methodologies have their issues. Importing the package only works if you can import it cleanly (i.e. without dependencies) from standard python. Reading the text file only works if the setup.py is called from the proper location.

This is mostly true. There are corner-cases however where this is not possible. If that is the case, you will need to live without these helpful shortcuts. See the comments on this article for more on this.

Read the rest of this entry »

Posted in Python | 30 Comments »

Unable to easy_install psycopg2 on debian

October 29th, 2009 by exhuma.twn

Problem:

$ easy_install psycopg2
Searching for psycopg2                                                                                                                                                                                                                                                      
Reading http://pypi.python.org/simple/psycopg2/                                                                                                                                                                                                                              
Reading http://initd.org/projects/psycopg2                                                                                                                                                                                                                                  
Reading http://initd.org/pub/software/psycopg/                                                                                                                                                                                                                              
Best match: psycopg2 2.0.13                                                                                                                                                                                                                                                  
Downloading http://initd.org/pub/software/psycopg/psycopg2-2.0.13.tar.gz                                                                                                                                                                                                    
Processing psycopg2-2.0.13.tar.gz                                                                                                                                                                                                                                            
Running psycopg2-2.0.13/setup.py -q bdist_egg --dist-dir /tmp/easy_install-cHE0C_/psycopg2-2.0.13/egg-dist-tmp-x-CxRS                                                                                                                                                        
error: Setup script exited with error: No such file or directory

Solution:

This most likely indicates that you are missing the “libpq” headers:

sudo aptitude install libpq-dev

should solve the problem

Posted in Python | No Comments »

Python startup (command completion & history)

August 21st, 2008 by exhuma.twn

If you want command completion and a history in your python shell, export the PYTHONSTARTUP env var (export PYTHONSTARTUP=$HOME/.pystartup) in your bashrc and create a file ~/.pystartup with the following contents:

import atexit
import os
import readline
import rlcompleter

historyPath = os.path.expanduser("~/.pyhistory")

def save_history(historyPath=historyPath):
import readline
readline.write_history_file(historyPath)

if os.path.exists(historyPath):
readline.read_history_file(historyPath)

readline.parse_and_bind('tab: complete')

atexit.register(save_history)
del os, atexit, readline, rlcompleter, save_history, historyPath

Posted in Python | No Comments »

Vim script (mapping) to generate python getters and setters

December 31st, 2007 by exhuma.twn

Somethin that I need quite often is to create custom accessors and mutators for class-attributes. For example convert this:

class MyClass(object):
   
   def __init__(self):
      self.has_changes = False
      self.some_attribute = False

into this:

class MyClass(object):
   
   def __init__(self):
      self.__has_changes = False
      self.__some_attribute = False

   def get_some_attribute(self):
      "Accessor: some_attribute"
      return self.__some_attribute

   def set_some_attribute(self, input):
      "Mutator: some_attribute"
      self.__some_attribute = input
      self.__has_changes = True

   some_attribute = property(get_some_attribute, set_some_attribute)

   def get_has_changes(self):
      "Accessor: has_changes"
      return self.__has_changes

   has_changes = property(get_has_changes)

This particular example allows an easy tracking if a class contains changes. Without the need of calling myclass.get_some_attribute() or myclass.set_some_attribute(foo). You can simply do myclass.some_attribute = foo and the has_changes attribut will change accordingly.

If your class has many attributes, writing custom accessors and mutators can be tedious. So here’s a small Vim-mapping that get’s you started. Sure, you may still need to fine-tune some generated code, but the bulk is there.

<font color="#808bed">nmap</font> <font color="#c080d0">&lt;</font><font color="#c080d0">F6</font><font color="#c080d0">&gt;</font> yyP<font color="#c080d0">&lt;</font><font color="#c080d0">home</font><font color="#c080d0">&gt;</font>widef get_<font color="#c080d0">&lt;</font><font color="#c080d0">end</font><font color="#c080d0">&gt;</font>(self):<font color="#c080d0">&lt;</font><font color="#c080d0">esc</font><font color="#c080d0">&gt;&lt;</font><font color="#c080d0">down</font><font color="#c080d0">&gt;&lt;</font><font color="#c080d0">esc</font><font color="#c080d0">&gt;</font>yyP&gt;&gt;I&quot;Accessor: <font color="#c080d0">&lt;</font><font color="#c080d0">end</font><font color="#c080d0">&gt;</font>&quot;<font color="#c080d0">&lt;</font><font color="#c080d0">esc</font><font color="#c080d0">&gt;&lt;</font><font color="#c080d0">down</font><font color="#c080d0">&gt;</font>yyP&gt;&gt;Ireturn self.__<font color="#c080d0">&lt;</font><font color="#c080d0">esc</font><font color="#c080d0">&gt;</font>o<font color="#c080d0">&lt;</font><font color="#c080d0">esc</font><font color="#c080d0">&gt;&lt;</font><font color="#c080d0">down</font><font color="#c080d0">&gt;</font>yyPIdef set_<font color="#c080d0">&lt;</font><font color="#c080d0">end</font><font color="#c080d0">&gt;</font>(self, input):<font color="#c080d0">&lt;</font><font color="#c080d0">esc</font><font color="#c080d0">&gt;&lt;</font><font color="#c080d0">down</font><font color="#c080d0">&gt;</font>yyP&gt;&gt;I&quot;Mutator: <font color="#c080d0">&lt;</font><font color="#c080d0">end</font><font color="#c080d0">&gt;</font>&quot;<font color="#c080d0">&lt;</font><font color="#c080d0">esc</font><font color="#c080d0">&gt;&lt;</font><font color="#c080d0">down</font><font color="#c080d0">&gt;</font>yyP&gt;&gt;Iself.__<font color="#c080d0">&lt;</font><font color="#c080d0">end</font><font color="#c080d0">&gt;</font> = input<font color="#c080d0">&lt;</font><font color="#c080d0">esc</font><font color="#c080d0">&gt;</font>o<font color="#c080d0">&lt;</font><font color="#c080d0">esc</font><font color="#c080d0">&gt;&lt;</font><font color="#c080d0">down</font><font color="#c080d0">&gt;&lt;</font><font color="#c080d0">home</font><font color="#c080d0">&gt;</font>wveyA = property(get_<font color="#c080d0">&lt;</font><font color="#c080d0">esc</font><font color="#c080d0">&gt;</font>pA, set_<font color="#c080d0">&lt;</font><font color="#c080d0">esc</font><font color="#c080d0">&gt;</font>pA)<font color="#c080d0">&lt;</font><font color="#c080d0">esc</font><font color="#c080d0">&gt;</font>o<font color="#c080d0">&lt;</font><font color="#c080d0">esc</font><font color="#c080d0">&gt;</font>

Put this into your vimrc, or (like I do) into the ~/.vim/ftplugin/python.vim file so it get’s only loaded for python files. Then you only need to write the attribute name of the class, put your cursor on that line, be sure to be in normal mode (hit a few time <esc>) 😉 and hit F6

If you want to change the shortcut, simply change the first parameter to this mapping line.

Posted in Python | No Comments »

Calculate the distance between two GPS-Coordinates

September 17th, 2007 by exhuma.twn

This function uses the Haversine formula to calculate the distance which takes into account the spherical nature of the earth.
As the earth is not a perfect sphere, this function approximates this by using the average radius.

from math import sin, cos, radians, sqrt, asin

def lldistance(a, b):
   """
   Calculates the distance between two GPS points (decimal)
   @param a: 2-tuple of point A
   @param b: 2-tuple of point B
   @return: distance in m
   """

   r = 6367442.5             # average earth radius in m
   dLat = radians(a[0]-b[0])
   dLon = radians(a[1]-b[1])
   x = sin(dLat/2) ** 2 + \
       cos(radians(a[0])) * cos(radians(b[0])) *\
       sin(dLon/2) ** 2
   y = 2 * asin(sqrt(x))
   d = r * y

   return d

Posted in Python | 1 Comment »

« Previous Entries

Pages

Recent Posts

Categories

Links


Archives

Meta