Formatting PostgreSQL CSV logs

April 24th, 2013 by exhuma.twn

The problem

Today I needed to keep an eye on PostgreSQL logs. Luckily, I decided upon installation to log everything using the “csvlog” format. But there’s a small catch, depending how you read that log. This catch is newline characters in database queries.

This has nothing to do with PostgreSQL directly. In fact, it does the right thing, in that it quotes all required fields. Now, a quoted field can contain a newline character. But if you read the file on a line-by-line basis (using methods like file_handle.readline, this will case problems. No matter what programming language you use, if you call readline, it will read up to the next newline character and return that. So, let’s say you have the following CSV record:

2013-03-21 10:41:19.651 CET,"ipbase","ipbase_test",13426,"[local]",514ad5bf.3472,139,"SELECT",2013-03-21 10:41:19 CET,2/5828,3741,LOG,00000,"duration: 0.404 ms  statement: SELECT\n                 p2.device,\n                 p2.scope,\n                 p2.label,\n                 p2.direction\n             FROM port p1\n             INNER JOIN port p2 USING (link)\n             WHERE p1.device='E'\n             AND p1.scope='provisioned'\n             AND p1.label='Eg'\n             AND (p1.device = p2.device\n                 AND p1.scope = p2.scope\n                 AND p1.label=p2.label) = false",,,,,,,,,""

If you read this naïvely with “readline” calls, you will get the following:

 1:2013-03-21 10:41:19.651 CET,"ipbase","ipbase_test",13426,"[local]",514ad5bf.3472,139,"SELECT",2013-03-21 10:41:19 CET,2/5828,3741,LOG,00000,"duration: 0.404 ms  statement: SELECT
 2:                p2.device,
 3:                p2.scope,
 4:                p2.label,
 5:                p2.direction
 6:            FROM port p1
 7:            INNER JOIN port p2 USING (link)
 8:            WHERE p1.device='E'
 9:            AND p1.scope='provisioned'
10:            AND p1.label='Eg'
11:            AND (p1.device = p2.device
12:                AND p1.scope = p2.scope
13:                AND p1.label=p2.label) = false",,,,,,,,,""

Now, this is really annoying if you want to parse the file properly.

The solution

Read the file byte-by-byte, and feed a line to the CSV parser only if you hit a newline outside of quoted text. Obviously you should consider the newline style (\n, \r or \r\n) and the proper quote and escape characters when doing this.

What about Python?

It turns out, Python’s csv module suffers from this problem. The builtin CSV module reads files line-by-line. However, it is possible to override the default behavior.

For my own purpose, I wrote a simple script, reading from the postgres log until interrupted.

You are free to use this for your own purpose, modify or extend it as you like.

You can find it here: exhuma/postgresql-logmon

Posted in Python | No Comments »

Recovering from a corrupted git repo

February 23rd, 2013 by exhuma.twn

I do a lot of work on the go. Offline. Sometimes it takes a long time to push changes to a remote repository. As always, Murphy’s law applies, and the one repo that explodes into my face is the one with ten days worth of work in it.

While working, suddenly my laptop hang. Music looping. No mouse movement. Nothing. The only possible solution was to do a cold-reboot. I was not worried. Everything was saved, and I only changed a few lines and can easily recover if something went awry. So I rebooted.

Once back in the system, I immediately wanted to do a git status and git diff. Git spat back the following error message:

jukebox$ git st
fatal: object 9bd41c2f96f295924af92a9da175cb3686f13359 is corrupted

My Laptop had shown some strange and erratic behaviour over the last few days already. I already left a memtest running for about 24 hours earlier this week without errors. The only possible explanation left was the hard-disk.

Fun times ahead! 10 days of work at risk… 10 days of important changes! Sweat building up my forehead. Bloody sweat!

I trust my tools to keep my code safe. I trust git. I trust vim. I do microscopic commits, and I knew my current uncommitted changes only involved a few lines. So maybe only the last commit got corrupted? Let’s see…

Read the rest of this entry »

Posted in Coding Voodoo | No Comments »

Adding unicode glyphs to docutils (ReStructured Text, Sphinx, …) documents.

February 13th, 2013 by exhuma.twn

docutils supports adding a wide range of unicode glyphs into documents, while still keeping the source document readable and plain ASCII.

The character mappings are available in include files, and have to be included into the document before being useful.

For example, adding a horizontal arrow is done by including 'isoamsa.txt' which provides (amongst others) the replacement |hArr|. An example document might be:

.. include:: <isoamsa.txt>

This is some text containing an arrow symbol: |hArr|

Hav a look at the extensions available in the official docutils distribution for more information.

Posted in Coding Voodoo | No Comments »

Getting ENUM DataType to work with Doctrine 2 in Zend Framework 2

January 24th, 2013 by wickeddoc

When using Doctrine 2 with a MySQL Database which has tables with ENUM datatypes, you might run into the following error message:

‘Unknown database type enum requested, Doctrine\DBAL\Platforms\MySqlPlatform may not support it.’

This is because Doctrine 2 doesn’t support the ENUM DataType natively as you can read here: Doctrine Cookbook.

In their Doctrine Cookbook they give a solution how you can resolve this by mapping the ENUM to a STRING datatype. But if you’re using Zend Framework 2 you’ll probably run into the same question as I have: where do I put this stuff for it to work?

This might not be the perfect solution, but it worked for me.

Adding the required code to the onBootstrap() method of the Module.php file of the “default” module did the job.

class Module
{
    public function onBootstrap(MvcEvent $e)
    {
        ...
        $em = $e->getApplication()->getServiceManager()->get('Doctrine\ORM\EntityManager');
        $platform = $em->getConnection()->getDatabasePlatform();
        $platform->registerDoctrineTypeMapping('enum', 'string');
        ...
       
    }

}

Posted in Doctrine, PHP, Zend Framework | No Comments »

Format JSON documents on the command line

August 31st, 2012 by exhuma.twn

I tend to test my REST services using curl. Most of the time the JSON responses are not pretty-printed, which makes testing larger documents a pain. Copy pasting the results in online JSON formatters was really getting annoying. So I wrote a few lines of python to do that job for me. Note that this is a two-minute-hack and by no means very generic. It works for me, and with curl. The handling for Headers is especially eerie :P But maybe you will still find it useful.

Without further ado, here it is: https://github.com/exhuma/braindump/tree/master/jsonformat

Installation

pip install jsonf

or

easy_install jsonf

Screenshot

Posted in Uncategorized | No Comments »

Tunnel your browser connections (i.e. your web traffic) through an SSH tunnel.

August 24th, 2012 by exhuma.twn

I actually did not believe it’s this simple.

Creating the tunnel is as easy as typing:

ssh -D 1080 user@remote_host

Which can be improved with

ssh -fND 1080 user@remote_host

See the man page on the details of the extra options.

This will open up port 1080 on your local machine, providing a SOCKS proxy (SOCKS5 if I’m right). You can then specify this in any application supporting SOCKS proxies. This includes Firefox and Chrome. With Firefox it’s straight-forward. You can find it in the usual proxy page in the settings.

For chrome it’s a bit more tricky. While you can specify a SOCKS proxy, it seems to ignore it. If you want to enable your tunnel, you have to run chrome with the following command-line flag:

chromium-browser --proxy-server="socks5://localhost:1080"

(or use chrome. Whatever rocks your boat).

Once this is set up, what will happen is that your application/browser will send all requests to your locally running SSH instance. This in turn will forward it to the remote host, where the request will be sent out on the web. The response takes the inverse direction. As stated by linode, this is great if you’re on an untrustworthy network!

Sources:

Posted in Linux, Techno Voodoo | No Comments »

More convenient development with the closure library

August 23rd, 2012 by exhuma.twn

While cleaning up my build process using ‘closure’, I stumbled across plovr. After using it for only 20 minutes, I am convinced that this should be in the toolbox of *everyone* developing with the closure library!

While the documentation is still a bit sparse, you can have it set up in no time! There’s no need to regurgitate a basic setup example in this post. Everything necessary is readily availble over at http://plovr.com/!

Posted in Coding Voodoo, JavaScript | No Comments »

Update to bash completion for fabric tasks.

August 20th, 2012 by exhuma.twn

In a previous post I’d shown a way to enable bash-completion for fabric tasks.

With later revisions of fabric, you can now organise tasks in submodules. The new update takes this into account.

Posted in Uncategorized | No Comments »

Specifying “externs” using closurebuilder.py

August 13th, 2012 by exhuma.twn

The last hour or so I have been struggling with an “extern” definition using closurebuilder.py. When running the compiler manually, you specify externs like this:

  java -jar compiler.jar [...] --externs myexterns.js

So, naturally, my command looked like this:

  closurebuilder.py [...] --compiler_flags="--externs myexterns.js"

However, this resulted in the following error:

  "--externs myexterns.js" is not a valid option

My next step was to verify this by running the compiler manually. And it worked. So, without a doubt, it had something to do with closurebuilder.py. Suspecting a bug, I updated to the latest SVN revision. No luck. Still getting the same error.

So I dug into the code of closurebuilder, and the problem was obvious:

Each additional command-line flag --compiler_flags gets appended to a list. This list is then used as-is in the subprocess call. The subprocess API however expects each argument as a new item in the list. And technically --externs myexterns.js are two arguments. The first is --externs, the second is myexterns.js. The usual --compiler_flags="--compilation_level=ADVANCED_OPTIMIZATIONS" works because it is considered as one argument (it does not contain white-space).

So to get it working properly, you will have to modify the above line to the following:

  closurebuilder.py [...] --compiler_flags="--externs" --compiler_flags="myexterns.js"

Whether it makes sense to fix this inside closurebuilder.py is debatable. The arguments need to be passed as a list for security reasons (to be able to do proper shell escaping). Personally I don't care that I have to specify my compiler flags in this wonky fashion. It's in a makefile afterall...

Posted in Uncategorized | No Comments »

Clarification about MANIFEST changes in the packaging guide.

May 27th, 2012 by exhuma.twn

Just today I needed to remove files from a package. And I realised that I forgot to mention that removing files from the MANIFEST.in files has a small gotcha… Here’s the small note I added to the original article:

When running the setup script, it will create a folder with the extension .egg-info. If you make changes to your MANIFEST.in file, you should delete this folder. It contains a file named SOURCES.txt which contains all the file in the package. If you remove files from your manifest, the will only disappear when that file is changes as well. So the easiest way is to delete the egg-info folder and let the setup script re-create it!

Posted in Uncategorized | No Comments »

« Previous Entries

Pages

Recent Posts

Categories

Links


Archives

Meta