Using the ltree module in PostgreSQL
PostgreSQL offers the very interesting module ltree. It’s used to represent a tree structure within the database. One could represent a tree quite easily as a table which references itself. However, if you ever went along that road, you will know that it becomes very slow with larger trees. In addition, the SQL queries that reference themselves are very unhandy.
If you want to know more, what ltree can do for you, I suggest you go ahead and read the usage examples on the ltree homepage.
Installing ltree
This is the part that eluded me. On the homepage it only states:
cd contrib/ltree make make install make installcheck
Which is perfectly fine if you compiled PostgreSQL from source. But what comes next? Or what do you have to do on distros like Debian/Ubuntu, SuSE, RedHat, or other Linux distros that actually offer ready-made packages for the PostgreSQL contrib modules (ltree is part of those!)?
The answer is simple: Install the PostgreSQL contrib package (postgresql-contrib on debian/ubuntu).
Ok, so you now have the contrib modules installed. But nothing happened to your PostgreSQL installation. Nothing changed. You still cannot use ltree….. what gives?!??!
Don’t Panic!
The explanation is very simple if you understand the basics about PostgreSQL modules. You now have the modules installed, which is in fact – in the simplest case – just a SQL script. You still need to activate/install that module into the database you want to use it in. The solution is even present on the ltree homepage. But they don’t explain what’s happening:
psql ltreetest < /usr/local/pgsql/share/contrib/ltree.sql
This installs the ltree module into the database called “ltreetest”. You can install it in any database you wish. It simply creates a set of functions, the ltree type and operators to use on the ltree type. Once you have this imported, you are ready to use ltree in your database.
I tried this in both Debian Sarge and Ubuntu Dapper Drake. From what I can tell, nothing changed since then. And the annoying thing is, that it seems to me that both systems don’t include the ltreetest.sql script which they import as a next step. So I was again a bit blocked. But using ltree is in fact amazingly easy.
To get you started:
CREATE TABLE test(
id SERIAL NOT NULL PRIMARY KEY,
path ltree,
someothercolumn VARCHAR(20)
);
INSERT INTO test (path, someothercolumn) VALUES ('TOP.Vehicle.Car', 'car');
INSERT INTO test (path, someothercolumn) VALUES ('TOP.Vehicle.Truck', 'truck');
INSERT INTO test (path, someothercolumn) VALUES ('TOP.Vehicle.Car.Trabbi', 'trabbi');
I haven’t tried it yet without the “TOP” node. But my guess is, it should work.
An important note to Windows users
During the PostgreSQL installation, the question about contrib modules comes up. If you select them here, the modules will get installed into the default database template, and as such in all databases you will create! So unless you really want to have the module installed in all database go right ahead and select the module. But in general it’n not a good idea.
On Windows the contrib modules are located somewhere in the installation folder of PostgreSQL whether you select them in the installation wizard or not. So you can still install the modules by manually importing the SQL script in the database you want.
Now that you have a database ready for testing, it’s time to head back to the ltree-page and learn about the usage.
Have fun!
Posted in Coding Voodoo | 2 Comments »

June 28th, 2007 at 3:07 pm
Looks Very interesting…
Gonna have a look on that if that can help me to store populations generated during Genetic Programming.
One thing I am not to sure is the performance issue.
They represent the tree as a whole nested structure. It is actually in spirit quite close to a nested list (python style- assumption: First element of a list or a nested list is always a head node)
tree = ['A', ['B', ['D', 'H', 'I'], ‘E’], ['C','F', 'G']]
I am totally crap with data bases in general, so my question might seem a bit naive, but…
Is it not much quicker to store it in the data base as paired elements of a hash map ?
(python-style dictionary):
tree = { ‘A’:['B','C'], ‘B’:['D','E'], ‘C’:['F','G'], ‘D’:['H','I'] }
Reconstruction of a nested structure would be something done separately, and this way, you can not only store trees but graphs and it should not get too slow to store larger trees.
June 29th, 2007 at 8:45 pm
What you see here in the postgres queries is probably not how it’s stored internally. The simple text-content separating the nodes with dots, is simply a convenience notation.
Under the hood it uses some C-code which I did not look at. I don’t like reading C
But I suspect it’s a simple linked list. The reason “ltree” exists is simply because querying a tree using database tables is painfully slow. The ltree module aleviates this.
How it’s represented internally is not (and should not be) the concern of the database designer.